Startup Time, Single Binaries, and Why Rust Fits CLI Work

CLI tools have a set of properties that distinguish them from other software categories: they’re invoked directly by users and scripts many times a day, they need to run on machines that may not have the right language runtime, and they often need to feel fast even when doing nothing compute-intensive. A recent post on smiling.dev describes reaching for Rust instead of Python for this use case and finding the result meaningfully better. The reasons behind that experience are worth examining in detail.

The Startup Cost Is Real

Every Python CLI invocation pays a startup tax before executing a single line of user code. Running python3 -c "pass" on a typical modern system takes 30 to 50 milliseconds. That cost compounds as you import libraries: import argparse is cheap, under 5ms, but import click typically adds 60 to 100ms, and import rich can add another 150 to 200ms because it pulls in a dependency tree that includes colorama, pygments, and several others. A moderately equipped Python CLI tool using click for argument parsing and rich for terminal output often takes 300 to 500ms before touching user data.

For a tool that runs once as part of a build pipeline, 400ms is irrelevant. For a tool used as a shell alias, run in loops over file trees, or invoked during shell completion generation, 400ms is the dominant cost of every invocation. The sluggishness users report about Python-based CLI tools is not imagination; it is the interpreter initializing.

A Rust binary compiled with cargo build --release starts in under 5ms on the same hardware. The runtime is embedded in the binary. There is no interpreter initialization, no .pyc caching check, no import system traversal. The gap is roughly two orders of magnitude.

Distribution Is the Underrated Pain Point

The harder problem with Python CLIs is distribution. Shipping a Python-based CLI to users who are not developers requires choosing between poor options.

The pip install path works if the user has a compatible Python version and is comfortable with pip. It fails for non-developer users, creates global namespace pollution by default, and breaks when the system Python version differs from what the tool was developed against. pipx solves the isolation problem but adds a dependency on pipx itself, which most non-developers do not have.

PyInstaller and similar bundlers can produce standalone executables, but they bundle the full Python interpreter along with all dependencies. A modestly equipped Python CLI tool bundled with PyInstaller typically produces an executable of 30 to 80MB, and the startup overhead persists because the bundled interpreter still initializes on each invocation.

A Rust CLI compiled as a statically linked binary, using musl on Linux or the standard toolchain on macOS and Windows, produces a self-contained executable with no external runtime dependencies. After stripping debug symbols, typical Rust CLI tools land between 2 and 15MB. The user downloads a file, marks it executable, puts it in their PATH, and that is it.

This is why ripgrep, fd, bat, and delta all ship as single binaries in their release artifacts. GitHub releases for these projects include a pre-built binary for each supported platform. Installation is a download and a PATH entry, nothing more.

Argument Parsing With Compile-Time Guarantees

Python’s CLI library ecosystem is mature. argparse ships with the standard library. click is the community standard, offering clean decorator-based argument definitions. Typer builds on click and uses Python type annotations to generate parsers. These are well-designed tools, but they validate argument schemas at runtime.

Rust’s clap crate, currently at version 4.5.x, offers a derive-based API where argument schemas are defined as structs and enums:

use clap::Parser;

#[derive(Parser, Debug)]
#[command(author, version, about)]
struct Args {
    /// Path to process
    #[arg(short, long)]
    path: std::path::PathBuf,

    /// Number of threads
    #[arg(short, long, default_value_t = 4)]
    threads: usize,

    /// Output format
    #[arg(long, value_enum)]
    format: OutputFormat,
}

#[derive(clap::ValueEnum, Clone, Debug)]
enum OutputFormat {
    Json,
    Text,
    Csv,
}

The compiler validates the argument schema at compile time. threads must be parseable as a usize; clap handles the runtime parsing and generates error messages automatically. The OutputFormat enum means the user can only pass json, text, or csv, and clap produces a clean error message listing valid options for anything else. Shell completion generation is available through the clap_complete crate without additional boilerplate.

The Python equivalent with click:

import click

@click.command()
@click.option('--path', type=click.Path(exists=True), required=True)
@click.option('--threads', type=int, default=4)
@click.option('--format', type=click.Choice(['json', 'text', 'csv']))
def main(path, threads, format):
    ...

This is clean and readable. But the type constraints live in decorators evaluated at runtime. A mismatch between what the decorator declares and how the variable is used inside the function body passes without complaint from the compiler and fails at runtime. Python’s type checkers can catch some of these issues with the right stubs, but the schema itself is advisory, not enforced.

Error Handling That Cannot Be Ignored

Rust’s Result<T, E> type combined with the ? operator makes error handling in CLI tools systematic. A function that reads a file, parses it, and writes output uses ? throughout, and errors propagate upward to be formatted and printed uniformly at the top level. Libraries like anyhow and thiserror make this practical for application code:

use anyhow::{Context, Result};

fn run(args: Args) -> Result<()> {
    let content = std::fs::read_to_string(&args.path)
        .with_context(|| format!("Failed to read {}", args.path.display()))?;

    let parsed = parse_content(&content)
        .context("Failed to parse file contents")?;

    write_output(parsed, args.format)
        .context("Failed to write output")?;

    Ok(())
}

Each error carries its context message, and the original error is preserved in the chain. The compiler warns on unused Result values; silently discarding an error requires an explicit .ok() or let _ = .... Python exceptions provide similar chaining through raise X from Y, but forgetting to handle an exception is easy and silent until an unhandled exception surfaces at runtime.

The Ecosystem as Evidence

The pattern shows up repeatedly across Rust’s CLI ecosystem. Most of the tools replacing traditional Unix utilities over the past several years are written in Rust: eza as an ls replacement, dust for du, tokei for line counting, hyperfine for benchmarking, starship for shell prompt rendering, zoxide for directory jumping. These tools are replacing C tools as often as Python tools; the consistent choice of Rust across independent maintainers reflects what the language offers specifically for this use case.

Performance is part of it. ripgrep’s benchmarks against GNU grep show 5 to 10x improvements on many workloads, driven primarily by better use of SIMD instructions and smarter buffering. But startup time and clean distribution are the reasons these tools get adopted even by users who would not notice raw search throughput.

Where Python Remains the Right Choice

Python fits CLI tools that are embedded in Python ecosystems: Django management commands, data processing pipeline utilities, tools that wrap Python libraries directly. The startup cost becomes irrelevant when the Python process is already running. The distribution problem disappears when the users are Python developers who already have pip and are comfortable with virtual environments.

Typer in particular narrows the ergonomics gap significantly. Defining a CLI with Python type annotations and getting argument parsing, validation, and help text generation from them feels close to idiomatic Rust in spirit, even if the guarantees are weaker at compile time. For internal tooling distributed to a known Python-capable audience, Typer is a strong choice.

The developer experience cost of Rust also matters. The ownership system and borrow checker add friction during initial development and iteration. If a tool is for internal use, will be maintained by one person, and targets users who have Python installed, the added complexity of Rust may not pay off.

The Underlying Pattern

The experience described in the smiling.dev article follows a pattern common enough in the Rust community to be worth naming directly: the initial complexity investment pays off faster for CLI tools than for most other use cases, because the runtime properties of a compiled binary address the specific friction points that Python introduces when building tools other people install and run. Startup time, distribution, type-safe argument handling, and checked error propagation are all addressed by language properties that come for free once the fundamentals are in place.

That is a narrower claim than “Rust is fast,” and a more useful one. The fit is not about raw throughput; it is about the mismatch between what Python was designed for and what CLI tools need to deliver.