· 6 min read ·

Fifteen Years of SSH Certificate Support and We're Still Copying Keys

Source: lobsters

The OpenSSH implementation has shipped SSH certificate support since version 5.4, released in 2010. Fifteen years later, most teams still distribute public keys into authorized_keys files and call it authentication.

Jan-Piet Mens wrote a clear walkthrough of the setup recently. The mechanics are simple enough to follow in an afternoon, but the more interesting question is what the certificate model changes about the underlying trust structure.

The Scaling Problem with authorized_keys

The authorized_keys approach scales quadratically. A team of 200 engineers accessing 1,000 servers means 200,000 individual key entries to maintain. When someone leaves, you audit every server. When a key is compromised, you audit every server. When you need to answer “who currently has access to production?”, you read every file.

The deeper issue is that authorized_keys is decentralized by design. Any admin with root on any server can grant access to anyone by appending a line. There is no central authority, no expiry, and no audit trail. The access model accumulates silently until it becomes impossible to reason about.

What an SSH Certificate Contains

SSH certificates are not X.509 certificates. OpenSSH defines its own format, documented in PROTOCOL.certkeys in the OpenSSH source tree. The format is intentionally simpler: a signed blob containing a public key, an identity string, a list of principals, validity timestamps, critical options, extensions, and the CA’s signature over all of it.

For an Ed25519 user certificate, the wire format contains these fields in order:

"ssh-ed25519-cert-v01@openssh.com"
nonce           (random bytes, prevents hash-collision attacks)
pk              (32-byte Ed25519 public key being certified)
serial          (uint64, CA-assigned)
type            (1=user, 2=host)
key_id          (free-form string, logged in auth.log on every auth)
principals      (list of Unix usernames this cert authorizes)
valid_after     (uint64 Unix timestamp)
valid_before    (uint64 Unix timestamp)
critical_options (key-value pairs: force-command, source-address)
extensions      (key-value pairs: permit-pty, permit-agent-forwarding, etc.)
reserved        (empty string)
signature_key   (the CA's complete public key encoding)
signature       (CA signature over all preceding fields)

Critical options and extensions are encoded as lexicographically ordered pairs. The ordering is enforced: a signature over out-of-order fields fails verification. This matters if you are writing any tooling that generates certificates programmatically.

The principal binding is the central security property. A certificate specifying principals=["alice"] can only authenticate as alice, regardless of PermitRootLogin settings on the server. The username constraint is in the signature, not in a configuration file someone can edit.

Setting Up the CA

All the tooling is in stock OpenSSH:

# Generate the CA key pair
ssh-keygen -t ed25519 -f /etc/ssh/user_ca -C "user-ca@example.com" -N ""

# Sign a user's public key
ssh-keygen -s /etc/ssh/user_ca \
    -I "alice@example.com" \
    -n "alice,deploy" \
    -V +8h \
    -z 42 \
    ~/.ssh/id_ed25519.pub

The -I flag sets the identity string, which appears in auth.log on every authentication. The -n flag sets the principals. The -V flag sets validity: +8h, +30m, +52w, and absolute date ranges like 20260101:20270101 all work. The -z flag sets a serial number useful for revocation tracking.

This produces ~/.ssh/id_ed25519-cert.pub. The SSH client picks it up automatically when it finds a cert named <keyfile>-cert.pub alongside the private key, requiring no client configuration changes.

On every server, one directive in /etc/ssh/sshd_config:

TrustedUserCAKeys /etc/ssh/user_ca.pub

Every server with this line trusts every certificate the CA signs. The authorized_keys file becomes optional for new servers, and TrustedUserCAKeys coexists cleanly with AuthorizedKeysFile, making gradual migration feasible without a flag day.

Inspect what you have issued at any time:

ssh-keygen -L -f ~/.ssh/id_ed25519-cert.pub

The output shows exactly what the server evaluates: validity window, principals, extensions, CA fingerprint, and serial number.

Short-Lived Certificates

The validity period is where certificates shift from marginally better key management to a materially different operational posture.

With 8-hour certificates, off-boarding becomes: stop issuing certificates, and existing certs expire without touching any server. With 1-hour certs, a stolen credential has a bounded damage window without revocation infrastructure in the critical path.

Netflix’s BLESS was an early influential implementation of this pattern: an AWS Lambda function issuing 2-minute SSH certificates after validating the requester’s IAM identity. The lifetime was short enough that stealing a certificate after a session was already established had no practical value.

HashiCorp Vault’s SSH secrets engine brings the pattern to non-AWS environments:

vault write ssh-client-signer/sign/my-role \
    public_key=@$HOME/.ssh/id_ed25519.pub

Vault handles authentication, role-based TTL enforcement, and audit logging. Every issuance is recorded with the requester’s identity and the certificate parameters.

Teleport and Smallstep’s step-ca go further by gating certificate issuance directly behind SSO and MFA, so the cert is an artifact of a completed identity verification flow rather than a separate credential to manage. Both treat short-lived certificates and session recording as first-class features rather than add-ons.

Revocation

OpenSSH’s revocation mechanism is the Key Revocation List (KRL), documented in PROTOCOL.krl. It uses a compact binary format with bitsets for serial number ranges, making it efficient to store and check even when revoking large batches of certificates.

# Build a KRL from a revoked certificate
ssh-keygen -k -f /etc/ssh/revoked_keys \
    -s /etc/ssh/user_ca.pub \
    alice-cert.pub

# Add more revocations to an existing KRL
ssh-keygen -k -u -f /etc/ssh/revoked_keys \
    -s /etc/ssh/user_ca.pub \
    bob-cert.pub

# Check whether a cert is revoked
ssh-keygen -Q -f /etc/ssh/revoked_keys alice-cert.pub

In sshd_config:

RevokedKeys /etc/ssh/revoked_keys

Distributing the KRL still requires touching every server, which is the same O(servers) problem that authorized_keys management creates. For routine off-boarding, short-lived certs eliminate the need: stop issuing, wait for expiry. Reserve the KRL for security incidents where you need immediate revocation before the scheduled expiry window closes.

Host Certificates: The Overlooked Half

User certificates address the “trust the client” side. Host certificates address the “trust the server” side, and they receive less attention despite having equivalent security value.

The standard SSH experience for a new server is the TOFU (Trust On First Use) prompt: “The authenticity of host can’t be established. Are you sure you want to continue connecting?” Most users accept without verifying the fingerprint. MITM attacks depend precisely on this behavior.

Host certificates eliminate it. Sign each server’s host key with a separate host CA:

ssh-keygen -s /etc/ssh/host_ca \
    -I "webserver01.example.com" \
    -h \
    -n "webserver01,webserver01.example.com,10.0.1.5" \
    -V +52w \
    /etc/ssh/ssh_host_ed25519_key.pub

In sshd_config:

HostCertificate /etc/ssh/ssh_host_ed25519_key-cert.pub

Distribute the host CA public key to clients via /etc/ssh/ssh_known_hosts:

@cert-authority *.example.com ssh-ed25519 AAAA...base64...

The @cert-authority marker tells the SSH client this is a CA key, not a specific host key. Clients with this line verify new servers without prompting. New servers in your infrastructure are authenticated correctly on the first connection, and MITM attempts produce a verification failure instead of a user prompt that gets clicked through reflexively.

The two CAs, user and host, serve independent purposes and can be managed separately. Keeping them separate limits blast radius: a compromise of the user CA does not affect host verification, and vice versa.

The Audit Trail

One concrete benefit that requires no extra infrastructure: the key_id field appears in /var/log/auth.log on every successful authentication:

sshd[1234]: Accepted publickey for alice from 10.0.1.5 port 54321 ssh2:
ED25519-CERT SHA256:xxxx ID alice@corp.com (serial 42) CA ED25519 SHA256:yyyy

With authorized_keys, auth.log records which key fingerprint authenticated but nothing about whose key it is without a separate key inventory. With certificates, the identity is embedded in the certificate and the certificate is in the log by design. Correlating “who logged into that server at 14:30 on Tuesday?” becomes a log search rather than a forensic exercise across multiple systems.

Serial numbers add another dimension: you can tie every authentication event to a specific issuance, which is useful when an identity provider issues certs on behalf of users and you want to trace a specific session back to the authentication event that produced it.

The Migration Path

Moving from authorized_keys to certificates does not require a cutover. TrustedUserCAKeys and AuthorizedKeysFile coexist: servers accept both traditional key auth and cert auth simultaneously. Add certificate support to existing servers, issue certs to willing users first, and phase out authorized_keys entries incrementally as old keys rotate out.

For teams without the infrastructure to run Vault or Teleport, the bare OpenSSH tooling is sufficient to start. A CA is a single key file on a jump box or in a secret manager. The signing step is one command. Operational complexity scales with the deployment, not with the mechanism itself.

The source article covers the baseline setup clearly. What makes this worth adopting is the trust model: one CA key configured on all servers instead of a distributed, expiry-free, audit-free collection of public keys that accumulates until nobody knows what is in it or who still has access.

Was this interesting?