· 7 min read ·

The Two Trust Problems SSH Certificates Actually Fix

Source: lobsters

Most SSH setups are built on a patchwork of trust. You accept a server’s host key once, commit it to ~/.ssh/known_hosts, and hope you are not being redirected somewhere else. On servers, you accumulate authorized_keys files full of public keys from people who may have left the organization years ago. SSH certificates solve both problems with one conceptual change: a certificate authority.

jpmens.net’s recent walkthrough covers the basics of getting SSH certificates running. This post goes deeper into the full trust model, the pieces most tutorials skip, and the tooling that has grown up around certificate-based SSH at scale.

What SSH Certificates Actually Are

SSH certificates are not X.509 certificates. OpenSSH uses its own binary format, documented in PROTOCOL.certkeys in the openssh-portable source. The format is simpler than PKIX: a serialized structure containing the public key, metadata (serial number, key ID, valid principals, validity window, critical options, extensions), and the CA’s signature over all of that.

Two certificate types exist: user certificates and host certificates. Both use the same signing mechanism but serve different verification directions. User certificates prove that a CA vouches for a user’s public key; host certificates prove that a CA vouches for a server’s host key. Most tutorials cover user certificates and skip host certificates entirely. That leaves half the trust model unaddressed.

Creating a CA

A CA in SSH terms is just an ordinary SSH key pair. Convention has settled on Ed25519 for new setups:

ssh-keygen -t ed25519 -f user_ca -C "user CA"
ssh-keygen -t ed25519 -f host_ca -C "host CA"

Keep these private keys somewhere safe, preferably offline or stored in hardware. Everything else in your infrastructure will trust signatures made by these keys, so losing one means rotating trust across your entire fleet.

Signing User Certificates

Once a user generates a key pair, you sign their public key:

ssh-keygen -s user_ca \
  -I "alice@example.com" \
  -n alice \
  -V -5m:+8h \
  alice_key.pub

This produces alice_key-cert.pub. The fields matter:

  • -I is the key ID, a human-readable audit label that appears in server logs when the certificate is used for authentication.
  • -n sets valid principals: the usernames this certificate can authenticate as. Omitting this makes the certificate valid for any username, which is almost never what you want.
  • -V is the validity window. The example allows a five-minute back-dated start (useful for clock skew) and expires eight hours from issuance.

You can inspect any certificate with ssh-keygen -L -f alice_key-cert.pub, which prints all fields in human-readable form, including the CA fingerprint, validity bounds, principals, and options.

On the server side, a single directive in sshd_config tells OpenSSH which CA public keys to trust:

TrustedUserCAKeys /etc/ssh/trusted_user_ca.pub

That file contains the public key of your user CA. Every server carrying this line will accept any valid certificate signed by that CA, with no per-server key distribution needed. Adding a new team member means signing their key once, not updating dozens of authorized_keys files through configuration management.

Critical Options and Extensions

Certificates can carry restrictions and capabilities baked in at signing time. Critical options cause the server to reject the certificate if it does not recognize them. Extensions are advisory: an older server that does not understand an extension ignores it.

Common critical options:

# Restrict to specific source address ranges
ssh-keygen -s user_ca -I "ci-bot" -n deploy \
  -O source-address=10.0.0.0/8 \
  -V +1h deploy_key.pub

# Force a specific command regardless of what the client requests
ssh-keygen -s user_ca -I "backup-runner" -n backup \
  -O force-command="/usr/local/bin/backup.sh" \
  -V +24h backup_key.pub

Extensions control what the session is allowed to do. By default, a signed certificate includes permit-pty, permit-port-forwarding, permit-agent-forwarding, permit-X11-forwarding, and permit-user-rc. You can strip these with -O clear followed by explicitly re-adding only what you need. A CI pipeline certificate that needs to run a deployment script has no business forwarding X11 or running arbitrary commands.

This is where SSH certificates start to look like a real access control layer rather than a key distribution convenience. The CA embeds policy into the credential at issuance time, and the server enforces it without needing to know anything specific about the user.

Host Certificates: The Forgotten Half

When you SSH to a new server without a host certificate, OpenSSH shows you a fingerprint and asks you to verify it manually. Most people type “yes” and move on. This is trust on first use (TOFU), and it is a weak model: you are making a security decision with no basis for comparison.

Host certificates eliminate TOFU. You sign the server’s host public key with your host CA:

ssh-keygen -s host_ca \
  -I "web01.example.com" \
  -h \
  -n "web01.example.com,web01,10.0.1.10" \
  -V +52w \
  /etc/ssh/ssh_host_ed25519_key.pub

The -h flag marks this as a host certificate rather than a user certificate. Valid principals here are the names and addresses by which clients will refer to this server. If a client connects to 10.0.1.10 but the certificate only lists web01.example.com, OpenSSH will reject it.

The server advertises the certificate in sshd_config:

HostCertificate /etc/ssh/ssh_host_ed25519_key-cert.pub

On the client side, instead of per-host entries in known_hosts, you add a CA marker:

@cert-authority *.example.com ssh-ed25519 AAAA...

Every server in your fleet with a certificate signed by this CA is now automatically trusted without a TOFU prompt. You are not maintaining a known_hosts file that grows without bound; you are trusting a CA you control. Adding a new server means signing its host key once, not pushing a known_hosts update to every client.

Revocation

One gap in a naive certificate setup is revocation. If a certificate is valid for a year and an employee leaves, you have a problem. Two approaches handle this.

Key Revocation Lists (KRLs) let you revoke specific certificates or public keys. OpenSSH’s ssh-keygen -k subcommand creates and updates KRL files, which are compact binary structures designed to remain small even with many revoked entries. Adding RevokedKeys /etc/ssh/krl to sshd_config makes the server check the list on every connection attempt.

# Create a KRL revoking a specific certificate
ssh-keygen -k -f /etc/ssh/krl -z 1 alice_key-cert.pub

# Update the KRL to add another revoked key
ssh-keygen -k -u -f /etc/ssh/krl -z 2 mallory_key.pub

The simpler approach at scale is short certificate lifetimes. If certificates expire in eight hours, you do not need to revoke them when someone leaves; you stop issuing new ones. A certificate that expires tonight provides no access tomorrow, regardless of what happens to your KRL infrastructure. This approach requires an automated issuance pipeline, but it has the appealing property that your revocation infrastructure is as simple as “stop signing for this principal.”

The Tooling Ecosystem

Manual certificate signing does not scale past a handful of servers. Several mature tools exist to run CA services that issue certificates on demand.

HashiCorp Vault’s SSH secrets engine provides a CA that issues short-lived certificates after checking Vault policies. Users authenticate to Vault through whatever identity system your organization uses (LDAP, OIDC, cloud provider roles) and receive a signed certificate valid for a configurable window. The CA private key never leaves Vault’s encrypted storage.

Smallstep’s step-ca is a standalone CA with first-class SSH support alongside its TLS capabilities. Their step ssh login command handles the full issuance flow for a user, including an OIDC browser flow for authentication. It supports ACME for automated certificate renewal on the TLS side, with analogous automation for SSH.

Teleport takes a more integrated approach, embedding the CA, a certificate-aware SSH proxy, session recording, and role-based access control into one system. It manages both user and host certificates as part of its normal operation, and the audit log captures every session by certificate identity.

Netflix’s BLESS, though less actively maintained now, was an early influential system: it used AWS Lambda to sign certificates after verifying caller identity through IAM. The pattern it established, using short-lived certificates issued by a cloud function after verifying identity through a separate mechanism, influenced most of what came after it. AWS EC2 Instance Connect applies a variation on this: it pushes a temporary public key directly to the instance through the AWS API, bypassing certificates but achieving a similar short-lived access model.

The Security Model in Plain Terms

The trust hierarchy in a certificate-based SSH setup is straightforward once you see it whole. There is one (or a small number of) CA key pairs that you protect carefully. Everything else derives trust from those. Servers present signed host credentials. Users present signed user credentials. The CA is the single point of trust, and the system fails closed: if a credential is not signed by a trusted CA, the connection is rejected regardless of what the key looks like.

The traditional authorized_keys model distributes trust across every server. Each server maintains its own list of trusted public keys. Adding a new team member means updating authorized_keys on every machine they need access to, through configuration management, through a script, through whatever manual process you have built. Removing access for a departed employee means finding every machine they had access to and removing their key, which requires that your inventory of who-has-access-where be accurate and up to date.

The known_hosts model has the symmetric problem on the client side. Each client learns host keys one at a time through TOFU and stores them locally. Rotating a server’s host key, or migrating to new hardware with a different key, means that clients will see a warning and be blocked until they manually update their local known_hosts. A CA makes that rotation transparent: sign the new key with the same CA, and clients trust it automatically.

SSH certificates have been available since OpenSSH 5.4, released in 2010. The feature has been stable and well-documented for over fifteen years. The reason it is not universally deployed is not technical; the tooling to automate certificate issuance took time to mature, and the TOFU model, despite its weaknesses, works well enough that most teams never feel the pain sharply enough to change. At small scales, manually managed keys are fine. At larger scales, or anywhere security matters seriously, the CA model is worth the setup cost, and the setup cost has dropped considerably as tools like Vault, step-ca, and Teleport have become standard infrastructure components.

Was this interesting?