SSH Certificates Have Been Ready Since 2010. Your Infrastructure Probably Isn't Using Them.

OpenSSH has supported certificate-based authentication since version 5.4, released in 2010. Sixteen years later, most engineering teams are still copying public keys into authorized_keys files and hoping someone remembers to clean them up when an employee leaves. The feature is not obscure, it is not experimental, and it is not hard to understand conceptually. The slow adoption comes down to operational tooling, and that story is worth telling in full.

JP Mens wrote about this recently, covering the core mechanics with his characteristic clarity. This post builds on that foundation and goes deeper into why the tooling gap existed, what has been built to close it, and where short-lived certificates change the security model entirely.

The Problem with authorized_keys at Scale

For a single developer with a handful of servers, authorized_keys is fine. You add your key, it works, you move on. The problems compound as the organization grows.

Every key in authorized_keys is permanent by default. There is no expiry field, no built-in revocation mechanism, no audit trail. When an engineer changes roles or leaves the company, someone has to remember to remove their key from every server they had access to. At ten servers that’s annoying. At a thousand it’s a compliance incident waiting to happen.

Distribution is the other half of the problem. Getting a new key onto the right set of servers requires either manual SSH access, configuration management tooling (Ansible, Chef, Puppet), or some bespoke synchronization script that someone wrote three years ago and nobody fully understands anymore. The key material itself has no metadata: no owner, no expiry, no scope, no indication of what it was for. A key that was added for a contractor six months ago looks identical to one added yesterday for a full-time engineer.

The Trust On First Use (TOFU) problem runs in the other direction too. When a developer first connects to a server they haven’t visited before, SSH presents the host fingerprint and asks them to verify it. In practice, almost everyone types yes without checking. That behavior is so universal that it has become the expected default, which means TOFU provides almost no protection against host impersonation or man-in-the-middle attacks in realistic environments.

How SSH Certificates Actually Work

An SSH certificate is a public key that has been signed by a Certificate Authority (CA). The signed certificate carries metadata: a validity window, a list of principals (usernames), optional source address restrictions, and a serial number. When a client presents a certificate, the server doesn’t need a copy of the public key in authorized_keys. It just needs to trust the CA that signed the certificate, and the certificate carries everything else.

Creating a CA is straightforward:

# Generate the CA key pair (keep the private key offline or in a secrets manager)
ssh-keygen -t ed25519 -f /etc/ssh/ca/user_ca -C "user-ca-2024"

Signing a user certificate looks like this:

# Sign alice's public key for 8 hours, valid for the 'alice' and 'deploy' principals
ssh-keygen -s /etc/ssh/ca/user_ca \
  -I "alice@corp-2024-01-15" \
  -n "alice,deploy" \
  -V +8h \
  ~/.ssh/alice_key.pub

This produces alice_key-cert.pub. You can inspect the certificate contents with:

ssh-keygen -L -f ~/.ssh/alice_key-cert.pub

The output shows the signing CA’s fingerprint, the principals, the validity window, the key ID, and any critical options or extensions. Everything is visible and auditable without querying any external system.

On the server side, you tell sshd to trust certificates signed by your CA:

# /etc/sshd_config
TrustedUserCAKeys /etc/ssh/ca/user_ca.pub

That one line replaces the need to distribute individual public keys to every server. A developer gets their key signed by the CA and can authenticate to any server configured to trust that CA. Onboarding is now a CA operation, not a file distribution operation.

User Certificates vs Host Certificates

User certificates handle client authentication. Host certificates handle the other direction: proving to the client that the server is who it claims to be.

This is where the TOFU problem gets solved properly. You sign each server’s host key with a host CA, and you configure clients to trust that CA for a given hostname pattern. The known_hosts file gets a @cert-authority line instead of individual host fingerprints:

# Sign the server's host key
ssh-keygen -s /etc/ssh/ca/host_ca \
  -I "prod-web-01.corp" \
  -h \
  -n "prod-web-01.corp,10.0.1.5" \
  -V +52w \
  /etc/ssh/ssh_host_ed25519_key.pub

Then in each server’s sshd_config:

HostCertificate /etc/ssh/ssh_host_ed25519_key-cert.pub

And in the client’s ~/.ssh/known_hosts:

@cert-authority *.corp ssh-ed25519 AAAA...base64... host-ca-2024

Now when a developer connects to any *.corp host for the first time, SSH validates the host certificate against the CA key they already trust. No TOFU prompt. No manual fingerprint verification. The client can detect impersonation because an attacker’s host won’t have a certificate signed by your CA.

Host certificates are dramatically underused relative to user certificates. Most writeups on SSH certificates focus entirely on the user side, which leaves half the security model on the table.

Key Revocation Lists

Certificates with short validity windows largely sidestep the revocation problem: you just wait for the certificate to expire. For longer-lived certificates, or for cases where you need to revoke immediately (a stolen key, a compromised system), OpenSSH provides Key Revocation Lists.

A KRL is a compact binary file that lists revoked certificates by serial number, key ID, or raw key. It’s far more efficient than manually hunting down and removing keys from authorized_keys files across a fleet. You generate and update it with ssh-keygen -k:

# Create a KRL revoking certificate serial 42
ssh-keygen -k -f /etc/ssh/krl -z 42 /dev/null

# Reference the KRL in sshd_config
# RevokedKeys /etc/ssh/krl

The catch is distribution: the KRL file itself needs to reach every server that needs to enforce it. This is the same distribution problem as authorized_keys, just for a single file instead of per-user keys. Configuration management tooling handles this reasonably well, but it’s still a push operation that requires coordination.

Short-lived certificates make KRLs largely unnecessary for the user authentication case, which is one of the reasons the short-lived approach has become the preferred pattern in modern tooling.

The CA Bootstrap Problem

Here is the part that most introductory SSH certificate guides gloss over: distributing the CA public key itself.

The certificate model only works if every server trusts your CA, and every client has the host CA in their known_hosts. Getting the user CA public key onto a new server is no harder than adding a single line to sshd_config, but you still have to get it there. For the initial bootstrap, you need some other authentication mechanism, which usually means falling back to authorized_keys or cloud instance metadata services (AWS Instance Connect, GCP OS Login, etc.).

For the client side, distributing the host CA key is more subtle. You can bake it into a base OS image, push it via configuration management, or serve it through a centrally managed ssh_config. None of these are hard, but they require deliberate infrastructure investment, and that investment is exactly what many teams haven’t made.

The CA private key itself needs to be protected with the same rigor you’d apply to any root credential. Compromise of the CA private key means an attacker can issue valid certificates for any principal on any server that trusts it. Hardware security modules are the right answer for production CAs, and all the serious tooling options below support them.

The Tooling Ecosystem

The operational complexity of running a CA, managing certificate issuance, and handling lifecycle events is what kept SSH certificate adoption low for years. Several tools have emerged that make this tractable.

smallstep/step-ca is probably the most developer-friendly option today. It’s an open-source CA server that handles certificate issuance over HTTPS, supports OIDC and OAuth flows for authenticating requests, and has a CLI (step) that integrates cleanly into developer workflows. A developer runs step ssh login and gets a short-lived certificate without touching any keys or talking to an ops team. The OIDC integration means you can tie issuance to your existing identity provider, so when someone’s account is disabled in your IdP, they simply stop getting certificates.

HashiCorp Vault has an SSH secrets engine that operates in two modes: OTP (one-time passwords, largely superseded) and CA mode. In CA mode, Vault acts as the signing CA and you request certificates through Vault’s API. The major advantage here is that Vault’s access control policies, audit logging, and lease system wrap certificate issuance in the same framework you’re already using for secrets management. If your organization already runs Vault, adding SSH certificate issuance is a relatively small configuration change.

Netflix BLESS (Bastion’s Lambda Ephemeral SSH Service) is the design that put short-lived SSH certificates on the map for large-scale deployments. Netflix built it as an AWS Lambda function: you authenticate to the Lambda (via MFA or similar), it signs your key with a short validity window (typically minutes to hours), and you use that certificate to reach production hosts through a bastion layer. BLESS is open-source but it is also very much a Netflix-shaped tool; the Lambda architecture and AWS-specific authentication mean most shops use it as a reference design rather than deploying it directly.

Teleport takes the most opinionated approach. It’s a full access plane that handles SSH, Kubernetes, databases, and web applications through a unified proxy layer. SSH certificates are the underlying mechanism, but developers don’t interact with certificate lifecycle directly; Teleport handles it transparently behind its own authentication flows. The trade-off is that Teleport is a significant infrastructure commitment. You’re not just adding CA infrastructure; you’re replacing your SSH access model with a new platform.

Short-Lived Certificates as a Security Philosophy

The conventional security model for credentials involves issuance, use, and eventual revocation. Revocation is the hard part: you need a mechanism to check whether a credential is still valid at the time of use, and that mechanism needs to be fast, reliable, and consistently consulted.

Short-lived certificates sidestep this entirely. If your certificates are valid for four hours, you don’t need a revocation list because certificates expire before the operational window for responding to a compromise even closes. A stolen certificate that expires in three hours gives an attacker a small, bounded window, and the next certificate rotation will exclude the compromised identity anyway.

This is a meaningful philosophical shift. Instead of asking “is this credential still valid?”, you ask “did this credential exist recently enough to trust?” The answer comes from the certificate itself, with no external query required. This is what makes the short-lived approach scale well: no CRL distribution points, no OCSP responder, no central revocation database that becomes a single point of failure.

The consequence is that certificate issuance has to be fast and reliable. If your CA is down or your issuance flow is broken, developers can’t authenticate. You’ve traded a revocation availability problem for an issuance availability problem. For most organizations, the issuance problem is easier to solve: you can run redundant CA instances, cache certificates locally, and fail open in degraded modes in ways that are harder to do responsibly with revocation.

Getting Started

If you’re evaluating this for a real deployment, the decision tree is roughly: if you already run Vault, start with the SSH secrets engine because the operational model is familiar and the integration surface is small. If you want something purpose-built and developer-friendly with OIDC support, step-ca is the right starting point. If you need a full access plane with audit logging across multiple protocols, look at Teleport with clear expectations about the commitment it represents.

For host certificates specifically, there’s no reason to wait for new tooling. You can add host certificate signing to any existing CA setup today, push the signed certificates through your configuration management layer, and distribute the host CA key in a known_hosts file that ships with your base image or dotfiles. The TOFU problem is solvable with a few days of focused work, and the security improvement is immediate.

The infrastructure has been ready for sixteen years. The tooling to make it operational has matured considerably in the last five. The main remaining barrier is organizational inertia, and that’s a problem with a known solution.