Why SSH Certificates Are Worth the Setup Overhead

Most teams run SSH the same way: generate a key pair, paste the public key into ~/.ssh/authorized_keys on every server, and move on. It works. For a team of three people and a handful of servers, it works fine indefinitely. The problems show up later, quietly, when someone leaves the company, a server gets rebuilt, or you’re trying to figure out which of the forty keys in an authorized_keys file is still actively used.

JP Mens’ writeup makes the case for SSH certificates as the structural fix for this. The case is solid, and the mechanics are well-documented in the OpenSSH manual. What I want to do here is go past the setup tutorial angle and explain the specific failure modes that certificates address, including the host certificate side that most people skip entirely.

The Key Distribution Problem

With raw public keys, trust is stored at the destination. Each server independently decides who can log in by maintaining its own authorized_keys file. If you want to add or remove access, you have to touch every server. If you’re using configuration management like Ansible or Puppet, you can automate this reasonably well, but you’re still fundamentally managing a distributed list of trusted keys.

Certificates invert this. Instead of distributing trust to every server, you establish a Certificate Authority (CA), sign user keys with it, and tell each server to trust anything signed by that CA. The authorized_keys file shrinks to nothing; the sshd_config entry that replaces it is a single line:

TrustedUserCAKeys /etc/ssh/trusted_user_ca_keys

That file contains the CA’s public key. New users get access by having their public key signed by the CA. Revoked users stop getting signatures issued. The servers don’t need to change.

Creating the CA and Signing Keys

The CA is just an ordinary SSH key pair. Protect it well, because anything signed by it gets access.

# Create the CA key pair
ssh-keygen -t ed25519 -f ca_key -C "corp-ssh-ca-2026"

# Sign a user's public key
ssh-keygen -s ca_key \
  -I "alice@example.com" \
  -n alice \
  -V +52w \
  alice_ed25519.pub

The -I flag is the key identity, a human-readable label that shows up in logs. The -n flag sets the principals, which are the usernames the certificate is valid for on the target systems. The -V +52w sets an expiry of 52 weeks from now.

The result is a file called alice_ed25519-cert.pub. Alice doesn’t need a new private key; her existing key pair just gains a signed certificate alongside it. When SSH sends the certificate, the server validates the signature against the trusted CA key and checks that the principal matches the account being accessed.

You can inspect a certificate to see everything encoded in it:

ssh-keygen -L -f alice_ed25519-cert.pub

This outputs the serial, type, key ID, principals, validity window, critical options, and extensions. The critical options are particularly useful: force-command restricts the certificate to running a specific command, and source-address limits which IP ranges can use it. These are constraints you’d previously have to encode per-key in authorized_keys with command= and from= prefixes, scattered across every server.

Expiry as a Revocation Strategy

One of the most practical advantages of certificates is that they expire. With raw keys, removing access requires deleting the key from every authorized_keys file. If you miss one server, the key still works. Certificates solve this differently: once a certificate expires, the user must get a new one signed. If you don’t sign a new certificate, access lapses automatically.

For short-lived certificates, some organizations push this to an extreme: certificates valid for hours or a day, issued automatically at login time through a signing service. HashiCorp Vault’s SSH secrets engine does this, as does Teleport and Smallstep’s step-ca. The certificate expires before most incident response timelines would require manual revocation anyway.

For longer-lived certificates, OpenSSH has a Key Revocation List (KRL) mechanism. You maintain a KRL file and distribute it to servers via RevokedKeys in sshd_config. Revoking a certificate is a single command against the KRL, and then the KRL gets pushed out by your configuration management. It’s still a distribution problem, but a much simpler one than hunting down authorized_keys entries.

The Host Certificate Side

Most people who encounter SSH certificates focus entirely on user authentication, and ignore host certificates. This is a mistake, because host certificates solve a different but equally annoying problem.

When you SSH to a server for the first time, you get the well-known prompt:

The authenticity of host 'server.example.com (10.0.0.5)' can't be established.
ED25519 key fingerprint is SHA256:abc123...
Are you sure you want to continue connecting (yes/no/[fingerprint])?

This is Trust On First Use (TOFU). You’re being asked to blindly accept a fingerprint with no way to verify it other than out-of-band means. Most people type yes without checking anything. The host’s public key then gets stored in ~/.ssh/known_hosts, and future connections are validated against that stored key.

Host certificates flip this. The server’s host key gets signed by the CA, and clients are configured to trust any host key bearing a valid CA signature matching the expected hostname pattern:

# Sign the server's host key
ssh-keygen -s ca_key \
  -I "server.example.com" \
  -h \
  -n server.example.com,10.0.0.5 \
  -V +52w \
  /etc/ssh/ssh_host_ed25519_key.pub

The -h flag creates a host certificate rather than a user certificate. On the client side, the CA’s public key goes into ~/.ssh/known_hosts with a special prefix:

@cert-authority *.example.com ssh-ed25519 AAAA...

Now when a client connects to any server matching *.example.com that presents a valid host certificate, the connection proceeds without the TOFU prompt. New servers get trusted automatically the moment they have a signed host certificate. Old servers with expired or missing host certificates generate a clear error rather than a silent downgrade to fingerprint comparison.

This is particularly useful for ephemeral infrastructure: autoscaling groups, container hosts, cloud instances that cycle frequently. Each new instance gets its host key signed at provisioning time, and clients trust it without any manual known_hosts management.

AuthorizedPrincipalsFile and Multi-Account Access

One subtle but important configuration option is AuthorizedPrincipalsFile. By default, a certificate principal must match the local username being accessed. If Alice’s certificate has the principal alice, she can log into the alice account. If you want her to also access deploy or root, you have two options: add those principals to her certificate, or use AuthorizedPrincipalsFile.

The principals file lets you specify, per local account, which certificate principals are allowed to access it:

# /etc/ssh/auth_principals/root
alice@example.com
bob@example.com

Combined with AuthorizedPrincipalsFile /etc/ssh/auth_principals/%u in sshd_config, this decouples the certificate’s identity from the local account name. You can manage access roles on the server side without reissuing certificates.

What This Costs You

None of this is free. The CA key needs to be protected carefully. If it’s compromised, an attacker can sign certificates granting access to your entire fleet for any validity period they choose. Many organizations keep the CA key offline or in an HSM and build a signing service with its own access controls in front of it.

You also need a process for certificate issuance. For small teams, a script that wraps ssh-keygen -s is enough. For larger organizations, you want a proper signing service with audit logging, rate limiting, and integration with your identity provider. The operational complexity has to live somewhere.

The tooling ecosystem has matured enough that you don’t have to build this from scratch. Vault’s SSH engine, Teleport, step-ca, and even simpler tools like Netflix’s BLESS (now somewhat dated) show different points in the complexity-control tradeoff space.

For teams already running a CA for TLS certificates, the cognitive overhead is lower than it looks: the concepts are the same, the tooling is different, and you get the same central trust management benefits you’re already accustomed to for HTTPS.

The Practical Threshold

Below a certain scale, authorized_keys management plus good discipline is fine. Above it, the discipline breaks down. Someone forgets to remove a key. A server gets provisioned without pulling the latest key list. A contractor’s key lingers for months after their engagement ends.

Certificates don’t require perfect discipline because the system enforces expiration and central trust by construction. The setup cost is real, but it’s a one-time cost. The operational benefit accumulates every time you add a server, onboard a user, or offboard one, because none of those actions require touching individual servers anymore.

That’s the actual case for SSH certificates: not that they’re more convenient to set up, but that they remove an entire class of ongoing operational work and replace it with a simpler, auditable model.