SSH Certificates Solve Two Problems You Probably Thought Were Separate
Source: lobsters
Most teams reach for SSH certificates after their authorized_keys management becomes painful enough. A few hundred servers, a few dozen engineers, one person who left six months ago whose key is still showing up in grep results across the fleet. At that point, certificates feel like the obvious fix for the user-authentication side of the problem.
What gets less attention is that certificates solve an equally annoying problem on the other side: the known_hosts file. These two problems are mirror images of each other, and SSH certificates collapse both of them with the same underlying mechanism.
JP Mens wrote about SSH certificates as a better SSH experience, and the core observation is right. But the full picture of why certificates are architecturally superior to the traditional model is worth unpacking in more detail.
The N×M Problem, Twice
Traditional SSH key management has a scaling problem on both sides of the connection.
On the server side: every machine maintains a list of authorized public keys for each user account. If you have N servers and M users, you have N×M entries to keep synchronized. Add a new engineer, and you touch N servers. Remove one, you touch N servers again. A config management system (Ansible, Chef, Puppet) makes this tractable, but it does not eliminate the fundamental O(N×M) nature of the state you are managing. Drift happens. A key added directly to a production box during an incident never makes it back into the source of truth.
On the client side: every user maintains a known_hosts file that maps server hostnames and IPs to their public key fingerprints. Provision a new server, and every engineer either accepts a trust-on-first-use (TOFU) prompt, which is a latent MITM vulnerability, or you push known_hosts entries to every workstation. Rebuild a server, and the key changes; everyone gets the WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED message and has to manually scrub the old entry.
SSH certificates reduce both of these to O(1) per endpoint. The mechanism is the same in both directions: a Certificate Authority signs keys, and the trusting party only needs the CA’s public key, once.
How the CA Model Works
The foundation is a CA keypair. Generate it once, protect the private key carefully, and distribute the public key to the systems that need to trust it.
ssh-keygen -t ed25519 -f /etc/ssh/user_ca_key -C "user-ca-2026"
To authorize a user to connect to servers, you sign their public key:
ssh-keygen -s /etc/ssh/user_ca_key \
-I "alice@example.com" \
-n alice,ops \
-V +8h \
-O no-port-forwarding \
alice_id_ed25519.pub
This produces alice_id_ed25519-cert.pub, which Alice presents alongside her private key when connecting. The -I flag sets a human-readable identity string that appears in sshd logs. The -n flag sets the principals, which are the Unix usernames this certificate is valid for on target hosts. The -V +8h sets an 8-hour validity window.
On the server, a single line in sshd_config is all that is needed:
TrustedUserCAKeys /etc/ssh/user_ca_key.pub
Every server gets this line, pointing at the same CA public key file. That is the entire per-server configuration for user authentication. No authorized_keys files, no per-user entries, no synchronization job.
You can inspect any certificate with:
ssh-keygen -L -f alice_id_ed25519-cert.pub
The output shows the signing CA, the key ID, the serial number, the validity window, the principals, and the extensions. It is all readable without contacting any server.
Principals and AuthorizedPrincipalsFile
Principals deserve more attention than they usually get. When sshd receives a certificate, it checks that the Unix username the client is trying to log in as matches one of the principals in the certificate. If Alice’s certificate lists alice,ops as principals, she can log in as the alice user or as the ops user on any server that trusts the CA.
This works well until you want to restrict which principals can reach specific servers. The AuthorizedPrincipalsFile directive handles this:
# sshd_config
AuthorizedPrincipalsFile /etc/ssh/auth_principals/%u
Create /etc/ssh/auth_principals/root on a production server containing only prod-ops. Now even if Alice’s certificate lists ops as a principal, she cannot log in as root on that server because ops is not in the principals file for root. On a development server, the same file might contain ops,dev,staging-team.
This gives you per-server access control without any certificates needing to change. The CA issues certificates with broad principals; the servers enforce fine-grained restrictions locally. The two concerns stay separate.
Host Certificates: The Overlooked Half
User certificates solve the authorized_keys problem. Host certificates solve the known_hosts problem, and they are set up almost identically.
You can use a separate CA for hosts, which is good practice:
ssh-keygen -t ed25519 -f /etc/ssh/host_ca_key -C "host-ca-2026"
Sign each server’s host key:
ssh-keygen -s /etc/ssh/host_ca_key \
-h \
-I "web-prod-01.example.com" \
-n "web-prod-01.example.com,web-prod-01,10.0.1.5" \
-V +52w \
/etc/ssh/ssh_host_ed25519_key.pub
The -h flag produces a host certificate rather than a user certificate. The principals here are hostnames and IPs the certificate is valid for. Copy the resulting ssh_host_ed25519_key-cert.pub back to the server and add to sshd_config:
HostCertificate /etc/ssh/ssh_host_ed25519_key-cert.pub
On every client machine, a single line in /etc/ssh/known_hosts or ~/.ssh/known_hosts:
@cert-authority *.example.com ssh-ed25519 AAAA...base64... host-ca-2026
Now every engineer’s SSH client automatically trusts any server at *.example.com presenting a valid host certificate. No TOFU prompts. No per-server known_hosts entries. Provision a hundred new servers, sign their host keys, and every client trusts them immediately without any client-side changes.
Rebuilding a server no longer produces the host identification warning either, as long as you sign the new host key with the same CA and the same principals.
Short-Lived Certificates and Why Revocation Becomes Optional
The -V +8h validity window in the signing command is not just a convenience feature; it fundamentally changes the security model.
With traditional SSH keys, revocation means physically removing the key from every authorized_keys file. Even with good config management, there is a window between when you decide to revoke access and when every server has applied the change. SSH does offer a Key Revocation List (KRL) mechanism via the RevokedKeys directive, but it requires distributing an updated KRL file to every server, which brings you back to a distributed state management problem.
With 8-hour certificates, the worst case for a compromised credential is 8 hours. For many threat models, that is acceptable without any revocation infrastructure at all. Stop issuing new certificates to Alice, and her access expires on its own. The CA issuance service is the single choke point for access control.
This is the model Netflix published with BLESS in 2016: certificates valid for 2 minutes, issued by a Lambda function that validates a Duo MFA token first. At that TTL, revocation is essentially a solved problem by construction.
The CA Tooling Ecosystem
A bare ssh-keygen -s invocation is sufficient for small teams. For anything larger, a managed CA service adds authentication, authorization, and audit logging around the signing operation.
HashiCorp Vault’s SSH Secrets Engine is the most widely deployed option. You configure a role that specifies allowed principals, maximum TTL, and permitted extensions. Engineers authenticate to Vault (via LDAP, OIDC, GitHub, or any other auth method) and then request a signed certificate:
vault write ssh/sign/engineers \
public_key=@$HOME/.ssh/id_ed25519.pub \
valid_principals="alice"
Vault records every signing request in its audit log, enforces the role’s TTL caps, and ties certificate issuance to the requester’s identity within whatever auth system you are using. Mapping a GitHub organization team to a set of principals takes a few lines of HCL.
Smallstep’s step-ca takes a similar approach but with an ACME-compatible CA that also issues TLS certificates, making it a single CA for both SSH and HTTPS if that is useful. It integrates with OIDC out of the box, so engineers sign in with Google or Okta and receive a short-lived certificate scoped to their identity.
Teleport goes further by embedding the CA into a bastion/proxy layer. Every SSH session goes through Teleport, which issues a fresh certificate per session and records a full session audit log. This adds operational overhead but gives you session recording and a single audit trail across all access.
For a small team or a home lab, none of this tooling is necessary. A simple shell script that calls ssh-keygen -s after checking an authorization file, served over an authenticated API, is sufficient. The CA private key on a YubiKey or an HSM-backed AWS KMS key handles the protection requirement without a full Vault deployment.
Protecting the CA Key
The CA private key is the master credential. Compromise it, and an attacker can issue certificates valid for any principal. This is the central security trade-off of the certificate model: you have simplified distributed state management at the cost of creating a high-value centralized secret.
For production deployments, the CA private key should not live on a filesystem. Vault’s seal mechanism wraps it with cloud KMS or a Shamir’s Secret Sharing scheme. For smaller setups, a hardware security key works: OpenSSH supports PKCS#11 tokens and, with recent versions, resident keys on FIDO2 devices. The CA signing operation touches the hardware key, which never releases the private key material to the host.
Separate CAs per environment reduce blast radius. A development CA compromise does not expose production access. The corresponding TrustedUserCAKeys and @cert-authority entries on production servers reference only the production CA, so development certificates are simply invalid there regardless of what principals they list.
What the sshd_config Looks Like in Practice
A complete production sshd_config for the certificate model is not dramatically different from a traditional one:
# Trust certificates signed by the user CA
TrustedUserCAKeys /etc/ssh/user_ca_key.pub
# Restrict principals per-account
AuthorizedPrincipalsFile /etc/ssh/auth_principals/%u
# Present the signed host certificate
HostCertificate /etc/ssh/ssh_host_ed25519_key-cert.pub
# Revoke compromised keys/certs if needed
# RevokedKeys /etc/ssh/revoked_keys
# Disable password auth (you were doing this already)
PasswordAuthentication no
KbdInteractiveAuthentication no
The authorized_keys fallback still works alongside TrustedUserCAKeys. You can run both during a transition period, giving certificate-based access to users who have been migrated while keeping key-based access for users or service accounts not yet converted.
The Operational Difference
The practical difference between the two models shows up most clearly in two scenarios.
When you onboard a new engineer: in the traditional model, their public key needs to reach every server they should access, via config management or direct file edits. With certificates, they generate a keypair, present the public key to the CA service, get a certificate, and can connect immediately to any server their principals grant access to. The server configuration does not change.
When you offboard someone: in the traditional model, their public key needs to be removed from every server. With certificates and short TTLs, you revoke their access in the CA service and their certificate expires within hours. No server-side changes needed, no config management run required.
The logging improvement is also worth noting. The auth.log entry for a traditional key-based login shows a public key fingerprint. The entry for a certificate login shows the key ID string you set with -I during signing, which can be a human-readable identifier like alice@example.com or a structured string containing username, team, and issuance timestamp. Audit queries become readable.
Getting Started
The path from zero to a working certificate setup is shorter than most documentation makes it appear. Generate a CA keypair, add TrustedUserCAKeys to sshd_config on a test server, sign your own public key with a short validity window, and connect. The whole process takes about five minutes and requires nothing beyond the ssh-keygen binary you already have.
Host certificates take another five minutes. Generate a host CA keypair, sign the server’s existing host key, add HostCertificate to sshd_config, and add the @cert-authority line to your known_hosts. Reconnect and verify that no host key prompt appears.
Once the basic flow works, the CA management tooling and the issuance service are engineering problems with well-established solutions. The protocol layer is stable, well-supported across all major SSH implementations, and has been available since OpenSSH 5.4 in 2010. The tooling has matured considerably since then. There is no longer a good reason to manage SSH access any other way.