SSH certificates: the better SSH experience

(jpmens.net)

119 points | by jandeboevrie 7 hours ago

12 comments

thomashabets2 4 hours ago
Every couple of months someone re-discovers SSH certificates, and blogs about them.
I'm guilty of it too. My blog post from 15 years ago is nowhere near as good as OP's post, but if I though me of 15 years ago lived up to my standards of today, I'd be really disappointed: https://blog.habets.se/2011/07/OpenSSH-certificates.html
[-]
- Stefan-H 1 hour ago
  I think the scary reality is most people conflate "keys" and "certificates". I have worked with security engineers that I need to remind that we do not use SSH certs, but rather key auth, and they have to think it through to make it click.
- papyDoctor 2 hours ago
  Another useful feature of SSH certificates is that you can sign a user’s public key to grant them access to a remote machine for a limited time and as a specific remote user.
- kaoD 4 hours ago
  I've known SSH certs for a while but never went through the effort of migrating away from keys. I'm very frustrated about manually managing my SSH keys across my different servers and devices though.
  I assume you gathered a lot of thoughts over these 15 years.
  Should I invest in making the switch?
  [-]
  - anyfoo 2 hours ago
    A big problem I have with ssh carts is that they are not universally supported. For me, there is always some device or daemon (for example tinyssh in the initramfs of my gaming pc so that I can unlock it remotely) that only works with “plain old ssh keys”. And if I have to distribute and sync my keys onto a few hosts anyway, it takes away the benefits.
  - dizhn 46 minutes ago
    I am keeping an eye on the new (and alpha) Authentik agent which will allow idp based ssh logins. There's also SSSD already supported but it requires glibc (due to needing NSS) meaning it's not available on Alpine.
  - thomashabets2 2 hours ago
    If your use case is such that you are frustrated about managing keys, host or user keys, then yes it does sound like SSH certs would help you. E.g. when you have many users, servers, or high enough cartesian product of the two.
    In environment where they don't cause frustration they're not worth it.
    Not really more to it than that, from my point of view.
  - ibotty 3 hours ago
    Yes. Caveat: It might not really be worth it if all your infrastructure is managed by these newfangled infrastructure-as-code-things that are quick to roll out (OpenShift/OKD, Talos, etc.) and you have only one repo to change SSH keys (single cluster or single repo for all clusters).
    There are some serious security benefits for larger organizations but it does not sound as if you are part of one.
  - otabdeveloper4 2 hours ago
    You will have to manage your SSH CA certificates instead of your keys.
    The workflows SSH CA's are extremely janky and insecure.
    With some creative use of `AuthorizedKeysCommand` you can make SSH key rotation painless and secure.
    With SSH certificates you have to go back to the "keys to the kingdom" antipattern and just hope for the best.
    [-]
    - jamiesonbecker 1 hour ago
      Exactly. We'd had discussions about building https://Userify.com (plug!) around SSH certificates, but elected to go with keys instead, because Userify delivers most of the good things around certificates without the jank and insecurity.
      It's not that certificates themselves are insecure themselves, it's that the workflows (as the parent points out) are awful. We might still add some automation around that (and I think I saw some competitor tooling out there if you're committed to that path) but I personally feel like it's an answer to the wrong question.
- V-eHGsd_ 50 minutes ago
  oh man, I referred back to your blog post when I wrote the ssh certificate authority for $job ... ~10 years ago.
  Thank for writing it!
longislandguido 18 minutes ago
This discussion is full of schizo solutions to "secure" SSH, most of which make no practical sense or have no technical basis.
There really needs to be a definitive best practices guide published by a trusted authority.
Tepix 3 hours ago
The author lists all the advantes of CA certificates, yet doesn't list the disadvantages. OTOH, all the many steps required to set it up make the disadvantages rather obvious.
Also, I've never had a security issue due to TOFU, have you?
[-]
- zamadatix 30 minutes ago
  If you have some form of access to set up the CA config on the box before connecting then you can use the same access channel to avoid needing to rely on TOFU for setting up the key access all the same.
  This can be anything from being part of the install script to customized deployment image to physical access to access via a host in virtualized scenarios.
  TOFU only really comes into play when the box is already set up and you have no other way to load things onto the box other than connecting via SSH to do so. But, again, that would be the same story if you were intending to go the certificate approach too.
- adrian_b 2 hours ago
  TOFU is convenient, but not necessary.
  Choosing to use TOFU is a distinct choice from the choice of using the keys generated by SSH, instead of using certificates.
  If you do not want to use TOFU, for extra security, you just have to pair the computers by copying between them the corresponding public keys through a secure channel, e.g. by using a USB memory.
  Using certificates does not add any simplification or any extra security.
  For real security, you still must pair the communicating computers by copying between them the corresponding certificates, through a secure channel, e.g. a USB memory.
  When you use for HTTPS the certificates that have come with your Internet browser, you trust that the installer package for the browser has come to that computer through a secure channel from the authority that has created the certificates. This is usually an assumption much more far fetched than the assumption that you can trust TOFU between computers under your control.
  Certificates may be useful in big organizations, if other functionality is needed beyond just establishing secure communication channels, e.g. if you want to use certificate revocation.
  In the list of "advantages" enumerated in the parent article, more than half of them are false, because if certificates are implemented correctly, completely equivalent actions must be executed when SSH keys without TOFU are used and when certificates are used.
  Perhaps the author meant by writing some of the "advantages" that the actions that supposedly are no longer needed with certificates are done by an administrator, not by the user. However that is also applicable with SSH. An administrator could install the certificates, so that no action is required from the user, but an administrator can also install the SSH public keys, so that no TOFU is ever needed from the user.
  Using certificates requires exactly the same steps like using keys generated by SSH (i.e. generating certificates and copying them between computers through secure channels, to pair the servers and the authorized users), but it may need additional steps, caused by the fact that certificates provide additional functionality.
  [-]
  - gkoz 2 hours ago
    Are you pairing computers by copying certificates to visit this site?
    [-]
    - _hyn3 1 hour ago
      Touche.. actually a good point, but actually those are two different situations. With one, I'm accessing a website and trusting that the certificate is signed by someone I trust; so the trust in my browser certificates (which include certificates from hundreds of certificate authorities all over the world, any one of which could be compromised, robbed, or controlled by an adversarial person or even government) is extended to the site that I'm visiting. To say this is weak sauce rather understates how bad this actually is. (To paraphrase Churchill, this is the worst possible design, except for all the rest.)
      With the other, I'm logging into a server for the first time (and I could simply deploy the same trusted host key to all my ssh servers via an autoscaling configuration or whatever). I think it's debatable if TOFU is worse or better than your (granted clever) metaphor.
      (to those who'd recommend userify, yes - great for the client login issue and definitely increases security, but to parent's point, TOFU is still needed unless you want to distribute host pubkeys)
    - adrian_b 5 minutes ago
      Pairing is absolutely necessary for bidirectional authentication, where each party must verify the identity of the other end.
      To visit this site, there is no pairing, because the site does not know who I am.
      In order to verify the identity of the HN site, I must trust that the maintainers of the installation packages of the browsers that I use (Firefox, Vivaldi, Chromium) have ensured that the built-in certificates have reached me through a secure path. This actually requires much more trust than when someone answers "yes" to the SSH unknown host message.
      If I use certificates for accessing e.g. the network of my employer, then my work computer must be paired with some corporate server, i.e. a unique certificate has been generated for myself and it has been copied to some certificate authority server for signing and then to my computer, and also a certificate of the local certificate authority has been copied to my personal computer.
      While pairing is unavoidable for bidirectional authentication, it is not necessarily direct between the end points. Both end points must have been paired with at least one other computer but they need not have been paired between themselves previously if there exists some path through secure connections that have been originally created by pairing.
      When certificates are used, usually the pairings are not done directly between end points, but each computer must be paired with the server hosting the certificate authority.
      The term "pairing" is not used frequently, but it should have been preferred, because frequently the users do not understand which are the exact actions on which the security of their communications depend, which leads to various exploits.
      "Pairing" of 2 systems, e.g. A and B, means that some information must be transmitted through a secure channel from A to B and some other information must be transmitted through a secure channel from B to A. An alternative pairing method is to generate both pieces of information on one of the 2 systems and transmit both of them through a secure channel to the other. The information exchange channels must already be secure, because before pairing authentication is impossible.
      The pairing between a PC and the server hosting the certificate authority can be done in various ways, depending on where the PC certificate is generated. If the certificate is generated at the certificate authority than both it and the root certificate must be copied through a secure channel to the PC. If the certificate is generated on the PC, it must be sent through a secure channel to the CA for signing, then it must be sent back also through a secure channel.
      In practice, administrators are not always careful enough for the channels through which certificates are copied to be really secure.
- akerl_ 2 hours ago
  > Also, I've never had a security issue due to TOFU, have you?
  This is a bit like suggesting you've never been in a car crash, so seat belts must not be worth considering.
  Do you feel that beyond the obvious and documented work in setting them up, there are disadvantages to using SSH certificates?
  [-]
  - adrian_b 2 hours ago
    Certificates provide extra features, like revocation.
    However, if you do not need the extra features provided by certificates, using SSH-generated keys is strictly equivalent with using certificates and it requires less work.
    TOFU is neither necessary nor recommended, it is just a convenience feature, to be used when security may be lax.
    The secure way to use SSH is to never use TOFU but to pair the user and the server by copying the public keys between the 2 computers through a secure channel, e.g. either by using a USB memory or by sending the public keys through already existing authenticated encrypted links that pass through other computers. (Such a link may be a HTTPS download link.)
    When using certificates, a completely identical procedure must be used. After certificates are generated, like also after SSH keys are generated, the certificates must be copied to the client computer and the server computer through secure channels.
    [-]
    - palata 1 hour ago
      > TOFU is neither necessary nor recommended
      Just to make it clear: this does not mean that it is fine to blindly accept the message on first use.
      The "secure way" implies copying the server's public key as well, which people generally don't do, right? Which is equivalent to verifying the fingerprint shown with the TOFU message, correct?
  - otabdeveloper4 2 hours ago
    Your ISP or telecom has to be compromised for TOFU to be relevant to anything. In practice that never happens.
    [-]
    - fc417fc802 54 minutes ago
      Not just your ISP. If an attacker slipped a device onto your LAN and also you happened to be sshing to a new box for the first time then TOFU poses a problem. But that's an awfully limited attack surface. It's similar to the difference between leaking a fax while it's sent versus leaking years old emails that are just sitting there on an internet accessible server.
      As for your ISP I think you should never rely on TOFU over the public internet. If you really don't want to do ssh certs it's easy enough to make the host key available securely via https.
linsomniac 3 hours ago
In our dev/stg environment we reinstall half our machines every morning (largely to test our machine setup automation), and SSH host certificates make that so much nicer than having to persist host keys or remove/replace them in known_hosts. Highly recommended.
[-]
- grave88 2 hours ago
  [dead]
jamiesonbecker 58 minutes ago
SSH certs quietly hurt in prod. Short-lived creds + centralized CA just moves complexity upward without solving the core problem: user management.
The system shifts from many small local states to one highly coupled control point. That control point has to be correct and reachable all the time. When it isn’t, failures go wide instead of narrow.
Example: a few boxes get popped and start hammering the CA. Now what? Access is broken everywhere at once.
Common friction points:
```
     1. your signer that has to be up and correct all the time
     2. trust roots everywhere (and drifting)
     3. TTL tuning nonsense (too short = random lockouts, too long = what was the point)
     4. limited on-box state makes debugging harder than it should be
     5. failures tend to fan out instead of staying contained
```
Revocation is also kind of a lie. Just waiting for expiry and hoping that’s good enough.
What actually happens is people reintroduce state anyway: sidecars, caches, agents… because you need it.
We went the opposite direction:
```
     1. nodes pull over outbound HTTPS
     2. local authorized_keys is the source of truth locally
     3. users/roles are visible on the box
     4. drift fixes itself quickly
     5. no inbound ports, no CA signatures (WELL, not strictly true*!)
```
You still get central control, but operation and failure modes are local instead of "everyone is locked out right now."
That’s basically what we do at Userify (https://userify.com). Less elegant than certs, more survivable at 2am. Also actually handles authz, not just part of authn.
And the part that usually gets hand-waved with SSH CAs:
```
     1. creating the user account
     2. managing sudo roles
     3. deciding what happens to home directories on removal
     4. cleanup vs retention for compliance/forensics
```
Those don’t go away - they're just not part of the certificate solution.
* (TLS still exists here, just at the transport layer using the system trust store. That channel delivers users, keys, and roles. The rest is handled explicitly instead of implied.)
[-]
- ngrilly 32 minutes ago
  How do you solve TOFU?
bobo56539 2 hours ago
With the recent wave of npm hacks stealing private keys, I wanted to limit key's lifetimes.
I've set up a couple of yubikeys as SSH CAs on hosts I manage. I use them to create short lived certs (say 24h) at the start of the day. This way i only have to enter the yubikey pin once a day.
I could not find an easy way to limit maximum certificate lifetime in openssh, except for using the AuthorizedPrincipalCommand, which feels very fragile.
Does anyone else have any experience with a similar setup? How do you limit cert max lifetime?
gunapologist99 1 hour ago
Anyone tried out Userify? It creates/removes ssh pubkeys locally so (like a CA) no authn server needs to be online. But unlike certs, active sessions and processes are terminated when the user access is revoked.
[-]
- jamiesonbecker 1 hour ago
  We're in the process of updating the experience to this century! ;)
  We've always taken the stance that crusty is better than vulnerable, but it turns out that not having a modern experience after 15 years is starting to feel like maybe we need to step up the features and shininess :)
sqbic 1 hour ago
I've had very good experiences with SSH Communication Security company's (the guys who invented SSH) PrivX product to manage secure remote access, including SSH certificates and also cert based Windows authentication. It supports other kinds of remote targets too, via webui or with native clients. Great product.
moviuro 1 hour ago
All those articles about SSH certificates fall short of explaining how the revocation list can/should be published.
Is that yet another problem that I need to solve with syncthing?
https://man.openbsd.org/ssh-keygen.1#KEY_REVOCATION_LISTS
[-]
- blipvert 1 hour ago
  If you generate short lived certificates via an automated process/service then you don’t really need to manage a revocation list as they will have expired in short order.
  [-]
  - jamiesonbecker 1 hour ago
    But then you can't log in if your box goes offline for any reason.
    [-]
    - blipvert 1 hour ago
      Hmm. For user certs you can have the service sign them for, say an hour, so long as you can ssh to your server in that time then there’s no need for any other interaction.
      Sure you need your signing service to be reasonably available, but that’s easily accomplished.
      Maybe I misunderstand?
      [-]
      - jamiesonbecker 44 minutes ago
        That works for authn in the happy path: short-lived cert, grab it, connect, done.
        Except for everything around that:
        * user lifecycle (create/remove/rename accounts)
        * authz (who gets sudo, what groups, per-host differences)
        * cleanup (what happens when someone leaves)
        * visibility (what state is this box actually in right now?)
        SSH certs don’t really touch any of that. They answer can this key log in right now, not what should exist on this machine.
        So in practice, something else ends up managing users, groups, sudoers, home dirs, etc. Now there are two systems that both have to be correct.
        On the availability point: "reasonably available" is doing a lot of work ;)
        Even with 1-hour certs:
        * new sessions depend on the signer
        * fleet-wide issues hit everything at once
        * incident response gets awkward if the signer is part of the blast radius
        The failure mode shifts from a few boxes don't work to nobody can get in anywhere
        The pull model just leans the other way:
        * nodes converge to desired state
        * access continues even if control plane hiccups
        * authn and authz live together on the box
        Both models can work - it’s more about which failure mode is tolerable to you.
        [-]
        blipvert 34 minutes ago
        Well, yes, pick your poison.
        But for just getting access to role accounts then I find it a lot nicer than distributing public keys around.
        And for everything else, a periodic Ansible :-)
      - moviuro 34 minutes ago
        That sounds like a lot of extra steps. How do I validate the authenticity of a signing request? Should my signing machine be able to challenge the requester? (This means that the CA key is on a machine with network access!!)
        Replacing the distribution of a revocation list with short-lived certificates just creates other problems that are not easier to solve. (Also, 1h is bonkers, even letsencrypt doesn't do it)
jcalvinowens 2 hours ago
You can also address TOFU to some extent using SSHFP DNS records.
Openssh supports checking the DNSSEC signature in the client, in theory, but it's a configure option and I'm not sure if distros build with it.
[-]
- fc417fc802 43 minutes ago
  Any idea if there's a standardized location, something like /.well-known/ssh?
- jsiepkes 2 hours ago
  On top of that you would need something to secure DNS. Like DNSSEC or at the very least use DNS with TLS or DNS over HTTP. None of these are typically enabled by default.
  [-]
  - jcalvinowens 1 hour ago
    Anything that uses system-resolved is probably doing DNSSEC validation by default. It's becoming much more common.
    Additionally, as I mentioned, openssh itself has support for validating the DNSSEC signature even if your local resolver doesn't. I actually don't think it can use the standard resolver for SSHFP records at all, but I'm not sure.
Thom2000 3 hours ago
Sadly services such as Github don't support these so it's mostly good for internal infrastructure.
[-]
- lights0123 3 hours ago
  They do, for Enterprise customers only: https://docs.github.com/en/enterprise-cloud@latest/organizat...
  They've rolled their host key one time, so there's little reason for them to use it on the host side.
Serhii-Set 1 minute ago
[dead]