{% extends 'base.html' %}

{% block title %}About SSH{% endblock %}

{% block content %}
  <div class="row full-margin"><h1>Understanding the Secure Shell Protocol (SSH)</h1></div>
{% endblock %}

{% block subcontent %}
<div class="long-form">
<p>
  In order to use our service, you will have to use the Secure Shell protocol (SSH) to connect to your capsul.
</p>
<p>
  <a href="https://en.wikipedia.org/wiki/SSH_(Secure_Shell)">SSH</a> is a very old tool, created back when the internet was a different place, with different use cases and concerns.
  In many ways, the protocol has failed to evolve to meet the needs of our 21st century global internet. 
  Instead, the users of SSH (tech heads, sysadmins, etc) have had to evolve our processes to work around SSH's limitations.
</p>
<p>
  These days, we use SSH + public-key cryptography to establish secure connections to our servers. 
  If you are not familiar with the concept of public key cryptography, cryptographic signatures, 
  or diffie-hellman key exchange, you may wish to see 
  <a href="https://en.wikipedia.org/wiki/Public-key_cryptography">the wikipedia article</a> for a refresher.
</p>

<div class="row half-margin"><h1>Public Key Crypto and Key Exchange: The TL;DR</h1></div>

<p>
  Computers can generate <b>"key pairs"</b> which consist of a public key and a private key. Given a <b>public key pair A</b>:
</p>
  <ol>
    <li>
      A computer which has access to <b>public key A</b> can encrypt data, 
      and then <b>ONLY</b> a computer which has access <b>private key A</b> can decrypt & read it
    </li>
    <li>
      Likewise, a computer which has access to <b>private key A</b> can encrypt data, 
      and any a computer which has access <b>public key A</b> can decrypt it, 
      thus <b>PROVING</b> the message must have come from someone who posesses <b>private key A</b>
    </li>
  </ol>
<p>
  Key exchange is a process in which two computers, Computer A and Computer B (often referred to as Alice and Bob)
  both create key pairs, so you have <b>key pair A</b> and <b>key pair B</b>, for a total of 4 keys:
</p>
  <ol>
    <li><b>public key A</b></li>
    <li><b>private key A</b></li>
    <li><b>public key B</b></li>
    <li><b>private key B</b></li>
  </ol>
  <p>
  In simplified terms, during a key exchange, 
</p>
  <ol>
    <li><b>computer A</b> sends <b>computer B</b> its public key</li>
    <li><b>computer B</b> sends <b>computer A</b> its public key</li>
    <li><b>computer A</b> sends <b>computer B</b> 
      a message which is encrypted with <b>computer B</b>'s public key</li>
    <li><b>computer B</b> sends <b>computer A</b> 
      a message which is encrypted with <b>computer A</b>'s public key</li>
  </ol>
<p>
  The way this process is carried out allows A and B to communicate with each-other securely, which is great, <br/><br/>

  <b><u>HOWEVER, there is a catch!!</u></b>
</p>

<p>
  When computers A and B are trying to establish a secure connection for the first time, 
  we assume that the way they communicate right now is NOT secure. That means that someone on the network 
  between A and B can read & modify
  all messages they send to each-other! You might be able to see where this is heading... 
</p>
<p>
  When <b>computer A</b> sends its public key to <b>computer B</b>, 
  someone in the middle (lets call it <b>computer E, or Eve</b>) could record that message, save it, 
  and then replace it with a forged message to <b>computer B</b> containing <b>public key E</b>
  (from a key pair that <b>computer E</b> generated).

  If this happens, when <b>computer B</b> sends an encrypted message to <b>computer A</b>, 
  B thinks that A's public key is actually <b>public key E</b>, so it will use <b>public key E</b> to encrypt.
  And again, <b>computer E</b> in the middle can intercept the message, and they can decrypt it as well 
  because they have <b>private key E</b>.
  Finally, they can relay the same message to <b>computer A</b>, this time encrypted with <b>computer A</b>'s public key. 
  This is called a <a href="https://en.wikipedia.org/wiki/Man-in-the-middle_attack">Man In The Middle (MITM)</a> attack.
</p>
<p>
  Without some additional verification method, 
  <b><u>Computer A AND Computer B can both be duped and the connection is NOT really secure</u></b>.
</p>

<div class="row half-margin"><h1>Authenticating Public Keys: A Tale of Two Protocols</h1></div>

<p>
  Now that we have seen how key exhange works, 
  and we understand that in order to prevent MITM attacks, all participants have to have a way of knowing 
  whether a given public key is authentic or not, I can explain what I meant when I said
</p>
<p>
  > [SSH] has failed to evolve to meet the needs of our 21st century global internet
</p>
<p>
  In order to explain this, let's first look at how a different, more modern protocol, 
  <a href="https://en.wikipedia.org/wiki/Transport_Layer_Security">Transport Layer Security (or TLS)</a> solved this problem. 
  TLS, (still sometimes called by its olde name "Secure Sockets Layer", or SSL) was created to enable HTTPS, to allow 
  internet users to log into web sites securely and purchase things online by entering their credit card number.
  Of course, this required security that actually works; if someone could MITM attack the connection, they could easily
  steal tons of credit card numbers and passwords.
</p>
<p>
  In order to enable this, a new standard called <a href="https://en.wikipedia.org/wiki/X.509">X.509</a> was created. 
  X.509 dictates the data format of certificates and keys (public keys and private keys), and it also defines 
  a simple and easy way to determine whether a given certificate (public key) is authentic. 
  X.509 introduced the concept of a Certificate Authority, or CA. 
  These CAs were supposed to be bank-like public institutions of power which everyone could trust. 
  The CA would create a key pair on an extremely secure computer, and then a CA Certificate (the public side of that key pair)
  would be distributed along with every copy of Windows, Mac OS, and Linux. Then folks who wanted to run a secure web server 
  could generate thier OWN key pair for thier web server, 
  and pay the CA to sign thier web server's X.509 certificate (public key) with the highly protected CA private key. 
  Critically, issue date, expiration date, and the domain name of the web server, like foo.example.com, would have to be included 
  in the x.509 certiciate along with the public key. 
  This way, when the user types https://foo.example.com into thier web browser:
</p>
  <ol>
    <li>The web browser sends a TLS ClientHello request to the server</li>
    <li>
      The server responds with a ServerHello & ServerCertificate message
      <ul>
        <li>The ServerCertificate message contains the X.509 certificate for the web server at foo.example.com</li>
      </ul>
    </li>
    <li>The web browser inspects the X.509 certificate
      <ul>
        <li>
          Is the current date in between the issued date and expiry date of the certificate?
          If not, display an <a href="https://expired.badssl.com/">EXPIRED_CERTIFICATE error</a>.
        </li>
        <li>
          Does the domain name the user typed in, foo.example.com, match the domain name in the certificate? 
          If not, display a <a href="https://wrong.host.badssl.com/">BAD_CERT_DOMAIN error</a>.
        </li>
        <li>
          Does the certificate contain a valid CA signature? 
          (can the signature on the certificate be decrypted by one of the CA Certificates included with the operating system?) 
          If not, display an <a href="https://untrusted-root.badssl.com/">UNKNOWN_ISSUER error</a>.
        </li>
      </ul>
    </li>
    <li>Assuming all the checks pass, the web browser trusts the certificate and connects</li>
  </ol>
<p>
  This system enabled the internet to grow and flourish:
  purchasing from a CA was the only way to get a valid X.509 certificate for a website, 
  and guaranteeing authenticity was in the CA's business interest. 
  The CAs kept their private keys behind razor wire and armed guards, and followed strict rules to ensure that only the right
  people got thier certificates signed. 
  Only the CAs themselves or anyone who had enough power to force them to create a fraudulent certificate 
  would be able to execute MITM attacks.
</p>
<p>
  The TLS+X.509 Certificate Authority works well for HTTP and other application protocols, because 
</p>
  <ul>
    <li>Most internet users don't have the patience to manually verify the authenticity of digital certificates.</li>
    <li>Most internet users don't understand or care how it works; they just want to connect right now.</li>
    <li>Businesses and organizations that run websites are generally willing to jump through hoops and 
      subjugate themselves to authorities in order to offer a more secure application experience to thier users.</li>
    <li>The centralization & problematic power dynamic which CAs represent 
      is easily swept under the rug, if it doesn't directly or noticably impact the average person, who cares?</li>
  </ul>

<p>
  However, this would never fly with SSH. You have to understand, SSH does not come from Microsoft, it does not come from Apple, 
  in fact, it does not even come from Linux or GNU. <a href="https://www.openssh.com/">SSH comes from BSD</a>. 
  <a href="https://en.wikipedia.org/wiki/BSD">Berkeley Software Distribution</a>. Most people don't even know 
  what BSD is. It's <i>Deep Nerdcore</i> material. The people who maintain SSH are not playing around, they would never 
  allow themselves to be subjugated by so-called "Certificate Authorities".
  So, what are they doing instead? Where is SSH at? Well, back when it was created, computer security was easy — 
  a very minimal defense was enough to deter attackers. 
  In order to help prevent these MITM attacks, instead of something like X.509, SSH employs a policy called 
  <a href="https://en.wikipedia.org/wiki/Trust_on_first_use">Trust On First Use (TOFU)</a>. 
</p>

<p> 
  The SSH client application keeps a record of every server it has ever connected to 
  in a file <span class="code">~/.ssh/known_hosts</span>.
</p>

<p> 
  (the tilde <span class="code">~</span> here represents the user's home directory, 
  <span class="code">/home/username</span> on linux, 
  <span class="code">C:\Users\username</span> on Windows, and 
  <span class="code">/Users/username</span> on MacOS). 
</p>

<p> 
  Also, note that as the <span class="code">.ssh</span> folder's name starts with a period, it is a "hidden" folder. 
  This just means that your operating system's Graphical User Interface (GUI) will not display it by default.
  All operating systems have a way to enable "Show Hidden Files" in the GUI, otherwise you can always access it via the
  command line.
</p>

<p> 
  If the user asks the SSH client to connect to a server it has never seen before, 
  it will print a prompt like this to the terminal:
</p>

  <pre class="code">The authenticity of host 'fooserver.com (69.4.20.69)' can't be established.
    ECDSA key fingerprint is SHA256:EXAMPLE1xY4JUVhYirOVlfuDFtgTbaiw3x29xYizEeU.
    Are you sure you want to continue connecting (yes/no/[fingerprint])?</pre>

<p>
  Here, the SSH client is displaying the fingerprint (<a href="https://en.wikipedia.org/wiki/SHA-2">SHA256 hash</a>) 
  of the public key provided by the server at <span class="code">fooserver.com</span>. 
  Back in the day, when SSH was created, servers lived for months to years, not minutes, and they were installed by hand. 
  So it would have been perfectly reasonable to call the person installing the server on thier 
  <a href="https://nokiamuseum.info/nokia-909/">Nokia 909</a>
  and ask them to log into it & read off the host key fingerprint over the phone.  
  After verifing that the fingerprints match in the phone call, the user would type <span class="code">yes</span> 
  to continue.
</p>

<p>
  After the SSH client connects to a server for the first time, it will record the server's IP address and public key in the 
  <span class="code">~/.ssh/known_hosts</span> file. All subsequent connections will simply check the public key 
  the server presents against the public key it has recorded in the <span class="code">~/.ssh/known_hosts</span> file. 
  If the two public keys match, the connection will continue without prompting the user, however, if they don't match, 
  the SSH client will display a scary warning message:
</p>
<pre class="code">
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@       WARNING: POSSIBLE DNS SPOOFING DETECTED!          @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
The ECDSA host key for fooserver.com has changed,
and the key for the corresponding IP address 69.4.20.42
is unknown. This could either mean that
DNS SPOOFING is happening or the IP address for the host
and its host key have changed at the same time.
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that a host key has just been changed.
The fingerprint for the ECDSA key sent by the remote host is
SHA256:EXAMPLEpDDefcNcIROtFpuTiHC1j3iNU74aaKFO03+0.
Please contact your system administrator.
Add correct host key in /root/.ssh/known_hosts to get rid of this message.
Offending ECDSA key in /root/.ssh/known_hosts:1
  remove with:
  ssh-keygen -f "/root/.ssh/known_hosts" -R "fooserver.com"
ECDSA host key for fooserver.com has changed and you have requested strict checking.
Host key verification failed.
</pre>

<p>
  This is why it's called <b>Trust On First Use</b>: 
  
  SSH protocol assumes that when you type <span class="code">yes</span> in response to the prompt during your first connection, 
  you <b>really did</b> verify that the server's public key fingerprint matches.
  
  If you type <span class="code">yes</span> here without checking the server's host key somehow, you could add an attackers public key to the trusted 
  list in your <span class="code">~/.ssh/known_hosts</span> file; if you type <span class="code">yes</span> blindly, 
  you are technically vulnerable to a man-in-the-middle attack. Such an attack could silently surviel your connection, 
  inject commands, even emulate / falsify the entire SSH session.

   Will anyone actually attack you like that? Probably not, because such an attack would be difficult to hide from someone who
   knows where to look. Personally, however, I'd rather not fuck around and find out. I'd rather find a way to prove to myself
   that my first SSH connection to a new server is secure, even if it's a potentially ephemeral virtual machine like a capsul.
</p>


<p>
  So what are technologists to do? Most cloud providers don't "provide" an easy way to get the SSH host public keys
  for instances that users create on thier platform. For example, see this 
  <a href="https://serverfault.com/questions/941915/verify-authenticity-of-ssh-host-on-digital-ocean-droplet-freebsd">
    question posted by a frustrated user trying to secure thier connection to a digitalocean droplet</a>.

  Besides using the provider's HTTPS-based console to log into the machine & directly read the public key, 
  providers also recommend using a "userdata script". 
  This script would run on boot & upload the machine's SSH public keys to an object storage system like <a href="https://www.backblaze.com/b2/cloud-storage.html">Backblaze B2</a> or 
  <del>Amazon S3</del><sup><a href="#ref_1">[1]</a></sup>, for an application to retrieve later. 
  As an example, I wrote a  
  <a href="https://git.sequentialread.com/forest/greenhouse/src/branch/master/backend.go#L1242-L1248">
    userdata script which does this</a>
  for my own automated VPS management code in  
  <a href="https://git.sequentialread.com/forest/greenhouse/">greenhouse</a>.
  Later in the process, greenhouse will 
  <a href="https://git.sequentialread.com/forest/greenhouse/src/branch/master/backend.go#L1267-L1277">
    download the public keys from the Object Storage provider 
    and add them to the ~/.ssh/known_hosts file</a>
  before finally 
  <a href="https://git.sequentialread.com/forest/greenhouse/src/branch/master/backend.go#L1297-L1313">
    invoking the ssh client against the cloud host</a>.
</p>

<p>
  Personally, I think it's disgusting and irresponsible to require users to go through that much work
  just to be able to connect to their instance securely. However, this practice appears to be an industry standard. 
  It's gross, but it's where we're at right now.
</p>

<p>
  So for <a href="https://capsul.org">capsul</a>, we obviously wanted to do better. 
  We wanted to make this kind of thing as easy as possible for the user, 
  so I'm proud to announce as of today, capsul SSH host key fingerprints will be displayed on the capsul detail page, 
  as well as the host's SSH public keys themselves in <span class="code">~/.ssh/known_hosts</span> format.
  Users can simply copy and paste these keys into thier <span class="code">~/.ssh/known_hosts</span> file and connect 
  with confidence that they are not being MITM attacked. 
</p>

<div class="row half-margin"><h1>Why ssh more ssh</h1></div>

<p>
  SSH is a relatively low-level protocol, it should be kept simple and it should not depend on anything external. 
  It has to be this way, because often times SSH is the first service that runs on a server, before any other 
  services or processes launch. SSH server has to run no matter what, because it's what we're gonna depend on to
  log in there and fix everything else which is broken! Also, SSH has to work for all computers, not just the ones which 
  have internet access or are reachable publically. 
  So, arguing that SSH should be wrapped in TLS or that SSH should use x.509 doesn't make much sense. 
</p>
<hr/>
<p>
  > ssh didn’t needed an upgrade. SSH is perfect
</p>
<hr/>
<p>
  Because of the case for absolute simplicity, I think that in a cloud based use-case
  it might even make sense to remove the TOFU and make the ssh client even less user friendly; requiring the 
  expected host key to be passed in on every command by default 
  would dramatically increase the security of real-world SSH usage.
  In order to make it more human-friendly again while keeping the security benefits,
  we can create a new layer of abstraction on top of SSH, create regime-specific automation & wrapper scripts. 
</p>
<p>
  For example, when we build a JSON API for capsul, we could also provide a <span class="code">capsul-cli</span>
  application which contains an SSH wrapper that knows how to automatically grab & inject the authentic host keys and invoke ssh
  in a single command. 
</p>

<p>
  Cheers and best wishes,<br/>
  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Forest
</p>

<hr/>
<p>
  <sup id="ref_1">[1]</sup> <a href="https://www.doitwithoutdues.com/">fuck amazon</a>
</p>

</div>
{% endblock %}

{% block pagesource %}/templates/about-ssh.html{% endblock %}