Cryptography crash course
Modern cryptography and its applications can be easily understood at a high level by understanding its three main mechanisms: symmetric encryption, asymmetric encryption and hashing.
Symmetric cryptography
Let's start with symmetric-key cryptography, symmetric cryptography, or simply symmetric encryption, as this is what most of us will think of when talking about encryption. The term symmetric comes from the fact that the same key is used in order to encrypt and decrypt data.
Let's say you have a piece of sensitive information you want to keep safe. You can encrypt this information using a cipher, which is an algorithm used to perform encryption and decryption. In addition to your data you must also provide a secret key, and the cipher will process your data and output it in an encrypted format. Note: the key used in the diagram is a silly example, please avoid this type of patterns.
The data provided as input is said to be in plain text, and the output of the cipher is commonly known as ciphertext. Don't be misled by these terms, as this doesn't mean you can only encrypt text files. You can encrypt any file type you can think of.
If you have the ciphertext and the key, you can revert the process and get your data back. This process is known as decryption.
The most widely used symmetric-key cipher used nowadays is named AES, which stands for the Advanced Encryption Standard.
Don't lose or expose your key!
If you lose your key, you lose your data. You could try to brute force decryption, that is, trying out every possible combination for the key, but... for AES 256 it would take you roughly 13,689 trillion trillion trillion trillion years using 2 billion high-end computers.
This is how ransomware works: it encrypts your data and then asks you to pay for the key. If you don't pay, your data is pretty much gone.
Also, remember that anyone in possession of your key will be able to decrypt your data.
Ace, Beatrix, and Evo
Let's say Ace wants to send Beatrix a message over an insecure channel like the Internet, but Evo has successfully performed a Man In The Middle attack and is eavesdropping on their communication.
If the message is in plain text and Evo manages to intercept it, he will be able to read it. However, if Ace and Beatrix had previously agreed on a key, they could encrypt their messages using AES and Evo wouldn't be able to read them.
But what if they haven't performed this key exchange beforehand but they still need to achieve secrecy and privacy despite communicating over an insecure channel?
Asymmetric-key or public-key cryptography
This is where asymmetric cryptography will come in handy. It's called asymmetric-key cryptography or simply asymmetric cryptography because there are two complementary keys involved: whatever you encrypt with one can only be decrypted with the other, and vice-versa.
One is commonly referred to as a public key, while the other is known as a private key. As their names imply, the public key can be freely distributed, but the private key must remain unknown to everybody but its owner. This is also why asymmetric cryptography is also commonly known as public-key cryptography.
Anyone who knows your public key can send you encrypted data that only you will be able to decrypt.
The opposite is also true, that is, if you encrypt data with your private key, it can only be decrypted with your public key. This operation is known as signing and can be used as a form of authentication: since only you have access to your private key, only you could have encrypted that data, and your signature can be verified by decrypting it with your public key.
Back to ABE - privacy
Back to our example. If Ace wants to send a private message to Beatrix, she can now send him her public key. She could even publish it in an online forum she and Ace might frequent, or attach it to all of her emails, which is a very common practice.
The key with the green background is Beatrix's public key. Evo can also intercept it, but it doesn't matter, because well... It's public after all.
Ace can now use Beatrix's public key to encrypt the message and send it to her (red padlock).
Since Beatrix's got the matching private key (red), she can now decrypt the message.
Evo can also intercept this message, but he can't infer anything about Beatrix's private key using her public key or Ace's ciphertext, so he can't decrypt the latter.
An extremely common use case for public-key cryptography is to exchange a key that will be used for symmetric encryption, since the latter is far more efficient than the former and can make a difference, especially when having to encrypt large pieces of information. The most common way to perform this key exchange is using the Diffie-Hellman method.
The two most common variants of public-key cryptography used nowadays are the RSA algorithm and Elliptic-curve cryptography (ECC).
Don't expose or lose your private key!
Malicious users in possession of your private key will be able to decrypt anything encrypted with your public key, which was most likely private data meant for your eyes only.
They could also impersonate you since they can "forge" your digital signature, proving the ownership of the private key that only you are supposed to have access to.
If you think your private key might have been compromised, generate a new pair and spread the word so that everybody knows they should use your new public key and not the old one.
If you happen to lose access to your private key, naturally you will not be able to decrypt anything that was encrypted with your public key or use your digital signature anymore. This is the reason why many coins are lost forever in cryptocurrencies, and also why the first thing you're told to do when you create a software wallet or acquire a hardware wallet is to write down your seed phrase: a set of random words your private keys originate from. If you lose access to your wallet, you can recover it by using the seed phrase. Needless to say, you should keep these words somewhere safe, as anyone in possession of it will be able to steal all your funds.
ABE integrity
Ace and Beatrix can now keep their communication private thanks to encryption.
However, what if Evo decides to still intercept the message and mess around with it? Even worse, what if Evo knew about the format of specific messages Ace and Beatrix exchange frequently. For example, he might know that some messages start with a bank account number, and he could alter a few bytes so that it appears to be a different number once decrypted, just to mess around with Ace and Beatrix.
Ace and Beatrix now need a way to check the integrity of their messages.
Hashing
This is where hashing can help.
A hash function takes a piece of data of any size as input and outputs a fixed-length value called digest, hash value, hash code, or simply hash. The process is commonly known as hashing.
Hash functions are one-way functions, in other words, they are irreversible. This means you can infer nothing from the input using the output. This is why it is considered a best practice to store passwords digests in databases, instead of the passwords themselves (as always, there are pitfalls to be avoided, but we will discuss them in another post).
There are no keys involved in this case, although the input data is sometimes referred to as the key. The output only depends on the input, so the same input will always produce the same output.
The most widely used hash algorithm nowadays is SHA-2, which stands for secure hashing algorithm version 2. The length of the output of SHA-2 can be either 224, 256, 384, or 512 bits. SHA variants are commonly referenced by their length instead of the version, e.g. SHA-256, which is the most widespread.
An interesting property of hash functions is that any tiny change in the input will cause a dramatic change in the output.
Take a look at these two SHA-256 digests for two strings that only differ in 1 byte (1 bit actually).
0000000000000000000000000000000000000000000000000000000000000000
827d096d92f3deeaa0e8070d79f45beb176768e57a958a1cd325f5f4b754b048
0000000000000000000000000000000000000000000000000000000000000001
f774fbb3f4cc0777bc80b6e86e6bf5ab70e2875ecc4c8cf102840d801a9a74ab
This mechanism allows us to check if there's been any change in our data, which could be caused by an error, or by a malicious actor like Evo.
Back to ABE - integrity
Now that we have hashing, Ace can provide proof of integrity to Beatrix.
But is it enough with computing the hash of the message and sending it along? You might have already seen the problem with this approach: Evo can simply alter the message and recompute the hash, before forwarding it to Beatrix.
What Ace can do now, just like Beatrix did, is generating his own asymmetric key pair. Now, instead of just sending the digest of his message along, he can sign it, i.e. encrypt it with his private key. All Beatrix needs in order to verify this signature is Ace's public key.
You might also be thinking: anyone with access to the public key, including Evo, can still check the value of the hash now. Indeed, he could, and he could even recompute it, but he can't sign it, because he doesn't have Ace's private key. In other words, he can't break this authentication mechanism.
Evo could now start flipping bits like a maniac before forwarding the message to Beatrix, but thanks to these mechanisms she would realize that the channel has been compromised and would warn Ace.
Ace could have also signed the message itself by encrypting it with his private key, but it's a far more common practice to sign a digest instead, and there are two main reasons for this:
- As discussed, it provides proof of integrity.
- It's far more efficient, as it's much faster to compute a hash and then sign it than using asymmetric cryptography to encrypt a file or message that could be massive when compared to the short and fixed length of a hash.
Cryptography mechanisms and security properties - recap
- Thanks to symmetric and asymmetric encryption, Ace and Beatrix have achieved privacy.
- Thanks to hashing, they have achieved integrity.
- Thanks to public-key cryptography, in particular the signing and verifying operations, they have achieved authentication, i.e. the message is authentic and there's proof of its source or origin.
- They have also achieved non-repudiation (short for non-repudiation of origin). In this case, Ace won't be able to deny that he was the author of the message.