In part one, we have talked about Symmetric and Asymmetric Encryption. In part two, we are going to talk about hash functions and digital signatures.
If you have read any previous text from the series, you can skip the next two introductory paragraphs and go to hash functions.
This series of texts aims to compose thoughts and deepen understanding of how people apply cryptography to cyber security. I am yet to find where the research will lead. That said, wondering through the technologies spawns those happy “now I get it” moments, with a sense of admiration for the developers of crypto systems.
Cryptographic algorithms can be mathematically intensive and mind taxing. Thus, I will attempt to skim the surface enough to know what, when, why, without going into details. This way will save time and enable enough fundamental understanding to pinpoint weaknesses during penetration tests and how to mitigate risks that arise.
A hash function takes as input data of any size. Hash function converts the input to a fixed size values. The value returned we call a hash, or a message digest, or a digest. Hash functions are one-way functions, which means we cannot easily convert back to the original input.
Examples of hash functions: MD4 (insecure), MD5 (insecure), SHA1(insecure), SHA256, SHS384, SHA512.
A hash algorithm is like a check digit on a credit card. We figure the last digit on a credit card by all the other digits on it. If we change one digit, the last one changes. Hash takes in entire data, and outputs a code of certain length, typically hexadecimal, based on the contents of the data. It is like a guarantee and a signature for that specific stream of bits.
Let’s say we want to transfer data from one computer to another and we must know if the data arrived intact. We could send the data multiple times and compare, but a hash can save the extra hassle. With a hash, we can detect unintentional modifications to data. A hash does not detect intentional modification, as an attacker can mislead us and generate his own hash and use it to replace original hash.
Another use of hashes is to include a pre-agreed shared secret into the message, and then hash that message, which is known as a HMAC, or Hash Message Authentication Code. HMAC provides authentication and integrity, because we have the pre-shared key and we have the hash used in combination together.
For integrity checking, a hash algorithm needs to meet three requirements.
First, a hash algorithm needs to be fast. It should be able to churn through a massive file in a second or two. It should not be overly quick or else it risks being breakable. If it is awfully slow, no one will wish to use it.
Second, a hash algorithm must be precise. Tweaking one bit in the file should produce a different hash, otherwise we could not trust the acuity checks.
Third, a hash algorithm needs to evade hash collisions and pre-image attacks. The algorithm should not allow someone to repeatedly find a value and generate a given hash. People send an immense bulk of data each day. That data can naturally have the same hash. That is tolerable, as the odds are unlikely, and we can deal with that. The problem is, if we can artificially construct a hash collision, then we can fake data signatures. Hash collision avoidance is an arms race. Historically, for many years MD5 algorithm and the initial variants of SHA algorithms were reliable, but with faster computers and new tricks we can break them. Because of this, we should avoid using straightforward hash functions for hashing passwords.
When we enter a password into a site or operating system, it is unsafe practice to store it in a database, because if an attacker compromises the database, he compromises the password.
For many years, people used MD5 to store passwords, and kept words beside their MD5. As a result, if we type MD5 hash into a search engine, it can find the word it was hashing.
Nowadays, as a defense, we mitigate the risk of password guessing by adding salt and resisting brute force attacks for substantial time by key stretching. We salt a password and hash it with the aid of a password key derivation function.
A salt is a piece of text of certain length and complexity which is combined with the original value before computing a hash. The salt itself should be random enough to generate a hash which will not exist in a pre-computed lookup table, called rainbow table. We keep a salt in plain text next to a hash value. It does not matter if an attacker can discover the salt, because it still invalidates a pre-computed lookup table.
We can also use a secret salt, called a pepper, which differs from salt that we do not store it alongside a password hash, but keep it separate in separate place, such as an application itself.
By using salts and defeating the hazard of pre-computed lookup tables, we drive an attacker to go down the route of a brute force attack.
Powerful computers make brute force attacks a serious threat, thus the idea is to slow down the hashing function to cause brute force attacks super tedious. We call this technique key stretching. Key stretching basically means iteratively applying a cryptographic hash function or a block cipher to a key. Researchers made this key derivation algorithms hardware intensive to increase the resources it takes to test each possible key. This is the present recommended way of storing passwords.
Examples of key-stretching algorithms include PBKDF2, bcrypt, or scrypt.
Moving on from hashes to digital signatures. A hash value signed with the sender’s private key produces a digital signature.
Once we digitally sign something, it provides authentication, because a sender has encrypted it with a private key, which only he owns. It provides non-repudiation, because the sender’s private key is used, and it provides integrity, because we are hashing.
We can use digital signatures with drivers in operating systems, found within certificates, to confirm that they are indeed from the person they claim to be from and that there have been no changes. When we have the digital signature, we can employ the sender’s public key to verify it and reveal the hash. Then we can take data, run it through the same hashing algorithm and compare hashes to verify the data integrity.
Hash functions provide integrity check, which means detection of accidental modifications. They serve neither confidentiality nor authentication. Some examples of hash functions are MD4, MD5, SHA, SHA1, SHA256, SHA384, SHA512. Today we should use SHA256 or above.
We can use hash functions for implementing message authentication and integrity with HMAC. We may also use them to store password hashes. Hashing passwords correctly is more complex than applying a straightforward hashing function and involves applying a salt coupled with key derivation algorithm.
A hash value encrypted with the issuer’s private key we call a digital signature. It provides authentication, non-repudiation, and integrity. And if we encrypt data and provide digital signature, we will additionally gain confidentiality.
That is it. In the next parts, we will follow up with TLS, then HTTPS, Certificate Authorities, and end with an overview of vulnerabilities and attacks on those systems.