Hash Functions

Hash Function

A hash function is any function that can be used to map data of arbitrary size to fixed-size values, called digest. In cryptography, we are interested in cryptographic hash function.
Cryptographic hash function has the following three properties:
  • Preimage resistance: No one should be able to reverse the hash function in order to recover the input given an output.
  • Second preimage resistance: Given an input and the digest it hashes to, no one should be able to find a different input that hashes to the same digest.
  • Collision resistance: No one should be able to produce two different inputs that hash to the same output.

Birthday Paradox

In probability theory, the birthday problem asks for the probability that, in a set of
randomly chosen people, at least two will share a birthday. The birthday paradox is that, counterintuitively, the probability of a shared birthday exceeds 50% in a group of only 23 people.
A birthday attack is a type of cryptographic attack that exploits the mathematics behind the birthday problem in probability theory. This attack can be used to abuse communication between two or more parties. The attack depends on the higher likelihood of collisions found between random attack attempts and a fixed degree of permutations (pigeonholes). With a birthday attack, it is possible to find a collision of a hash function in
2n=2n/2\sqrt{2^n} = 2^{n/2}
, with
being the classical preimage resistance security.
If our hash function generates random outputs of 256 bits, the space of all outputs is of size
. This means that collisions can be found with good probability after generating
digests (due to the birthday bound). This is the number we’re aiming for, and this is why hash functions at a minimum must provide 256-bit outputs.

Hash Functions in the Real World


Suppose that Alice knows some secret, call it
, and this secret can be revealed after a certain date
. She wants to show Bob that she actually knows the secret
but she can't reveal the content of
for now. In this case, a commitment scheme does the following things:
  • Alice computes
    H=Hash(S)H = Hash(S)
    , such as SHA256(M), and send
    to Bob.
  • After date
    , Alice sends Bob the secret
  • Bob computes
    and compares the result with
Commitments in cryptography generally try to achieve two properties:
  • Hiding: A commitment must hide the underlying value.
  • Binding: A commitment must hide a single value. In other words, if you commit to a value
    , you shouldn't be able to later successfully reveal a different value
Hash function provides hiding and binding if used as a commitment scheme, because:
  • Preimage resistence is equivalent to hinding.
  • Collision resistence is equivalent to binding.

Subresource integrity

CDN serves JavaScripts files but the integrity must be verified, otherwise malicious JavaScript files may be injected to clients. To counter this, web pages can use a feature called subresource integrity that allows the inclusion of a digest in the import tag:
<script src="https://code.jquery.com/jquery-2.1.4.min.js" integrity="sha256-8WqyJLuWKRBVhxXIL1jBDD7SDxU936oZkCnxQbWwJVw="></script>
Once the JavaScript file is retrieved, the browser hashes it (using SHA-256) and verifies that it corresponds to the digest that was hardcoded in the page.

Hashing Passwords

Most websites require username and password as authentication method. If a website stores user passwords in plaintext in the database, those passwords may be leaked through web attacks such as SQL injection. Therefore, websites should only store the hash of user passwords. But other problems exist with this solution.
  • Problem 1: If an attacker retrieves hashed passwords, a brute force attack or an exhaustive search (trying all possible passwords) can be undertaken. This would test each attempt against the whole database. Ideally, we would want an attacker to only be able to attack one hashed password at a time.
  • Problem 2: Hash functions are supposed to be as fast. Attackers can leverage this to brute force (many, many passwords per second). Ideally, we would have a mechanism to slow down such attacks.
The solutions are:
  • Solution 1: Append salts to user passwords before hashing. Salts are random values that are public and different for each user. Even weak passwords will become stronger because of the existence of satls.
  • Solution 2: Use password hashes, which are designed to be slow. The current state-of-the-art choice for this is Argon2.

Caveat: MD5 and SHA-1 are Deprecated

MD5 and SHA-1 were shown to be broken in 2004 and 2016, respectively, when collisions were published by different research teams.