HTTP is stateless. To make it stateful, websites often use session cookies. When Alice types username and password and click "log in", Alice's identity is verified and the server sends her a session cookie. In the following HTTP requests, Alice includes the session cookie in the request to identify herself, so that she does not have to enter username and password for each request. A session cookie (session ID) is a random string that acts as keys in database. When the server receives Alice's session cookie, it looks up Alice's session information corresponding to this ID in its temporary database.
There is nothing wrong with this approach, but in some cases, it might not scale well. If you have many servers, it could be annoying to have all the servers share the association between your users and the random strings. Instead, you could store more information on the browser side.
A Message Authentication Code (MAC) is a secret key algorithm that takes an input, like a hash function, but it also takes a secret key. It then produces a unique output called an authentication tag.
This process is deterministic; given the same secret key and the same message, a MAC produces the same authentication tag. Pictorially:
The interface of a message authentication code (MAC). The algorithm takes a secret key and a message, and deterministically produces a unique authentication tag. Without the key, it should be impossible to reproduce that authentication tag.
When the user logs in for the first time, you produce an authentication tag from your secret key and their username and have them store their username and the authentication tag in a cookie. Because they don't know the secret key, they won’t be able to forge a valid authentication tag for a different username.
To validate their cookie, you do the same: produce an authentication tag from your secret key and the username contained in the cookie and check if it matches the authentication tag contained in the cookie. If it matches, it must have come from you, as you were the only one who could have produced a valid authentication tag (under your secret key)
A MAC is like a private hash function that only you can compute because you know the key. In a sense, you can personalize a hash function with a key.
- MACs are resistant against forgery of authentication tags.
- An authentication tag needs to be of a minimum length to be secure.
- Messages can be replayed if authenticated naively.
- Verifying an authentication tag is prone to bugs.
Hash-based Message Authentication Code (HMAC) is a message authentication code that uses a hash function at its core. It is compatible with different hash functions, but it is mostly used in conjunction with SHA-2 (
hashlib.sha256). In Python:
key = b'secret'
message = b'this_is_a_message'
h = hmac.new(key, message, hashlib.sha256)
hmac_hexdigest = h.hexdigest()
Note that this protocol is not perfect: it allows replays. If a message and its authentication tag are replayed at a later point in time, they will still be authentic, but you’ll have no way of detecting that it is an older message being resent to you.
MACs are used in many places to ensure that the communications between two machines or two users are not tampered with. This is necessary in both cases where communications are in cleartext and where communications are encrypted.
One particularity of MACs is that they are often designed to produce bytes that look random (like hash functions). You can use this property to implement a single key to generate random numbers or to produce more keys.
Pseudorandom Function (PRF)
Imagine the set of all functions that take a variable-length input and produce a random output of a fixed size. If we could pick a function at random from this set and use it as a MAC (without a key), it would be swell. We would just have to agree on which function (kind of like agreeing on a key). Unfortunately, we can't have such a set as it is way too large, but we can emulate picking such a random function by designing something close enough: we call such constructions Pseudorandom Functions (PRFs). HMAC and most practical MACs are such constructions. They are randomized by a key argument instead. Choosing a different key is like picking a random function.
To track your users' browser sessions, you can send them a random string (associated to their metadata) or send them the metadata directly, attached with an authentication tag so that they cannot modify it.
Programming languages usually expose data structures called hash tables (also called hashmaps, dictionaries, associated arrays, and so on) that make use of noncryptographic hash functions. If a service exposes this data structure in such a way where the input of the noncryptographic hash function can be controlled by attackers, this can lead to Denial of Service (DoS) attacks, meaning that an attacker can render the service unusable. To avoid this, the noncryptographic hash function is usually randomized at the start of the program.