I can't judge the veracity of the history of hash functions, but the moment it starts talking about cryptography it goes completely off the rails: it seems to indicate that finite field exponentiation o'r high degree polynomials are used in cryptographic hash functions; they are emphatically not. It presents password hashing as just applying a suggest function to the password; in practice a KDF is used, which is a completely different design space (for a start, KDFs have a tweak parameter, usually called a salt in this context). Finally, there's a haven't reference to quantum computers breaking hash functions and needing post-quantum algorithms as a result. This does brush with reality in that Grover's algorithm does theoretically eat half the first preimage resistance security level of your hash function, but even SHA256 will require 2^128 iterations on a quantum computer, which will likely never be feasible. Worse, it doesn't help at all in attacks against second perimeter resistance or collision resistance.
Considering that everything I have personal knowledge of here is obviously bunk, best ignore the rest of it too
Author here.
In the article I explicitly mention that the second part (about high degree polynomial and descrete exponentation) is based on Diffie-Hellman's 1976 paper and presents their one-way function constructions, not MD5 or SHA family (my goal is to cover the history of hashing from the beginning, so I haven't done a research on modern systems yet).
As for the quantum computing stuff, I should have stated more clearly that I'm referring only to quantum computer allowing to calculate descrete logarithms rather fast, and provided a source such as https://math.mit.edu/~shor/elecpubs.html containing a link to the paper Algorithms for quantum computation: Discrete logarithms and factoring by Peter Shor. (I'm planning on covering post-quantum cryptography after I'm done with modern algorithms)
The right way to understand modern general-purpose cryptographic hash functions (like SHA2) is just to understand block ciphers. A hash function is a block cipher's permutation core, wired to a "compression" function (much simpler than compression as typically understood; somewhat analogous to the chaining CBC does) that feeds blocks through the same permutation continuously, scrambling state as it goes.
Everything gets tweaked differently because you have different constraints and parameters for a hash function than for a block cipher (though: there were SHA3 contestants that used Rijndael/AES for the core permutation, which is attractive because it has broad hardware support), but the core doodads are basically the same.
(And of course, you can run this argument in reverse and derive a cipher from a hash function trivially. That's how Chapoly happened.)
I have a decent intuition for what a hash function does after twenty years of encountering them in the wild. I don't even know what a block cipher is. I understand hash functions less after reading this than I did before. My conclusion is that a hash function is just a block cipher in the category of endofunctors.
> just a block cipher in the category of endofunctors
A block cipher is just a keyed pseudorandom permutation! :)
Imagine that we have arranged all numbers from 0 to 115792089237316195423570985008687907853269984665640564039457584007913129639936 (2^256-1) in an array and then shuffled that array 115792089237316195423570985008687907853269984665640564039457584007913129639936 (2^256) times uniquely, each time recording the resulting shuffled array.
Here's the Go-like type:
var perm [115792089237316195423570985008687907853269984665640564039457584007913129639936]uint256
This is an unkeyed permutation (we can already build a secure hash function like SHA-3 from it, e.g. SHA-3 uses [2^1600]uint1600 permutation, while Ascon-Hash uses [2^320]uint320. The best we can do with ours is 2^127.5 collision security).
A keyed permutation takes 2^256 of these arrays, again shuffled differently and uniquely, so we have:
var keyedPerm [115792089237316195423570985008687907853269984665640564039457584007913129639936][115792089237316195423570985008687907853269984665640564039457584007913129639936]uint256
A block cipher is just P[k][p] where k and p are indexes into the arrays. Let's call k the key and p the plaintext — first we select one particular permutation using the key, and then select the resulting number (ciphertext) using the plaintext.
A simple hash function built from this keyed permutation splits the message into 256-bit numbers and then selects permutations using the message as k, and previous value as p and adds them together:
h = (some starting number called IV)
h += keyedPerm[m[0]][h]
h += keyedPerm[m[1]][h]
This is SHA-2, SHA-1, MD5, etc, and with a slight variation and a larger block cipher, BLAKE(1,2,3).
Of course, it's physically impossible to have that much memory for the arrays and it's physically impossible to shuffle that many times, so a block cipher is an approximation of that by smashing and rearranging bits, hoping to cause diffusion and confusion.
...then shuffled that array 115792089237316195423570985008687907853269984665640564039457584007913129639936 (2^256) times uniquely, each time recording the resulting shuffled array.
Correct as an analogy for block ciphers, but note that there are (2^256)! unique permutations of the input space. You're selecting an unimaginably small slice of possible keyed pseudorandom permutations.
> understand hash functions less after reading this than I did before.
Haha! Same for me (this being tptacek's comment, not the article).
It's like reading an introductory explanation of what an electric vehicle is ("a car with an electric motor that one can recharge at home") and then a comment saying "no, no, no, internally the whole powertrain is completely different, there are inverters and relays, etc."
Yes, exactly. I don't object; it's salutary to find out you don't know anything about topic X. It's just disconcerting after having read and understood an article purporting to explain topic X.
You know what they do, right, that's what you mean by having an intuition for them? Do you understand how they work? Why they're designed the way they are? I'm not saying you need to, but that's what the article is about.
I read and understood the article, including the math in it, then came here (I know, that’s the wrong order) and read your comment, and promptly decided I knew less than I did before I started. It was very much like learning to use a monad in Haskell without knowing category theory, and then reading an article about them. Just because you understand an article written for the educated general public doesn’t mean you have the vocabulary to understand experts speaking to other experts.
Yeah, I'm not vouching for the article, just saying my response to it was that the simpler explanation for cryptographic hash functions is that they're a specialized application of a block cipher core.
The job of a modern block cipher core is to take a (heavily) iterated function, figure out how to apply a single input key securely to each of those iterated rounds, thoroughly combining the key with the block of data, achieving indistinguishability from random as quickly into the sequence of rounds as possible (in the same kind of simple step process as a Rubik's Cube), while breaking structure (like linearity) that would solve for the key or the data mathematically.
Do you mean “simpler” or do you mean “more accurate”? I’m quite willing to accept your explanation as more accurate, but it is not simpler, at least if you don’t know much about modern cryptography. To understand the article, all I needed was some algebra. I think my 13-year-old could mostly get it. To try to understand your second paragraph here, I’ve spent about fifteen minutes so far looking things up (starting with the definition of “block cipher” and ending somewhere about halfway through the Wikipedia article on AES) and I have a sense of its meaning in the abstract, but if there’s a quiz tomorrow I’m in trouble.
If you really were going for “simpler” rather than “more accurate” then I regret to inform you that you have joined the “monoid in the category of endofunctors” guy in room 2501 of the xkcd building.
Both, I think? There's no way around having to learn about block cryptography, and trying to learn specifically about cryptographic hash functions without learning how a cipher works seems like a bad plan. You get these huge visualizations of the internals of SHA2 or whatever and then attempts to explain every operation in them, most of which are missing the core abstraction those operations were designed around; it's like recapitulating a block cipher from first principles.
I have always been fascinated by Hash Functions.
Modern hash functions are incredibly fast and unbelievably secure (crypto hashes).
Also, equally important is how hashes have adopted to the usecases. We are intentionally developing slow hashes (BCrypt, Argon2id) with memory, time tradeoff to slow down hash generation as a security measure. One of the fascinating corners of Computers
I can't judge the veracity of the history of hash functions, but the moment it starts talking about cryptography it goes completely off the rails: it seems to indicate that finite field exponentiation o'r high degree polynomials are used in cryptographic hash functions; they are emphatically not. It presents password hashing as just applying a suggest function to the password; in practice a KDF is used, which is a completely different design space (for a start, KDFs have a tweak parameter, usually called a salt in this context). Finally, there's a haven't reference to quantum computers breaking hash functions and needing post-quantum algorithms as a result. This does brush with reality in that Grover's algorithm does theoretically eat half the first preimage resistance security level of your hash function, but even SHA256 will require 2^128 iterations on a quantum computer, which will likely never be feasible. Worse, it doesn't help at all in attacks against second perimeter resistance or collision resistance.
Considering that everything I have personal knowledge of here is obviously bunk, best ignore the rest of it too
Author here. In the article I explicitly mention that the second part (about high degree polynomial and descrete exponentation) is based on Diffie-Hellman's 1976 paper and presents their one-way function constructions, not MD5 or SHA family (my goal is to cover the history of hashing from the beginning, so I haven't done a research on modern systems yet).
As for the quantum computing stuff, I should have stated more clearly that I'm referring only to quantum computer allowing to calculate descrete logarithms rather fast, and provided a source such as https://math.mit.edu/~shor/elecpubs.html containing a link to the paper Algorithms for quantum computation: Discrete logarithms and factoring by Peter Shor. (I'm planning on covering post-quantum cryptography after I'm done with modern algorithms)
O’r?
The right way to understand modern general-purpose cryptographic hash functions (like SHA2) is just to understand block ciphers. A hash function is a block cipher's permutation core, wired to a "compression" function (much simpler than compression as typically understood; somewhat analogous to the chaining CBC does) that feeds blocks through the same permutation continuously, scrambling state as it goes.
Everything gets tweaked differently because you have different constraints and parameters for a hash function than for a block cipher (though: there were SHA3 contestants that used Rijndael/AES for the core permutation, which is attractive because it has broad hardware support), but the core doodads are basically the same.
(And of course, you can run this argument in reverse and derive a cipher from a hash function trivially. That's how Chapoly happened.)
> just to understand block ciphers
I have a decent intuition for what a hash function does after twenty years of encountering them in the wild. I don't even know what a block cipher is. I understand hash functions less after reading this than I did before. My conclusion is that a hash function is just a block cipher in the category of endofunctors.
> just a block cipher in the category of endofunctors
A block cipher is just a keyed pseudorandom permutation! :)
Imagine that we have arranged all numbers from 0 to 115792089237316195423570985008687907853269984665640564039457584007913129639936 (2^256-1) in an array and then shuffled that array 115792089237316195423570985008687907853269984665640564039457584007913129639936 (2^256) times uniquely, each time recording the resulting shuffled array.
Here's the Go-like type:
var perm [115792089237316195423570985008687907853269984665640564039457584007913129639936]uint256
This is an unkeyed permutation (we can already build a secure hash function like SHA-3 from it, e.g. SHA-3 uses [2^1600]uint1600 permutation, while Ascon-Hash uses [2^320]uint320. The best we can do with ours is 2^127.5 collision security).
A keyed permutation takes 2^256 of these arrays, again shuffled differently and uniquely, so we have:
var keyedPerm [115792089237316195423570985008687907853269984665640564039457584007913129639936][115792089237316195423570985008687907853269984665640564039457584007913129639936]uint256
A block cipher is just P[k][p] where k and p are indexes into the arrays. Let's call k the key and p the plaintext — first we select one particular permutation using the key, and then select the resulting number (ciphertext) using the plaintext.
A simple hash function built from this keyed permutation splits the message into 256-bit numbers and then selects permutations using the message as k, and previous value as p and adds them together:
This is SHA-2, SHA-1, MD5, etc, and with a slight variation and a larger block cipher, BLAKE(1,2,3).Of course, it's physically impossible to have that much memory for the arrays and it's physically impossible to shuffle that many times, so a block cipher is an approximation of that by smashing and rearranging bits, hoping to cause diffusion and confusion.
Oops, of course, thanks for correction!
> understand hash functions less after reading this than I did before.
Haha! Same for me (this being tptacek's comment, not the article).
It's like reading an introductory explanation of what an electric vehicle is ("a car with an electric motor that one can recharge at home") and then a comment saying "no, no, no, internally the whole powertrain is completely different, there are inverters and relays, etc."
Yes, exactly. I don't object; it's salutary to find out you don't know anything about topic X. It's just disconcerting after having read and understood an article purporting to explain topic X.
You know what they do, right, that's what you mean by having an intuition for them? Do you understand how they work? Why they're designed the way they are? I'm not saying you need to, but that's what the article is about.
I read and understood the article, including the math in it, then came here (I know, that’s the wrong order) and read your comment, and promptly decided I knew less than I did before I started. It was very much like learning to use a monad in Haskell without knowing category theory, and then reading an article about them. Just because you understand an article written for the educated general public doesn’t mean you have the vocabulary to understand experts speaking to other experts.
Yeah, I'm not vouching for the article, just saying my response to it was that the simpler explanation for cryptographic hash functions is that they're a specialized application of a block cipher core.
The job of a modern block cipher core is to take a (heavily) iterated function, figure out how to apply a single input key securely to each of those iterated rounds, thoroughly combining the key with the block of data, achieving indistinguishability from random as quickly into the sequence of rounds as possible (in the same kind of simple step process as a Rubik's Cube), while breaking structure (like linearity) that would solve for the key or the data mathematically.
Do you mean “simpler” or do you mean “more accurate”? I’m quite willing to accept your explanation as more accurate, but it is not simpler, at least if you don’t know much about modern cryptography. To understand the article, all I needed was some algebra. I think my 13-year-old could mostly get it. To try to understand your second paragraph here, I’ve spent about fifteen minutes so far looking things up (starting with the definition of “block cipher” and ending somewhere about halfway through the Wikipedia article on AES) and I have a sense of its meaning in the abstract, but if there’s a quiz tomorrow I’m in trouble.
If you really were going for “simpler” rather than “more accurate” then I regret to inform you that you have joined the “monoid in the category of endofunctors” guy in room 2501 of the xkcd building.
Both, I think? There's no way around having to learn about block cryptography, and trying to learn specifically about cryptographic hash functions without learning how a cipher works seems like a bad plan. You get these huge visualizations of the internals of SHA2 or whatever and then attempts to explain every operation in them, most of which are missing the core abstraction those operations were designed around; it's like recapitulating a block cipher from first principles.
I have always been fascinated by Hash Functions. Modern hash functions are incredibly fast and unbelievably secure (crypto hashes). Also, equally important is how hashes have adopted to the usecases. We are intentionally developing slow hashes (BCrypt, Argon2id) with memory, time tradeoff to slow down hash generation as a security measure. One of the fascinating corners of Computers
Is the title a Waffle House reference?