A history about random number generation isn't complete without mentioning George Marsaglias work.
He is responsible for multiply-with-carry, xorshift (the original version), KISS (a high quality generator predating the mersene twister) , the Ziggurat algorithm, diehard
Fun fact, one of the earliest methods for generating random mumbers, the middle square method, actually still passes all moderm statistical randomness test suites, if you hook up a weyl sequence to it: https://arxiv.org/abs/1704.00358
This, the middle square weyl sequence PRNG is my favoeite PRNG, because it's simple enough to implement from memory:
uint64_t x, weyl;
uint32_t msws(void) {
x = x * x + (weyl += CONSTANT);
return x = (x >> 32) | (x << 32);
}
You just take a number, square it, advace and add the weyl sequence to it amd finally swap the lower and upper bits, using the trucated result as the output.
The CONSTANT is pretty much arbitrary, it just needs to be odd and not too regular.
A good rule of thumb is to have no repeating or zero nibbles in each group of 4 bytes, e.g. 0xB5AD4ECEDA1CE2A9.
That paper doesn't mention how many rounds it passed on the statistical tests, just that they tested 25000 seeds.
They also don't definitely state a period but 2^64 with 192 or 384 bits of state is not that impressive.
Furthermore, your version here uses only 128 bits so it is not clear to me that it is equivalent to the ones presented in the paper.
msws32() from the paper is the exact code I wrote above.
The "s = 0xb5ad4eceda1ce2a9" is not part of the state, it's the CONSTANT.
I've tested msws32 it passes TestU01s BigCrush and didn't fail in >=1 TB of PractRand (I stopped after that).
A scaled down msws16 fails PractRand after 2 GB, a msws24() variant passes >=256 GB (I stopped after that).
It's certainly not as good as more state of the art PRNGs like PCG, xoshiro, romu, sfc64, tylo64, but it is very simple and has quite high quality output, much better than any similarly simple to construct PRNG I know of.
The renaming and the having the constant be a variable confused me when skimming for the parts that I was looking for.
So, the state is 128 or 256 for the versions presented and 64 for msws16.
I don't remember if running PractRand in word mode changes the way it reports results but either way failing at 2GB would mean it failed even before going through the whole Weyl sequence although the period itself isn't necessarily reduced.
I'm not sure if the middle-square is acting as a decent non-linear scrambler on the poor adder state or if both combined manage to hold 30 bits worth of state. Swapping the adder with an lcg or lfsr on msws16 would provide an answer.
PractRand has the benefit that we can look at where and how failure happens in these reduced versions so I think the criticism ultimately stands regarding the paper.
The Ziggurat algorithm is very important and widely used. There are some side channel vulnerabilities in differential privacy applications based on the details of this algorithm.
> Since nobody had figured out any downsides to PCG's yet, everyone shrugged and said "might as well just go with that then", and that is where, as of 2019, the art currently stands. The problem is solved, and life is good.
I wonder who "everyone" was, I'm not aware of many high-profile projects adopting PCG as a default. As of 2025, several high-profile runtimes (including all the major browsers) use xorshift variants [1]
It kind of doesn’t matter if there are users - there are people still stupidly using Mersenne Twister. The point is that PCG is better than xorshift and related in that family. That other high profile applications haven’t switched is besides the point that PCG is objectively better:
> O'Neill proposes testing PRNGs by applying statistical tests to their reduced-size variants and determining the minimum number of internal state bits required to pass.[7] TestU01's BigCrush examines enough data to detect a period of 235, so even an ideal generator requires 36 bits of state to pass it. Some very poor generators can pass if given a large enough state;[8] passing despite a small state is a measure of an algorithm's quality, and shows how large a safety margin exists between that lower limit and the state size used in practical applications.
PCG-RXS-M-XS (with 32-bit output) passes BigCrush with 36 bits of state (the minimum possible), PCG-XSH-RR (pcg32() above) requires 39, and PCG-XSH-RS (pcg32_fast() above) requires 49 bits of state. For comparison, xorshift*, one of the best of the alternatives, requires 40 bits of state,[5]: 19 and Mersenne twister fails despite 19937 bits of state.[9]
IMO there's plenty of reason to use Xoshiro over PCG. the quality differences between the best xoshiro and pcg differences are minimal (especially because most languages use a 256 bit state since it makes it easier to split/jump without worrying about duplicate streams), and Xoshiro generators tend to be easier to SIMD for when you need lots of random numbers.
Much like my beloved comb sort, I use xorshift because the implementation is small and it's Good Enough. God's Own 100 SLOC PRNG would have to be near-perfect and take three clock cycles to contemplate switching.
It takes up to 20000 CPU hours to break the seed from 512 output bits with an unknown state, increment and multiplier. (the multiplier is usually fixed constant)
To me this is completely unrelated to the quality of the PRNG, because security is explicitly a non-goal of the design. A general-purpose non-cryptographically secure PRNG is evaluated primarily on speed and uniformity of output. Any other qualities can certainly be interesting, but they're orthogonal to (how I would evaluate) quality.
Right: put differently, why would you bother to select among the insecure RNGs an RNG whose "seed" was "harder" to recover? What beneficial property would that provide your system?
CSPRNGs have all of the desirable properties for the output.
All else being equal, I don't think it is possible for a trivially reversible generator to have better statistical properties than a generator whose output behaves more like a CSPRNG.
It can definitely be good enough and or faster, though.
Right, I think defaulting to a CSPRNG is a pretty sane decision, and you'd know if you had need of a non-CSPRNG RNG. But what does that say about the choice between PCG and xorshiro?
Defaulting to a CSPRNG pre-seeded with system randomness is not a bad choice per se(especially given many users don't know they need one) but current ones are much slower than the RNGs we are discussing.
If you're going to provide a non-CS one for general simulation purposes, you probably want the one that is the closest to indistinguishable from random data as you can without compromising performance, though.
Some people will have more than enough with a traditional LCG(MC isn't even using RNGs anymore) but others may be using more of the output in semantically relevant ways where it won't work.
If Xoshiro's state can be trivially recovered from a short span of the output, there is a local bias right there that PractRand lets through but that your application could accidentally uncover.
The choice is: Are the performance gains enough to justify that risk?
Why does it matter if the state can be trivially recovered? What does that have to do with the applications in which these generators are actually used? If the word "risk" applies to your situation, you can't use either xorshiro or PCG.
This is too deep to reply but if a bit is dependent on the value of a bit a couple bytes back then it is not acting randomly.
It's not about security.
I hope you can agree that if every time there is a treasure chest to the left of a door, a pink rabbit spawns on the top left of the room, that's not acting very random-like.
I'm not taking a position on the perceived added value of PCG over Xoshiro.
PRNGs are not meant to be cryptographically secure. If you don't want recoverability by all means use SHA512 or a proper CSPRNG.
But saying PRNGs are bad because there is recoverability is like saying salt is bad because it isn't sweet. PRNGs are not meant for non-recoverability and salt isn't meant to be sweet.
It's not bad because "preventing seed recovery" isn't the job of an insecure RNG. If you care about seed recovery, you must use a secure generator. There aren't degrees of security here; PCG is insecure, and (say) the LRNG or CTR-DRBG are not.
I thought this was a proper article. It was a good read. Then I start looking around at the page and was like 'where the hell am I? This is a rust crate readme?!'
This was entertaining and informative, the best kind of info. But one puzzle remains - why did the author keep mentioning slide rules as a tool that would reveal the non-randomness of some number series ?
They're using slide rule users as a stand-in for serious mathematician as opposed to people who incidentally use mathematics. It makes some sense in historical context but becomes a bit anachronistic after the invention of electronic calculators.
Slide-rule is a sort of "trope". If you need to signal to readers or your book or watchers of your movie that some character has STEM education, you give them a slide-rule. There are other ways to do this, you can make them wear white coats. White coat is more popular though, if the author used white coats, I'm sure you'd be able to get it.
Is there a good text on random number generation that someone on HN can recommend? I've read about various generators, pseudorandom and truly random, but those have always been scattered across various places, and I'm wondering if there's a good solid unified text on all of them, in terms of families of them and their underlying ideas, and their advantages and disadvantages.
I love how this is written. A lot of things nowadays on this site, if only vaguely, make me think it was written in part by an LLM, but this didn’t fall into that category. Great read, bravo!
I've always wondered, if you started recording audio, can you treat the least significant bit as random? Perhaps as an alternative to a real hardware random number generator?
I gotta think there are going to be some periodics in there that will be toggling the LSB. Like some hum from some device far away will be at the right tiny amplitude to toggle it in a predictable way. Also the ADC hardware could concievably get stuck.
The whole system breaks because someone didn't set up their pulseaudio correctly?
and what if you need 1TB of random data? With 48kHz audio you would be waiting 5000 years haha. 1MB is still more than a day
That's how secure random number generators work. Those are suitable for almost all purposes except for simulations, where you're tapping the RNG so often that its performance really matters and demands more than the cycles/bytes of even optimized cryptography gets you.
> So, just using any old LCG wasn't good enough, you had to use one made by someone with a PhD in mathematics. Donald Knuth shook his fist at the world and shouted "Hah! I told you so!", published a book on how to do it Right that most people didn't read, and then went back into his Fortress of Solitude to write TeX.
Please tell me if I'm off-base here, but something I always thought about and have been toying with is the notion that "there's no true random in this universe."
From a particle physics perspective, as an observer in the electromagnetic spectrum, we're always observing through a reference frame based on the speed of light in relation to the observed object. Because it's always in reference to a constant, c, anything perceived at random can theoretically be measured if you had the position of the observer at the time of observation, right?
Any fan of determinism would need to tackle quantum physics and what seems like unavoidable randomness in it (and there are such theories, but they offer little hope of getting around the randomness from our point of view, since they hide the order from us). The typical example of a random phenomena in nature is radioactive decay. You can't predict when any given nucleus will decay, only the probability that it will happen (which gives the half-life).
How so? I also find randomness profound but not sure what you mean but not belonging in the materialized world. Particle decay/Radiation is a pretty random process I believe?
Some confusion? I was saying "time is not material".
In my conception time is made out of events, and the events are I suppose all material, and all have probabilities. So maybe time follows inevitably from matter. But I think it exists in its own right as a phenomenon that isn't material. There are such things. Knowledge is another one.
A history about random number generation isn't complete without mentioning George Marsaglias work.
He is responsible for multiply-with-carry, xorshift (the original version), KISS (a high quality generator predating the mersene twister) , the Ziggurat algorithm, diehard
Fun fact, one of the earliest methods for generating random mumbers, the middle square method, actually still passes all moderm statistical randomness test suites, if you hook up a weyl sequence to it: https://arxiv.org/abs/1704.00358
This, the middle square weyl sequence PRNG is my favoeite PRNG, because it's simple enough to implement from memory:
You just take a number, square it, advace and add the weyl sequence to it amd finally swap the lower and upper bits, using the trucated result as the output.The CONSTANT is pretty much arbitrary, it just needs to be odd and not too regular. A good rule of thumb is to have no repeating or zero nibbles in each group of 4 bytes, e.g. 0xB5AD4ECEDA1CE2A9.
That paper doesn't mention how many rounds it passed on the statistical tests, just that they tested 25000 seeds. They also don't definitely state a period but 2^64 with 192 or 384 bits of state is not that impressive. Furthermore, your version here uses only 128 bits so it is not clear to me that it is equivalent to the ones presented in the paper.
msws32() from the paper is the exact code I wrote above. The "s = 0xb5ad4eceda1ce2a9" is not part of the state, it's the CONSTANT.
I've tested msws32 it passes TestU01s BigCrush and didn't fail in >=1 TB of PractRand (I stopped after that). A scaled down msws16 fails PractRand after 2 GB, a msws24() variant passes >=256 GB (I stopped after that).
It's certainly not as good as more state of the art PRNGs like PCG, xoshiro, romu, sfc64, tylo64, but it is very simple and has quite high quality output, much better than any similarly simple to construct PRNG I know of.
Sorry for misrepresenting it a bit.
The renaming and the having the constant be a variable confused me when skimming for the parts that I was looking for.
So, the state is 128 or 256 for the versions presented and 64 for msws16.
I don't remember if running PractRand in word mode changes the way it reports results but either way failing at 2GB would mean it failed even before going through the whole Weyl sequence although the period itself isn't necessarily reduced.
I'm not sure if the middle-square is acting as a decent non-linear scrambler on the poor adder state or if both combined manage to hold 30 bits worth of state. Swapping the adder with an lcg or lfsr on msws16 would provide an answer.
PractRand has the benefit that we can look at where and how failure happens in these reduced versions so I think the criticism ultimately stands regarding the paper.
The Ziggurat algorithm is very important and widely used. There are some side channel vulnerabilities in differential privacy applications based on the details of this algorithm.
> Since nobody had figured out any downsides to PCG's yet, everyone shrugged and said "might as well just go with that then", and that is where, as of 2019, the art currently stands. The problem is solved, and life is good.
I wonder who "everyone" was, I'm not aware of many high-profile projects adopting PCG as a default. As of 2025, several high-profile runtimes (including all the major browsers) use xorshift variants [1]
Is there a list of users of PCG?
[1] See Adoption section in https://prng.di.unimi.it/
It kind of doesn’t matter if there are users - there are people still stupidly using Mersenne Twister. The point is that PCG is better than xorshift and related in that family. That other high profile applications haven’t switched is besides the point that PCG is objectively better:
> O'Neill proposes testing PRNGs by applying statistical tests to their reduced-size variants and determining the minimum number of internal state bits required to pass.[7] TestU01's BigCrush examines enough data to detect a period of 235, so even an ideal generator requires 36 bits of state to pass it. Some very poor generators can pass if given a large enough state;[8] passing despite a small state is a measure of an algorithm's quality, and shows how large a safety margin exists between that lower limit and the state size used in practical applications. PCG-RXS-M-XS (with 32-bit output) passes BigCrush with 36 bits of state (the minimum possible), PCG-XSH-RR (pcg32() above) requires 39, and PCG-XSH-RS (pcg32_fast() above) requires 49 bits of state. For comparison, xorshift*, one of the best of the alternatives, requires 40 bits of state,[5]: 19 and Mersenne twister fails despite 19937 bits of state.[9]
IMO there's plenty of reason to use Xoshiro over PCG. the quality differences between the best xoshiro and pcg differences are minimal (especially because most languages use a 256 bit state since it makes it easier to split/jump without worrying about duplicate streams), and Xoshiro generators tend to be easier to SIMD for when you need lots of random numbers.
There are SIMD versions of PCG and most variants you find online aren’t SIMD.
The default bit generator in NumPy is PCG-64 [1].
[1] https://numpy.org/doc/stable/reference/random/bit_generators...
Much like my beloved comb sort, I use xorshift because the implementation is small and it's Good Enough. God's Own 100 SLOC PRNG would have to be near-perfect and take three clock cycles to contemplate switching.
> nobody had figured out any downsides to PCG's yet
BTW, people have broken PCG already: https://hal.science/hal-02700791/file/main.pdf
It takes up to 20000 CPU hours to break the seed from 512 output bits with an unknown state, increment and multiplier. (the multiplier is usually fixed constant)
What does it mean to "break" PCG? It's not a secure random number generator.
Seed recovery. It's not meant to be cryptographically secure, but previously nobody had reversed it.
Showing that reversal takes that many CPU hours shows how good the PRNG quality is.
To me this is completely unrelated to the quality of the PRNG, because security is explicitly a non-goal of the design. A general-purpose non-cryptographically secure PRNG is evaluated primarily on speed and uniformity of output. Any other qualities can certainly be interesting, but they're orthogonal to (how I would evaluate) quality.
Right: put differently, why would you bother to select among the insecure RNGs an RNG whose "seed" was "harder" to recover? What beneficial property would that provide your system?
CSPRNGs have all of the desirable properties for the output.
All else being equal, I don't think it is possible for a trivially reversible generator to have better statistical properties than a generator whose output behaves more like a CSPRNG.
It can definitely be good enough and or faster, though.
Right, I think defaulting to a CSPRNG is a pretty sane decision, and you'd know if you had need of a non-CSPRNG RNG. But what does that say about the choice between PCG and xorshiro?
Defaulting to a CSPRNG pre-seeded with system randomness is not a bad choice per se(especially given many users don't know they need one) but current ones are much slower than the RNGs we are discussing.
If you're going to provide a non-CS one for general simulation purposes, you probably want the one that is the closest to indistinguishable from random data as you can without compromising performance, though.
Some people will have more than enough with a traditional LCG(MC isn't even using RNGs anymore) but others may be using more of the output in semantically relevant ways where it won't work.
If Xoshiro's state can be trivially recovered from a short span of the output, there is a local bias right there that PractRand lets through but that your application could accidentally uncover.
The choice is: Are the performance gains enough to justify that risk?
Why does it matter if the state can be trivially recovered? What does that have to do with the applications in which these generators are actually used? If the word "risk" applies to your situation, you can't use either xorshiro or PCG.
This is too deep to reply but if a bit is dependent on the value of a bit a couple bytes back then it is not acting randomly.
It's not about security.
I hope you can agree that if every time there is a treasure chest to the left of a door, a pink rabbit spawns on the top left of the room, that's not acting very random-like.
I'm not taking a position on the perceived added value of PCG over Xoshiro.
Any recoverability sounds very bad.
Why shouldn’t I just use eg sha512 on the previous hash and drop half the bits?
> Any recoverability sounds very bad.
PRNGs are not meant to be cryptographically secure. If you don't want recoverability by all means use SHA512 or a proper CSPRNG.
But saying PRNGs are bad because there is recoverability is like saying salt is bad because it isn't sweet. PRNGs are not meant for non-recoverability and salt isn't meant to be sweet.
It's not bad because "preventing seed recovery" isn't the job of an insecure RNG. If you care about seed recovery, you must use a secure generator. There aren't degrees of security here; PCG is insecure, and (say) the LRNG or CTR-DRBG are not.
I thought this was a proper article. It was a good read. Then I start looking around at the page and was like 'where the hell am I? This is a rust crate readme?!'
Talk about skipping a `1.0` release. Is anyone else scratching their heads about this? Here https://hg.sr.ht/~icefox/oorandom/rev/46ca789e6bb68cfc5d838f... the project jumps from a 0.1.0 release to 9.3.0.
"The version number is inversely proportional to the likelihood that I will ever come back and do something that would require incrementing it."
LOL
This was entertaining and informative, the best kind of info. But one puzzle remains - why did the author keep mentioning slide rules as a tool that would reveal the non-randomness of some number series ?
I don’t get that part.
They're using slide rule users as a stand-in for serious mathematician as opposed to people who incidentally use mathematics. It makes some sense in historical context but becomes a bit anachronistic after the invention of electronic calculators.
Slide-rule is a sort of "trope". If you need to signal to readers or your book or watchers of your movie that some character has STEM education, you give them a slide-rule. There are other ways to do this, you can make them wear white coats. White coat is more popular though, if the author used white coats, I'm sure you'd be able to get it.
I took it as being a tongue-in-cheek way of saying "mathematicians."
Is there a good text on random number generation that someone on HN can recommend? I've read about various generators, pseudorandom and truly random, but those have always been scattered across various places, and I'm wondering if there's a good solid unified text on all of them, in terms of families of them and their underlying ideas, and their advantages and disadvantages.
https://www.pcg-random.org/posts/bounded-rands.html
From the title, I expected to see a list of recently-generated random numbers.
I got a 27 yesterday.
I love how this is written. A lot of things nowadays on this site, if only vaguely, make me think it was written in part by an LLM, but this didn’t fall into that category. Great read, bravo!
Refreshing when technical writing has a sense of style.
Read it and gain a gnawing sense of unease at how "good" things might really be at present!
If you are so inclined, you can build your own random number generator in hardware, which uses a Geiger counter: https://www.instructables.com/Arduino-True-Random-Number-Gen...
I've always wondered, if you started recording audio, can you treat the least significant bit as random? Perhaps as an alternative to a real hardware random number generator?
Yeah it would be fun to play with...
I gotta think there are going to be some periodics in there that will be toggling the LSB. Like some hum from some device far away will be at the right tiny amplitude to toggle it in a predictable way. Also the ADC hardware could concievably get stuck.
The whole system breaks because someone didn't set up their pulseaudio correctly?
and what if you need 1TB of random data? With 48kHz audio you would be waiting 5000 years haha. 1MB is still more than a day
> and what if you need 1TB of random data? With 48kHz audio you would be waiting 5000 years haha. 1MB is still more than a day
I think you dropped the "k" in "kHz" in your calculations.
This is what I come to HN for.
I did not realize xorshift no longer as favored! Permuted Congruential Generators seems very cool. https://en.wikipedia.org/wiki/Permuted_congruential_generato...
i thought they were all built on the compression functions from secure hashes these days?
That's how secure random number generators work. Those are suitable for almost all purposes except for simulations, where you're tapping the RNG so often that its performance really matters and demands more than the cycles/bytes of even optimized cryptography gets you.
Plus with a simulation, you may want to replicate runs.
> So, just using any old LCG wasn't good enough, you had to use one made by someone with a PhD in mathematics. Donald Knuth shook his fist at the world and shouted "Hah! I told you so!", published a book on how to do it Right that most people didn't read, and then went back into his Fortress of Solitude to write TeX.
Haaaaaaaaaaaaaaa, I believe every word of this.
Xorshift, LCM and hardware rdrand work just fine. PCG and Weyl are overkill.
sure, a += b is overkill
Hardware rdrand, lol! Totally broken
Please tell me if I'm off-base here, but something I always thought about and have been toying with is the notion that "there's no true random in this universe."
From a particle physics perspective, as an observer in the electromagnetic spectrum, we're always observing through a reference frame based on the speed of light in relation to the observed object. Because it's always in reference to a constant, c, anything perceived at random can theoretically be measured if you had the position of the observer at the time of observation, right?
Am I way off-base here?
Any fan of determinism would need to tackle quantum physics and what seems like unavoidable randomness in it (and there are such theories, but they offer little hope of getting around the randomness from our point of view, since they hide the order from us). The typical example of a random phenomena in nature is radioactive decay. You can't predict when any given nucleus will decay, only the probability that it will happen (which gives the half-life).
random numbers are not exactly random.
natural random numbers are not (exactly) random or artifical generated random numbers are not (exactly) random?
I dunno ... his saucy language made this very difficult to read.
Randomness is far more profound than it appears to be. Probably it doesn't even belong to the real (materialized) world.
How so? I also find randomness profound but not sure what you mean but not belonging in the materialized world. Particle decay/Radiation is a pretty random process I believe?
Possibly connecting random events to time, which is not material.
The transformation of a particle into two more basic particles is absolutely material.
Some confusion? I was saying "time is not material".
In my conception time is made out of events, and the events are I suppose all material, and all have probabilities. So maybe time follows inevitably from matter. But I think it exists in its own right as a phenomenon that isn't material. There are such things. Knowledge is another one.
Noether's Theorem states that it is fundamental to our real world - or else the apparent, fundamental symmetries of our physics are wrong.