Randomness Testing Guide

(random.tastemaker.design)

42 points | by user070223 12 days ago ago

10 comments

seanhunter 2 days ago ago

I'm not sure whether it's still the case, but the state of the art some twenty years ago when I generated a random stream and wanted to test it, was the "diehard" suite[1] which started as an implementation of the tests suggested by Knuth in TAOCP and then was expanded from there. The version I had was in C that had been autogenerated by the gnu fortran complier from a fortran original, so the source code was even more impossible than normal quant code to understand. I understand it was superceded by "dieharder", which is a native C implementation I think.
Robert Brown has a page with a bunch of info about dieharder and statistical testing of random generators in general [2]
[1] https://en.wikipedia.org/wiki/Diehard_tests
[2] https://webhome.phy.duke.edu/~rgb/General/dieharder.php

[-]
- seanhunter 2 days ago ago
  
  Somewhat offtopic, but in case anyone is curious my random stream was a little daemon process I wrote. The problem was around 2000-ish Linux had a weakness in /dev/random where it would read from things like keyboard timings and feed them in as sources of entropy into a cryptographic sponge function that would write them into /dev/random and increase the entropy count. If you had a long-running Linux server, it didn't have anyone typing on the keyboard and the other entropy sources didn't generate that much entropy so your server would eventually run out of entropy in the pool and then would block when people tried to initiate ssl connections.
  So I wrote a little "additional entropy daemon" that would read things like CPU temperature fluctuations, signal noise on soundcards if installed etc (I forget them all exactly but there were a few), "bleach them" so they had reasonable statistical properties (eg the soundcard one before bleaching was almost all zeros with just occasional spikes in it so you want that to be normalized a bit) mix them together somewhat chaotically and then feed them into the sponge function with a relatively low entropy estimate. This meant our servers wouldn't block. I used diehard to test the randomness of the sources I was using before and after the mixing.
  The bug got fixed and people generally got comfortable using /dev/urandom rather than /dev/random, so my little process moved on to live on a farm with other daemon processes that were retired from use. I don't even have the source code any more.
- camel-cdr 2 days ago ago
  
  These days PractRand seems to be the best randomness test suite: https://pracrand.sourceforge.net/
apwheele 2 days ago ago

I don't know what NIST says, but for the tests that use Chi-square, you can look at the left tail. Basically tests that have very small Chi-square values are "too close" to the expected distribution.
This is how Fisher critiqued Mendel's experiments -- they were too perfect!
gammalost a day ago ago

Unfortunatly the site is sloppy when explaining the subject.
For example
> Let's say we have the following binary string. s=00000000000000000000 It is obviously not random since there are no ones in the string. Therefore, we must check that there are roughly an equal number of zeros and ones in the string.
guytv 2 days ago ago

If you take the UTF-8 binary for “hello world” and paste it there, it passes 4 out of 5 randomness tests.
Strange.
(0110100001100101011011000110110001101111001000000111011101101111011100100110110001100100)

[-]
- Antibabelic 2 days ago ago
  
  It is very easy for short strings to pass most of the tests.
  
  [-]
  - dominicrose 2 days ago ago
    
    Yes I tried with PHP and it failed with a size of 8800 for the Block Frequency Test, but it was fine at 880. Then I tried another random sequence of 8800 and it also failed the Autocorrelation Test.
- cnnlives9099 2 days ago ago
  
  It looks like there is some repetition in the binary representation to me. English language phrases in UTF-8 are not going to look random.
jap 2 days ago ago

There was originally a bug in the GAlib random number generator... if I remember correctly, the guy who identified it told me this was found (or demonstrated) by making a scatter plot of generated numbers an observing there was a pattern to the data.