A simple way to pre-scramble a sequence
I have this problem where I need to try something a number of times, maybe 64 times, and each time with a different small seed, say a value between 0 and 1024. I’d like the seeds to be reasonably scrambled and I don’t want to see the same one more than once. I also need the same order every time, that’s what I mean by pre-scrambling – the scrambling isn’t dynamic, it’s hardcoded. How do I do this in a simple way without resorting to say just hard coding a list of numbers?
What I’d really like to do is use something like a random number generator but with pre-determined set parameters that I know will generate the numbers I want. The xorshift generator seems like a good candidate because it’s really simple but the quality is decent. It requires 4 parameters, x, y, z, and w, and then you can generate a sequence of values in the following way, where limit is the size of the set you want to generate:
t = x ^ (x << 11) & 0xFFFFFFFF
x = y
y = z
z = w
w = (w ^ (w >> 19) ^ t ^ (t >> 8)) & 0xFFFFFFFF
return w % limit
This is what I’d like to use – I just need some parameters. And it turns out that there are parameters that will make it behave like I want. Here’s an example,
0x31f12daf 0x6dd97342 0xda4daacd 0x41235c71: 85/256 85 (33.2% 1x)
This means: with these parameters x y z w and a limit of 256 you get 85 different values (that’s the 85 in “85/256”) if you execute the xorshift code 85 times (that’s the second 85), which means that you’re covering 33.2% of the possible values and executing the loop 1x per unique generated value. Here are some seeds for different limits that are powers of 2, that generate a pretty big set of unique values,
0x2735e8f4 0x9fcb0c3e 0xd0c18064 0x0f6b0093: 26/32 26 (81.2% 1x)
0x1ff079d6 0x54248059 0xc968e911 0x38c15c75: 39/64 39 (60.9% 1x)
0xf8efc789 0x88a40b37 0x86f01aa6 0x15384f52: 59/128 59 (46.1% 1x)
0x31f12daf 0x6dd97342 0xda4daacd 0x41235c71: 85/256 85 (33.2% 1x)
0xfd8193d0 0xc8b69d8f 0xa42b9d1c 0x7e69d727: 133/512 133 (26% 1x)
0x4617b049 0xf7a49270 0xa88b568c 0x32dff77c: 174/1024 174 (17% 1x)
As an example of how to use one of these, here’s how you’d use the first line, the one that generates values with a limit of 32, in python:
def generate():
x = 0x2735e8f4
y = 0x9fcb0c3e
z = 0xd0c18064
w = 0x0f6b0093
for i in range(0, 26):
t = (x ^ (x << 11)) & 0xFFFFFFFF
x = y
y = z
z = w
w = (w ^ (w >> 19) ^ t ^ (t >> 8)) & 0xFFFFFFFF
yield w % 32
print list(generate())
This prints
[2, 18, 9, 30, 12, 0, 19, 17, 7, 5, 14, 16, 6, 24, 15, 4, 3,
31, 10, 29, 8, 23, 11, 22, 1, 26]
which, and you can check if you want, are 26 different values between 0 and 32.
If you can tolerate some duplicates you can get better coverage. If you can only tolerate a few, say about one duplicate in 16 values, you can get,
0xf852d2f8 0xaf137287 0x6a0d4795 0x326c8a41: 28/32 29 (87.5% 1x)
0xa8683488 0x361c1383 0xbc95298a 0xe4c57646: 46/64 49 (71.9% 1.1x)
0xca30a753 0x317aa350 0x1bc8bee2 0xce4830bf: 79/128 87 (61.7% 1.1x)
0xe079cdb3 0x7cb29574 0x45eebac8 0xeb8dcefc: 132/256 147 (51.6% 1.1x)
0xf41e4441 0x3345d823 0x608ff840 0x554f5cdd: 238/512 269 (46.5% 1.1x)
0x201ceef2 0x7f49ce17 0x0bd10615 0x139bfa7b: 423/1024 485 (41.3% 1.1x)
So for example in the 256 case, if you generate 147 values with the given seed you’ll hit 132 different values between 0 and 256, and the rest will be duplicates. That’s a lot better coverage than the 85 you could hit if you didn’t allow them.
If you can tolerate even more duplicates, say you don’t mind hitting roughly one on average per unique value, you can get close to full coverage
0xa9bb0026 0xe0178ff2 0x443fd052 0x9fe0d459: 32/32 41 (100% 1.3x)
0x4c19f4c7 0x1d253873 0x5c1f6f17 0x812944fc: 64/64 106 (100% 1.7x)
0x854fbffe 0x5aa965d5 0x06ff8c3e 0xd5057351: 127/128 252 (99.2% 2x)
0xeb03a458 0x26ae990e 0x6227aaee 0x192b3cb7: 242/256 496 (94.5% 2x)
0x97bbc402 0x20470416 0x09ce179b 0xbf276b7d: 469/512 980 (91.6% 2.1x)
0x6d36254c 0x19c5f46a 0x1614da1a 0xcfadbfc8: 915/1024 1939 (89.4% 2.1x)
As an example, running the same python code from above but now with parameters from this set you get,
[19, 27, 10, 12, 18, 16, 13, 15, 24, 4, 23, 11, 28, 30, 26,
19, 24, 1, 12, 15, 29, 2, 19, 8, 9, 20, 6, 5, 17, 3, 14, 31,
22, 25, 18, 27, 8, 20, 0, 7, 21]
In this list of length 41 all numbers between 0 and 31 occur, but 8, 12, 15, 18, 20, 24, and 27 are there twice and 19 is there three times.
Finally, if you don’t care how many times you try as long as you generate all the numbers that’s possible too. At this point you probably want to keep track of which value you’ve seen, maybe with a bit vector, and just filter the raw values you generate.
0xa9bb0026 0xe0178ff2 0x443fd052 0x9fe0d459: 32/32 41 (100% 1.3x)
0x4c19f4c7 0x1d253873 0x5c1f6f17 0x812944fc: 64/64 106 (100% 1.7x)
0xf3bd4634 0xb1442411 0xd536236f 0x3beae2fb: 128/128 291 (100% 2.3x)
0x1c608f31 0x34e54bbd 0xa799c7a8 0x78af6664: 256/256 729 (100% 2.8x)
0x10bace53 0x88bc0781 0xd815fe20 0xc77bcbec: 512/512 1786 (100% 3.5x)
0xb6c93670 0xb89f9de6 0x3ba3713c 0x78bcabc7: 1024/1024 4341 (100% 4.2x)
If you look closely you’ll notice that the parameters are the same as before for some limits, that’s because we already had perfect coverage before so there’s nothing to improve. And as the set grows you can see that the number of times you have to try grows, so at 1024 you need to loop 4341 times before you’ve seen all values. And note also that the duplicates aren’t evenly distributed, the further you get the more work you have to do to get a value you haven’t seen before.
Of limited use I guess but if this is the kind of thing you need, pretty convenient. I’ve dumped the code I used to generate these in a gist.