DEV Community

Zachary Patten
Zachary Patten

Posted on

Generating Unique Random Data: A Practical Example Of Algorithm Analysis

I ran across a practical example of algorithm analysis when I wrote a method that generates unique random values inside a provided range. Figured I would share it...

The code in this article was included in my https://github.com/ZacharyPatten/Towel project. Please check it out if you want to see more code like it. :)

Note: all code examples are in C#.

Introduction

When you want to generate unique random numbers, there are two approaches you can take: 1) you can track the remaining possible values of generation or 2) you can track the values that have already been randomly selected. For variables/parameters, let count be the number of values to generate, minValue be the inclusive lower bound of generation, and maxValue be the exclusive upper bound of generation.

Algorithm #1/A

After a bit of googling, option #1 seems to be more common. You often see people throw all the possible values into an array/list and remove a randomly generated index. Here is an implementation of that algorithm:

public static IEnumerable<int> NextUniqueA(this Random random, int count, int minValue, int maxValue) { // Algorithm A: Θ(range + count) int pool = maxValue - minValue; int[] array = new int[pool]; for (int i = 0, j = minValue; j < maxValue; i++, j++) // Θ(range) array[i] = j; for (int i = 0; i < count; i++) // Θ(count) { int rollIndex = random.Next(0, pool); int roll = array[rollIndex]; array[rollIndex] = array[--pool]; yield return roll; } } 
Enter fullscreen mode Exit fullscreen mode

Algorithm #2/B

Now, here is what option #2 can look like. For this implementation, I use a linked list to store all the values that have already been generated in sorted order:

public static IEnumerable<int> NextUniqueB(this Random random, int count, int minValue, int maxValue) { // Algorithm B: O(.5*count^2), Ω(count), ε(.5*count^2) Node head = null; for (int i = 0; i < count; i++) // Θ(count) { int roll = random.Next(minValue, maxValue - i); Node node = head; Node previous = null; while (!(node is null)) // O(count / 2), Ω(0), ε(count / 2) { if (node.Value > roll) break; roll++; previous = node; node = node.Next; } yield return roll; if (previous is null) head = new Node() { Value = roll, Next = head, }; else previous.Next = new Node() { Value = roll, Next = previous.Next }; } } internal class Node { internal int Value; internal Node Next; } 
Enter fullscreen mode Exit fullscreen mode

Analyzing Algorithm Complexities

These two implementations have different runtime algorithm complexities. The #1/A option has a runtime complexity of Θ(range + count) while the #2/B option has a runtime complexity of O(.5*count^2), Ω(count), ε(.5*count^2) where O is the worst case, Θ is every case, Ω is the best case, and ε is the expected average between the worst and the best cases. Long story short, when the count is relatively low you want to use algorithm #2/B. When the count is relatively high you want to use algorithm #1/A. So if you want to make a faster all-around general purpose method, you need to combine the two approaches. But how do you do that? You need to find the intersection of the two runtime complexities of each method.

If you treat both algorithm complexities as linear functions and do some simple math to find the intesection, you will find that they intersect at about √(maxValue - minValue) the square root of the range of random generation relative to count. Now that you know the intersection point, you can combine the two algorithms into a single function that is optimized for more use cases:

Combination of #1/A and #2/B

public static IEnumerable<int> NextUnique(this Random random, int count, int minValue, int maxValue) { if (count > Math.Sqrt(maxValue - minValue)) { // Algorithm A } else { // Algorithm B } } 
Enter fullscreen mode Exit fullscreen mode

Benchmarks

Here are some benchmarks so you can compare theory with practical results. Note that for these benchmarks I used a random generation range of 0..10000. So minValue = 0 and maxValue = 10000. From the theory, I knew that the intersection of the runtime algorithms is roughly √(maxValue - minValue) = √(100000 - 0) = √(100000) = 100. So when count is less than ~100 the #2/B algorithm should be faster, but when count is greater than ~100 algorithm #1/A should be faster.

Benchmarks Source Code

Note: This code uses the BenchmarkDotNet nuget package. It also uses the Plots exporter to generate the graphs, which requires R be installed.

using BenchmarkDotNet.Attributes; using BenchmarkDotNet.Running; using System; using System.Linq; using System.Collections.Generic; using BenchmarkDotNet.Exporters; using BenchmarkDotNet.Configs; namespace UniqueRandomGeneration { class Program { static void Main(string[] args) => BenchmarkRunner.Run<Benchmarks>(); } [RPlotExporter] [Config(typeof(Config))] public class Benchmarks { public class Config : ManualConfig { public Config() => Add(RPlotExporter.Default); } const int MinValue = 0; const int MaxValue = 10000; [Params(25, 50, 75, 100, 125, 150, 175, 200, 225, 250)] public int Count; readonly Random random = new Random(); [Benchmark] public void RandomUniqueA() => random.NextUniqueA(Count, MinValue, MaxValue).ToArray(); [Benchmark] public void RandomUniqueB() => random.NextUniqueB(Count, MinValue, MaxValue).ToArray(); [Benchmark] public void RandomUnique() => random.NextUnique(Count, MinValue, MaxValue).ToArray(); } public static class Extensions { internal class Node { internal int Value; internal Node Next; } public static IEnumerable<int> NextUnique(this Random random, int count, int minValue, int maxValue) { if (maxValue < minValue) throw new ArgumentOutOfRangeException(nameof(minValue) + " > " + nameof(minValue)); if (count < 0) throw new ArgumentOutOfRangeException(nameof(count) + " < 0"); if (maxValue - minValue < count) throw new ArgumentOutOfRangeException(nameof(count) + " is larger than " + nameof(maxValue) + " - " + nameof(minValue) + "."); if (count > Math.Sqrt(maxValue - minValue)) { // Algorithm A: Θ(range + count) int pool = maxValue - minValue; int[] array = new int[pool]; for (int i = 0, j = minValue; j < maxValue; i++, j++) // Θ(range) array[i] = j; for (int i = 0; i < count; i++) // Θ(count) { int rollIndex = random.Next(0, pool); int roll = array[rollIndex]; array[rollIndex] = array[--pool]; yield return roll; } } else { // Algorithm B: O(.5*count^2), Ω(count), ε(.5*count^2) Node head = null; for (int i = 0; i < count; i++) // Θ(count) { int roll = random.Next(minValue, maxValue - i); Node node = head; Node previous = null; while (!(node is null)) // O(count / 2), Ω(0), ε(count / 2) { if (node.Value > roll) break; roll++; previous = node; node = node.Next; } yield return roll; if (previous is null) head = new Node() { Value = roll, Next = head, }; else previous.Next = new Node() { Value = roll, Next = previous.Next }; } } } public static IEnumerable<int> NextUniqueA(this Random random, int count, int minValue, int maxValue) { if (maxValue < minValue) throw new ArgumentOutOfRangeException(nameof(minValue) + " > " + nameof(minValue)); if (count < 0) throw new ArgumentOutOfRangeException(nameof(count) + " < 0"); if (maxValue - minValue < count) throw new ArgumentOutOfRangeException(nameof(count) + " is larger than " + nameof(maxValue) + " - " + nameof(minValue) + "."); // Algorithm A: Θ(range + count) int pool = maxValue - minValue; int[] array = new int[pool]; for (int i = 0, j = minValue; j < maxValue; i++, j++) // Θ(range) array[i] = j; for (int i = 0; i < count; i++) // Θ(count) { int rollIndex = random.Next(0, pool); int roll = array[rollIndex]; array[rollIndex] = array[--pool]; yield return roll; } } public static IEnumerable<int> NextUniqueB(this Random random, int count, int minValue, int maxValue) { if (maxValue < minValue) throw new ArgumentOutOfRangeException(nameof(minValue) + " > " + nameof(minValue)); if (count < 0) throw new ArgumentOutOfRangeException(nameof(count) + " < 0"); if (maxValue - minValue < count) throw new ArgumentOutOfRangeException(nameof(count) + " is larger than " + nameof(maxValue) + " - " + nameof(minValue) + "."); // Algorithm B: O(.5*count^2), Ω(count), ε(.5*count^2) Node head = null; for (int i = 0; i < count; i++) // Θ(count) { int roll = random.Next(minValue, maxValue - i); Node node = head; Node previous = null; while (!(node is null)) // O(count / 2), Ω(0), ε(count / 2) { if (node.Value > roll) break; roll++; previous = node; node = node.Next; } yield return roll; if (previous is null) head = new Node() { Value = roll, Next = head, }; else previous.Next = new Node() { Value = roll, Next = previous.Next }; } } } } 
Enter fullscreen mode Exit fullscreen mode

Benchmarks Results Table

 BenchmarkDotNet=v0.12.0, OS=Windows 10.0.18363 Intel Core i7-4790K CPU 4.00GHz (Haswell), 1 CPU, 8 logical and 4 physical cores .NET Core SDK=3.1.100 [Host] : .NET Core 3.1.0 (CoreCLR 4.700.19.56402, CoreFX 4.700.19.56404), X64 RyuJIT DefaultJob : .NET Core 3.1.0 (CoreCLR 4.700.19.56402, CoreFX 4.700.19.56404), X64 RyuJIT 
Enter fullscreen mode Exit fullscreen mode
Method Count Mean Error StdDev
RandomUniqueA 25 7.406 us 0.1576 us 0.1876 us
RandomUniqueB 25 1.307 us 0.0256 us 0.0414 us
RandomUnique 25 1.300 us 0.0046 us 0.0043 us
RandomUniqueA 50 7.761 us 0.0263 us 0.0246 us
RandomUniqueB 50 2.893 us 0.0084 us 0.0079 us
RandomUnique 50 3.022 us 0.0755 us 0.0707 us
RandomUniqueA 75 8.387 us 0.1317 us 0.1232 us
RandomUniqueB 75 5.180 us 0.1070 us 0.1001 us
RandomUnique 75 5.298 us 0.0634 us 0.0593 us
RandomUniqueA 100 9.151 us 0.1800 us 0.2340 us
RandomUniqueB 100 8.013 us 0.0994 us 0.0930 us
RandomUnique 100 8.246 us 0.1205 us 0.1127 us
RandomUniqueA 125 9.335 us 0.1533 us 0.1434 us
RandomUniqueB 125 11.291 us 0.1093 us 0.1023 us
RandomUnique 125 9.394 us 0.1773 us 0.1658 us
RandomUniqueA 150 9.509 us 0.0323 us 0.0302 us
RandomUniqueB 150 14.486 us 0.0653 us 0.0610 us
RandomUnique 150 9.355 us 0.0428 us 0.0400 us
RandomUniqueA 175 9.972 us 0.0319 us 0.0298 us
RandomUniqueB 175 18.778 us 0.0792 us 0.0741 us
RandomUnique 175 9.789 us 0.0593 us 0.0554 us
RandomUniqueA 200 10.270 us 0.0297 us 0.0278 us
RandomUniqueB 200 23.384 us 0.0623 us 0.0520 us
RandomUnique 200 10.112 us 0.0499 us 0.0466 us
RandomUniqueA 225 10.724 us 0.1754 us 0.1465 us
RandomUniqueB 225 28.748 us 0.0872 us 0.0816 us
RandomUnique 225 10.612 us 0.0430 us 0.0381 us
RandomUniqueA 250 11.144 us 0.0518 us 0.0484 us
RandomUniqueB 250 34.320 us 0.1249 us 0.1107 us
RandomUnique 250 10.891 us 0.0826 us 0.0772 us

Benchmarks Results Bar Graph

Bar Graph Edited


Probability/Validity Testing

Here is some validity testing to hopefully make sure I am not full of shit. In this code, I generate 5 values in the 0..26 range. I do this 1000000 times, and I calculate the mean that each values was randomly selected. All the values should be selected an average of 5 / 26 = ‭0.1923...‬ times.

Validity Testing Source Code

using System; using System.Linq; using System.Collections.Generic; namespace ValidityTesting { class Program { static void Main() { Console.WriteLine(nameof(Extensions.NextUnique)); PrintPriorities(CalculateProbabilities(Extensions.NextUnique)); Console.WriteLine(); Console.WriteLine(nameof(Extensions.NextUniqueA)); PrintPriorities(CalculateProbabilities(Extensions.NextUniqueA)); Console.WriteLine(); Console.WriteLine(nameof(Extensions.NextUniqueB)); PrintPriorities(CalculateProbabilities(Extensions.NextUniqueB)); Console.WriteLine(); } internal static void PrintPriorities(Dictionary<int, double> results) { foreach (var keyVal in results.OrderBy(x => x.Key)) Console.WriteLine(keyVal.Key + ": " + keyVal.Value); } internal static Dictionary<int, double> CalculateProbabilities(Func<Random, int, int, int, IEnumerable<int>> algorithm) { Random random = new Random(); const int size = 5; const int min = 0; const int max = 26; const int iterations = 1000000; var counts = new Dictionary<int, int>(); for (int i = min; i < max; i++) counts.Add(i, 0); for (int i = 0; i < iterations; i++) foreach (int j in algorithm(random, size, min, max)) counts[j]++; var results = new Dictionary<int, double>(); foreach (var keyVal in counts) results.Add(keyVal.Key, keyVal.Value / (double)iterations); return results; } } public static class Extensions { internal class Node { internal int Value; internal Node Next; } public static IEnumerable<int> NextUnique(this Random random, int count, int minValue, int maxValue) { if (maxValue < minValue) throw new ArgumentOutOfRangeException(nameof(minValue) + " > " + nameof(minValue)); if (count < 0) throw new ArgumentOutOfRangeException(nameof(count) + " < 0"); if (maxValue - minValue < count) throw new ArgumentOutOfRangeException(nameof(count) + " is larger than " + nameof(maxValue) + " - " + nameof(minValue) + "."); if (count > Math.Sqrt(maxValue - minValue)) { // Algorithm A: Θ(range + count) int pool = maxValue - minValue; int[] array = new int[pool]; for (int i = 0, j = minValue; j < maxValue; i++, j++) // Θ(range) array[i] = j; for (int i = 0; i < count; i++) // Θ(count) { int rollIndex = random.Next(0, pool); int roll = array[rollIndex]; array[rollIndex] = array[--pool]; yield return roll; } } else { // Algorithm B: O(.5*count^2), Ω(count), ε(.5*count^2) Node head = null; for (int i = 0; i < count; i++) // Θ(count) { int roll = random.Next(minValue, maxValue - i); Node node = head; Node previous = null; while (!(node is null)) // O(count / 2), Ω(0), ε(count / 2) { if (node.Value > roll) break; roll++; previous = node; node = node.Next; } yield return roll; if (previous is null) head = new Node() { Value = roll, Next = head, }; else previous.Next = new Node() { Value = roll, Next = previous.Next }; } } } public static IEnumerable<int> NextUniqueA(this Random random, int count, int minValue, int maxValue) { if (maxValue < minValue) throw new ArgumentOutOfRangeException(nameof(minValue) + " > " + nameof(minValue)); if (count < 0) throw new ArgumentOutOfRangeException(nameof(count) + " < 0"); if (maxValue - minValue < count) throw new ArgumentOutOfRangeException(nameof(count) + " is larger than " + nameof(maxValue) + " - " + nameof(minValue) + "."); // Algorithm A: Θ(range + count) int pool = maxValue - minValue; int[] array = new int[pool]; for (int i = 0, j = minValue; j < maxValue; i++, j++) // Θ(range) array[i] = j; for (int i = 0; i < count; i++) // Θ(count) { int rollIndex = random.Next(0, pool); int roll = array[rollIndex]; array[rollIndex] = array[--pool]; yield return roll; } } public static IEnumerable<int> NextUniqueB(this Random random, int count, int minValue, int maxValue) { if (maxValue < minValue) throw new ArgumentOutOfRangeException(nameof(minValue) + " > " + nameof(minValue)); if (count < 0) throw new ArgumentOutOfRangeException(nameof(count) + " < 0"); if (maxValue - minValue < count) throw new ArgumentOutOfRangeException(nameof(count) + " is larger than " + nameof(maxValue) + " - " + nameof(minValue) + "."); // Algorithm B: O(.5*count^2), Ω(count), ε(.5*count^2) Node head = null; for (int i = 0; i < count; i++) // Θ(count) { int roll = random.Next(minValue, maxValue - i); Node node = head; Node previous = null; while (!(node is null)) // O(count / 2), Ω(0), ε(count / 2) { if (node.Value > roll) break; roll++; previous = node; node = node.Next; } yield return roll; if (previous is null) head = new Node() { Value = roll, Next = head, }; else previous.Next = new Node() { Value = roll, Next = previous.Next }; } } } } 
Enter fullscreen mode Exit fullscreen mode

Validity Testing Console Output

NextUnique 0: 0.192031 1: 0.19208 2: 0.192785 3: 0.192287 4: 0.192593 5: 0.192686 6: 0.191937 7: 0.191998 8: 0.19244 9: 0.192112 10: 0.19212 11: 0.192707 12: 0.192877 13: 0.192086 14: 0.192277 15: 0.191907 16: 0.192366 17: 0.192386 18: 0.192423 19: 0.19229 20: 0.192017 21: 0.191901 22: 0.19285 23: 0.19282 24: 0.191902 25: 0.192122 NextUniqueA 0: 0.192705 1: 0.191956 2: 0.192313 3: 0.192613 4: 0.193418 5: 0.191957 6: 0.192116 7: 0.192217 8: 0.192773 9: 0.191862 10: 0.19233 11: 0.192145 12: 0.192044 13: 0.192749 14: 0.192097 15: 0.19224 16: 0.192526 17: 0.19225 18: 0.191653 19: 0.192546 20: 0.192156 21: 0.191686 22: 0.192908 23: 0.19174 24: 0.192605 25: 0.192395 NextUniqueB 0: 0.192408 1: 0.191776 2: 0.191809 3: 0.192623 4: 0.192399 5: 0.192451 6: 0.192135 7: 0.1931 8: 0.192594 9: 0.19232 10: 0.192281 11: 0.191723 12: 0.192358 13: 0.19216 14: 0.192697 15: 0.192975 16: 0.192357 17: 0.192507 18: 0.191768 19: 0.191874 20: 0.192803 21: 0.191934 22: 0.192075 23: 0.192704 24: 0.192155 25: 0.192014 
Enter fullscreen mode Exit fullscreen mode

Top comments (0)