Reservoir Sampling
Algorithm to randomly choose k samples from an INFINITE list or STREAM of data, or a list of size N, where N is UNKNOWN or large enough that the list DOESN'T FIT into main memory.

Algorithms and Data Structures: TheAlgorist.com

System Design: DistributedComputing.dev

Low Level Design: LowLevelDesign.io

Frontend Engineering: FrontendEngineering.io
Reservoir sampling is a family of randomized algorithms for choosing a simple random sample, without replacement, of k items from a population of unknown size n in a single pass over the items. The size of the population n is not known to the algorithm and is typically too large for all n items to fit into main memory.
Algorithm to randomly select k samples from n elements:
 Take an array reservoir[] of length k and initialize it with the first k elements from the given array input[] of size n.

for index i := k to (n  1) repeat:

Generate a random number. Let's say the random number is m.
If m > k : do nothing
else: swap reservoir[m] with input[i].

Generate a random number. Let's say the random number is m.
 return reservoir array.
This is a Premium content.
Please subscribe to Algorithms course to access the code.
We could also implement the algorithm recursively as shown below.
Suppose we have an algorithm that can pull a random set of k elements from an array of size (n  1). How can you use this algorithm to pull a random set of k elements from an array of size n ?
We can first pull a random set of size k, say reservoir[0...(k  1)], from the first (n  1) elements. Then, we just need to decide if array[n] should be inserted into our subset (which would require pulling out a random element from it, to keep the size k). An easy way to do this is to pick a random number m from 0 through n. If m < k, then replace reservoir[m] with array[n].
This is a Premium content.
Please subscribe to Algorithms course to access the code.
Related Chapter:
Instructor:
If you have any feedback, please use this form: https://thealgorists.com/Feedback.