1.Introduction.
Reservoir Sampling is an online algorithm designed for picking up small amount of sample with uniform probability. To be specific, given a sample of size K with N items processed so far, the chance for any item to be selected is K/N. When the next item comes in, current sample has a chance to survive K/N*N/(N+1)=K/(N+1) while the new item has chance K/(N+1) to be selected.
The mathematical prove is as follows:
Reservoir sample algorithm works as follows:
- Create an array samples[0..k-1] and copy first k items of stream[] to it.
 
- When a new data comes in (with index greater than k - 1), generate a random number from 0 to i where i is index of current item in stream[].
 
- Generated another random number j with in range 0 to k-1, replace samples[j] with the new data.
 
The code for this aglroithm is also straightforward.
import random  
from typing import List,Generator  
class ReservoirSampling:  
    def __init__(self, capacity: int) -> None:  
        self._capacity:int = capacity  
        self._samples:List[int] = []  
        self._pointer:int = 0  
    def reservoir_sampling(self) -> Generator[List[int], int, None]:  
        while True:  
            next_val:int = yield self._samples  
            if len(self._samples) < self._capacity:  
                self._samples.append(next_val)  
            else:  
                rand_pos: int = random.randint(0,self._pointer)  # [0, M - 1]  
                # probability = K / M  
                if rand_pos < self._capacity: # belonging to [0, K - 1]  
                    self._samples[rand_pos] = next_val  
            self._pointer += 1   
if __name__ == '__main__':
    rs:ReservoirSampling = ReservoirSampling(5)  
    rs_generator = rs.reservoir_sampling()  
    print("init list:{}".format(next(rs_generator)))  
    for x in range(10):  
        print(rs_generator.send(x)) 
Result:
init list:[]
[0]
[0, 1]
[0, 1, 2]
[0, 1, 2, 3]
[0, 1, 2, 3, 4]
[0, 1, 5, 3, 4]
[6, 1, 5, 3, 4]
[6, 7, 5, 3, 4]
[6, 7, 5, 3, 8]
[6, 7, 9, 3, 8]