.NET Mutterings: Random Sampling extension method

Wednesday, October 01, 2008

Random Sampling extension method

Had reason recently to select a random sample of data from a stream of elements. The amount of samples I needed to take was finite but what was unknown was the number of elements in the input stream. I managed to find various implementations of an IEnumerable extension method on the net, but all the ones I found would cache the entire stream before selecting the sample. This clearly isn't scalable as the input stream increases in size.

After a look around I found this blog article that describes how to implement Reservoir sampling without the need to keep all the items. This then allowed me to write a Random Sample extension method to IEnumerable<T> to do what I needed.

public static IEnumerable<T> RandomSample<T>(this IEnumerable<T> rows, int nSamples)
{
Random rnd = new Random();

T[] sample = new T[nSamples];

int nSamplesTaken = 0;

          foreach (T item in rows)
          {
              if (nSamplesTaken < sample.Length)
              {
                  sample[nSamplesTaken] = item;
              }
              else
              {
                  // As the amount of samples increases the probability
                  // of including a value gets less..due to the fact
                  // that it has a greater chance of surviving if it gets
                  // placed into the sample over earlier selections

                  if (rnd.Next(nSamplesTaken) < nSamples)
                  {
                      sample[rnd.Next(nSamples)] = item;
                  }
              }

nSamplesTaken++;
}

          if (nSamplesTaken >= nSamples)
          {
              return sample;
          }
          else
          {
              return sample.Take(nSamplesTaken);
          }
      }

1 comment:

D. Patrick Caldwell said...: Hey, I enjoyed your blog post. I've been writing a lot of extension methods lately myself. Sometimes you really have to be thankful to be a C# developer, you know?

Also, I was wondering if you've tried any syntax highlighters? I use one from google code and I have a pretty easy one liner you can add to your template. If you'd like to use it, feel free.; 7:14 pm

.NET Mutterings

Wednesday, October 01, 2008

Random Sampling extension method

1 comment:

Blog Archive

About Me