Saturday, January 14, 2012

Shuffle in linq (part 2)

There are many times when we need to randomly sort a list or array.
The simplest idea is to use Random.Next().

Here is the code:
public static class ShuffleExtensions
{
    public static IEnumerable<tsource>
           RandomShuffle<tsource>(this IEnumerable<tsource> source)
    {
        var rand = new Random();
        return source.Select(t => new {
                Index = rand.Next(),
                Value = t })
            .OrderBy(p => p.Index)
            .Select(p => p.Value);
    }
}

The main problem with using Random.Next() is that it is not really random. Every number will be the first in the sequence based on the supplied seed. If you call it twice with the same seed (i.e. within one tick) you will get the same number.
The distribution of random numbers over your range will be very poor and a chi-square statistical test won't do too well either to test your set of numbers for a large number of iterations.
A better approach may be using System.Guid.NewGuid() function call. This returns a new GUID for each item in the array. Since GUID's are unique and non-repeating, this guarantees each item has a unique id. LINQ OrderBy will then sort the array by the list of GUID's returned.

Here is the code:
public static class ShuffleExtensions
{
    public static IEnumerable<tsource>
           RandomShuffle<tsource>(this IEnumerable<tsource> source)
    {
        return source.Select(t => new {
                Index = System.Guid.NewGuid(),
                Value = t })
            .OrderBy(p => p.Index)
            .Select(p => p.Value);
    }
}

You can write a simple test for verify the implementations.

Here is the code:
public static void TestRandomShuffle()
{
    // create and populate the original list with 1000 elements
    var l = new List<int>(100);
    for (var i = 0; i < 100; i++)
        l.Add(i);

    var shuffled = l.RandomShuffle().ToArray();
    for (var i = 0; i < 100; i++) 
        Debug.Write(i + ":" + shuffled[i].ToString() + ",");
}

As a final notice, remember that John von Neumann said:
Anyone who considers arithmetical methods of producing random digits is, of course, in a state of sin.
:)

Submit this story to DotNetKicks

1 comment:

  1. The extension method doesn't force execution. Thus, although the solution will randomize code, it will also randomize it differently in each iteration. Not a bad idea, if intended, but dangerous to use in Framework development, as well, I can see a developer scratching his head why the order of this list keeps changing if s/he is not aware of the use of this solution in a previous chain of linq, a few calls deep.

    ReplyDelete