Andy's observations as he continues to attempt to know all that is .NET...

Thursday, January 31, 2008

Safe Event snippet


 

This week Im teaching the course I co-wrote with Kevin Jones "Code Smarter with Design Patterns in .NET", as part of the course we introduce the observer pattern and whilst the first part of the deck deals with the basic principles we soon move on to the prefered way of doing it in .NET using delegates/events.

When using events in a multi threaded environment care needs to be taken as whilst register/unregister of the events is thread safe, care needs to be taken when raising the event.


 

Typically we see code like this


 

Public class Subject

{

public event EventHandler<EventArgs> Alert;


 

    // ...


 

    protected virtual void OnAlert(EventArgs args)

    {

        If ( Alert != null )

        {

            Alert( this , args )

        }

    }

}


 

Here a check is being made to make sure the delegate chain has at least one instance, however whilst this works in a single threaded environment in a multithreaded environment there is a race condition between the if and the firing of the event.


 

To get around this problem, one way is to take a local copy of the delegate reference, and compare that. Since register/unregister events build new lists thus making the register/unregister thread safe. However there is a simpler approach and that is to use the null pattern


 

public event EventHandler<EventArgs> Alert = delegate { };


 

This will simple create a delegate instance at the head of the delegate chain which is never removed. Thus its then safe to simply do


 

    Alert( this , args )


 

Without checking for null, since that can't happen. Ok there is an overhead, but it certainly makes the logic simpler and less prone to occasional hard to find race conditions. Remembering to do this could be a pain so I wrote a simple snippet to do it...So now I simply type safeevent TAB TAB and I always get an event with a null handler...another great use of anonymous methods...


 


 

Friday, January 25, 2008

Not so much killer app, but killer content

It used to be the case that a computer/os would succeed if it had the killer app, the app everyone needed. The spreadsheet was one of the first of that kind, now in the new media streaming age perhaps its killer content that will help to push the adoption of new technology. A student told me last week that Microsoft has done a deal so that MSN is the official home of the 2008 Olympic, the site will provide hours and hours of online coverage, bought to you by Silverlight. Could there be any better vehicle to get Silverlight plugin onto every home/office pc..


 

Read More

Monday, January 21, 2008

The potential danger of too much access to too much data

The UK government is set to build a series of large scale databases containing children's data. The charity Action on Rights for Children has put together a three part video detailing some of their concerns to this scheme.


 

http://www.youtube.com/watch?v=LRQr2VrtX-0

 
http://www.youtube.com/watch?v=KH-1IumXZbI


http://www.youtube.com/watch?v=IyXCSg-lRkA




 

Friday, January 18, 2008

Ensure local transaction snippet

Whilst teaching developmentors enet 3 class this week I had an idea for a new snippet. It came to me whilst teaching the deck on System.Transactions, I was in the middle of demonstrating to them how you need to be really careful not to cause a transaction to be promoted to the DTC ( See Previous Blog Article ). Two strategies I suggest are


 

  1. Stop the DTC, net stop MSDTC
  2. Add an Debug.Assert statement inside your TransactionScope block to ensure the current transaction is not a distributed transaction, by checking that the DistributedIdentifier is an empty GUID.


 

The first option is reasonable if you never need the DTC, but if you do have bits of code that does need the DTC its clearly not an option. The second approach is a bit tedious ut if you use a snippet, then its not all that bad. Below is the snippet code..


 


 

<CodeSnippet
Format="1.0.0"
xmlns="http://schemas.microsoft.com/VisualStudio/2005/CodeSnippet">

<Header>

<Title>Local Transaction</Title>

<Author>Andrew Clymer</Author>

<Shortcut>lscope</Shortcut>

<Description>Creates a transaction scope and ensures at the end of the block that the transaction is only a local transaction </Description>

<SnippetTypes>

<SnippetType>SurroundsWith</SnippetType>

<SnippetType>Expansion</SnippetType>

</SnippetTypes>

</Header>

<Snippet>

<Declarations>

<Literal>

<ID>scope</ID>

<Default>scope</Default>

</Literal>

</Declarations>

<Code
Language="CSharp">

<![CDATA[

using (TransactionScope $scope$ = new TransactionScope())

{


 

$selected$

$end$


 


 

Debug.Assert(Transaction.Current.TransactionInformation.DistributedIdentifier == Guid.Empty,

"Unexpected!! Transaction is now distributed");

}


]]>

</Code>

</Snippet>

</CodeSnippet>

Sunday, January 13, 2008

Pfx, Parallel Extensions take advantage of multiple cores, but be careful...

Microsoft has now made available a parallel library for .NET, its still in early stages with its aim is to make it easier to write applications to take advantage of multiple cores. Pfx has really two layers one which offers a relatively high level of abstraction, providing a Parallel.For, Parallel.Do and parallel LINQ and a lower level layer allowing programmers to schedule individual Tasks.

This has been a long time coming, .NET support for concurrency is pretty primitive even in .NET 3.0. Simple things like kick of N tasks and wait for them all to complete requires a reasonable amount of code, but on the surface Pfx looks to assist here. I spent a couple of days over xmas looking at it and understanding how it works.


 

So first task was to find some code I wanted to parallelise I frequently use a piece of logic I stole from the book "How long is a piece of string" that uses many iterations to work out the value of PI as a good async demo, so I thought I would have a go at parallelising this piece of code.


 


private
const
int N_ITERATIONS = 1000000000;


 


private
static
double CalcPi()

{


double pi = 1.0;


double multiply = -1;


 


for (int nIteration = 3; nIteration < N_ITERATIONS; nIteration += 2)

{

pi += multiply * (1.0 / (double)nIteration);

multiply *= -1;

}


 


return pi * 4.0;

}

Looking at the algorithm it doesn't look that complicated all I need to do is to make each core do a different part of the for statement and then combine the results of each core and multiple by 4.0. In fact Ive done this using the basic BeginInvoke functionality before so I know it scales...


 

So I first opted for the high level approach offered by Pfx called Parallel.For. Which it states allows me to spread loop iterations across multiple cores. In other words nIteration in my case could be 3 on one core and 5 on another at the same time.


 

private
static
double CalcPiSimpleParallel()

{


double pi = 1.0;


 


object sync = new
object();


 


//for (int nIteration = 3; nIteration < 1000000000; nIteration += 2)


 


Parallel.For<double>(0, N_ITERATIONS / 2 , 1, () => { return 0; }, (nIteration, localPi) =>

{


double multiply = 1;


// 3 5 7 9 11


 


if (nIteration % 2 == 0 )

{

multiply = -1;

}


 

localPi.ThreadLocalState += multiply * (1.0 / (3.0 + (double)nIteration* 2.0 ));

},

(localPi) => { lock (sync) { pi += localPi; } });


 


return pi * 4.0;

}


 


 

This unfortunately this did not achieve the results I had hoped for...it in fact took twice as long...Bizare...Ok so not suprising since it is using delegates. Using reflector you see that Parallel.For is using the underlying Task class provided by Pfx.


 

A Task is a piece of work that you give to a task manager and it queue's for an available thread. Now the first thing to mention is that tasks do not run on thread pool threads, a task manager has its own pool of threads which it controls depending on the number of available cores and if the tasks running are currently blocked.


 

Task task = Task.Create((o) =>

{

            Console.WriteLine("Task Running..");

        } );


 

task.Wait();


 

The code above will create a new task, and it will be run on one of the default Task Managers threads, and the main thread will wait for its completion.


 

Parallel.For, uses a special variant of Task called a replicating task, which means when you create this task the task manager will effectively not create a single instance of the task but in fact as many instances as there are cores. Each of these task clones are then expected to work in harmony to achieve the overall task. A typical implementation might therefore have an outerloop looking for a new piece of work to do in the overall task, when no more work it simply ends the task.

In the case of a Parallel.For you could expect the outerloop to simply perform an Interlocked.Increment on the loop counter until the required value is reached, and then inside the loop to simply invoke the required piece of work, and here lies the problem in my example in that the piece of work that is executing inside the for loop is extremely trivial and doesn't take that many cycles, and so the overhead of invoking a delegate for each iteration is having a large impact on the overall performance.

So first word of warning when using Pfx, you need to make sure that the piece of work inside a given task is of a reasonable size in order for it to actually scale. Ok so it's not as trivial as it first looks to parallelise the calc Pi method. Not put off I refactored the code to make the algorithm have an inner loop, thus increasing the amount of work for each parallel for iteration. To start with I simple broke it down into ten blocks, thinking each block would now be a reasonable amount of work and we would be only paying the overhead of a function call 10 times, as opposed to N_ITERATIONS/2


private
static
double CalcPiDualLoop()

{


double pi = 1.0;


 


object sync = new
object();


int stride = N_ITERATIONS / 2 / 10;


 


//for (int nIteration = 3; nIteration < 1000000000; nIteration += 2)


 


Parallel.For<double>(0, N_ITERATIONS / 2, stride, () => { return 0; }, (nIteration, localPi) =>

{


 


double multiply = 1;


// 3 5 7 9 11


 


if (nIteration % 2 == 0)

{

multiply = -1;

}


 


double piVal = 0;


for (int val = nIteration; val < nIteration + stride ; val++)

{

piVal += multiply * (1.0 / (3.0 + (double)val * 2.0));

multiply *= -1;

}


 

localPi.ThreadLocalState += piVal;

},

(localPi) => { lock (sync) { pi += localPi; } });


 


return pi * 4.0;

}


 

But only partial luck, I got a less than 10% speed up...not what I was expecting. So I had another look at the implmentation of Parallel.For to understand why, and digging into it you realise that each of the task created by parallel for, doesn't just take the next value in the loop and work on that, but takes the next eight values. So what this means is that in my case I have an 2 core CPU, the first task to run takes the first 8 iterations of the loop to work on, and the second task gets just 2. The second task completes its work, and since there is no more ends, leaving the first task to complete its remaining 6 all running on one core, leaving my other core cold.

To make sure I wasn't seeing things I made the number of itterations of Parallel a factor of 8 * number of cores, and hey presto it took just over half the time.


 

I can see that taking chunks of work at a time reduces contention for work allocation, but Im struggling to see why this would be neccessary since this contention will not be that great especically when in order to make Parallel.For to work efficiently each iteration needs to be of a reasonable size. I really hope they address this issue when they finally release the library, since having to know how many cores you are running on and magic number 8 just seems rather odd and goes against what I believe they were trying to achieve.


 

I also implemented the method using the low level parallel task library and that scales beautifully too..


 

private
static
double ParallelCalcPi()

{


object sync = new
object();


int nIteration = 3;


double pi = 1;


int stride = N_ITERATIONS / 4;


 


Task task = Task.Create((o) =>

{


Console.WriteLine("Running..{0}", Thread.CurrentThread.ManagedThreadId);


 


double localPi =0.0;


double multiply = -1;


 


long end;


while ((end = Interlocked.Add(ref nIteration, stride)) <= N_ITERATIONS + 3)

{


//Console.WriteLine("Next Chunk {0} from {1} to {2}", Thread.CurrentThread.ManagedThreadId,


// end - stride, end);


 


for (int val = (int) (end - stride); val < end; val += 2)

{

localPi += multiply * (1.0 / (double)val);

multiply *= -1;

}

}


 


lock (sync) { pi += localPi; }


 

}, TaskCreationOptions.SelfReplicating);


 


 

task.Wait();


 


return pi * 4.0;

}


 

First impressions is that the Task Library seems ok, but the Parallel.For seems ill thought out. I have also done some experiments with Parallel.Do which allows me to kick of multiple tasks, by supplying a series of Action delegates as parameters. Parallel.Do then blocks waiting for them to complete. Unfortuantly no timeout parameter is present whih means it waits forever, so if an action goes into a rogue state and never completes, your thread that invoked Parallel.Do will hang, not desirable....So my second request is that they add a timeout parameter.


 

About Me

My photo
Im a freelance consultant for .NET based technology. My last real job, was at Cisco System were I was a lead architect for Cisco's identity solutions. I arrived at Cisco via aquisition and prior to that worked in small startups. The startup culture is what appeals to me, and thats why I finally left Cisco after seven years.....I now filll my time through a combination of consultancy and teaching for Developmentor...and working on insane startups that nobody with an ounce of sense would look twice at...