Synchronization Complexity in the .NET Framework, Part 2
Well it seems like an eternity ago but at last I'm writing the followup to my initial question about synchronization complexity.
I'd like to start with this link to a summary of the synchronization costs of nearly all of the framework. And I say nearly all because I noticed that at least three methods had a synchronization cost that overflowed a 64 bit signed integer. Assumming (and I think this is safe to do) that they only spilled into the negative regions, those three entries are
System.Windows.Forms.DataGridView.ProcessLeftKey(System.Windows.Forms.Keys)
count 10,823,468,784,974,500,000
complexity 20.0
System.Windows.Forms.DataGridView.ProcessRightKey(System.Windows.Forms.Keys)
count 10,823,468,784,974,500,000
complexity 20.0
System.Windows.Forms.Design.DataGridViewDesigner.Initialize(System.ComponentModel.IComponent)
count 9,230,518,549,563,500,000
complexity 20.0
Those are in fact the answers to the problem I posted in the original article and if you patch up those three lines in the .txt file I provided you'll have all the costs for the framework.
But what does it mean?
Well first let me remind you what the formula is. If the count is 0 then the complexity is 0. If the count is non-zero then the complexity is 1+log(count) rounded to 1 digit. So a complexity of 20 means that there were 10^19 calls to Monitor.Enter in the complete calltree of the method in question. 10^19 is a huge huge number!
OK, so what does that mean?
Well I'll tell you first what it doesn't mean. It doesn't mean that if you call that method that you can expect 10^19 calls to Monitor.Enter. Remember this is looking at the complete call graph and on any given invocation you certainly don't take both sides of every branch, ever branch of every switch and so on. You'll get some narrow slice of that calltree.
OK great, so the static count isn't the actual observed count if you call it, so what can we learn from this? Well two things.
- For methods with large complexity the reason that the complexity is so large is that the method has a very deep/wide call tree. These costs accumulate pretty quickly in those cases and so the metric provides a way of seeing just how high level the method is from a synchronization prespective or an allocation perspective.
- For methods with small complexities, the call graph is probably not that busy and the reported complexity is much more likely to be very close to the actual number of sychronizations or allocations you can expect to observe. Certainly if the cost is zero there will be no synchronizations*.
So what can you do with this? Well there are two realizations that I hope you agree with.
- For these very high level methods you can basically have no idea whatsoever what the actual cost you are going to pay is without doing experiments in context. The amount of variability could be vast (10^19 demonstrated). So clearly you must have a lot of tolerance in the context in which you use such methods, or else plenty of experimenting.
- Secondly, by comparing the complexity (either kind) of a method you have just implemented to either guidelines (like GetHashCode should have a nil complexity) or to observed patterns (like FooBar methods all have a cost of 4 but yours has a cost of 12) you can make better decisions.
Food for thought.
*For the synchronization counts I disregarded the one-time synchronizations present in the String class because they totally skew the truth and they do happen exactly once. One-time costs are the bane of static analysis
Comments
- Anonymous
March 22, 2007
The comment has been removed - Anonymous
March 28, 2007
perhaps you could alter your static analysis to look at switch and if/else constructs as sub trees and do one/several of a) treat them as independent and sum them (as now) b) treat them as equally likely and take the average c) treat them as parallel and take the worst case d) Or get very clever on the heuristics if one is vastly more expensive than the other(s) and try to determine if the more expensive one looks like a fixed one off cost (say if it sets a static boolean or assignes to a null checked guard static variable) and drop it (perhaps noting it seperately) I like d but the c is a simple solution that would give more meaningful numbers... Matt