Condividi tramite


What is a collection?

Admitted, we blew it in the first version of the framework with System.Collections.ICollection, which is next to useless. But we fixed it up pretty well when generics came along in .NET framework 2.0: System.Collections.Generic.ICollection<T> lets you Add and Remove elements, enumerate them, Count them and check for membership.

Obviously from then on, everyone would implement ICollection<T> every time they make a collection, right? Not so. Here is how we used LINQ to learn about what collections really are, and how that made us change our language design in C# 3.0.

Collection initializers

With LINQ, the **L**anguage **IN**tegrated **Q**uery framework that we're shipping in Orcas, we're enabling a more expression-oriented style of programming. For instance it should be possible to create and intialize an object within one expression. For collections, initialization typically amounts to adding an initial set of elements. Hence collection initializers in C# 3.0 look like this:

  new MyNames { "Luke Hoban", "Karen Liu", "Charlie Calvert" }

The meaning of this new syntax is simply to create an instance of MyNames using its no-arg constructor (constructor arguments can be supplied if necessary) and call its Add method with each of the strings.

So what types do we allow collection initializers on? Easy: collection types. What are those? Obvious: types that implement ICollection<T>. This is a nice and easy design - ICollection<T> ensures that you have an Add method so obviously that is the one that gets called for each element in the collection initializer. It is strongly typed, too - the initializer can contain only elements of the appropriate element type. In the above new expression, MyNames would be a class that implements ICollection<string> and everything works smoothly from there.

There's just one problem: Nobody implements ICollection<T>!

LINQ to LINQ

Well, nobody is a strong word. But we did an extensive study of our own framework classes, and found only a few that did. How? Using LINQ of course. The following query does the trick:

  from name in assemblyNames

  select Assembly.LoadWithPartialName(name) into a

  from c in a.GetTypes()

  where c.IsPublic &&

     c.GetConstructors().Any(m => m.IsPublic) &&

     GetInterfaceTypes(c).Contains(typeof(ICollection<>))

  select c.FullName;

Let’s go through this query a little bit and see what it does. For each name in a list of assemblyNames that we pre-baked for the purpose, load up the corresponding assembly:

  from name in assemblyNames

  select Assembly.LoadWithPartialName(name)

One at a time, put the reflection objects representing these assemblies into a, and for each assembly a run through the types c defined in there:

  from c in a.GetTypes()

Filter through, keeping each type only if it

a) IsPublic

b) has Any constructor that IsPublic

c) implements ICollection<T> for some T:

  where c.IsPublic &&

     c.GetConstructors().Any(m => m.IsPublic) &&

     GetInterfaceTypes(c).Contains(typeof(ICollection<>))

For those that pass this test, select out their full name:

  select c.FullName;

Nothing to it, really.

What is a collection?

What did we find then? Only 14 of our own (public) classes (with public constructors) implement ICollection<T>! Obviously there are a lot more collections in the framework, so it was clear that we needed some other way of telling whether something is a collection class. LINQ to the rescue once more: With modified versions of the query it was easy to establish that among our public classes with public constructors there are:

· 189 that have a public Add method and implement System.Collections.IEnumerable

· 42 that have a public Add method but do not implement System.Collections.IEnumerable

If you look at the classes returned by these two queries, you realize that there are essentially two fundamentally different meanings of the name “Add”:

a) Insert the argument into a collection, or

b) Return the arithmetic sum of the argument and the receiver.

People are actually very good at (directly or indirectly) implementing the nongeneric IEnumerable interface when writing collection classes, so that turns out to be a pretty reliable indicator of whether an Add method is the first or the second kind. Thus for our purposes the operational answer to the headline question becomes:

A collection is a type that implements IEnumerable and has a public Add method

Which Add to call?

We ain’t done yet, though. Further LINQ queries over the 189 collection types identified above show:

· 28 collection types have more than one Add method

· 30 collection types have no Add method with just one argument

So, given that our collection initializers are supposed to call “the” Add method which one should they call? It seems that there will be some value in collection initializers allowing you to:

a) choose which overload to call

b) call Add methods with more than one argument

Our resolution to this is to refine our understanding of collection initializers a little bit. The list you provide is not a “list of elements to add”, but a “list of sets of arguments to Add methods”. If an entry in the list consists of multiple arguments to an Add method, these are enclosed in { curly braces }. This is actually immensely useful. For example, it allows you to Add key/value pairs to a dictionary, something we have had a number of requests for as a separate feature.

The initializer list does not have to be homogenous; we do separate overload resolution against Add methods for each entry in the list.

So given a collection class

public class Plurals : IDictionary<string,string> {

  public void Add(string singular, string plural); // implements IDictionary<string,string>.Add

  public void Add(string singular); // appends an “s” to the singular form

  public void Add(KeyValuePair<string,string> pair); // implements ICollection<KeyValuePair<string,string>>.Add

  …

}

We can write the following collection initializer:

Plurals myPlurals = new Plurals{ “collection”, { “query”, “queries” }, new KeyValuePair(“child”, “children”) };

which would make use of all the different Add methods on our collection class.

Is this right?

The resulting language design is a “pattern based” approach. We rely on users using a particular name for their methods in a way that is not checked by the compiler when they write it. If they go and change the name of Add to AddPair in one assembly, the compiler won’t complain about that, but instead about a collection initializer sitting somewhere else suddenly missing an overload to call.

Here I think it is instructive to look at our history. We already have pattern-based syntax in C# - the foreach pattern. Though not everybody realizes it, you can actually write a class that does not implement IEnumerable and have foreach work over it; as long as it contains a GetEnumerator method. What happens though is that people overwhelmingly choose to have the compiler help them keep it right by implementing the IEnumerable interface. In the same way we fully expect people to recognize the additional benefit of implementing ICollection<T> in the future – not only can your collection be initialized, but the compiler checks it too. So while we are currently in a situation where very few classes implement ICollection<T> this is likely to change over time, and with the new tweaks to our collection initializer design we hope to have ensured that the feature adds value both now and in that future.

Comments

  • Anonymous
    October 17, 2006
    The C# team needs your help debugging the new Visual Studio 2005 Service Pack 1 Beta . I've written about

  • Anonymous
    October 17, 2006
    You've been kicked (a good thing) - Trackback from DotNetKicks.com

  • Anonymous
    October 17, 2006
    The comment has been removed

  • Anonymous
    October 17, 2006
    Another advantage to examining the interface method if possible is the ability for non-English developers to construct friendlier native language methods for their objects.

  • Anonymous
    October 22, 2006
    Welcome to the eighth installment of Community Convergence . This week let's focus on two C# Wikis available

  • Anonymous
    October 27, 2006
    >> If an entry in the list consists of multiple arguments to an Add method, these are enclosed in { curly braces }. To me, curly brackets have always said one of two things in c#: array contents and code blocks. So it's a bit confusing to me why you chose curly brackets to indicate parameter tuples being sent to a method, when clearly the logical choice would have been the simple parenthesis: Plurals myPlurals = new Plurals { “collection”, ( “query”, “queries” ), new KeyValuePair(“child”, “children”) };

  • Anonymous
    October 31, 2006
    The comment has been removed

  • Anonymous
    October 31, 2006
    Which tools I need to play with LINQ?

  • Anonymous
    October 31, 2006
    You could make it a lot cleaner by implementing a KeyValuePair operator, that creates a KeyValuePair-instance, like in Smalltalk: var pair = key -> value; // same as: var pair = new KeyValuePair(key, value); Then you can remove the curly braces from your initializer: var p = new Plurals { "mouse" -> "mice", "child" -> "children", "fish" -> "fish" };

  • Anonymous
    October 31, 2006
    >> parentheses are also already heavily used; for grouping, invoking, casting etc. You mention invoking and this is exactly the situation here - you're invoking an overload. You can say that the pure mathematical definition of a function dictates that the above pair is a tuple and that the language doesn't have them, but the fact remains that this is a method call and not a type initialization (I'm talking about only the query/queries part) which would make the use of curly brackets inconsistent with the rest of the language (which uses curly brackets for initialization: anonymous types, collection initializers, array initializers, anonymous methods, etc.).

  • Anonymous
    November 02, 2006
    When implenting ICollection<T> I have also been adding AddNew(T t).  This takes the passed item and makes a new copy to put in the array.  It would seem this is preferable for a constuction call as I would assume the braced items could be changed later.  If the litorals were added with the normal Add() then the complier should complain when the program attemps to change a read only value.

  • Anonymous
    November 03, 2006
    Curious, why not just inherit your collection class (or the abstract version of your collection, if so inclined) from List<T> - and get all this IEnumerable related cool stuff for basically free?

  • Anonymous
    November 06, 2006
    When you identified the 'collections' in your framework that did not implement ICollection<T>... why didn't you fix that? Is it because inheriting and implementing ICollection<T> is too costly/breaking?

  • Anonymous
    November 06, 2006
    Interesting article. [syntax niggle] [question w obvious answer] Hope this comment proved my cleverness!

  • Anonymous
    November 14, 2006
    Mads Torgersen (have I got it right? you never wrote your full name) wrote last month in his weblogabout

  • Anonymous
    November 14, 2006
    Mads Torgersen (have I got it right? you never wrote your full name) wrote last month in his weblog about

  • Anonymous
    November 15, 2006
    The comment has been removed

  • Anonymous
    November 16, 2006
    Why not create a new interfaces ICollector<T> and ICollector defining only a Add Method. Existing collection interfaces (including ICollection<T>)and classes (including Queue) implementing some kind of add-functionality could be changed to implement ICollector<T> or ICollector. This would not break existing code. What is a collection? The answered could be: It is a class implementing ICollector<T> or ICollector. An IPairCollector<T> interface could be used in the same way.

  • Anonymous
    November 19, 2006
    This Is really nice But Need More Examples

  • Anonymous
    November 20, 2006
    Are you sure that GetInterfaceTypes walks up the inheritance tree? Because if it doesn't and I implement IList<T>, then even though IList<T> implements ICollection<T>, my class won't show in that query. I can tell you that we EXTENSIVELY use generic collections here.

  • Anonymous
    November 22, 2006
    Here some examples to my prevoius e-mail: namespace Test.Normal {    // Sample interfaces and classes without special handling of add-functionality    interface ICollection<T> {        void Add(T item);        bool Remove(T item);    }    class NomalCollection<T> : ICollection<T> {        public void Add(T item) { }        public bool Remove(T item) { return true; }    }    class NormalQueue<T> {        public void Enqueue(T item) { }        public T Dequeue() { return default(T); }    } } namespace Test.Enhanced {    // Sample interfaces and classes with special handling of add-functionality intoducing ICollector<T>    interface ICollector<T> {        void Add(T item);    }    interface ICollection<T> : ICollector<T> {        bool Remove(T item);    }    class EnhancedCollection<T> : ICollection<T> {        public void Add(T item) { }        public bool Remove(T item) { return true; }    }    class EnhancedQueue<T> : ICollector<T> {        public void Enqueue(T item) { }        public T Dequeue() { return default(T); }        void ICollector<T>.Add(T item) { Enqueue(item); }    } }


using System; namespace Test {    class Program {        static bool IsICollector(object o)        {   // The test "o is ICollector<>" with empty generic type parameter is not possible.            // This method does it using Reflection.            Type[] interfaces;            Type iCollectorType;            if (o == null)                return false;            interfaces=o.GetType().GetInterfaces();            iCollectorType = typeof(Enhanced.ICollector<>);            foreach (Type t in interfaces){                if (t.IsGenericType && t.GetGenericTypeDefinition() == iCollectorType)                    return true;            }            return false;        }        static void Main(string[] args)        {            Normal.ICollection<int> normColl1 = new Normal.NomalCollection<int>();            Normal.NomalCollection<int> normColl2 = new Normal.NomalCollection<int>();            Enhanced.ICollection<int> enhColl1 = new Enhanced.EnhancedCollection<int>();            Enhanced.EnhancedCollection<int> enhColl2 = new Enhanced.EnhancedCollection<int>();            normColl1.Add(1);   normColl2.Add(2);            enhColl1.Add(3);    enhColl2.Add(4);            Console.WriteLine("normColl1 IS ICollector: {0}", IsICollector(normColl1)); // ==> false            Console.WriteLine("normColl2 IS ICollector: {0}", IsICollector(normColl2)); // ==> false            Console.WriteLine("enhColl1 IS ICollector: {0}", IsICollector(enhColl1)); // ==> true            Console.WriteLine("enhColl2 IS ICollector: {0}", IsICollector(enhColl2)); // ==> true            Normal.NormalQueue<int> normQueue = new Normal.NormalQueue<int>();            Enhanced.ICollector<int> enhQueue1 = new Enhanced.EnhancedQueue<int>();            Enhanced.EnhancedQueue<int> enhQueue2 = new Enhanced.EnhancedQueue<int>();            normQueue.Enqueue(1);            ((Enhanced.EnhancedQueue<int>)enhQueue1).Enqueue(2);   enhQueue1.Add(3);            enhQueue2.Enqueue(4);   ((Enhanced.ICollector<int>)enhQueue2).Add(5);            Console.WriteLine("normQueue IS ICollector: {0}", IsICollector(normQueue)); // ==> false            Console.WriteLine("enhQueue1 IS ICollector: {0}", IsICollector(enhQueue1)); // ==> true            Console.WriteLine("enhQueue2 IS ICollector: {0}", IsICollector(enhQueue2)); // ==> true            Console.ReadLine();        }    } }

  • Anonymous
    November 22, 2006
    How about using attributes for this? Then you could use an [CollectionAddMethod] and [CollectionRemoveMethod] for example instead of looking for an actual implementation of an ICollection...

  • Anonymous
    November 24, 2006
    Would an Add extension method (new feature in C# 3.0) that was in scope also be accepted by the compiler for collection initialization? For example (bear with me - I'm new to this stuff), assume there exists a Queue<> class which is IEnumerable<> but not ICollection<> and has an Enqueue method but no Add method. If the following extension method was defined: public static void Add<T>(this Queue<T> queue, T item) {  queue.Enqueue(item); } would that work? I know in this case it seems an unnecessary method call because Add just wraps Enqueue, but taking it further, it would allow any number of custom Add methods to be added to any collection type and participate in the collection initialization syntax.

  • Anonymous
    November 26, 2006
    Nu ştiu câţi dintre voi au păstrat setarea standard a Visual Studio, şi anume aceea de a-ţi deschide

  • Anonymous
    November 27, 2006
    I like Andres Cartin's suggestion for Attributes. Feels more flexible.

  • Anonymous
    November 27, 2006
    With curly braces...hmmm looks like PERL!

  • Anonymous
    November 28, 2006
    About the collection initializer: Couldn't the same thing be accomplished with a constructor like this: public Collection(params T[] items) and not have to introduce new syntax? Collection<String> myStrings =    new Collection<String>("One", "Two", "Three");

  • Anonymous
    November 30, 2006
    Andres Cartin's suggestion is great !

  • Anonymous
    December 01, 2006
    It looks like Microsoft has taken a big step backwards with LINQ. The syntax/semantics are counter-intuitive and makes readability a great challenge.

  • Anonymous
    December 03, 2006
    Querying function names for the correct funciton to call is a bad idea.  I think you should just fix the collections you have found with this query with one of the suggested methods above.  What this artical does do is show the power of LINQ, and how its power can be used with reflection for analysis of source code.  I think developers are more interested in this power then saving a few lines of code when initializing a collection. --Falooley

  • Anonymous
    December 03, 2006
    there might be a bit of stupidity in every revolutionary thing but I as a database freak for the last 10-12 years am thinking that LINQ is the worst I've seen from MS. it is refering to object data (fields) as if they are relational data(columns). this comes to confuse the already confused c# developer about everything he is supposed to do with a database. well, their RELATIONAL database. we've got tons of rubbish sql from our web developers anyway and this now will make things even worse!

  • Anonymous
    December 07, 2006
    Recently I came across a blog post by Mads Torgersen , the project manager for the C# language team at

  • Anonymous
    January 23, 2007
    Thanks again for all your comments. Many of you suggest that we introduce some mechanism (new interface, attributes, special methods etc) to better mark the fact that something "is a collection" and how it should be initialized with a collection initializer. This does not address the problem we set out to solve, which is to make collection initialization possible on a number of existing classes without modification. The only suggestion I've seen that does that is the one about allowing colleciton initializers to make use of Add methods added as extension methods. We will look at that, but as it is, we are currently finishing up the last bits of coding for Orcas, and can hardly fit any more changes in. Heck we gotta ship at some point. Luckily this is the kind of limitations that we can lift in the next version if the need is there; we are not designing ourselves out of it.

  • Anonymous
    January 23, 2007
    Welcome to the nineteenth Community Convergence. I'm Charlie Calvert, the C# Community PM, and this is

  • Anonymous
    January 25, 2007
    Why do you microsft guys go public with your language design issues just as you go into lockdown. Surely it would be more meaningful to engage the community at a point where their input can make a difference.

  • Anonymous
    January 25, 2007
    LINQ and C# 3.0 have been public since PDC in september 2005, with active forums, blogs, website etc. We got a ton of useful input from the community which has made quite a difference. Welcome to the game :-)

  • Anonymous
    June 21, 2007
    Sorry for being late. My personal experience has been that ICollection and the like are not fine-granular enough. The interfaces should be more fine-granular. My rule of thumb is: every interface in the system should have at most one member. Works amazingly since 2003. Interfaces should more use interface inheritance to provide more flexibility in distilling out certain facets of the functionality. I totally agree with Oli that a special interface is needed, with the only Add(T item) method. I don't really like the proposal of Andres Cartin about attributes, this should be solved with interfaces - interfaces are there and can handle it well. My own little library includes the IFillable<T> interface with the only method Add(T item). The name (Fillable) isn't probably the best, ICollector might be an option, but the idea served me very well all the time. There are a lot of scenarios where the only thing that we want to do with a parameter is to add an item to it. Almost everywhere, where we use yield return to return a number of items, we can use a parameter of type IFillable. The only difference is that it won't be lazy anymore. Some examples from my own "base class library": public interface IFillable<T> { void Add(T item); } public interface ISet<T> { bool Contains(T item); } public interface IClearable { void Clear(); } public class ListSet<T> : List<T>, ISet<T>, IFillable<T>, IClearable { } public class Set<T> : Dictionary<T, Object>, IEnumerable<T>, ISet<T>, IFillable<T>, IClearable { ... }

  • Anonymous
    September 25, 2007
    How sad it is that Microsoft doesn't consider an XML document a collection! For example, it is not possible to us a For/Each block to iterate through each element in an XML document using VB or C#. It is possible to do with XSL. I've submitted a case (SRZ070705000503) about Microsoft's misconception that everything between the start and end tags of an element is a single element when it is not. That is only true for elements with text content; not xml content (descendants). This case has been escalated for weeks now so I assume that there is some validity to my claim. XML would be SO MUCH MORE USEFUL if each element (node) could be dealt with individually without affecting the descendants of the element. A good example would be to delete or move an element leaving the descendants as descendants of the deleted or moved element. Consider an XML document with elements that have an attribute called Approved which can be True or False and you want to end up with an XML document that consists only of the elements with Approved=True using an XPath predicate. Currently, this can only be done by converting the XML to a NodeList which loses the hierarchical structure. I've recommended that the XMLNode object in the MSXMLDOM get a new property called ExcludeDescendants. That property would be False by default in existing applications that don't have the ExcludeDescendants property so that existing applications would work as they do now. New instances of controls that would be affected such as the XMLDataSource control would have the ability to set that property to True so that individual elements could be manipulated the same way that they can be using XSL. If Microsoft were to include an XML document as a collection, that would open up many new potential uses for XML which currently can't be done or require dozens of lines of code to do. There are plenty of name/value pair collections, now it's time to provide developers with hierarchical collections! David

  • Anonymous
    November 25, 2007
    I couldn't find GetInterfaceTypes, but GetInterfaces seems to do the trick: from name in assemblyNames  select Assembly.LoadWithPartialName(name) into a  from c in a.GetTypes()  where c.IsPublic &&     c.GetConstructors().Any(m => m.IsPublic) &&     c.GetInterfaces().Contains(typeof(ICollection<>))  select c.FullName However, it's interesting that typeof(Dictionary<,>.KeyCollection).GetInterfaces().Contains(typeof(ICollection<>)) returns False, even though KeyCollection implements ICollection<>. Why would that be?

  • Anonymous
    January 06, 2008
    File this in my learn something new every day bucket. I received an email from Steve Maine after he read

  • Anonymous
    July 21, 2008
    Ultracet and coumadin. Ultracet.