Condividi tramite


Why all the love for lists?

One of the things that I have noticed when participating in interviews with potential candidates is that most candidates have a favorite data structure.  Presumably this is the data structure that they feel the most comfortable with.  The problem is that no matter what coding question I ask them, they see the answer through the lenses of their favorite data structure.  I have met programmers who are in love with hash tables, others who have a certain fondness for B-trees, and quite a few who adore arrays.  My feeling is that this happens not because they understand the data structure itself better than other data structures (honestly, I can't believe that the guy who loves b-trees doesn't understand what arrays are), but it is because each data structure encourages a certain way of thinking about problems.  So really the candidates are just thinking about the question in a certain way that is expressed on the outside as using some particular data structure.

One of the first things that a newcomer to functional programming will notice is that lists are all the rage.  Where most programmers who are accustomed to imperative style would naturally use an array, a variable, or a mutable object, a functional programmer will often use a list.  Why is this?

If we consider the insight that it is not the data structure itself but rather the mode of thinking that encourages the usage of the data structure then we can make some progress on understanding this.  Lists are the simplest recursive data structure.  A list is just a node consisting of a value and the next node in the list.  The definition of a list is defined in terms of itself.

interface IList<T>
{
IList<T> Next { get; }
T Value { get; }
}

But notice how similar this way of thinking is to recursive functions.  For example a very common function defined over lists is Length.  This function computes the length of a given list.

public static int Length<T>(this IList<T> list)
{
if (list == null)
return 0;
return 1 + Length(list.Next);
}

Length is a function that is defined in terms of itself.  Like the definition of list, Length has a value (either the 0 or the 1 +) and it has a next (the recursive call to Length(list.Next)).  So one reason why lists are so popular in functional programming is that they encourage a recursive way of thinking about problems.

In C# 3.0, lists are not so commonly used as IEnumerable<T>.  In fact, IEnumerable<T> is used so much that it sticks out in the same way the usage of lists in functional programming languages sticks out.  The primary Linq provider that will be shipped is Linq to Objects which requires that the source of a query implement IEnumerable<T>.

In some senses, IEnumerable<T> is not so very different from the definition of IList<T> shown above.  Both of them represent sequences of elements where only the current value and the next element are available at a given time.  In fact, it is relatively easy to define one in terms of the other.

Here is a definition of IEnumerable<T> in terms of IList<T>.

public static IEnumerable<T> GetEnumerator<T>(this IList<T> list)
{
for (var current = list; current != null; current = current.Next)
yield return current.Value;
}

And here is a corresponding definition of IList<T> in terms of IEnumerable<T> (take from this post).

class LazyList<T> : IList<T>
{
IList<T> next;
T value;
IEnumerator<T> enumerator;

  LazyList(T value, IEnumerator<T> enumerator)
{
this.value = value;
this.enumerator = enumerator;
}

  public static IList<T> Create(IEnumerable<T> enumerable)
{
return Create(enumerable.GetEnumerator());
}

  public static IList<T> Create(IEnumerator<T> enumerator)
{
if (enumerator.MoveNext())
return new LazyList<T>(enumerator.Current, enumerator);
return null;
}

  public IList<T> Next
{
get
{
if (enumerator != null)
{
next = Create(enumerator);
enumerator = null;
}
return next;
}
}

  public T Value { get { return value; } }
}

But it should be noted that there is a critical difference between both of these definitions and the following definition of List<T>.

class List<T> : IList<T>
{
T value;
IList<T> next;

  public List(T value, IList<T> next)
{
this.value = value;
this.next = next;
}

  public IList<T> Next { get { return next; } }
public T Value { get { return value; } }

  public static IList<T> Empty { get { return null; } }
}

Did you notice what it is?  The previous definitions were lazy but this definition is not.  The iterator did not describe the data itself but rather a process for sequencing data that will be executed possibly sometime in the future.  The LazyList also does not actually realize an entire list.  It describes a way to construct the next element of a list on demand.  But the definition of List does not defer anything to the future.  Now the question is, does it matter?

Comments

  • Anonymous
    February 12, 2007
    I guess it matters in the sense that the lazy and non lazy implementations have differing impacts on memory pressure. It's one of those classic space / time trade offs. All though having done some perf timing in C# 2 of 'LINQ-like' lazy evaluation pipelines I have to say I'm not so sure that 'time' is such a worry - the performance of 'yield return' is plenty good enough. Besides I really like the fact that lazy lists can be 'live' in the sense that elements can be conjured as needed and thus reflect the most recent state of some variable - rather than a snapshot in time as is the case with a pre-populated list.

  • Anonymous
    February 12, 2007
    I guess the real question to ask, are developers mature enough to adopt such technologies without adding complexity to the code? mostly what we search is a declarative style of programming. Functional programmming helps quite much, specially in the concept of writing what we want to do Not how to do it , but on the other hand its excessive usage might add a lot of unreadability !

  • Anonymous
    February 12, 2007
    The comment has been removed

  • Anonymous
    February 12, 2007
    No one claimed that microsoft invented functional programming Sriyan! what microsoft offering is a mixed enviorement. having a mixed enviorement(functional and imperative) can add a lot of power for developers...especially those more familyar with C driven syntax (c#, java) ,who find the syntax of scheme or Haskel is quite bizard. Not forgetting the static typing at compile time ;)

  • Anonymous
    February 12, 2007
    Sweet JEE5, have you heard of Lisp?

  • Anonymous
    February 13, 2007
    Syrian: I surely didn't mean to offend you by talking about functional programming.  I don't think we are ripping it off at all and quite frankly I'm not sure what ripping it off means.  So I'll assume that you are not happy because C# allows some things that functional programmers are used to like lambdas and closures. C# is not a pure functional language and as MadsT (http://blogs.msdn.com/madst) pointed out in his blog post, it does not have all of the facilities that normally are included in FP.  That being said, it does allow some hybrid paradigm stuff. Btw, out of curiousity what marketing names did we use? Codeless: Yes, I have heard of Lisp and use it regularly.  Although I think Sadek's point about the syntax still stands since as most lispers are quick to point out, all the rest is just syntactic sugar and many people crave it incessantly.

  • Anonymous
    February 19, 2007
    Welcome to the twenty-first Community Convergence. I'm Charlie Calvert, the C# Community PM, and this

  • Anonymous
    February 19, 2007
    Welcome to the twenty-first Community Convergence. I'm Charlie Calvert, the C# Community PM, and this

  • Anonymous
    February 19, 2007
    I think it has a lot to do with whether someone has evolved past mere structural/syntactical programming and are truly performing object-oriented programming like aspect-orientation or interface-orientation. I don't think it's necessarily what the person is most comfortable with; it may be a particular data structure best maps to how they think about data. For me, "list" is an implementation.  One of the things I like about the .NET framework is it's ability to facilitate interface-orientation.  I can view a collection as such, that fact that it may be implemented as a list is largely irrelevant. Arguing "list" is better than another data structure signals to me the person has missed the point--especially in the general case.  "Premature optimization", to the extent they've prematurely rejected an implementation without testing/evaluating it, comes to mind. The difference between what a collection contains and what IEnumerable provides functionally, are different.  IEnumerable has no physical limitations on size, it's free to enumerate forever without error--which provides it the ability to perform lazy evaluation or even transient evaluation (where only the "current" element really exists, or "exists" for the duration of the call).  A collection cannot do that when it has attributes like Length, or indexers...

  • Anonymous
    February 19, 2007
    Peter: I agree completely about why people choose certain data structures.  That is exactly what my first paragraph points out.  That a choice of a certain data structure often reflects a mode of thinking about the problem. Yes, that is true that a list may be viewed as a collection.  However, it is very relevant that it is a list sometimes.  Different collections have different design tradeoffs.  Unfortunately, sometimes we need to know more than just that it is a collection. I don't think anyone said that lists are better than another data structure.  Only that lists map well to certain types of thinking. Talking about the differences between IEnumerable and a collection is somewhat of a philosophical debate.  It is certainly the case that any enumeration will have some length (possibly infinite - countable or uncountable) and I don't think that a collection necessarily needs indexers (what is the nth element of a set?).  Furthermore, can't any enumeration of an IEnumerable be considered a collection of elements.

  • Anonymous
    February 19, 2007
    @Wes: Yes, we agree on several of points (my comments were in response to everything as well as for clarity). I don't think I agree with your classification of enumeration though.  You don't need to have any data in order to begin enumeration with IEnumerable.  Have a look at my example at http://blogs.msdn.com/cdndevs/archive/2007/02/13/announcement-new-visual-studio-talk-show-podcast.aspx -- which enumerates keystrokes.  When the enumeration ends there is a known length, but during the enumeration length is, and cannot be, known.  Yes, a pedantic example. For the most part, yes, IEnumerable/enumeration is used on sets of data that is either material (has been allocated for, has known/calculable length at start of enumeration) or is ephemeral (lazy eval or calculated).  But, that doesn't have to be the case.  And that's the beauty of IEnumerable, it is that far abstracted from the implementation of the "set".  Yes, it could be philosophical, it would depend on whether you view the set as that being enumerated or the set that has been enumerated. The implementation can, and sometimes, must be a consideration when used.  In terms of enumeration, I think it's best when it's not a consideration.  A particular IEnumerator could actually switch implementation details; enumerating elements of a list at some point and enumeration elements of an array at another. Sure, often you have to couple your code to the type of collection by making use of properties like Length; but, too often programmers expect/force coupling to things based upon implementation. (If I'm making sense, I guess I'm saying enumeration of a collection should be viewed as separate)

  • Anonymous
    February 19, 2007
    Peter: Good points.  Yes, often I represent input buffers as an IEnumerable because of the lazy evaluation aspect. I am glad you are a strong proponent of separation of interface and implementation.  That is a very good thing. Now, before I dive down again into IEnumerable, I want to point out that my demonstration of a one-to-one translation between IEnumerable<T> and IList<T> (not the framework's IList<T>) was not to convince people that lists are good or better or to encourage people necessarily to use lists.  It was simply to show that they are isomorphic.  There is straightforward mapping from one structure to the other and vice-versa.  This in turn explains why things that are normally defined on lists in other languages (map, filter, fold) are defined on IEnumerable in C#.  It also demonstrates that they are largely the same concept. Now that being said... Whether the length is calculable at the start of an enumeration is not always important.  In the Linq to Objects API there is a method called Count.  This method does a few interesting things like check to see if the underlying object implements ICollection in which case it returns the count. If it doesn't then it enumerates the sequence and returns the realized length.  Of course the result of the Count method may be a program that doesn't halt. I personally like to think about IEnumerable<T> as a projection of something else into the "enumerable" domain.  For example the enumeration of the elements of a set is not the set but it is a projection of the set into the enumerable domain.  The same is true for lists and so on.

  • Anonymous
    March 23, 2007
    The logical steps needed to solve a task versus implementation style/method  has been an issue in many of the projects I've been on.  Many of the approaches simply add extra complexity without increasing reliability.  This hits hard given that our production systems follow the historical rule of having 3/4ths of the cost after they are first put into production.  We've mitigated the post-ship costs by applying linear code to linear tasks, recursive code to recursive tasks, and OO/modular code to modular tasks.  Our team's found that applying the wrong code approach to a task type (linear, recursive, OO/modular) makes the resultant source code much more expensive to test, debug and maintain.  Our main emphasis on structure is at the application level   since our applications are built to meet specific business process needs and must be easily changed to meet business process changes.  This leads to spending development effort for architecture, algorithms, data structures at the application level and leaving individual code modules as mostly self-contained black boxes  We assume that the self-contained modules are written with simple and maintainable code even if that code is using less than state of the art language features (e.g., custom iterators).  This helps us greatly because we need production systems that we can put an average capability hypothetical new team member with 2 years development experience on without an extensive learning period or extensive coaching.  All encompassing application level frameworks, extended language features, use of too many framework APIs, and use of too many patterns are also avoided.  The 2 years of experience rule seems to be about right for setting the desired code complexity, architecture complexity and data structure complexity for maximum maintainability and production system life-span length.

  • Anonymous
    March 23, 2007
    Sounds like given what you want to achieve that you are doing the right thing.  I definitely agree that it works best to use the right tool for the right job.

  • Anonymous
    April 23, 2009
    "It was simply to show that they are isomorphic" The danger is that many less experienced readers will fail to see that. When things essential to computing seem to be so equivalent there should hardly be any need to stress the similarity. Instead, I make it my mission to stress the difference. It is the form vs. semantics distinction that should be made, and is often overlooked by programmers. Even pointing out the (factual) similarity risks affirming a whole bunch of programmers in their belief that the concepts are interchangeable. I believe the very similarity is the trap. Now, if I were elitist, I might want to leave my juniors in the belief that IEnumerable is just a glorified IList or vice versa. There is little use to point out to a pedestrian that essentially a Ferrari does many of the same things as a Skoda, in very much the same way. He can see that. Repeating it only confirms the assumption that are no differences in intention.