Condividi tramite


Tip 12 - How to choose an Inheritance Strategy

What strategies does the Entity Framework support?

The Entity Framework supports 3 primary inheritance strategies:

Table Per Hierarchy (TPH):

In TPH, all data for a type hierarchy is stored in one table, and there is a discriminator column that is used to establish the type of a particular row (i.e. 'C' for Car or 'B' for Boat).

Columns for Properties that are not shared across all types need to be nullable, even if they aren't nullable properties. Which means the database is not completely enforcing your null-ability constraints.

A TPH table might look something like this:

ID Description Discriminator HorsePower KeelLength
1 Porsche 911 C 600 NULL
2 Yacht - KZ7 B NULL 2.2

Table Per Type (TPT):

In TPT properties in a base type are stored in a shared table. For example a Vehicle table:

ID Description
1 Porsche 911
2 Yacht - KZ7

And then there is a table for each sub type, with columns for just newly declared properties and the key, so the tables can be joined. So there would be a Car table:

ID HorsePower
1 600

and Boat table:

ID KeelLength
2 2.2

Table Per Concrete Class (TPC):

In Table per Concrete Class there is a table for each class, and each of those tables has a column for every property of that type, i.e. a Car table:

ID Description HorsePower
1 Porsche 911 600

And a Boat table:

ID Description KeelLength
2 Yacht - KZ7 2.2

Which strategy is the Best?

Trick question! In isolation of your requirements there is no 'Best' strategy.

But...

While the EF runtime supports TPC, the designer doesn't, and using TPC in the EF forces you to avoid associations in your base type. Because of these issues we generally discourage the use of TPC with the Entity Framework.

This means in most situations the question comes down to TPH or TPT?

Here are some of the things you might want to consider when making your decision:

Concern Winner Reasoning
Performance TPH Table Per Hierarchy is generally better performing, because no joins are necessary, everything is in one table. The decision becomes even more clear cut once the inheritance hierarchy gets wide or deep.
Flexibility TPT Table Per Type is often used by ISVs, as it allows customizations without modifying the 'base' table. I.e. new subtypes can be added simply by creating new tables for those sub-types.
Database Validation TPT TPH requires columns in derived types to be NULLABLE in the database, so that other derived types can be stored in the same table. Because of this it is possible to create rows in the table that are not valid according to the conceptual model. I.e. the column is NULLABLE but the combination of a particular column being NULL and a particular discriminator or type is not valid. This means the database is not enforcing the conceptual model for you anymore. This is fine if all access to the database is via the EF, but if anything else is used, you can end up with 'dirty' data.
Aesthetics TPT This one is completely subjective, but TPT feels more Object Oriented to me :)
Storage Space TPT Ward Bell pointed this one out: If your inheritance hierarchy has lots of types, then using TPH will result in lots of empty cells. If your database can handle 'sparse' columns well this probably isn't a real concern.

As you can see once you know what it is you are looking for it should be a pretty easy task to choose a strategy. Most of the time the recommendation is TPH because generally performance trumps these other concerns.

But every situation is different, and the key is to understand exactly what you value, and then make the decision accordingly.

Let me know if you have any questions.

Comments

  • Anonymous
    April 14, 2009
    PingBack from http://bogdanbrinzarea.wordpress.com/2009/04/14/summaries-15042009/

  • Anonymous
    April 15, 2009
    Readers interested in another detailed comparison of inheritance strategies (not linked to EF though) might also have a look at:http://www.xcalia.com/xdn/resources/articles/InheritanceStrategy/PickingTheRightInheritanceStrategy.jsp-- Eric Samson.

  • Anonymous
    April 15, 2009
    Excellent summary.I would add that TPH performance advantage increases with number of derived types (not just inheritance depth) .. if you do polymorphic queries (e.g., get all vehicles starting with letter 'A').TPT is sometimes thought superior when the number of distinct properties (horsepower, keellenght) leads to wide, sparse table. Don't know how often this is a practical objection.Finally, this discussion is not EF specific; the same considerations and tradeoffs apply to other "ORMs".

  • Anonymous
    April 15, 2009
    Hopefully if you're reading this you've noticed that I've started a series of Tips recently. The Tips

  • Anonymous
    April 15, 2009
    Thanks Ward. I'll feed those thoughts in.

  • Anonymous
    April 15, 2009
    Thanks for the link Eric!

  • Anonymous
    April 16, 2009
    Ref'd this tip and similar passages of "NHibernate In Action" on my blog, http://tinyurl.com/d4vry9, where I also urge restraint in using inheritance at all. Keep the series going, Alex!

  • Anonymous
    April 20, 2009
    Please help. I am apparently missing something. What about a simple, traditiona, relational data model, like Pubs or Northwind? Which are these most naturally mapped to in your naming conventions? I would guess "it can be actually be coded in any way, TPT/TPH/TPC, but it most naturally fits TPH, where the term 'descriminator' is esentially a 'foreging key'? I am confused. Can you explain a bit more, in terms of Pubs or Northwind?

  • Anonymous
    June 10, 2009
    There are currently two ways to get Entity Framework apps up and running, we call these Database First

  • Anonymous
    June 14, 2009
      There are currently two ways to get Entity Framework apps up and running, we call these Database

  • Anonymous
    October 14, 2009
    We are re-inventing the wheel all over again - your Table Per Type (TPT) pattern actually does not represent per ce a table per type.

  • Anonymous
    November 17, 2009
    My take is slightly different. This is an excellent summary, but as a DBA I can't think of a reason to recommend TPH to anyone, even on performance grounds. More details:http://sqlblog.com/blogs/merrill_aldrich/archive/2009/11/17/trick-question-part-quattro.aspx

  • Anonymous
    May 08, 2010
    thanks -- this was very helpful