How WCF Data Service Changes in OS 1.4 Affects Windows Azure Table Clients
The release of Guest OS 1.4 contains an update to .NET 3.5 SP1 which contains some bug fixes to WCF Data Services. We have received some feedback on backward compatibility issues in Windows Azure Tables with respect to the WCF Data Services update. The purpose of this post is to go over some of the breaking changes when moving from .NET 3.5/.NET 3.5 SP1 to the above mentioned update to .NET 3.5 SP1. In addition, we hope this also helps when you upgrade your application to use .NET 4.0 from .NET 3.5 SP1, since the same breaking changes are present in .NET 4.0 too.
Issue #1 - PartitionKey/RowKey ordering in Single Entity query
Before .NET 4.0 and the update to WCF Data Services in .NET 3.5SP1, no exception was thrown when trying to get a single entity which does not exist in the storage service. Take for example the following LINQ query, where the RowKey match is in the expression before the PartitionKey match:
var q = from entity in context.CreateQuery<MyEntity>(“MyTable”)
where entity.RowKey == “Bar” && entity.PartitionKey == “Foo” select entity
This would generate the following Uri:
https://myacocunt.table.core.windows.net/MyTable?$filter=PartitionKey eq ‘Foo’ and RowKey eq ‘Bar’
If the entity did not exist, the use of $filter to specify the filter did not result in an exception and an empty result set was returned before the update. The bug is that the above query should have resulted in the following Uri:
https://myacocunt.table.core.windows.net/MyTable(PartitionKey=“Foo”,RowKey=”Bar”)
This Uri format always results in a DataServiceQueryException with error code “ResourceNotFound” when the entity does not exist.
In the update, when querying for a single entity if the RowKey is filtered before the PartitionKey in the query, it now results in the above Uri which addresses a single entity (i.e. $filter is not used). An exception is now raised if the entity is not present on the server irrespective of the order in which the keys are specified in the LINQ query.
Note, the following LINQ query:
var q = from entity in context.CreateQuery<MyEntity>(“MyTable”)
where entity.PartitionKey == “Foo” && entity.RowKey == “Bar” select entity
has always resulted in the following query Uri:
https://myacocunt.table.core.windows.net/MyTable(PartitionKey=“Foo”,RowKey=”Bar”)
which always results in a DataServiceQueryException with error code “ResourceNotFound” when the entity does not exist. This has always been the case and has not changed.
Any dependency on the behavior that an empty set will be returned when an entity is not found will break your application because the new behavior is to raise an exception even when RowKey precedes PartitionKey in the LINQ query.
The suggestion from WCF Data Services team for this breaking change is:
In update to .NET 3.5 SP1 (available in the Guest OS 1.4 release) and in .NET 4.0, a new flag “IgnoreResourceNotFoundException” on the context is provided to control this. Use IgnoreResourceNotFoundException to ignore exceptions by specifying the following:
context.IgnoreResourceNotFoundException = true;
Always Catch exceptions and then ignore ”Resource Not Found” exceptions if required by your application logic.
Issue #2 - Uri double escaping that impacts queries
Previous versions of the WCF Data Services library did not escape certain characters when forming the Uri. This allowed some entities to be inserted but not retrieved or deleted. A blog post covered the characters that had problems. The solution was to encode/escape them before using them. However, the updated .NET 3.5SP1 (used in OS 1.4) and .NET 4.0 has fixed this issue by using the appropriate encoded/escaped value. This now would break existing applications that had already escaped their values.
The resolution is to review your application to see if values are being escaped and undo these changes to see if it works with the WCF Data Service release.
Issue #3 - Uri escaping that impacts AttachTo
The DataServiceContext tracks entities using its address when entities are either returned in the query result or when AddObject/AttachTo is invoked. The address is basically the Uri that contains PartitionKey and RowKey. For example, an entity with PartitionKey=foo and RowKey=bar is tracked using
https://myaccount.table.core.windows.net/MyTable(PartitionKey='foo',RowKey='bar').
With the Uri escaping fix mentioned above, the previous version (pre .NET 3.5SP1 update and pre .NET 4.0) has a mismatch in how it creates this address to track the entities when the address has a special character in it, which needs to be escaped. The mismatch is between the address (that is appropriately escaped) it uses for an entity it receives from the server and the address it uses (that is un-escaped) when AddObject/AttachObject is invoked. This mismatch causes entities with same key to be tracked twice.
For example, for an entity with PartitionKey = ‘foo@bar.com’ and RowKey = ‘’, the address used to search the list of tracked entities at the time of AttachTo and AddObject is:
.NET 3.5 SP1 update and .NET 4.0 uses the same address that is used to track the entity:
https://myaccount.table.core.windows.net/Emails(PartitionKey='jai%40com',RowKey='')
Pre .NET 4.0 and .NET 3.5 SP1 update however uses a different (un-escaped) address:
https://myaccount.table.core.windows.net/Emails(PartitionKey='jai\@com',RowKey='')
So let us use an example to see exactly where the inconsistency is:
New Client Library (update to .NET 3.5 SP1 and .NET 4.0):
When entity returned from server as result of a query, server returns ID that is escaped:
https://myaccount.table.core.windows.net/Emails(PartitionKey='jai%40com',RowKey='')
and WCF Data Service Client library tracks using this id for the entity.
Then assume an AddObject/AttachTo is invoked for an object with the same key, so the WCF Data Service Client library uses the escaped URI to try to add/attach the object:
https://myaccount.table.core.windows.net/Emails(PartitionKey='jai%40com',RowKey='')
This resuls in an InvalidOperationException exception being thrown with the message “Context is already tracking a different entity with the same resource Uri”. This is the behavior that the client library wants, since the object was already being tracked in the context, so the program should not be able add/attach another object with the same key.
Old Client Library:
Now let us look at the example using the client library before the update. When entity returned from server as result of a query, server returns the ID that is escaped:
https://myaccount.table.core.windows.net/Emails(PartitionKey='jai%40com',RowKey='')
and WCF Data Service Client library tracks using this id for the entity.
Then when the AddObject/AttachTo is invoked, the WCF Data Service Client library does not escape it and uses
https://myaccount.table.core.windows.net/Emails(PartitionKey='jai\@com',RowKey='')
to track the newly added object and hence causing the inconsistency. Instead, the client library should have escaped the keys in order to know that it was already tracking an object of that name, which is what the update in the new client library now does.
For the old client library, this can lead to strange behavior since two instances that represent the single server entity will be tracked in a single context (one is tracked via a query result and the other is tracked via either AttachTo or AddObject)
- If both instances are unconditionally updated, the user may inadvertently lose some changes.
- Let us assume a scenario where a table is used like a lookup table. An application may choose to query all entities from this lookup table with the context tracking these entities. Here the context uses IDs that are appropriately escaped. Now an application may rely on “Context is already tracking…” exception when it adds a new entity. However, the bug can cause the context to track it using an un-escaped URI and the collision is not detected during AddObject and the context tracks two instances that represent the same key. When SaveChanges is invoked, the server fails because the entity already exists on the server and the server correctly returns “Conflict”. However, an application may not be expecting this behavior since it expected the conflict to be detected while “AddObject” was invoked rather than SaveChanges.
- If conditional update is used on both instances, only first update that is processed by the server will succeed and second will fail because of ETag check. The order in which the entities are added to the context (via query or AddObject/AttachTo) will determine the order of requests dispatched to the server.
However, if the address does not contain special characters, then AddObject and AttachTo would throw InvalidOperationException with message “Context is already tracking a different entity with the same resource Uri”, and everything would work fine in the old client library.
This bug has been fixed in the .NET 3.5 SP1 update and in .NET 4.0 where the context escapes the Uri even when AddObject/AttachTo is invoked hence recreating the same address (and hence correctly leading to an InvalidOperationException mentioned above).
Let us go over this using a code example to show how the issue could occur:
TableServiceContext context = tableClient.GetDataServiceContext();
// For simplicity we have ignored the code that uses CloudTableQuery to
// handle continuation tokens.
var q = from entity in context.CreateQuery<MyEntity>("Emails") select entity;
// Let us assume entityInTable is an already existing entity in table retrieved
// using the above query and will now be tracked by the context.
var entityInTable = q.FirstOrDefault();
// now let us create a new instance but with the same PartitionKey and RowKey
var someEntity = new MyEntity
{
PartitionKey = entityInTable.PartitionKey,
RowKey = entityInTable.RowKey
};
try
{
// NOTE: Depending on WCF release and key values, AttachTo may throw
// an InvalidOperationException with message:
// "The context is already tracking a different entity with the same resource Uri."
// CASE 1> Pre .NET 3.5SP1 update => Depending on the key value, an exception may be thrown.
// If the key contains a character that needs to be encoded,
// then an exception is NOT thrown, otherwise, an exception is
// always thrown.
// CASE 2> .NET 3.5SP1 update and .NET 4.0 => An exception is always thrown
//
// Example of an value: If PartitionKey = ‘foo@bar.com’ and RowKey = ‘’ then
// the entity is tracked as:
// https://myaccount.table.core.windows.net/MyTable(PartitionKey='foo%40bar.com',RowKey='')
// However, when attaching a new object in CASE 1, the id is not escaped and hence
// the duplicate entity is not tracked and an exception is not thrown.
// leading to strange behavior if the application unconditionally updates both the instances
context.AttachTo("Emails", someEntity, "*");
}
catch (InvalidOperationException e)
{
// Check if message is "The context is already tracking a different entity with the
// same resource Uri." and handle this case as required by your application
}
context.UpdateObject(someEntity);
context.SaveChanges();
The resolution is to upgrade the WCF Data Services library. However, after upgrading, you should ensure that your code handles exceptions. This is one of the recommended best practices.
One can also check for entity existence using key equality check rather than instance equality before attaching/adding a new object instance. So in the example below, the first LINQ query finds the tracked entity but the second one does not since it does an equality check on reference which it is not the same. If we Attach only if an entity is not found using the first LINQ query, we will never have duplicates. Also, remember that WCD Data Service recommends that a new instance of context be used for every logical operation. Using a new context for every logical operation should reduce the chances of tracking duplicate entities.
Example:
// Create a new instance and let entityInTable represent an entity retrieved via a query
var someEntity = new MyEntity
{
PartitionKey = entityInTable.PartitionKey,
RowKey = entityInTable.RowKey
};
// This will find the tracked entity instance since we are looking for key equality. If
// trackedEntityKeySearch is not null, it means entity is tracked so do not invoke AddObject/AttachTo
var trackedEntityKeySearch = (from e in context.Entities
where ((TableServiceEntity)e.Entity).PartitionKey == someEntity.PartitionKey
&& ((TableServiceEntity)e.Entity).RowKey == someEntity.RowKey
select ((TableServiceEntity)e.Entity)).FirstOrDefault<TableServiceEntity>();
// NOTE: This will not find the tracked entity even if it is tracked since it is not the same
// object instance. So the above query is preferred to see if a particular entity is being tracked
var trackedEntityReferenceSearch = (from e in context.Entities
where e.Entity == someEntity
select ((TableServiceEntity)e.Entity)).FirstOrDefault<TableServiceEntity>();
We apologize for any inconvenience this has caused and hope this helps you make a smooth transition as possible to .NET 3.5 SP1 update or .NET 4.0. However, we would like to end this by reiterating a couple of best practices that the WCF Data Service team recommends:
- Always handle exceptions in AddObject, UpdateObject, AttachTo and in Queries
- DataServiceContext is not thread safe. It is recommended to create a new context for every logical operation
We will have more on best practices in the near future.
Jai Haridas
Comments
- Anonymous
September 19, 2010
Please update the AzureTable Whitepaper for handling exceptions... considering your closing statements I'd like to see some more guidance one how to implement error handling in CRUD operations. Please see this expanded question on this topic: "Clean way to catch errors from Azure Table (other than string match?)" stackoverflow.com/.../clean-way-to-catch-errors-from-azure-table-other-than-string-match