Partager via


Deleting versions from a Sharepoint list item

One of the cool things about Office 2010 is that we made a decision to use Office technology wherever possible.  One of the results of this decision is that the Offfice website (www.office.com or office.microsoft.com) is built on Sharepoint 2010.  I recently had some folks from our perf team come to me to help come up with a strategy for dealing with a rapidly-expanding sharepoint database.  You see, we keep our web site content -- such as help articles, sample templates, clip art, video, etc. -- in a sharepoint db, where it is authored, tested, localized, and finally propogated to the live site.

One of the design decisions made by sharepoint was to make a complete copy of a document each time a new version is checked in.  This includes making a copy of the file contents, even if the only editing changes were made to metadata columns.  This simplifies some operations, but one of the side effects is that large files tend to eat up db space quickly.  In order to keep the disk requirements manageable, we needed to be able to delete some of the items in the version history of certain documents.

Right now you are probably thinking, "doesn't sharepoint already have a version retention policy feature, where you can specify how much data you want to keep?"  Yes, it does.  And I'm sure it works as-is for a lot of different types of installations.  But we have some additional requirements that aren't supported in their model.  For one, we need quite a bit of flexibility as to how many versions we keep -- maybe for a help article we want to keep 20 versions, but for a video we only want to keep two.  Or maybe certain authors want to keep a long version history, while other authors don't mind if there are just a couple of revisions in the history.  Another problem is that certain versions are important to keep, no matter how old they are.  For example, the version that is currently published out on the live site is always important.  The version that has been sent out for localization should defintely not get deleted.  And so on.

After some brainstorming we came up with this:  we need 1) an infrastructure that allows users and maintenance utilities to select and identify which list items need to have some of their versions removed, and 2) a method that will remove the given versions from the list item.  The first part is beyond the scope of this blog post. ;-)  But the second part is why I am writing the post in the first place.

So what would this method look like?  Well, it would take an SPListItem as its main parameter for sure, since our intent is to remove versions from them.  We also need to keep a minimum number of versions, so we can pass that in as an int.  So far we have something like:

RemoveVersions(

SPListItem item, int minVersions)

Okay.  But we also need to let the caller specify any versions they want to make sure we don't remove.  The .Net framework has several different classes that enable you to store a collection of items.  We want to be able to lookup versions in the collection, so that narrows it down, but there are still several.  List<T> is a straightforward one.  Dictionary<TKey,TValue>.KeyCollection contains the keys in a dictionary.  Or there's one of my favorites, HashSet<T>.  The point is that we don't want to limit our callers to one certain class, because they will likely have their list in a different collection type, and need to convert it.

Luckily, all the classes I mentioned implement the ICollection<T> interface.  One of the methods on this interface is Contains().  So if we just specify that our input list of versions they want to save is ICollection<string>, they can store their list in any number of collection types, and we can call Contains() to see if the version we want to delete is off limits.

Finally, what about the return type of the method?  Technically we don't need to return anything.  If we encounter some sort of error we would throw an exception.  If we don't throw an exception, the caller can assume we did our job.  But it might be nice to tell them a little bit about what we did.  How about returning an int that specifies how many versions we deleted?

internal static int RemoveVersions(SPListItem item, int minVersions, ICollection<string> savedVersions)

Note that I used "internal static" as qualifiers.  You may need to modify that depending on your needs.  So the shell of our method looks something like this: 

         internal static int RemoveVersions(SPListItem item, int minVersions, ICollection<string> savedVersions)

        {

            //  Homework for the reader: validate the input arguments.

            //  if item is null, throw an ArgumentNullException

            //  if minVersions < 0 throw an ArgumentOutOfRangeException



            int deletedCount = 0;



            //  ...



            return deletedCount;

        }
  

 Now we just need to figure out what to put in that little section in the middle. ;-)

Looking at the Sharepoint API there are really only two candidates for deleting versions.  One of them is SPListItemVersion, accessed from the SPListItemVersionCollection in SPListItem.Versions, and the other is SPFileVersion, accessed from the SPFileVersionCollection in SPListItem.File.Versions.  Both of them track information about the version history of the document.  Rather than spend a lot of discussion on which one is better, let me just say that SPFileVersion is the right choice if you need to access the actual file contents of a previous version.  Otherwise SPListItemVersion is the better choice.  One of the reasons I went with SPListItemVersion is that it is also available for items in Sharepoint lists, whereas SPFileVersion is only available in librarires.  Most of your big data storage usage is in libraries anyway, so that's not a big win, but users can store attachments along with list items, so you can potentially get some large list items.

But the biggest reason I like SPListItem.Versions is that it is an ordered collection.  Versions[0] is always the newest version.  And Versions[Versions.Count - 1] is always the oldest version.  This makes looking for "old" versions easy, since they will all be at the higher indices in the collection.  In fact, it makes the problem of skipping a minimum number of versions trivial.  When we iterate through the collection, instead of starting at index 0, we can just start at index "minVersions".  And Voila, we have just saved all those versions.  How about I just show you the loop I used, and then explain it.

             int i = minVersions;    // start looking for old versions after skipping minVersions



            while (i < item.Versions.Count)

            {

                SPListItemVersion itemVersion = item.Versions[i];

                string versionLabel = itemVersion.VersionLabel;



                if (!itemVersion.IsCurrentVersion &&    // Not "current" according to SharePoint (e.g. last-published major version, moderated version)

                    (savedVersions == null || !savedVersions.Contains(versionLabel)))  // not one of our "saved" versions

                {

                    itemVersion.Delete();

                    ++deletedCount;

                }

                else

                {

                    ++i;

                }

            }

 

First observation is that the pattern doesn't follow a standard for loop, where we bump up the iterator on each pass through the loop.  There's a simple, but not obvious, explanation for this.  Let's say we get to itemVersions[10], and find a version we want to delete.  We call SPListItemVersion.Delete(), and item #10 is deleted from the sharepoint db.  What isn't immediately apparent is that it is also deleted from the Versions collection -- immediately!  So the next time we reference item.Versions[10], we don't get a stale copy of the version we just deleted.  We actually get the next older version!  Because of this behavior, we don't want to increment the indexer if we delete a version -- otherwise we would skip a version.  The way the loop would usually execute, then, is that we start looking for old versions somewhere in the middle of the list of versions.  We delete each old version.  This causes all the older versions to move up a slot in the version collection, and the Versions.Count to be decremented by one.  Eventually our indexer will be greater than Versions.Count, either because we skipped versions and incremented the indexer, or because we deleted versions, which decremented Versions.Count.

The other interesting point is the inclusion of (!itemVersion.IsCurrentVersion).  What does this even mean?  Didn't you just say the Versions[0] is the current version?  Why do you need this check, then -- you might be saying. ;-)  It turns out that the current. i.e., the latest version, isn't the only current version.  Let's say for example that you have enabled major/minor versioning on your library/list, and your current version is 4.3.  Well, at least that's the latest version.  Your current version according to sharepoint is the last major version, which is 4.0.  So if you didn't have this in your loop, and you were only keeping two versions, you might try to delete version 4.0.  Sharepoint will throw an exception and tell you that you can't delete the current version.  (Of course this didn't actually happen to me.)  Apparently there are a few other "current" versions, such as the last moderated version on certain list types.  Suffice it so say, though, that if Sharepoint thinks it is a current version 1) the IsCurrentVersion property will be true, and 2) you can't delete it.

Well, I hope that has been fun and informative.  I'll paste the full method below for easy reference.

- Paul

         /// <summary>

        /// Removes unneeded versions from a sharepoint list item

        /// </summary>

        /// <param name="item">The SPListItem that needs some versions removed</param>

        /// <param name="minVersions">The minimum number of versions to keep</param>

        /// <param name="savedVersions">A collection of important version labels (or null)</param>

        /// <returns>The number of versions deleted</returns>

        internal static int RemoveVersions(SPListItem item, int minVersions, ICollection<string> savedVersions)

        {

            //  Homework for the reader: validate the input arguments.

            //  if item is null, throw an ArgumentNullException

            //  if minVersions < 0 throw an ArgumentOutOfRangeException



            int deletedCount = 0;

            int i = minVersions;    // start looking for old versions after skipping minVersions



            while (i < item.Versions.Count)

            {

                SPListItemVersion itemVersion = item.Versions[i];

                string versionLabel = itemVersion.VersionLabel;



                if (!itemVersion.IsCurrentVersion &&    // Not "current" according to SharePoint (e.g. last-published major version, moderated version)

                    (savedVersions == null || !savedVersions.Contains(versionLabel)))  // not one of our "saved" versions

                {

                    itemVersion.Delete();

                    ++deletedCount;

                }

                else

                {

                    ++i;

                }

            }



            return deletedCount;

        }

Comments

  • Anonymous
    January 04, 2012
    Correction: SPItem.File.Version[index].IsCurrentVersion

  • Anonymous
    June 14, 2012
    This is excellent info.  I've spent 3 days trying to figure out how to delete all previous file versions from a library.  If there is a current draft and a current published version, we can only keep the current draft.   And we have to preserve the original Modified and Modified By values.   I haven't found a way.  Am I reading your blog to mean there is no way. I've tried deleting the current major version when there is a current minor version; I haven't found a way; the operations won’t allow deleting of anything marked “IsCurrentVersion” and I can’t change IsCurrentVersion to False.  I've tried publishing any minors to major and then writing back the old Modified and Modified By values; it won't let me because these are ReadOnly properties.  If I change the library settings to make those properties not read-only, the SystemUpdate fails..   Would you have any other suggestion on how to delete all previous file versions including any previous Major versions (while preserving Modified & Modified By)?  We are about to convert tens of thousands of existing files in SharePoint to new content types.  The new content types use the term store (taxonomy fields).  But only the latest file version will be updated to the new content type, so we don't want to leave any old content types/old metadata in place (meaning we need to delete ALL old versions). Appreciate any help.