次の方法で共有


Custom transforms: determining end of rowset

With SQL 2008 CTPs starting to get into the hands of customers and ISVs, we've found that some custom transforms and script transforms use incorrect code to check for the end of rowset (incoming data), which could cause problems when you move the code to SQL 2008. Unfortunately, some Microsoft docs helped to spead this error, so let me clarify what I mean and the correct usage pattern.

Most often the EndOfRowset property of SSIS buffer is being often used incorrectly. The property exists on PipelineBuffer object and ScriptBuffer object, so this applies to custom transforms and script transforms.

The buffer.EndOfRowset property is an indicator that the current buffer is the very last (for particular input), and the component will not see any more buffers (on this input).

Note that:
1) Value of this property does not depend on whether you’ve iterated over rows in that buffer,
2) The value should not be used to check whether the buffer contains any rows.

The last point is the most important here. In SQL 2005, the data flow engine sends an additional empty buffer with EndOfRowset indicator at the end. So the buffer [in SQL 2005] either contains some rows, or contains EndOfRowset indicator. But don't rely on either of this.

In SQL 2008, this dummy empty buffer was deemed too expensive from performance point of view, and the data flow engine would not send an additional empty buffer anymore. Instead it simply sets EndOfRowset flag on the last buffer. Thus you get a series of buffers that contain the data, and at the very end a buffer that contains both the data and the EndOfRowset indicator.

In some samples and forum posts I've seen incorrect usage of EndOfRowset property that relies on SQL 2005 behavior (exclusive OR: either end-of-rowset or some data), i.e. the code does not check the incoming rows if it gets end of rowset. In other cases the code assumes it should receive empty buffer, and uses row count as end-of-rowset indicator. If you have a code that uses incorrect patterns, e.g.:

 /* warning - incorrect code */
if (!buffer.EndOfRowset)
{
  while(buffer.NextRow())
  {
    // do something with the row
  }
}
else
{
  // do something at the end
}

or

 /* warning - another incorrect code */
if (buffer.RowCount != 0)
{
  while(buffer.NextRow())
  {
    // do something with the row
  }
}
else
{
  // do something at the end
} 

Please change it to:

 while (buffer.NextRow())
{
  // do something with the row
}

if (buffer.EndOfRowset)
{
  // do something at the end
}

Otherwise you risk missing contents of last buffer in SQL 2008 (first incorrect sample) or fail to detect end of data (second incorrect sample). The last version works correctly both in SQL 2005 and in SQL 2008.

Update: see this followup blog
https://blogs.msdn.com/michen/archive/2008/10/19/does-buffer-nextrow-skips-the-first-row-in-a-buffer.aspx

Comments

  • Anonymous
    August 31, 2007
    The comment has been removed

  • Anonymous
    September 25, 2007
    Thats going to bust my custom code but I guess its not to difficult to code.

  • Anonymous
    October 18, 2008
    I got a follow up question to my old post regarding enumerating rows in SSIS buffer , that suggested