Office Space: Tips and Tricks for Scripting Microsoft Office Applications
Welcome to Office Space, the new column that offers tips and tricks for scripting Microsoft® Office applications. We’ll post new tips every Tuesday and Thursday; to see an archive of previous tips, visit the Office Space Archive. And if you have particular questions about Microsoft Office scripting, feel free to send them to scripter@microsoft.com (in English, if possible). We can’t promise to answer all the questions we receive, but we’ll do our best.
Using the Text to Columns Feature in Microsoft Excel
One thing we Microsoft Scripting Guys have always tried to do is “tell it like it is.” We try not to go overboard gushing about how wonderful Microsoft products are; in fact, much to the chagrin of our fellow employees, we will actually point out flaws and foibles in those products. That’s not due to any hidden agenda on our part, that’s simply because we want to help you better understand how our products work. Or – on rare occasions – how they don’t work.
Of course, there is one exception to this rule: Microsoft Excel. Now, it’s not that we think Excel is 100% perfect, and it’s not that the Excel team is paying us to keep quiet. (Although Excel team, if you’re out there, we’d be willing to consider such an offer.) It’s just that Excel is a really nice piece of software, and the more we use Excel the more cool features we stumble upon. For example, consider this spreadsheet:
As you can see, we have both city names and state names in column A. This is something you’ll see all the time in Excel: a column will hold both first and last names for a person (Myer, Ken) or a user name and a department (Ken Myer, Finance). In general, that’s not very good spreadsheet design. Why not? Well, suppose our sample spreadsheet had hundreds of these entries and we wanted to sort them by state. Good luck, huh? With the city names and the state names in the same column sorting by state is nearly impossible. What we should have done is set our spreadsheet up like this, with city and state names in separate columns:
Unfortunately, though, we didn’t do that. So now we’re stuck, right?
Wrong! As it turns out, Excel has a slick little feature called Text to Columns that can take a column of data – like City, State – and convert it to two columns of data (City and State). To show you how that works, try running this little script:
Set objExcel = CreateObject("Excel.Application")
objExcel.Visible = True
Set objWorkbook = objExcel.Workbooks.Add()
Set objWorksheet = objWorkbook.Worksheets(1)
objExcel.Cells(1, 1).Value = "Olympia, WA"
objExcel.Cells(2, 1).Value = "Salem, OR"
objExcel.Cells(3, 1).Value = "Boise, ID"
objExcel.Cells(4, 1).Value = "Sacramento, CA"
Set objRange = objExcel.Range("A1").EntireColumn
Set objRange2 = objExcel.Range("B1")
objRange.TextToColumns objRange2,,,,,,TRUE
What’d we tell you? When you run the script, you should get a spreadsheet that looks like this:
Cool, huh? We now have cities in column B and states in column C. To get a nice, finished spreadsheet all we have to do is delete the now-superfluous column A, something we can do programmatically by tacking these two lines of code on to the end of our script:
Set objRange = objExcel.Range("A1")
objRange.EntireColumn.Delete
Let’s talk about how we managed to pull this off. The script begins by creating an instance of the Excel.Application object, and then sets the Visible property to True. (If we didn’t do that, Excel would run in an invisible window and we’d never see it.) After that we add a workbook and a blank worksheet. All that happens in these four lines of code:
Set objExcel = CreateObject("Excel.Application")
objExcel.Visible = True
Set objWorkbook = objExcel.Workbooks.Add()
Set objWorksheet = objWorkbook.Worksheets(1)
Next we simply add some data to cells A1 through A4:
objExcel.Cells(1, 1).Value = "Olympia, WA"
objExcel.Cells(2, 1).Value = "Salem, OR"
objExcel.Cells(3, 1).Value = "Boise, ID"
objExcel.Cells(4, 1).Value = "Sacramento, CA"
That gives us a sample spreadsheet to work with. Now it’s time to get down to business.
We begin by creating two instances of the Excel Range object. The first instance (with the object reference objRange) encompasses all the cells in column A; we do that by specifying cell A1 as the start point and then using the EntireColumn property to select all of column A. Here’s the line of code that creates this first range:
Set objRange = objExcel.Range("A1").EntireColumn
This first range represents the column where the existing data lives. Our second range object reference – cleverly titled objRange2 (you think this is bad, you should hear what we named our kids!) – indicates the spot in the spreadsheet where we want to place the converted data. Note that we have to specify only a single cell for this range; Excel will automatically extend the range as needed to find a home for all the converted data:
Set objRange2 = objExcel.Range("B1")
Now all we have to do is call the TextToColumns method and pass the method the appropriate parameters. The method call looks a little odd, but we’ll explain it in just a second:
objRange.TextToColumns objRange2,,,,,,TRUE
So what’s with all the commas? Well, before you ask, we didn’t fall asleep at the keyboard (not this time anyway). Instead, like many Office methods TextToColumns has a number of parameters that must be passed in order. If you want to use the default values for a parameter – as we do here – you don’t need to type a value for that parameter but you do need to type a comma. In our sample code TRUE is the seventh parameter, and the only way to ensure that Excel sees it as the seventh parameter is to include all the blank parameters and commas. If we did this:
objRange.TextToColumns objRange2,TRUE
Then TRUE would be seen as the second parameter and our script would no longer work as expected. Thus we have a seemingly-endless stream of commas, but all for good reason.
We won’t take the time to go through all the possible parameters for the TextToColumns method; there are way too many of them for that. Suffice to say that TextToColumns not only gives you lots of control over how you split your columns – you can specify a comma as the delimiter, you can specify a tab as the delimiter, etc. – but it also gives you the ability to pre-format the columns after they’ve been split. For more information, see the Microsoft Excel VBA Language Reference on MSDN.
For now we’ll just explain how our one line of code works. To begin with, notice that we call the method on column A (objRange), the column that has the initial set of data. We pass the destination range (objRange2) as the first parameter, and then we accept the default settings for the following items:
DataType. We’re using the default value of delimited; we could, alternatively, specify a fixed-width data type.
TextQualifier. A text qualifier would be something like double quotes around each string, e.g. “Olympia, WA”. That’s irrelevant for us, so we just leave this parameter blank.
ConsecutiveDelimiter. This parameter tells Excel whether it should treat consecutive delimiters – such as ,, – as a single delimiter. Again, this doesn’t matter to us, so we leave it blank.
Tab. This tells Excel whether or not to use the tab as the delimiter. We want to use the comma as the delimiter, so we leave this at the default value of FALSE.
Semicolon. Ditto for the semicolon.
That leaves us with parameter no. 7: Comma. We want Excel to use the comma as the delimiter, so we set the value of this parameter to TRUE. There are additional parameters that follow Comma but we’ve elected to go with the default values for each of those; because of that, we’re free to leave those “trailing” parameters out of the method call.
So once again Excel comes through in the clutch, proving that Microsoft Excel represents the pinnacle of human endeavor and achievement. (Excel team, please make the check out to The Scripting Guys. Thanks.)