Share via


PowerShell script for finding Microsoft Office legacy files

Referenced documents:
[MS-CFB]: Compound File Binary File Format
[MS-OLEPS]: Object Linking and Embedding (OLE) Property Set Data Structures
Windows PowerShell Cookbook, 3rd edition, by Lee Holmes

NOTE: Questions and comments are welcome. However, please DO NOT post a comment using the comment tool at the end of this post. Instead, post a new thread in the Open Specifications Forum: Office File Formats at
https://social.msdn.microsoft.com/Forums/en-US/os_binaryfile/threads

 

#########################
# WHAT THE SCRIPT DOES
#########################

This blog is complementary to the blog “Determining Office Binary File Format Types”, by JCurry (Josh). That blog describes in details how to find PIDSI_APPNAME, i.e. the name of the application, which created the file. For the Office legacy files it can be: “Microsoft Office Word”, “Microsoft Excel”, or “Microsoft Office PowerPoint”. The PowerShell (PS) script in this blog prints more information, some information from the header, and the properties from the “Summary Information sector”. This looks adequate to meet the requests of recent cases on the Open Specifications Forum / Office File Formats, but, of course, more annotation can be added if there is demand for it.

The OfficeLegacyFilter.ps1 PS script, see attached file, starts with a comment block, which contains the disclaimer and the name, version, and usage of the script. The script has one parameter, which can be a directory or file name. If the parameter is a directory, then all the files in this directory and in the subdirectories are recursively checked.

All checks are made on the content of the file; the file name extension (sometime called file type) is not used. If a check fails the file is just skipped. Which means if no Office Legacy file is in scope, the script returns nothing. This makes easier to use the script in pipe.

The checks start with the Header Signature, Minor and Major versions, etc. until the First Directory Sector Location is reached. Currently we are interested only in one directory sector, which has the name “Summary Information”. At offset 0x74 of that directory sector is the Starting Sector Location, see [MS-CFB] v20130118 / 2.6.1 Compound File Directory Entry, from that value we can navigate to [MS-OLEPS] — v20130118 / 2.21 PropertySetStream and to 2.20 PropertySet, where you’ll find NumProperties; all the properties with their values are printed out. All these steps can be easily followed in the script. 

#########################
# HOW TO RUN THE SCRIPT
#########################

If your machine is not set up for running a PowerShell script, the OfficeLegacyFilter.ps1 script will not run, because scripting support is disabled by default. To see your current execution policy setting run the:

Get-ExecutionPolicy

cmdlet. If you have some kind of restriction, it can be lifted by running the

Set-ExecutionPolicy Unrestricted

cmdlet. If you try to run the script now, probably you will be prompted for permission and the script will run.

If you want to run the script in a more secure way, set the execution policy to the desired level. If you already have a code-signing certificate, you can use it. To check whether you have a code-signing certificate do the next:
You can go to the cert: driver and look around the certificates

cd cert: # Go to the cert driver and look around

You can find all code-signing certificates

dir cert: –Recurse –CodeSigningCert

If you have at least one, use, e.g., the first one

$cert = @(dir cert: -Recurse –CodeSigningCert)[0]

Set-AuthenticodeSignature <ps1 file> $cert

At the end of the <ps1 file> you should see a comment block with the next structure:

# SIG # Begin signature block
# <64 base64 digits>
.......................
# <64 base64 digits>
# <base64 digits>
# SIG # End signature block

and running the script should not be a problem.

If you don’t have code-signing certificate, then you can make one. The steps how to create a self-signed certificate are described in many places. First you need the utility for making certificate, makecert.exe. You can read about it here:
https://msdn.microsoft.com/en-us/library/windows/desktop/aa386968(v=vs.85).aspx

First you have to generate a local certificate authority and after to create a self-signed certificate by using the local certification authority.
You can read about this, e.g., at
Program: Create a Self-Signed Certificate
Holmes, Lee (2012-12-21). Windows PowerShell Cookbook: The Complete Guide to Scripting Microsoft's Command Shell (p. 522). O'Reilly Media. Kindle Edition.

You should be able to see the properties of the newly created signing certificate by running:

dir cert: -Recurse –CodeSigningCert | Format-List *

 

#########################
# EXAMPLE
#########################

After creating two of the three test files I changed their extension to txt. It can be anything, it is not used in the script.

 

PS C:\Projects\PS\scripts> C:\Projects\PS\scripts\OfficeLegacyFilter.ps1 'C:\Test\a'

C:\Test\a\x\Autonumbering.txt
    000000 Header Signature . . . . : D0 CF 11 E0 A1 B1 1A E1
    000018 MinorVersion . . . . . . : 62
    00001A MajorVersion . . . . . . : 3
    009E94 01 CODEPAGE_PROPERTY_IDENTIFIER: 1252
    009E9C 02 PIDSI_TITLE . . . . . : PowerPoint Presentation
    009EBC 04 PIDSI_AUTHOR . . . . . : Vilmos Foltenyi
    009EBC 0A PIDSI_EDITTIME . . . . : 00:02:53.4790000
    009ED4 08 PIDSI_LASTAUTHOR . . . : Vilmos Foltenyi
    009EEC 09 PIDSI_REVNUMBER . . . . : 1  
    009EF8 12 PIDSI_APPNAME . . . . : Microsoft Office PowerPoint
    009F28 0C PIDSI_CREATE_DTM . . . : Tuesday, 12/20/2011 3:33:05 PM
    009F34 0D PIDSI_LASTSAVE_DTM . . . : Thursday, 1/24/2013 12:05:04 AM
    009F40 0F PIDSI_WORDCOUNT . . . . : 3
    009F48 11 PIDSI_THUMBNAIL size format : 57736 FFFFFFFF

C:\Test\a\x\glow test.txt
    000000 Header Signature . . . . : D0 CF 11 E0 A1 B1 1A E1
    000018 MinorVersion . . . . . . : 62
    00001A MajorVersion . . . . . . : 3
    0118A4 01 CODEPAGE_PROPERTY_IDENTIFIER: 1252
    0118AC 04 PIDSI_AUTHOR . . . . . : Tim
    0118AC 0A PIDSI_EDITTIME . . . . : 00:00:00
    0118B8 07 PIDSI_TEMPLATE . . . . : Normal.dotm
    0118CC 08 PIDSI_LASTAUTHOR . . . : Tim
    0118D8 09 PIDSI_REVNUMBER . . . . : 2  
    0118E4 12 PIDSI_APPNAME . . . . : Microsoft Office Word  
    011910 0C PIDSI_CREATE_DTM . . . : Wednesday, 1/16/2013 8:33:00 AM
    01191C 0D PIDSI_LASTSAVE_DTM . . . : Wednesday, 1/16/2013 8:33:00 AM
    011928 0E PIDSI_PAGECOUNT . . . . : 1
    011930 0F PIDSI_WORDCOUNT . . . . : 24
    011938 10 PIDSI_CHARCOUNT . . . . : 137
    011940 13 PIDSI_DOC_SECURITY . . . : 0

C:\Test\a\y\Acronyms.xls
    000000 Header Signature . . . . : D0 CF 11 E0 A1 B1 1A E1
    000018 MinorVersion . . . . . . : 62
    00001A MajorVersion . . . . . . : 3
    009474 01 CODEPAGE_PROPERTY_IDENTIFIER: 1252
    00947C 04 PIDSI_AUTHOR . . . . . :    
    009488 08 PIDSI_LASTAUTHOR . . . :    
    009494 12 PIDSI_APPNAME . . . . : Microsoft Excel
    0094AC 0C PIDSI_CREATE_DTM . . . : Friday, 9/15/2006 5:00:00 PM
    0094B8 0D PIDSI_LASTSAVE_DTM . . . : Thursday, 2/14/2013 11:49:09 AM
    0094C4 13 PIDSI_DOC_SECURITY . . . : 0

 

OfficeLegacyFilter.ps1