How to: Query for the Largest File or Files in a Directory Tree (LINQ)
This example shows five queries related to file size in bytes:
How to retrieve the size in bytes of the largest file.
How to retrieve the size in bytes of the smallest file.
How to retrieve the FileInfo object largest or smallest file from one or more folders under a specified root folder.
How to retrieve a sequence such as the 10 largest files.
How to order files into groups based on their file size in bytes, ignoring files that are less than a specified size.
Example
The following example contains five separate queries that show how to query and group files, depending on their file size in bytes. You can easily modify these examples to base the query on some other property of the FileInfo object.
Module QueryBySize
Sub Main()
' Change the drive\path if necessary
Dim root As String = "C:\Program Files\Microsoft Visual Studio 9.0"
'Take a snapshot of the folder contents
Dim dir As New System.IO.DirectoryInfo(root)
Dim fileList = dir.GetFiles("*.*", System.IO.SearchOption.AllDirectories)
' Return the size of the largest file
Dim maxSize = Aggregate aFile In fileList Into Max(GetFileLength(aFile))
'Dim maxSize = fileLengths.Max
Console.WriteLine("The length of the largest file under {0} is {1}", _
root, maxSize)
' Return the FileInfo object of the largest file
' by sorting and selecting from the beginning of the list
Dim filesByLengDesc = From file In fileList _
Let filelength = GetFileLength(file) _
Where filelength > 0 _
Order By filelength Descending _
Select file
Dim longestFile = filesByLengDesc.First
Console.WriteLine("The largest file under {0} is {1} with a length of {2} bytes", _
root, longestFile.FullName, longestFile.Length)
Dim smallestFile = filesByLengDesc.Last
Console.WriteLine("The smallest file under {0} is {1} with a length of {2} bytes", _
root, smallestFile.FullName, smallestFile.Length)
' Return the FileInfos for the 10 largest files
' Based on a previous query, but nothing is executed
' until the For Each statement below.
Dim tenLargest = From file In filesByLengDesc Take 10
Console.WriteLine("The 10 largest files under {0} are:", root)
For Each fi As System.IO.FileInfo In tenLargest
Console.WriteLine("{0}: {1} bytes", fi.FullName, fi.Length)
Next
' Group files according to their size,
' leaving out the ones under 200K
Dim sizeGroups = From file As System.IO.FileInfo In fileList _
Where file.Length > 0 _
Let groupLength = file.Length / 100000 _
Group file By groupLength Into fileGroup = Group _
Where groupLength >= 2 _
Order By groupLength Descending
For Each group In sizeGroups
Console.WriteLine(group.groupLength + "00000")
For Each item As System.IO.FileInfo In group.fileGroup
Console.WriteLine(" {0}: {1}", item.Name, item.Length)
Next
Next
' Keep the console window open in debug mode
Console.WriteLine("Press any key to exit.")
Console.ReadKey()
End Sub
' This method is used to catch the possible exception
' that can be raised when accessing the FileInfo.Length property.
' In this particular case, it is safe to ignore the exception.
Function GetFileLength(ByVal fi As System.IO.FileInfo) As Long
Dim retval As Long
Try
retval = fi.Length
Catch ex As FileNotFoundException
' If a file is no longer present,
' just return zero bytes.
retval = 0
End Try
Return retval
End Function
End Module
class QueryBySize
{
static void Main(string[] args)
{
QueryFilesBySize();
Console.WriteLine("Press any key to exit");
Console.ReadKey();
}
private static void QueryFilesBySize()
{
string startFolder = @"c:\program files\Microsoft Visual Studio 9.0\";
// Take a snapshot of the file system.
System.IO.DirectoryInfo dir = new System.IO.DirectoryInfo(startFolder);
// This method assumes that the application has discovery permissions
// for all folders under the specified path.
IEnumerable<System.IO.FileInfo> fileList = dir.GetFiles("*.*", System.IO.SearchOption.AllDirectories);
//Return the size of the largest file
long maxSize =
(from file in fileList
let len = GetFileLength(file)
select len)
.Max();
Console.WriteLine("The length of the largest file under {0} is {1}",
startFolder, maxSize);
// Return the FileInfo object for the largest file
// by sorting and selecting from beginning of list
System.IO.FileInfo longestFile =
(from file in fileList
let len = GetFileLength(file)
where len > 0
orderby len descending
select file)
.First();
Console.WriteLine("The largest file under {0} is {1} with a length of {2} bytes",
startFolder, longestFile.FullName, longestFile.Length);
//Return the FileInfo of the smallest file
System.IO.FileInfo smallestFile =
(from file in fileList
let len = GetFileLength(file)
where len > 0
orderby len ascending
select file).First();
Console.WriteLine("The smallest file under {0} is {1} with a length of {2} bytes",
startFolder, smallestFile.FullName, smallestFile.Length);
//Return the FileInfos for the 10 largest files
// queryTenLargest is an IEnumerable<System.IO.FileInfo>
var queryTenLargest =
(from file in fileList
let len = GetFileLength(file)
orderby len descending
select file).Take(10);
Console.WriteLine("The 10 largest files under {0} are:", startFolder);
foreach (var v in queryTenLargest)
{
Console.WriteLine("{0}: {1} bytes", v.FullName, v.Length);
}
// Group the files according to their size, leaving out
// files that are less than 200000 bytes.
var querySizeGroups =
from file in fileList
let len = GetFileLength(file)
where len > 0
group file by (len / 100000) into fileGroup
where fileGroup.Key >= 2
orderby fileGroup.Key descending
select fileGroup;
foreach (var filegroup in querySizeGroups)
{
Console.WriteLine(filegroup.Key.ToString() + "00000");
foreach (var item in filegroup)
{
Console.WriteLine("\t{0}: {1}", item.Name, item.Length);
}
}
}
// This method is used to swallow the possible exception
// that can be raised when accessing the FileInfo.Length property.
// In this particular case, it is safe to swallow the exception.
static long GetFileLength(System.IO.FileInfo fi)
{
long retval;
try
{
retval = fi.Length;
}
catch (System.IO.FileNotFoundException)
{
// If a file is no longer present,
// just add zero bytes to the total.
retval = 0;
}
return retval;
}
}
To return one or more complete FileInfo objects, the query first must examine each one in the data source, and then sort them by the value of their Length property. Then it can return the single one or the sequence with the greatest lengths. Use First to return the first element in a list. Use Take<TSource> to return the first n number of elements. Specify a descending sort order to put the smallest elements at the start of the list.
The query calls out to a separate method to obtain the file size in bytes in order to consume the possible exception that will be raised in the case where a file was deleted on another thread in the time period since the FileInfo object was created in the call to GetFiles. Even through the FileInfo object has already been created, the exception can occur because a FileInfo object will try to refresh its Length property by using the most current size in bytes the first time the property is accessed. By putting this operation in a try-catch block outside the query, we follow the rule of avoiding operations in queries that can cause side-effects. In general, great care must be taken when consuming exceptions, to make sure that an application is not left in an unknown state.
Compiling the Code
Create a Visual Studio project that targets the .NET Framework version 3.5. The project has a reference to System.Core.dll and a using directive (C#) or Imports statement (Visual Basic) for the System.Linq namespace by default.
Copy this code into your project.
Press F5 to compile and run the program.
Press any key to exit the console window.
Robust Programming
For intensive query operations over the contents of multiple types of documents and files, consider using the Windows Desktop Search engine.