How to: Join Content from Dissimilar Files (LINQ)
This example shows how to join data from two comma-delimited files that share a common value that is used as a matching key. This technique can be useful if you have to combine data from two spreadsheets, or from a spreadsheet and from a file that has another format, into a new file. You can modify the example to work with any kind of structured text.
To create the data files
Copy the following lines into a file that is named scores.csv and save it to the same folder as your project file. For example, if your project file is in Visual Studio 2010\Projects, save scores.csv to that folder. The file represents spreadsheet data. Column 1 is the student's ID, and columns 2 through 5 are test scores.
111, 97, 92, 81, 60 112, 75, 84, 91, 39 113, 88, 94, 65, 91 114, 97, 89, 85, 82 115, 35, 72, 91, 70 116, 99, 86, 90, 94 117, 93, 92, 80, 87 118, 92, 90, 83, 78 119, 68, 79, 88, 92 120, 99, 82, 81, 79 121, 96, 85, 91, 60 122, 94, 92, 91, 91
Copy the following lines into a file that is named names.csv and save it to the same folder as your project file. For example, if your project file is in Visual Studio 2010\Projects, save names.csv to that folder. The file represents a spreadsheet that contains the student's last name, first name, and student ID.
Omelchenko,Svetlana,111 O'Donnell,Claire,112 Mortensen,Sven,113 Garcia,Cesar,114 Garcia,Debra,115 Fakhouri,Fadi,116 Feng,Hanying,117 Garcia,Hugo,118 Tucker,Lance,119 Adams,Terry,120 Zabokritski,Eugene,121 Tucker,Michael,122
Example
Class JoinStrings
Shared Sub Main()
' Join content from spreadsheet files that contain
' related information. names.csv contains the student name
' plus an ID number. scores.csv contains the ID and a
' set of four test scores. The following query joins
' the scores to the student names by using ID as a
' matching key.
Dim names As String() = System.IO.File.ReadAllLines("../../../names.csv")
Dim scores As String() = System.IO.File.ReadAllLines("../../../scores.csv")
' Name: Last[0], First[1], ID[2], Grade Level[3]
' Omelchenko, Svetlana, 111, 2
' Score: StudentID[0], Exam1[1] Exam2[2], Exam3[3], Exam4[4]
' 111, 97, 92, 81, 60
' This query joins two dissimilar spreadsheets based on common ID value.
' Multiple from clauses are used instead of a join clause
' in order to store results of id.Split.
Dim scoreQuery1 = From name In names
Let n = name.Split(New Char() {","})
From id In scores
Let n2 = id.Split(New Char() {","})
Where n(2) = n2(0)
Select n(0) & "," & n(1) & "," & n2(0) & "," & n2(1) & "," &
n2(2) & "," & n2(3)
' Pass a query variable to a Sub and execute it there.
' The query itself is unchanged.
OutputQueryResults(scoreQuery1, "Merge two spreadsheets:")
' Keep console window open in debug mode.
Console.WriteLine("Press any key to exit.")
Console.ReadKey()
End Sub
Shared Sub OutputQueryResults(ByVal query As IEnumerable(Of String), ByVal message As String)
Console.WriteLine(System.Environment.NewLine & message)
For Each item As String In query
Console.WriteLine(item)
Next
Console.WriteLine(query.Count & " total names in list")
End Sub
End Class
' Output:
'Merge two spreadsheets:
'Adams,Terry,120, 99, 82, 81
'Fakhouri,Fadi,116, 99, 86, 90
'Feng,Hanying,117, 93, 92, 80
'Garcia,Cesar,114, 97, 89, 85
'Garcia,Debra,115, 35, 72, 91
'Garcia,Hugo,118, 92, 90, 83
'Mortensen,Sven,113, 88, 94, 65
'O'Donnell,Claire,112, 75, 84, 91
'Omelchenko,Svetlana,111, 97, 92, 81
'Tucker,Lance,119, 68, 79, 88
'Tucker,Michael,122, 94, 92, 91
'Zabokritski,Eugene,121, 96, 85, 91
'12 total names in list
class JoinStrings
{
static void Main()
{
// Join content from dissimilar files that contain
// related information. File names.csv contains the student
// name plus an ID number. File scores.csv contains the ID
// and a set of four test scores. The following query joins
// the scores to the student names by using ID as a
// matching key.
string[] names = System.IO.File.ReadAllLines(@"../../../names.csv");
string[] scores = System.IO.File.ReadAllLines(@"../../../scores.csv");
// Name: Last[0], First[1], ID[2]
// Omelchenko, Svetlana, 11
// Score: StudentID[0], Exam1[1] Exam2[2], Exam3[3], Exam4[4]
// 111, 97, 92, 81, 60
// This query joins two dissimilar spreadsheets based on common ID value.
// Multiple from clauses are used instead of a join clause
// in order to store results of id.Split.
IEnumerable<string> scoreQuery1 =
from name in names
let nameFields = name.Split(',')
from id in scores
let scoreFields = id.Split(',')
where nameFields[2] == scoreFields[0]
select nameFields[0] + "," + scoreFields[1] + "," + scoreFields[2]
+ "," + scoreFields[3] + "," + scoreFields[4];
// Pass a query variable to a method and execute it
// in the method. The query itself is unchanged.
OutputQueryResults(scoreQuery1, "Merge two spreadsheets:");
// Keep console window open in debug mode.
Console.WriteLine("Press any key to exit");
Console.ReadKey();
}
static void OutputQueryResults(IEnumerable<string> query, string message)
{
Console.WriteLine(System.Environment.NewLine + message);
foreach (string item in query)
{
Console.WriteLine(item);
}
Console.WriteLine("{0} total names in list", query.Count());
}
}
/* Output:
Merge two spreadsheets:
Adams, 99, 82, 81, 79
Fakhouri, 99, 86, 90, 94
Feng, 93, 92, 80, 87
Garcia, 97, 89, 85, 82
Garcia, 35, 72, 91, 70
Garcia, 92, 90, 83, 78
Mortensen, 88, 94, 65, 91
O'Donnell, 75, 84, 91, 39
Omelchenko, 97, 92, 81, 60
Tucker, 68, 79, 88, 92
Tucker, 94, 92, 91, 91
Zabokritski, 96, 85, 91, 60
12 total names in list
*/
Compiling the Code
Create a Visual Studio project that targets the .NET Framework version 3.5 or a later version. By default, the project has a reference to System.Core.dll and a using directive (C#) or Imports statement (Visual Basic) for the System.Linq namespace. In C# projects, add a using directive for the System.IO namespace.
Copy this code into your project.
Press F5 to compile and run the program.
Press any key to exit the console window.