How to: Perform Custom Join Operations (C# Programming Guide)
This example shows how to perform join operations that are not possible with the join clause. In a query expression, the join clause is limited to, and optimized for, equijoins, which are by far the most common type of join operation. When performing an equijoin, you will probably always get the best performance by using the join clause.
However, the join clause cannot be used in the following cases:
When the join is predicated on an expression of inequality (a non-equijoin).
When the join is predicated on more than one expression of equality or inequality.
When you have to introduce a temporary range variable for the right side (inner) sequence before the join operation.
To perform joins that are not equijoins, you can use multiple from clauses to introduce each data source independently. You then apply a predicate expression in a where clause to the range variable for each source. The expression also can take the form of a method call.
Note
Do not confuse this kind of custom join operation with the use of multiple from clauses to access inner collections. For more information, see join clause (C# Reference).
Example
The first method in the following example shows a simple cross join. Cross joins must be used with caution because they can produce very large result sets. However, they can be useful in some scenarios for creating source sequences against which additional queries are run.
The second method produces a sequence of all the products whose category ID is listed in the category list on the left side. Note the use of the let clause and the Contains method to create a temporary array. It also is possible to create the array before the query and eliminate the first from clause.
class CustomJoins
{
#region Data
class Product
{
public string Name { get; set; }
public int CategoryID { get; set; }
}
class Category
{
public string Name { get; set; }
public int ID { get; set; }
}
// Specify the first data source.
List<Category> categories = new List<Category>()
{
new Category(){Name="Beverages", ID=001},
new Category(){ Name="Condiments", ID=002},
new Category(){ Name="Vegetables", ID=003},
};
// Specify the second data source.
List<Product> products = new List<Product>()
{
new Product{Name="Tea", CategoryID=001},
new Product{Name="Mustard", CategoryID=002},
new Product{Name="Pickles", CategoryID=002},
new Product{Name="Carrots", CategoryID=003},
new Product{Name="Bok Choy", CategoryID=003},
new Product{Name="Peaches", CategoryID=005},
new Product{Name="Melons", CategoryID=005},
new Product{Name="Ice Cream", CategoryID=007},
new Product{Name="Mackerel", CategoryID=012},
};
#endregion
static void Main()
{
CustomJoins app = new CustomJoins();
app.CrossJoin();
app.NonEquijoin();
Console.WriteLine("Press any key to exit.");
Console.ReadKey();
}
void CrossJoin()
{
var crossJoinQuery =
from c in categories
from p in products
select new { c.ID, p.Name };
Console.WriteLine("Cross Join Query:");
foreach (var v in crossJoinQuery)
{
Console.WriteLine("{0,-5}{1}", v.ID, v.Name);
}
}
void NonEquijoin()
{
var nonEquijoinQuery =
from p in products
let catIds = from c in categories
select c.ID
where catIds.Contains(p.CategoryID) == true
select new { Product = p.Name, CategoryID = p.CategoryID };
Console.WriteLine("Non-equijoin query:");
foreach (var v in nonEquijoinQuery)
{
Console.WriteLine("{0,-5}{1}", v.CategoryID, v.Product);
}
}
}
/* Output:
Cross Join Query:
1 Tea
1 Mustard
1 Pickles
1 Carrots
1 Bok Choy
1 Peaches
1 Melons
1 Ice Cream
1 Mackerel
2 Tea
2 Mustard
2 Pickles
2 Carrots
2 Bok Choy
2 Peaches
2 Melons
2 Ice Cream
2 Mackerel
3 Tea
3 Mustard
3 Pickles
3 Carrots
3 Bok Choy
3 Peaches
3 Melons
3 Ice Cream
3 Mackerel
Non-equijoin query:
1 Tea
2 Mustard
2 Pickles
3 Carrots
3 Bok Choy
Press any key to exit.
*/
In the following example, the query must join two sequences based on matching keys that, in the case of the inner (right side) sequence, cannot be obtained prior to the join clause itself. If this join were performed with a join clause, then the Split method would have to be called for each element. The use of multiple from clauses enables the query to avoid the overhead of the repeated method call. However, since join is optimized, in this particular case it might still be faster than using multiple from clauses. The results will vary depending primarily on how expensive the method call is.
class MergeTwoCSVFiles
{
static void Main()
{
// See section Compiling the Code for information about the data files.
string[] names = System.IO.File.ReadAllLines(@"../../../names.csv");
string[] scores = System.IO.File.ReadAllLines(@"../../../scores.csv");
// Merge the data sources using a named type.
// You could use var instead of an explicit type for the query.
IEnumerable<Student> queryNamesScores =
// Split each line in the data files into an array of strings.
from name in names
let x = name.Split(',')
from score in scores
let s = score.Split(',')
// Look for matching IDs from the two data files.
where x[2] == s[0]
// If the IDs match, build a Student object.
select new Student()
{
FirstName = x[0],
LastName = x[1],
ID = Convert.ToInt32(x[2]),
ExamScores = (from scoreAsText in s.Skip(1)
select Convert.ToInt32(scoreAsText)).
ToList()
};
// Optional. Store the newly created student objects in memory
// for faster access in future queries
List<Student> students = queryNamesScores.ToList();
foreach (var student in students)
{
Console.WriteLine("The average score of {0} {1} is {2}.",
student.FirstName, student.LastName, student.ExamScores.Average());
}
//Keep console window open in debug mode
Console.WriteLine("Press any key to exit.");
Console.ReadKey();
}
}
class Student
{
public string FirstName { get; set; }
public string LastName { get; set; }
public int ID { get; set; }
public List<int> ExamScores { get; set; }
}
/* Output:
The average score of Omelchenko Svetlana is 82.5.
The average score of O'Donnell Claire is 72.25.
The average score of Mortensen Sven is 84.5.
The average score of Garcia Cesar is 88.25.
The average score of Garcia Debra is 67.
The average score of Fakhouri Fadi is 92.25.
The average score of Feng Hanying is 88.
The average score of Garcia Hugo is 85.75.
The average score of Tucker Lance is 81.75.
The average score of Adams Terry is 85.25.
The average score of Zabokritski Eugene is 83.
The average score of Tucker Michael is 92.
*/
Compiling the Code
Create a Visual Studio console application project that targets .NET Framework 3.5 or later. By default, the project has a reference to System.Core.dll and a using directive for the System.Linq namespace.
Replace the Program class with the code in the previous example.
Follow the instructions in How to: Join Content from Dissimilar Files (LINQ) to set up the data files, scores.csv and names.csv.
Press F5 to compile and run the program.
Press any key to exit the console window.
See Also
Tasks
How to: Order the Results of a Join Clause (C# Programming Guide)