This is a favourite of mine, simply because it always used to trip me up. I put this down to learning SQL before Linq, so I expect Linq to behave the same way as SQL. As you’ll see below, when it comes to Group By they behave a little differently.
Contents
Linq Group By Example
private class Record { public string Name { get; set; } public int Score { get; set; } } static void Main() { var scores = new List<Record>() { new Record() { Name = "Bill", Score = 2 }, new Record() { Name = "Ted", Score = 9 }, new Record() { Name = "Bill", Score = 1 }, new Record() { Name = "Ted", Score = 8 }, new Record() { Name = "Bill", Score = 9 }, new Record() { Name = "Ted", Score = 5 }, }; var groupings = scores.GroupBy(x => x.Name); foreach (var grouping in groupings) { Console.WriteLine(grouping.Key); foreach (var record in grouping) { Console.WriteLine($" {record.Score}"); } } }
Which outputs:
Peter 2 1 9 Ralph 9 8 5
First we create a struct (Record) to hold our sample data and initialise list of Records with some data.
The actual linq group by statement is then done on line 19 and results in an object of type: IEnumerable<IGrouping<string, Record>> (which I called groupings). I then use a foreach (line 21) to iterate over each of these groupings, printing it’s name and each of it’s values (with another foreach on line 24).
What is Linq Group By used for?
Linq Group By is for grouping a set of results by a certain attribute. In the example above we’re grouping Score by the name on the score. But equally you could group accounting figures by month, sales figures by widget, people by age – the list is endless.
Quite often you’ll want to then summarise the data using an aggregation function, but as we’ll see below – with Linq Group By, you don’t have to!
What is an IGrouping?
The full definition of IGrouping can be found in the microsoft docs, and I would agree with them that an IGrouping represents a collection of objects that have a common key. But what actually is it?
Since .NET Core is now open source, we can see for ourselves by looking at the relevant source code:
public interface IGrouping<out TKey, out TElement> : IEnumerable<TElement> { TKey Key { get; } }
This shows us that an IGrouping is an IEnumerable with an additional Key property.
Does this make intuitive sense? I think so: the Group By statement divides an IEnumerable into smaller IEnumerables (IGroupings) and labels each of these with the key that they all share.
In the example above, we ended up with two IGroupings, each one holding the set of Records that shared a Name (so three Records for Bill and three for Ted). We could have worked out which group was which by looking at the first element in each and checking it’s Name, but the IGrouping’s key property was handy because meant we didn’t have to look into the records – the key is right there:
var grouping = scores.GroupBy(x => x.Name); foreach (var group in grouping) { // Without using key property Console.WriteLine(group.First().Name); // Using IGroupings's key property Console.WriteLine(group.Key); }
Common Use Case Examples
Usually when you’re grouping by something, you’re aiming to aggregate the results. This is so common, that in SQL you can’t group by without the aggregation step. The following examples show some different ways to aggregate the results of a Linq Group By:
Linq Group By Count
var scores = new List<Record>() { new Record() { Name = "Bill", Score = 2 }, new Record() { Name = "Ted", Score = 9 }, new Record() { Name = "Bill", Score = 1 }, new Record() { Name = "Ted", Score = 8 }, new Record() { Name = "Bill", Score = 9 }, new Record() { Name = "Ted", Score = 5 }, }; var grouping = scores.GroupBy(x => x.Name); foreach (var group in grouping) { Console.WriteLine( $"{group.Key}: {group.Count()}"); }
Here we use the same Group By to get our IEnumerable of IGroupings, but this time instead of printing the score from each Record, we use Linq Count to count the number of Records in the IGrouping instead. The above code outputs:
Bill: 3 Ted: 3
Linq Group By Sum
var scores = new List<Record>() { new Record() { Name = "Bill", Score = 2 }, new Record() { Name = "Ted", Score = 9 }, new Record() { Name = "Bill", Score = 1 }, new Record() { Name = "Ted", Score = 8 }, new Record() { Name = "Bill", Score = 9 }, new Record() { Name = "Ted", Score = 5 }, }; var grouping = scores.GroupBy(x => x.Name); foreach (var group in grouping) { Console.WriteLine( $"{group.Key}: {group.Sum(x => x.Score)}"); }
We use the same Group By statement as before, but now we print the sum of the Score of all the records in the IGrouping. This results in:
Bill: 12 Ted: 22
Linq Group By Average
var scores = new List<Record>() { new Record() { Name = "Bill", Score = 2 }, new Record() { Name = "Ted", Score = 9 }, new Record() { Name = "Bill", Score = 1 }, new Record() { Name = "Ted", Score = 8 }, new Record() { Name = "Bill", Score = 9 }, new Record() { Name = "Ted", Score = 5 }, }; var grouping = scores.GroupBy(x => x.Name); foreach (var group in grouping) { Console.WriteLine( $"{group.Key}: {group.Average(x => x.Score)}"); }
You might be starting to see a theme here. As before, we use Group By to get a set of IGroupings, then when print the average (mean) of the scores of all Records in the IGrouping. This results in:
Bill: 4 Ted: 7.33333333333333
Linq Group By Min
var scores = new List<Record>() { new Record() { Name = "Bill", Score = 2 }, new Record() { Name = "Ted", Score = 9 }, new Record() { Name = "Bill", Score = 1 }, new Record() { Name = "Ted", Score = 8 }, new Record() { Name = "Bill", Score = 9 }, new Record() { Name = "Ted", Score = 5 }, }; var grouping = scores.GroupBy(x => x.Name); foreach (var group in grouping) { Console.WriteLine( $"{group.Key}: {group.Min(x => x.Score)}"); }
As above, we use a Linq Group By to get a set (IEnumerable) of IGroupings, then for each IGrouping we print the minimum Score from all it’s Records. This results in:
Bill: 1 Ted: 5
Linq Group By Max
var scores = new List<Record>() { new Record() { Name = "Bill", Score = 2 }, new Record() { Name = "Ted", Score = 9 }, new Record() { Name = "Bill", Score = 1 }, new Record() { Name = "Ted", Score = 8 }, new Record() { Name = "Bill", Score = 9 }, new Record() { Name = "Ted", Score = 5 }, }; var grouping = scores.GroupBy(x => x.Name); foreach (var group in grouping) { Console.WriteLine( $"{group.Key}: {group.Max(x => x.Score)}"); }
As above, we use a Linq Group By to get a set (IEnumerable) of IGroupings, then for each IGrouping we print the max Score from all it’s Records using Linq Max. This results in:
Bill: 9 Ted: 9
C# Linq Group By Contains
var scores = new List<Record>() { new Record() { Name = "Bill", Score = 2 }, new Record() { Name = "Ted", Score = 9 }, new Record() { Name = "Bill", Score = 1 }, new Record() { Name = "Ted", Score = 8 }, new Record() { Name = "Bill", Score = 9 }, new Record() { Name = "Ted", Score = 5 }, }; var grouping = scores.GroupBy(x => x.Name); foreach (var group in grouping) { Console.Write($"Did {group.Key} ever score an 8? "); Console.WriteLine(group.Select(x => x.Score).Contains(8) ? "Yes" : "No"); }
This uses a slightly different pattern: We use the same Group By as before to get a set of IGroupings, we then use Select to convert each IGrouping to a set of scores, then we use Linq Contains on these scores to check if there’s an 8 in there. This code outputs:
Did Bill ever score an 8? No Did Ted ever score an 8? Yes
Linq Group By Join (the Linq equivalent of SQL’s String_Agg)
If you’ve ever used SQL’s String_Agg function, then you might go looking for it’s equivalent in Linq. The String_Agg aggregation function takes a set of values and combines them into a string, using a supplied separator. Here’s an example of doing exactly that with Linq Group By:
var scores = new List<Record>() { new Record() { Name = "Bill", Score = 2 }, new Record() { Name = "Ted", Score = 9 }, new Record() { Name = "Bill", Score = 1 }, new Record() { Name = "Ted", Score = 8 }, new Record() { Name = "Bill", Score = 9 }, new Record() { Name = "Ted", Score = 5 }, }; var grouping = scores.GroupBy(x => x.Name); foreach (var group in grouping) { Console.WriteLine( $"{group.Key}: {string.Join(", ", group.Select(x => x.Score))}"); }
This again uses a slightly different pattern to those above, in that you don’t need a Linq aggregation function to achieve the required result. Instead we can use string.Join. This code outputs:
Bill: 2, 1, 9 Ted: 9, 8, 5
Linq Group By vs SQL Group By
I’ve touched on this before in the Dictionary Shorthand post, but while Linq Group By is very similar to SQL Group By, the main difference is that SQL Group By combines the aggregation step with the Grouping, while with Linq, the Group By and the Aggregation are separate.
In SQL this mean you’re forced to explain how you want the results to be aggregates, with Linq, you don’t have to.
With Linq you can easily write your own aggregation code (see the String_Agg section above for an example), while you’re pretty limited in this regard with SQL.
I used to prefer SQL’s approach, since it’s what I learnt first and I find it simpler. The more I use Linq Group By however, the more I appreciate the flexibility and power you get from keeping the aggregation separate, even if it does take some getting used to.
Conclusion
Linq Group By is an incredibly powerful way to analyse a group of results and to summarise data (when combined with a Linq aggregation). This type of summarising has always been a core part of SQL, and it’s inclusion in the C# language is, at least by this developer, very much appreciated!