Linq Archives - C# Sage

July 25, 2020October 23, 2020

C# Linq Except: How to Get Items Not In Another List

Overview

When writing C# code we often want to do set based operations on Lists, Dictionaries or other IEnumerables. Here I’ll walk you through Except, a Linq extension method that you can use to get objects from one List that don’t exist in another. I’ll also explain how you can use the same approach on dictionaries and I’ll touch on other similar set based methods like Union, Concat and Intersect.

Except

To get the items from list one list (A) that are not in another list (B) you can use the Linq Except method like this:

var a = new List<int>() { 1, 2, 3, 4, 5 };
var b = new List<int>() { 2, 4, 9, 16, 25 };

var aNotB = a.Except(b);

aNotB.ToList().ForEach(x => Console.WriteLine(x));

Which will produce the following results:

1
3
5

Except is a Linq extension method, so to use it you must first import System.Linq at the top of your file, like this:

using System.Linq;

It’s worth bearing in mind that the result of A.Except(B) will be an IEnumerable, if you want the result as a list call .ToList() as I’ve done above.

Linq Except is the C# equivalent of using “Where X Not In” in SQL.

Except: Example Use Case

There are all sorts of reasons you might want the values from one list that aren’t in another, for instance:

Say you have a shopping list, but you’ve just been to the store and bought some of the items, you might want to get everything on the original shopping list that you didn’t just purchase:

var shopppingList = new List<string>()
{
    "apples",
    "beans",
    "pasta",
    "butter",
    "rice",
    "flour",
    "eggs"
};

var itemsBought = new List<string>()
{
    "chocolate",
    "butter",
    "flour",
    "eggs",
    "icing sugar"
};

var newShoppingList = shopppingList.Except(itemsBought);

Console.WriteLine("You still need to buy the following items:");
foreach (var item in newShoppingList)
{
    Console.WriteLine($"  {item}");
}

Which outputs:

You still need to buy the following items:
  apples
  beans
  pasta
  rice

Using Except with Dictionaries (i.e. Get Items Not In Another Dictionary)

Linq Except works for any IEnumerable, so it’s pretty easy to use it to check for items in one dictionary that are not in another dictionary by calling var aExceptB = a.Except(b); as follows:

var a = new Dictionary<int, int>()
{
    { 0, 0 },
    { 1, 1 },
    { 2, 4 },
    { 3, 9 },
    { 4, 16 }
};

var b = new Dictionary<int, int>()
{
    { 0, 0 },
    { 1, 2 },
    { 2, 4 },
    { 3, 6 }
};

var aExceptB = a.Except(b);

aExceptB.ToList().ForEach(x => Console.WriteLine(x));

/* this code outputs:
[1, 1]
[3, 9]
[4, 16]
*/

By default, this will consider items in the dictionary to match if (and only if) both their keys and values are equal. Eg. [apples, 6] will match [apples, 6] bit won’t match [apples, 4].

Although this is the default behaviour of a C# Dictionary Except, you often just want all the items from Dictionary A where the item’s key doesn’t exist in Dictionary B. You can do this kind of dictionary except as follows:

How to use except with a dictionary just comparing keys (i.e. Using a non-default IEqualityComparer)

One caveat when using Except on dictionaries is that for two items in a dictionary to be considered equal, both the key and the value must be equal. But what if you don’t care what the value is, you just want to compare using keys?

This is where it’s helpful to be able to specify what it means for two things to be equal. To do this we need to implement IEqualityComparer:

private class KeyComparer<T1, T2> : IEqualityComparer<KeyValuePair<T1, T2>>
    where T1 : IComparable
{
    public bool Equals(KeyValuePair<T1, T2> x, KeyValuePair<T1, T2> y)
    {
        //Check whether the keys are equal
        return x.Key.Equals(y.Key);
    }

    // GetHashCode() must return the same value for equal objects.
    public int GetHashCode(KeyValuePair<T1, T2> kVPair)
    {
        return kVPair.Key.GetHashCode();
    }
}

static void Main()
{
    var a = new Dictionary<int, int>()
    {
        { 0, 0 },
        { 1, 1 },
        { 2, 4 },
        { 3, 9 },
        { 4, 16 }
    };

    var b = new Dictionary<int, int>()
    {
        { 0, 0 },
        { 1, 2 },
        { 2, 4 },
        { 3, 6 }
    };

    var aExceptB = a.Except(b, new KeyComparer<int, int>());

    aExceptB.ToList().ForEach(x => Console.WriteLine(x));

    /* this code outputs:
    [4, 16]
    */
}

The important bit here is return x.Key.Equals(y.Key); this says that if if element X has the same key as element Y, then consider them to be equal. This means that { 3, 9 } in dictionary A can match with { 3, 6 } in Dictionary B because the keys of both match, hence it’s excluded from the output.

Other similar functions

Union

If you want any element that is in either list A or list B, then union might be the way to go:

var a = new List<int>() { 1, 2, 3, 4 };
var b = new List<int>() { 3, 4, 5, 6 };

var aUnionB = a.Union(b);

aUnionB.ToList().ForEach(x => Console.WriteLine(x));

/* this code outputs:
1
2
3
4
5
6
*/

However, implicitly does a Distinct on the results, so you won’t get repeated elements just because they’re in both lists. Moreover – I tried it with two 1s in list A and got the same results output – so be warned, this will remove duplicates from your lists.

If you want to keep the duplicates, consider Concat:

Concat

This is a lot like Union, but it will keep the duplicated results. A.Concat(B) will give you list A concatenated with (joined on to) list B. i.e. you’ll get all the elements in A and all the Elements in B, and any values in both will be duplicated. For example:

var a = new List<int>() { 1, 2, 3, 4 };
var b = new List<int>() { 3, 4, 5, 6 };

var results = a.Concat(b);

results.ToList().ForEach(x => Console.WriteLine(x));

/* this code outputs:
1
2
3
4
3
4
5
6
*/

You can see from the output that List b has simple been stuck on the end of List A.

Intersect

So you want the all the items that occur in both list a and list b? You’re looking for the Linq Intersect method. It can be used as follows:

var a = new List<int>() { 1, 2, 3, 3, 4 };
var b = new List<int>() { 3, 4, 5, 6 };

var aIntersectB = a.Intersect(b);

aIntersectB.ToList().ForEach(x => Console.WriteLine(x));

/* this code outputs:
3
4
*/

As you can see from the given results, this also does an implicit Distinct, so you will lose and duplicated values in your lists.

Not Intersect?

What if you want any items from list A that aren’t in B and any items from List B that aren’t in A? Well, there isn’t a dedicated Linq method exactly for that… but, you can build it up yourself with the Union and Except methods:

var a = new List<int>() { 1, 2, 3, 4 };
var b = new List<int>() { 3, 4, 5, 6 };

var results = a.Except(b).Union(b.Except(a));

results.ToList().ForEach(x => Console.WriteLine(x));

/* this code outputs:
1
2
5
6
*/

Distinct

I’ve mentioned distinct a few times and it’s worth a mention, but it’s slightly different to the methods so far discussed. So far we’ve looked at methods that take two IEnumerables and return an IEnumerable. Distinct is different in that it takes a single IEnumerable and returns an IEnumerable.

Distinct is used to return the distinct elements from a list, i.e. at most one copy of every element in a list. It can be used as follows:

var a = new List<int>() { 1, 2, 3, 3, 4, 1 };

var results = a.Distinct();

results.ToList().ForEach(x => Console.WriteLine(x));

/* this code outputs:
1
2
3
4
*/

Do These Linq Methods Preserve the Order of Elements?

I wasn’t sure, so I tried it, and yes, these Linq methods do preserve the order of the items in the Lists:

var a = new List<int>() {4, 3, 2, 4, 1 };
var b = new List<int>() { 3, 4, 5, 6 };

var results = a.Union(b);

results.ToList().ForEach(x => Console.WriteLine(x));

/* this code outputs:
4
3
2
1
5
6
*/

Conclusion

Hopefully this has given you a good introduction to performing set based operations in C# with Linq. In particular using Except to find the items in one list that aren’t in another.

If you’ve spotted a mistake, or if you think there’s something I could add to make this more useful, or if you just want to say “hi” then please leave a message in the comments.

Until next time, have a nice day.

July 19, 2020March 6, 2021

C# Linq Group By

This is a favourite of mine, simply because it always used to trip me up. I put this down to learning SQL before Linq, so I expect Linq to behave the same way as SQL. As you’ll see below, when it comes to Group By they behave a little differently.

Linq Group By Example

private class Record
{
    public string Name { get; set; }
    public int Score { get; set; }
}

static void Main()
{
    var scores = new List<Record>()
    {
        new Record() { Name = "Bill", Score = 2 },
        new Record() { Name = "Ted", Score = 9 },
        new Record() { Name = "Bill", Score = 1 },
        new Record() { Name = "Ted", Score = 8 },
        new Record() { Name = "Bill", Score = 9 },
        new Record() { Name = "Ted", Score = 5 },
    };

    var groupings = scores.GroupBy(x => x.Name);

    foreach (var grouping in groupings)
    {
        Console.WriteLine(grouping.Key);
        foreach (var record in grouping)
        {
            Console.WriteLine($"  {record.Score}");
        }
    }
}

Which outputs:

Peter
  2
  1
  9
Ralph
  9
  8
  5

First we create a struct (Record) to hold our sample data and initialise list of Records with some data.

The actual linq group by statement is then done on line 19 and results in an object of type: IEnumerable<IGrouping<string, Record>> (which I called groupings). I then use a foreach (line 21) to iterate over each of these groupings, printing it’s name and each of it’s values (with another foreach on line 24).

What is Linq Group By used for?

Linq Group By is for grouping a set of results by a certain attribute. In the example above we’re grouping Score by the name on the score. But equally you could group accounting figures by month, sales figures by widget, people by age – the list is endless.

Quite often you’ll want to then summarise the data using an aggregation function, but as we’ll see below – with Linq Group By, you don’t have to!

What is an IGrouping?

The full definition of IGrouping can be found in the microsoft docs, and I would agree with them that an IGrouping represents a collection of objects that have a common key. But what actually is it?

Since .NET Core is now open source, we can see for ourselves by looking at the relevant source code:

public interface IGrouping<out TKey, out TElement> : IEnumerable<TElement>
{
    TKey Key { get; }
}

This shows us that an IGrouping is an IEnumerable with an additional Key property.

Does this make intuitive sense? I think so: the Group By statement divides an IEnumerable into smaller IEnumerables (IGroupings) and labels each of these with the key that they all share.

In the example above, we ended up with two IGroupings, each one holding the set of Records that shared a Name (so three Records for Bill and three for Ted). We could have worked out which group was which by looking at the first element in each and checking it’s Name, but the IGrouping’s key property was handy because meant we didn’t have to look into the records – the key is right there:

var grouping = scores.GroupBy(x => x.Name);

foreach (var group in grouping)
{
    // Without using key property
    Console.WriteLine(group.First().Name);

    // Using IGroupings's key property
    Console.WriteLine(group.Key);
}

Common Use Case Examples

Usually when you’re grouping by something, you’re aiming to aggregate the results. This is so common, that in SQL you can’t group by without the aggregation step. The following examples show some different ways to aggregate the results of a Linq Group By:

Linq Group By Count

var scores = new List<Record>()
{
    new Record() { Name = "Bill", Score = 2 },
    new Record() { Name = "Ted", Score = 9 },
    new Record() { Name = "Bill", Score = 1 },
    new Record() { Name = "Ted", Score = 8 },
    new Record() { Name = "Bill", Score = 9 },
    new Record() { Name = "Ted", Score = 5 },
};

var grouping = scores.GroupBy(x => x.Name);

foreach (var group in grouping)
{
    Console.WriteLine(
        $"{group.Key}: {group.Count()}");
}

Here we use the same Group By to get our IEnumerable of IGroupings, but this time instead of printing the score from each Record, we use Linq Count to count the number of Records in the IGrouping instead. The above code outputs:

Bill: 3
Ted: 3

Linq Group By Sum

var scores = new List<Record>()
{
    new Record() { Name = "Bill", Score = 2 },
    new Record() { Name = "Ted", Score = 9 },
    new Record() { Name = "Bill", Score = 1 },
    new Record() { Name = "Ted", Score = 8 },
    new Record() { Name = "Bill", Score = 9 },
    new Record() { Name = "Ted", Score = 5 },
};

var grouping = scores.GroupBy(x => x.Name);

foreach (var group in grouping)
{
    Console.WriteLine(
        $"{group.Key}: {group.Sum(x => x.Score)}");
}

We use the same Group By statement as before, but now we print the sum of the Score of all the records in the IGrouping. This results in:

Bill: 12
Ted: 22

Linq Group By Average

var scores = new List<Record>()
{
    new Record() { Name = "Bill", Score = 2 },
    new Record() { Name = "Ted", Score = 9 },
    new Record() { Name = "Bill", Score = 1 },
    new Record() { Name = "Ted", Score = 8 },
    new Record() { Name = "Bill", Score = 9 },
    new Record() { Name = "Ted", Score = 5 },
};

var grouping = scores.GroupBy(x => x.Name);

foreach (var group in grouping)
{
    Console.WriteLine(
        $"{group.Key}: {group.Average(x => x.Score)}");
}

You might be starting to see a theme here. As before, we use Group By to get a set of IGroupings, then when print the average (mean) of the scores of all Records in the IGrouping. This results in:

Bill: 4
Ted: 7.33333333333333

Linq Group By Min

var scores = new List<Record>()
{
    new Record() { Name = "Bill", Score = 2 },
    new Record() { Name = "Ted", Score = 9 },
    new Record() { Name = "Bill", Score = 1 },
    new Record() { Name = "Ted", Score = 8 },
    new Record() { Name = "Bill", Score = 9 },
    new Record() { Name = "Ted", Score = 5 },
};

var grouping = scores.GroupBy(x => x.Name);

foreach (var group in grouping)
{
    Console.WriteLine(
        $"{group.Key}: {group.Min(x => x.Score)}");
}

As above, we use a Linq Group By to get a set (IEnumerable) of IGroupings, then for each IGrouping we print the minimum Score from all it’s Records. This results in:

Bill: 1
Ted: 5

Linq Group By Max

var scores = new List<Record>()
{
    new Record() { Name = "Bill", Score = 2 },
    new Record() { Name = "Ted", Score = 9 },
    new Record() { Name = "Bill", Score = 1 },
    new Record() { Name = "Ted", Score = 8 },
    new Record() { Name = "Bill", Score = 9 },
    new Record() { Name = "Ted", Score = 5 },
};

var grouping = scores.GroupBy(x => x.Name);

foreach (var group in grouping)
{
    Console.WriteLine(
        $"{group.Key}: {group.Max(x => x.Score)}");
}

As above, we use a Linq Group By to get a set (IEnumerable) of IGroupings, then for each IGrouping we print the max Score from all it’s Records using Linq Max. This results in:

Bill: 9
Ted: 9

C# Linq Group By Contains

var scores = new List<Record>()
{
    new Record() { Name = "Bill", Score = 2 },
    new Record() { Name = "Ted", Score = 9 },
    new Record() { Name = "Bill", Score = 1 },
    new Record() { Name = "Ted", Score = 8 },
    new Record() { Name = "Bill", Score = 9 },
    new Record() { Name = "Ted", Score = 5 },
};

var grouping = scores.GroupBy(x => x.Name);

foreach (var group in grouping)
{
    Console.Write($"Did {group.Key} ever score an 8? ");
    Console.WriteLine(group.Select(x => x.Score).Contains(8) ? "Yes" : "No");
}

This uses a slightly different pattern: We use the same Group By as before to get a set of IGroupings, we then use Select to convert each IGrouping to a set of scores, then we use Linq Contains on these scores to check if there’s an 8 in there. This code outputs:

Did Bill ever score an 8? No
Did Ted ever score an 8? Yes

Linq Group By Join (the Linq equivalent of SQL’s String_Agg)

If you’ve ever used SQL’s String_Agg function, then you might go looking for it’s equivalent in Linq. The String_Agg aggregation function takes a set of values and combines them into a string, using a supplied separator. Here’s an example of doing exactly that with Linq Group By:

var scores = new List<Record>()
{
    new Record() { Name = "Bill", Score = 2 },
    new Record() { Name = "Ted", Score = 9 },
    new Record() { Name = "Bill", Score = 1 },
    new Record() { Name = "Ted", Score = 8 },
    new Record() { Name = "Bill", Score = 9 },
    new Record() { Name = "Ted", Score = 5 },
};

var grouping = scores.GroupBy(x => x.Name);

foreach (var group in grouping)
{
    Console.WriteLine(
        $"{group.Key}: {string.Join(", ", group.Select(x => x.Score))}");
}

This again uses a slightly different pattern to those above, in that you don’t need a Linq aggregation function to achieve the required result. Instead we can use string.Join. This code outputs:

Bill: 2, 1, 9
Ted: 9, 8, 5

Linq Group By vs SQL Group By

I’ve touched on this before in the Dictionary Shorthand post, but while Linq Group By is very similar to SQL Group By, the main difference is that SQL Group By combines the aggregation step with the Grouping, while with Linq, the Group By and the Aggregation are separate.

In SQL this mean you’re forced to explain how you want the results to be aggregates, with Linq, you don’t have to.

With Linq you can easily write your own aggregation code (see the String_Agg section above for an example), while you’re pretty limited in this regard with SQL.

I used to prefer SQL’s approach, since it’s what I learnt first and I find it simpler. The more I use Linq Group By however, the more I appreciate the flexibility and power you get from keeping the aggregation separate, even if it does take some getting used to.

Conclusion

Linq Group By is an incredibly powerful way to analyse a group of results and to summarise data (when combined with a Linq aggregation). This type of summarising has always been a core part of SQL, and it’s inclusion in the C# language is, at least by this developer, very much appreciated!