LINQ has been in .NET from 3.5 version. It simplifies a lot many tasks, allowing expressing data processing in more declarative manner. You probably had an occasion to use Except method. But, do you know exact workings of Except method?

At MSDN we can find that :

public static IEnumerable<TSource> Except<TSource>(
  this IEnumerable<TSource> first,
  IEnumerable<TSource> second
)

Produces the set difference of two sequences by using the default equality comparer to compare values.

Having above information, do you have any idea, what would be written into the console when following code is executed?

var invalidChars = Path.GetInvalidFileNameChars();
var fileName = "Availability";
var sanitizedFileName = new string(fileName.Except(invalidChars).ToArray());
Console.WriteLine(sanitizedFileName);

If you initially thought that output will be Availability you are wrong. This is also I thought at first. But there is subtle point in MSDN explanation of Except method. It would treat sequences as sets. That means there will be no elements in output sequence from second sequence. This is no surprise. But this also means that every element from the first sequence, which appears more than once, will be omitted in result.

To achieve required behavior, we need a method that would produce the difference between two sequences, so we can do:

public static IEnumerable<TSource> Without<TSource>(
    this IEnumerable<TSource> first, 
    IEnumerable<TSource> second)
{
    if(first == null)
    {
        throw new ArgumentNullException("first");
    }
    if(second == null)
    {
        throw new ArgumentNullException("second");
    }
    return WithoutIterator(first, second);
}

public static IEnumerable<TSource> WithoutIterator<TSource>(
IEnumerable<TSource> first,
IEnumerable<TSource> second) { var withoutElements = new HashSet<TSource>(second); foreach(var element in first) { if(!withoutElements.Contains(element)) { yield return element; } } }

Summary

Except method is little misleading. When we read the example with sanitizing file name, I bet most of us think that the only invalidChars will be removed from fileName. It is good to know that to avoid further confusion. And if we do not want to treat sequences as sets we could always resort to the method like Without which is shown above.



blog comments powered by Disqus

Published

14 December 2015

Tags