C# String Replace – How to Replace Parts of String in C#

Introduction

It’s common to have data in one format, but want to change it to another. At a low level, this can often come down to doing string replacements – whether it’s tabs to commas for delimited files, more complicated munges, it’s a useful trick to have up your sleeve.

Simplest Replace

It doesn’t get much simpler than using the String.Replace() method:

var text = "Hello World";
var text2 = text.Replace("World", "Everyone");
Console.WriteLine($"{text} -> {text2}");
\\ This code outputs:
\\ Hello World -> Hello Everyone

Case Insensitive Search and Replace

By default, String.Replace is case sensitive, it cares about capitalization:

var text = "Hello World";
var text2 = text.Replace("hello", "Hi");
Console.WriteLine($"{text} -> {text2}");
// This code outputs:
// Hello World -> Hello World

The replace attempt fails because “Hello” does not match “hello” (it’s case sensitive). If you’re after a case insensitive replace, try this:

var text = "Hello World";
var s = StringComparison.CurrentCultureIgnoreCase;
var text2 = text.Replace("hello", "Yo", s);
Console.WriteLine($"{text} -> {text2}");
// This code outputs:
// Hello World -> Yo World

Replacing a Single Character in a String

String.Replace also has another overload, allowing the replacement of a single character.

var text = "Hello World";
var text2 = text.Replace('e', 'a');
Console.WriteLine($"{text} -> {text2}");
// This code outputs:
// Hello World -> Hallo World

There’s a gotcha though, you can’t replace a char with a string or a string with a char, the types of the oldValue and new value have to match. If you try and mix the types you’ll end up with an error:

var text = "Hello World";
var text2 = text.Replace('e', "abc");
Console.WriteLine($"{text} -> {text2}");
// This code throws a compile time error:
// Compilation error (line 3, col 31): Argument 2: cannot convert from 'string' to 'char'

Replace the First Instance of a String

By default, String.Replace does a global replace; it will replace all instances of the old value with the new value:

var text = "The ants go marching two by two...";
var text2 = text.Replace("two", "three");
Console.WriteLine(text);
Console.WriteLine(text2);
\\ This code outputs:
\\ The ants go marching two by two...
\\ The ants go marching three by three...

What if we don’t want all instances replaced? we just want to replace the first instance then stop?

If you only want to replace the first instance of a string, then I’m afraid you need to step out of the comfort zone of String.Replace. You can either roll your own method, or you can use regex.replace:

Replace First Instance – Roll Your Own

The following code snippet has been influenced heavily by this Stack Overflow answer:

static string ReplaceFirst(string text, string oldValue, string newValue)
{
  int pos = text.IndexOf(oldValue);
  return pos < 0
    ? text // a negative value indicates oldValue was not found
    : text.Substring(0, pos) // Everything up to the oldValue
      + newValue
      + text.Substring(pos + oldValue.Length); // Everything after the oldValue
}

It can then be used as follows:

var text = "The ants go marching two by two...";
var text2 = Program.ReplaceFirst(text, "two", "three");
Console.WriteLine(text);
Console.WriteLine(text2);
// This code outputs:
// The ants go marching two by two...
// The ants go marching three by two...

Replace First Instance – Using Regex Replace

By using the Regex class, we can achieve a one time replacement without writing our own code:

using System.Text.RegularExpressions;
var text = "Hello Hello Hello";
var regex = new Regex("Hello");
var text2 = regex.Replace(text, "Hi", 1);
Console.WriteLine($"{text} -> {text2}");
// This code outputs
// Hello Hello Hello -> Hi Hello Hello

In the above example we passed a 1 as the last argument to regex.Replace, this argument is actually a count and we can use it to specify how many times to run the replacement:

 var text = "Hello Hello Hello";
var regex = new Regex("Hello");
var text2 = regex.Replace(text, "Yo", 2);
Console.WriteLine($"{text} -> {text2}");
// This code outputs
// Hello Hello Hello -> Yo Yo Hello

Regular Expression Replace

We’ve seen above how the Regex.Replace class class can be used to replace a single, or a specified number of occurrences of a pattern, but that’s just the tip of the iceberg when it comes to the power of regular expressions. They really deserve their own post, but for now I’ll give you a few examples to whet your appetite:

Case Insensitive Replace using Regex.Replace

var text = "Hello world";
var text2 = Regex.Replace(text, "WORLD", "Everyone", RegexOptions.IgnoreCase);
Console.WriteLine($"{text} -> {text2}");
// This code outputs
// Hello world -> Hello Everyone

Wildcard Replace Using Regex.Replace

var text = "Hello Hallo Hullo";
var text2 = Regex.Replace(text, "H.llo", "Yo");
Console.WriteLine($"{text} -> {text2}");
// This code outputs
// Hello Hallo Hullo -> Yo Yo Yo

String Replace in Place

Note that in C# string are immutable. This means that any search and replace (whether using built in methods, regex methods, or rolling your own code) are going to return a new string with the requested substitutions. Now imagine your input string takes up 100MB of memory, and you’re running 10 or 20 replacements on it. Each time you do a replace you’re allocating memory for the new copy of the string. It’s easy to see how you could quickly use up significant memory resources, so be careful when you’re dealing with large strings!

Thankfully, the StringBuilder class was designed with this situation in mind. It behaves a lot like a string, but it’s actually a mutable sequence of characters. That means we can modify it in place without allocating new memory for a copy of the original string:

In Place Replace using StringBuilder.Replace

var text = new StringBuilder("Hello Hello Hello");
Console.WriteLine($"Before: {text}");
text.Replace("Hello", "Ho");
Console.WriteLine($"After: {text}");
// This code outputs
// Before: Hello Hello Hello
// After: Ho Ho Ho

It’s worth looking at the docs to see what can be done with StringBuilder.Replace, it’s not as fully featured as String.Replace or Regex.Replace, but if you’re dealing with large strings and memory is an issue, then it’s a great tool to be aware of.

Conclusion

We’ve seen how to do simple replacements with String.Replace, how to replace the first (or first few!) instances of a pattern in a couple of ways, we’ve looked at case insensitive searches, touched on regular expressions and even considered ways to limit memory usage when running a search and replace on very large strings.

I really hope you’ve learned something from this deep dive into string replacing in C#. As always, if you feel I’ve missed anything, or just want to say hi, let me know in the comments!

C# Return – How to leave a function/method and return values

If you’re a seasoned programmer you’ll know that the c# return statement is used to exit a method, optionally passing a value (a return parameter) back to the calling function.

But since you’re here, I’m guessing you’re new enough to C # (or programming in general) to benefit from a deeper dive into what a function is, what we mean by a returning from a function and how we go about returning parameters. So, let’s get into it!

Example 1 – Simple Return

public void ReturnExample()
{
  Console.WriteLine("Hello World");
  return;
}

This example function just writes “Hello World” to the console then exits, but the interesting part is the return statement, this calls an end to the function and would be where a value could be returned (see returning a value example below). This method is declared as returning void, i.e. it doesn’t return anything, so we put nothing after the return statement.

Example 2 – Implied Return

The docs tell us that “If the method is a void type, the return statement can be omitted.”; this means that we can leave a function (i.e. return from a function) without explicitly using the return statement, but only in a function that returns void (i.e. doesn’t return anything).

public void ImpliedReturnExample()
{
  Console.WriteLine("Hello World");
}

Example 3 – Returning a value

public int GetRandomNumber()
{
  return 4; // Chosen by fair dice roll
}

Besides the blatant homage to xkcd, this example shows a function which is called GetRandomNumber and we declare that it will return an int. The body of the function (the bit between the to curly braces) is a single line which return an integer, the integer ‘4’ to be exact.

What happens if you declare a function as returning an int, but return something else, like a string? or don’t return anything at all? Either way, you’ll get a compile time error: something like “Cannot implicitly convert type ‘string’ to ‘int'” or “An object of a type convertible to ‘int’ is required” respectively.

Using a returned value

What do we mean by returning a value? It means that the code that calls the function receives a value back when it calls itm and can make use of that value. This allows us to subdivide our code into re-useable blocks (functions) and to use the results of those blocks.

public int GetRandomNumber()
{
  return 4; // Chosen by fair dice roll
}

public void PrintRandomNumber()
{
  Console.WriteLine($"Today's random number is: {this.GetRandomNumber()}");
}
// Today's random number is: 4

Example 4 – Early Return

So far, all of our examples functions have returned at the end, but there’s not reason you can’t return earlier in the function:

public void EarlyReturnExample()
{
  Console.WriteLine("Hello World");
  return;
  Console.WriteLine("This will not be written");
}
// This code outputs:
// Hello World

As you can see from the output of this example, the function ends when it hits the return statement, meaning that the second Console.WriteLine statement is not hit, and the phrase “This will not be written” isn’t written to the console.

This is a fairly contrived example, but this pattern can come in very useful if you need to stop processing early, e.g.:

public void PrintOddOrEven(int a)
{
  if (a%2 == 0)
  {
    Console.WriteLine($"{a} is even");
    return;
  }
  Console.WriteLine($"{a} is odd");
}
// Sample output:
// 2 is even
// 3 is odd

This function writes the value passed in, followed either by “is even” or “is odd” depending on the value passed in. The interesting use of the return statement is that, if the value is even, we use a return statement to exit form the function early. This means we can omit the else clause from our if statement, safe in the knowledge that we’ll never print both “is even” and “is odd” for the same value. Choosing to return early instead of using an else statement can make your code easier to read by reducing the amount of indentation in the rest of your code.

Unreachable Code Detected

This is a compile time warning and it means that the compiler has found some code that is unreachable, there is no way that the code could be executed. This is usually caused by an early return statement, for example:

public void PrintRandomNumber()
{
  Console.WriteLine($"Today's random number is: ");
  return;
  Console.WriteLine(4);
}
// Throws a compile time warning
// Unreachable code detected
// When run it prints:
// Today's random number is: 

Usually the issue is a lot more subtle that this. If you do ever get this warning it’s always worth digging in and finding out the problem, it could save you hours of debugging later.

Return Multiple Values

In C# a method/function can either one value or no values, it can’t return two or more values. However, there’s nothing in the rules to say that the value returned can’t itself be a group or list of things. Returning a tuple of values is the closest thing I know of to returning multiple values from a function in C#.

Returning a Tuple

My favourite way to return multiple things from a single function is to return them as a tuple. Tuples are a relatively new feature in C#, but are well worth learning about for situations such as these:

public (double, double) GetBothSquareRoots(int x)
{
  var a = Math.Sqrt(x);
  var b = 0 - a;
  return (a, b);
}
// Example return values:
// 9 -> 3, -3
// 100 -> 10, -10

Returning multiple values as a tuple has the advantage (over an array, or list) that the items do not need to be the same type:

public (string, System.Drawing.Color) MyFavouriteColour()
{
  return ("Forest Green", System.Drawing.Color.ForestGreen);
}

This is a contrived example (as you can get the name from the Color object) but there have been plenty of occasions where being able to return two or more things of different types from the same function has gotten me out of a jam.

That said, this counts as a “code smell”, it often suggests you’re doing something wrong. either that you’re methods are not specific enough or that you should be passing around a class rather than a collection of objects. That said, it sometimes is appropriate to return a tuple, but consider getting someone with experience to review your code to make sure you’re not doing anything daft.

Returning a List (or IEnumerable)

public IEnumberable<int> GetPositiveNumbersBelow(int a)
{
  var list = new List<int>();
  for (var i = 1; i < a; i++)
  {
    list.Add(i);
  }
  return list;
}
// example returned values
// 3 -> [1,2]
// 5 -> [1,2,3,4]

Yield Return

The above example (Returning a List) brings us nicely on to yield return. This lets us return a list (technically and IEnumerable) of objects, but with a much cleaner syntax, by yield returning each list item individually. This means we don’t have to declare a list variable and add to it, making the example much cleaner:

public IEnumerable<int> GetPositiveNumbersBelowAgain(int a)
{
  for (var i = 1; i < a; i++)
  {
    yield return i;
  }
}
// Example return values:
// 4 -> [1,2,3]
// 6 -> [1,2,3,4,5]

I personally, love how neat the yield return version of this example is, and I try and use a yield return whenever it makes sense. Some IDEs like Visual Studio 2019 will prompt you to use yield return where appropriate.

Other similar statements

C# Break – leaving a loop

In the same way that a return statement can be used to leave.a method/function, we can use a break statement to leave a loop, such as a while loop:

public IEnumerable<int> DoubleUntil100(int x)
{
  while(true)
  {
    x *= 2;
    if (x >= 100)
    {
      break;
    }
    yield return x;
  }
}
// Sample return values
// 3 -> [6,12,24,48,96]
// 5 -> [10,20,40,80]

This is a contrived example because we could have put our condition into the while() loop, but I hope you can see that the break statement causes us to leave the while loop when it gets executed. Unlike a return statement, we can’t include a value after a break.

C # Continue – finishing one iteration of a loop

What if you don’t want to exit a loop completely, but you do want to stop processing and move on to the next cycle in the loop? That’s where the continue statement comes in.

public IEnumerable<int> DoubleUntil100Excluding40(int x)
{
  while(x<50)
  {
    x *= 2;
    if (x == 40)
    {
      continue;
    }
    yield return x;
  }
}
// Sample return values
// 3 -> [6,12,24,48,96]
// 5 -> [10,20,80]

In this example, hitting 40 doesn’t cause the while loop to stop running, but it does skip the rest of that iteration, meaning the yield return never runs for 40 and so 40 is never included in the output. If you’re not familiar with yield return, it might be worth re-reading that section above.

What does “Not all code paths return a value” mean?

This is a topic that deserves it’s own post, but for now I’ll explain that when a function is declared as returning a value, all paths through that code must return a value. If this rule is violated, you’ll get a compile time error:

public int Nothing()
{
}
// Compile time error:
// 'Program.Nothing(int)': not all code paths return a value

That was a pretty basic example (the function is empty, it’s clearly missing a return statement. But let’s take a look at a more involved example:

public int Collatz(int x)
{
  if (x == 1)
  {
    return 1;
  }
  else if (x%2 == 0)
  {
    return Collatz(x/2);
  }
}
// Compile time error:
// 'Program.Collatz(int)': not all code paths return a value

This program calls itself recursively if the number passed in is even, but if the number is not even – what happens? execution continues to the end of the function, the ‘}’ without returning anything. This is a path through the code that doesn’t return a value, which is not allowed for a function that is declared as returning an int, so we get a compile time error.

How do we fix it? It’s necessary to make sure that all paths through the function result in a value being returned:

public int Collatz(int x)
{
  if (x == 1)
  {
    return x;
  }
  else if (x%2 == 0)
  {
    return Collatz(x/2);
  }
  else
  {
    return Collatz(3*x +1);
  }
}
// example return values:
// 3 -> 1
// 5 -> 1
// 42 -> 1

It’s worth noting that this doesn’t make sure your program will always return a value. In fact, the Collatz conjecture (that every positive integer plugged into the above recursive function will always end up returning 1) is, as yet, unproven; so it’s conceivable to enter a value that ends up with the program never stopping (or more likely overflowing). What the compile time check is doing is ensuring there are no obvious gaps in your code where you’ve forgotten to return a value, it leaves the rest up to you!

Conclusion

The return statement is an integral part of the C# programming language, we’ve seen to how to use it to leave a function, when it can be omitted, and how to use it to return values, how to return multiple values, what yield return does and when it’s useful, and much more besides. I hope you’ve enjoyed this deep dive into statement that most people take for granted, and I hope you learned something along the way.

If you want a whole post on “Not all code paths return a value”, if something is still not clear to you, or you just want to show me some love, let me know in the comments!

C# Delay – How to pause code execution in C#

Introduction

When programming, it is common to want to pause execution for a certain amount of time and most common programming and scripting languages have some form of Sleep command built in to achieve this.

For example, when we’ve encountered a problem with a remote resource, it’s common to back-off (pause execution) for a short amount of time and retry.

Here I’ll go through the various options for introducing a delay in C#, ranging from the most basic (Thread.Sleep()) – suitable for use in a single threaded console application. To more complicated versions for use in multi threaded interactive applications.

Add a delay in C# using Thread.Sleep()

// Will delay for three seconds
Thread.Sleep(3000);

Using Thread.Sleep() is the simplest way to introduce a delay in C# code, but it will hang the main thread for the duration of the delay, so it’s only really appropriate for console applications. Assuming you’re happy with that, let’s dive into a more complete example:

using System;
using System.Threading;

class Program
{
  static void Main()
  {
    Console.WriteLine($"Delay starting at {DateTime.Now}");

    // Will delay for three seconds
    var milliseconds = 3000;
    Thread.Sleep(milliseconds);
    Console.WriteLine($"Finished delay at {DateTime.Now}");
  }
}

/* this code outputs:
Delay starting at 13/11/2020 11:59:39
Finished delay at 13/11/2020 11:59:42
*/

A common mistake when first using Thread.Sleep() is to forget the using, result in the following error:

error CS0103 C# The name 'Thread' does not exist in the current context

This is easily fixed by adding a “using System.Threading;” line at the top of the file, as in the above example.

The next thing to note is that Thread.Sleep() takes miliseconds as it’s argument, so if you want to sleep for 3 seconds you need to pass the number 3000. It’s possible to make your intent clearer using a timespan like this:

Thread.Sleep(TimeSpan.FromSeconds(3));

But older versions of Thread.Sleep didn’t take a TimeSpan, so your mileage may vary.

Add a Delay in C# without blocking main thread using Task.Delay()

// Will delay for 3 seconds
await Task.Delay(3000);

There is an asynchronous version of Thread.Sleep called Task.Delay. If you’re not familiar with how asynchronous calls work in C# then I’m planning a series of posts on the topic (let me know you want it in the comments!). Until that’s up, see the official docs if you need more info.

The idea here is to start a new task that runs the delay in the background, let’s look at an example:

using System;
using System.Threading.Tasks;

class Program
{
  async static Task Main()
  {
    Console.WriteLine($"Delay starting at {DateTime.Now}");

    // Will delay for 3 seconds
    await Task.Delay(3000);
    Console.WriteLine($"Finished delay at {DateTime.Now}");
  }
}

/* this code outputs:
Delay starting at 13/11/2020 12:23:09
Finished delay at 13/11/2020 12:23:12
*/

Once again if you forget the using, you might encounter the following error:

CS0246 C# The type or namespace name 'Task' could not be found (are you missing a using directive or an assembly reference?)

Be sure to add “using System.Threading.Tasks;” to avoid this.

Note also that we’ve had to make our main method async and return a Task rather than void. This is a contrived example, and if you’ve got a console application as simple of this one there’s no need for this asynchronous version (just use sleep). Forgetting to do that can result in this error:

CS4033 C# The 'await' operator can only be used within an async method. Consider marking this method with the 'async' modifier and changing its return type to 'Task'.

So what can we do with this asynchronous delay that can’t be done with the basic Thread.Sleep()? Quite simply, we can get other things done while we wait for the delay to finish. Time for a further example:

using System;
using System.Threading;
using System.Threading.Tasks;

class Program
{
  static void Main()
  {
    Console.WriteLine($"Delay starting at {DateTime.Now}");

    //Sleep for 3 seconds in the background
    var delay = Task.Delay(TimeSpan.FromSeconds(3));

    var seconds = 0;
    while (!delay.IsCompleted)
    {
      // While we're waiting, note the time ticking past
      seconds++;
      Thread.Sleep(TimeSpan.FromSeconds(1));
      Console.WriteLine($"Waiting... {seconds}");
    }

    Console.WriteLine($"Finished delay at {DateTime.Now} after {seconds} seconds");
  }
}

/* this code outputs:
Delay starting at 13/11/2020 12:44:49
Waiting... 1
Waiting... 2
Waiting... 3
Finished delay at 13/11/2020 12:44:52 after 3 seconds
*/

This example makes use of the fact that Task.Delay() is running in the background, and allows the main thread to do some useful (or not!) work. In this example the main thread just outputs “Waiting… {seconds}”, but I’d argue that even that is useful as it provides feedback to the user that the console application is still actively working, it could easily be updated to print the % done or similar.

I hope I’ve not confused things by combining both Task.Delay() and Thread.Sleep() in one example!

For interactive (non-console) applications it’s especially important that the main thread can respond to inputs, allowing the program to remain interactive while delays are processed in the background.

Add a repeating delay in C# using one of the Timer classes

There are several timer classes that can be used to repeatedly trigger events after a certain period of time has elapsed:

System.Timers.Timer
System.Threading.Timer
System.Windows.Forms.Timer (.NET Framework only)
System.Web.UI.Timer
System.Windows.Threading.DispatcherTimer

Each of these timer classes has different functionality, with these remarks on MSDN giving more details of which to use depending on your requirements.

If you just want a single delay in C# then use either Thread.Sleep() or Task.Delay() as described above. However, if you’re after a repeating delay, a timer can be helpful.

For the purposes of the following examples, I’m going to use a System.Threading.Timer() as it appears to be Microsoft preferred general purpose timer.

The thrust of using a timer comes from instantiating a new System.Threading.Timer(), to which you need to supply at least three arguments: callback, dueTime and period.

  • callback TimerCallback – this is the method that should be called whenever the timer fires
  • dueTime int/Timespan – this is how long to wait before the timer first fires
  • period int/TimeSpan – this is how long to wait between each subsequent firing of the timer

As example of such an instantiation might be:

var timer = new System.Threading.Timer(
  DoSomething,
  null,
  TimeSpan.FromSeconds(5),  // time to first firing
  TimeSpan.FromSeconds(1)); // delay for each subsequent firing

This starts a timer that will wait 5 seconds before calling DoSomething, which it will continue to do once a second after that.

The following example is more complete, showing you how to set up the callback, one way of tracking the number of times it’s called, and how to signal that the timer should finish and then stop it. Here’re the code:

using System;
using System.Threading;

class Program
{
  static void Main()
  {
    Console.WriteLine($"Delay starting at {DateTime.Now}");

    var idleWaiter = new IdleWaiter(3);

    // Create an AutoResetEvent to signal when the IdleWaiter was reached it's limit
    var autoEvent = new AutoResetEvent(false);

    var timer = new System.Threading.Timer(
      idleWaiter.PrintWaiting,
      autoEvent,
      TimeSpan.FromSeconds(1),  // time to first firing
      TimeSpan.FromSeconds(1)); // delay for each subsequent firing

    // Wait until the autoevent signals;
    autoEvent.WaitOne();

    // Dispose of the timer
    timer.Dispose();
    Console.WriteLine($"Finished delay at {DateTime.Now} after {idleWaiter.TimesCalled} seconds");
 
  }
}

class IdleWaiter
{
  public IdleWaiter(int threshold)
  {
    this.Threshold = threshold;
  }

  public int TimesCalled { get; private set; }
  public int Threshold { get; }

  public void PrintWaiting(object stateInfo)
  {
    var autoEvent = (AutoResetEvent)stateInfo;
    Console.WriteLine($"Waiting... {++this.TimesCalled}");

    if (this.TimesCalled >= Threshold)
    {
      autoEvent.Set();
    }
  }
}

/* this code outputs:
Delay starting at 13/11/2020 13:44:39
Waiting... 1
Waiting... 2
Waiting... 3
Finished delay at 13/11/2020 13:44:42 after 3 seconds
*/

I know it’s a bit heavy, but in the above example I’ve created a new class IdleWaiter which is responsible for printing “Waiting…” each time it’s called, while tracking the number of times it’s been called and signalling (via an autoResetEvent) when it’s reached a threshold.

When you run this code, the timer fires every seconds until it’s been run three times, then it signals that it’s reached it’s threshold and we stop the timer by disposing of it.

If we didn’t dispose of the timer it would keep on ticking once every second. You can try this for yourself by commenting out the dispose line and adding a Thread.Sleep() to stop the program exiting:

// Dispose of the timer
//timer.Dispose();
Console.WriteLine($"Finished delay at {DateTime.Now} after {idleWaiter.TimesCalled} seconds");

Thread.Sleep(5000);

If you run the above code with this change you get the following output:

Delay starting at 13/11/2020 13:56:32
Waiting... 1
Waiting... 2
Waiting... 3
Finished delay at 13/11/2020 13:56:35 after 3 seconds
Waiting... 4
Waiting... 5
Waiting... 6
Waiting... 7
Waiting... 8

Using a Timer might be the right choice if you want a task to repeat on a schedule, but given the added complexity, I’d probably stick to the other options for most use cases.

Conclusion

If you’re in a console app, or some other single threaded application, you can use Thread.Sleep() to trigger a delay, just be careful with the fact this takes milliseconds (or better yet use TimeSpan.FromSeconds()).

If you want to delay in an asynchronous way, allowing your main thread to do useful work in the interim, Thread.Delay() is they way to go.

If you’re want to kick something off repeatedly with a delay in between, then you should be using a timer like System.threading.Timer.

I’m sure there are other ways of adding a delay in C#, but I think I’ve covered off the most important ones. If there’s something you think I’ve missed that deserved to be included, if you think I’ve got something wrong, or if you just want to congratulate me on a job well done then please let me know in the comments.

Mean while, if you want to go deep on another area of C#, might I recommend my recent post on using (or abusing?) Linq style foreach.

C# Char to Int – How to convert a Char to an Int in C#

Introduction

It’s common to pull chars out of string, especially with regular expressions, and need to treat them like a true number. The easiest way to convert a char to in int in C# is: int.Parse(myChar.ToString()).

With that said, it’s not always the safest or most efficient method. Here I’ll give you a few different options for how to convert a char to an int, together with a discussion of their pros and cons and when each approach might appropriate.

Convert a Char to an Int using int.Parse()

As mentioned in the introduction, int.Parse() is the simplest method. If you’re certain that your char is an integer and are prepared to accept an exception being thrown if you’re wrong, then int.Parse() is your friend:

var myChar = '7';
var myInt = int.Parse(myChar.ToString());
Console.WriteLine(myInt);

/* this code outputs:
7
*/

Note the use of myChar.ToString() to convert the char into a string as there’s no overload of int.Parse() that takes a char. If you want further details on this, checkout the official docs.

But do bear in mind that you can expect a System.FormatException if the character you enter isn’t an int. You couple wrap this statement in a try catch block, but there is a neater way to handle these cases using int.TryParse:

Convert a Char to an Int using int.TryParse()

If you’re not sure if your character represents an integer, but you want to try converting it anyway, then int.TryParse() is for you. Here’s an example of how to use it:

int result;
if (!int.TryParse(myChar.ToString(), out result))
{
  // Do something else
}

Of if you’re after a more complete example:

using System;

public class Program
{
  public static void Main()
  {
    var myChars = new char[] {'9', 'z', '½'};
    foreach (var myChar in myChars)
    {
      if (!int.TryParse(myChar.ToString(), out var result))
      {
        Console.WriteLine($"the char {myChar} does not represent an integer");
      }
      else
      {
        Console.WriteLine($"{result} is of type {result.GetType()}");
      }
    }
  }
}

/* this code outputs:
9 is of type System.Int32
the char z does not represent an integer
the char ½ does not represent an integer
*/

Using int.TryParse() is safe, in that it’s not likely to throw exceptions, and in my opinion it wins on the readability front. That said, keep reading for some faster, more concise and some might argue more correct options. In particular, the eagle eyed among you might have noticed in the above example that the char ½ does not represent an integer. That’s true, but it does represent a number! so how can we parse ½, ¾ or the other fraction symbols?

Convert a Char to an Int using Char.GetNumericValue()

The Char.GetNumericValue() method can be used to convert a char or a string to a double as follows:

var number = (int)Char.GetNumericValue(myChar);

Or for a more complete example:

using System;

public class Program
{
  public static void Main()
  {
    var myChars = new char[] {'9', 'z', '½'};
    foreach (var myChar in myChars)
    {
      var number = Char.GetNumericValue(myChar);
      var myInt = (int)number;
      
      if (number == -1)
      {
        Console.WriteLine($"{myChar} has no numberic value");
      }
      else if (myInt == number)
      {
        Console.WriteLine($"{myChar} converts to the integer {myInt}");
      }
      else
      {
        Console.WriteLine($"{myChar} converts to {number}, which is not an integer");
      }
    }
  }
}

/* this code outputs:
9 converts to the integer 9
z has no numberic value
½ converts to 0.5, which is not an integer
*/

One of the things I like about char.GetNumericValue is is that ½ correctly converts to 0.5, but although it’s a number (actually a System.Double) it’s not an int, so not really appropriate for this article.

Also, when a character like ‘x’ is parsed using Char.GetNumericValue, no exception is thrown. Instead it’s given the result of -1, meaning you have to be careful and check explicitly for -1 errors in your code.

It’s due to these oddities that I generally prefer int.TryParse(), but if your dataset is likely to contain fraction characters char.GetNumericValue() is worth being aware of. That said, if you’re after ways to convert arbitrary strings to doubles then you might want to look at Double.TryParse(), but that’s a story for another post.

I mentioned above that there is a more concise and faster method available and I’d like to touch on those in the next section:

Convert a Char to an Int using character arithmetic

Before I go into the example, it’s worth explaining that each char is internally represented by a number, as can be seen in the following table:

We can check that this is true in c# as follows:

var myChars = new char[] {'0', '1', '2', '3', '4', '5', '6', '7', '8', '9'};
foreach (var myChar in myChars)
{
  Console.WriteLine($"{myChar} is {Convert.ToInt32(myChar)} interally");
}

/* this code outputs:
0 is 48 interally
1 is 49 interally
2 is 50 interally
3 is 51 interally
4 is 52 interally
5 is 53 interally
6 is 54 interally
7 is 55 interally
8 is 56 interally
9 is 57 interally
*/

So why is that important? Well, it means we can use arithmetic to very quickly and easily convert the characters 0-9 into integers as follows:

var myInt = myChar - '0';

Or for a complete example:

using System;

public class Program
{
  public static void Main()
  {
    var myChars = new char[] {'0', '1', '2', '3', '4', '5', '6', '7', '8', '9'};
    foreach (var myChar in myChars)
    {
      var myInt = myChar - '0';
      Console.WriteLine($"{myChar} is now {myInt} which is of type {myInt.GetType()}");
    }
  }
}

/* this code outputs:
0 is now 0 which is of type System.Int32
1 is now 1 which is of type System.Int32
2 is now 2 which is of type System.Int32
3 is now 3 which is of type System.Int32
4 is now 4 which is of type System.Int32
5 is now 5 which is of type System.Int32
6 is now 6 which is of type System.Int32
7 is now 7 which is of type System.Int32
8 is now 8 which is of type System.Int32
9 is now 9 which is of type System.Int32
*/

So this is clearly concise, and it’s likely to be quick, so why isn’t it my preferred method?

Firstly, it’s really not obvious when reading this code back what we’re trying to achieve. I’d say this code isn’t very readable. You could argue that a well placed comment could fix that:

// Convert the char into an int - for details see
// https://csharpsage.com/c-char-to-int#Convert_a_Char_to_Int_using_character_arithmetic
var myInt = myChar - '0';

But my counter argument would be that comments become stale, and it takes more effort to even commented unreadable code, than code that is readable to begin with.

There’s also the question of what happens if we try and parse a character like ‘x’ with this method:

using System;

public class Program
{
  public static void Main()
  {
    var myChars = new char[] {'9', 'a', 'z', '½'};
    foreach (var myChar in myChars)
    {
      var myInt = myChar - '0';
      Console.WriteLine($"{myChar} is now {myInt} which is of type {myInt.GetType()}");
      if (myInt < 0 || myInt > 9)
      {
        Console.WriteLine($"{myChar} appears to be out of bounds");
      }
    }
  }
}

/* this code outputs:
9 is now 9 which is of type System.Int32
a is now 49 which is of type System.Int32
a appears to be out of bounds
z is now 74 which is of type System.Int32
z appears to be out of bounds
½ is now 141 which is of type System.Int32
½ appears to be out of bounds
*/

You’ll see that this method does not have any in built error checking, so we again need to be careful and check the bounds of the result ourselves.

I don’t know about you, but I’m not one of r being careful, so I like to stick to methods that do the fiddly stuff for me, like int.TryParse() described above.

Conclusion

There’s more than one way to skin a cat, and there’s more than one way to convert a char to in int in C#:

  1. If you’re confident it’s an int, use: int.Parse(myChar.ToString());
  2. If you’re not sure if your char represents an in, use: int.Parse(myChar.ToString(), out var myInt);
  3. If you want to handle fractional chars (like ½) consider using char.GetNumericValue(), but be aware it might turn -1 if the input doesn’t represent a char and the output is a double not an int;
  4. If you’re confident in your input is in the range ‘0’ – ‘9’ and speed is key, then you can do: var myInt = myChar – ‘0’;

I hope this has brought some clarity to the subject of converting a char to an int in C#. If you’re interested in improving your C# skills, you might want to check out my recent post on C# interview questions.

C# List Length – How to get (and set) the Length of a List in C#

Introduction

On the face of it, this is a really easy one: just use List.Count:

Simple Example of getting List Length using List.Count in C#

var numbers = new List<int> {1, 2, 3};	
Console.WriteLine(numbers.Count);

/* this code outputs:
3
*/

Length Vs Count in C#

It’s common to use Arrays before using Lists (at least if you’re as old as I am!) so it often feels natural to use Length. That said, Length is not very widely available – it’s usually seen on Arrays and Strings:

var numbers = new int[] {1, 2, 3};
Console.WriteLine($"array length: {numbers.Length}");
	  
var myString = "some string";
Console.WriteLine($"string length: {myString.Length}");

/* this code outputs:
array length: 3
string length: 11
*/

List Count vs Capacity

List.Count will tell you how many items there are in your list, while List.Capacity gets (and sets) the number of items the list can hold without resizing.

For as a background, it’s worth reminding ourselves that Lists (unlike arrays) will resize dynamically – that is you can keep adding items to a list and it will grow to allow the items to fit. Capacity is added in chunks (powers of 2), so that resizing happens occasionally (not every time an element is added):

using System;
using System.Collections.Generic;
					
var numbers = new List<int>();	
Console.WriteLine($"Count: {numbers.Count}");
Console.WriteLine($"Capacity: {numbers.Capacity}");
var prevCapacity = numbers.Capacity;
var loop = 0;
while (loop <= 8)
{
	numbers.Add(1);
	if (numbers.Capacity != prevCapacity)
	{
		prevCapacity = numbers.Capacity;
		loop++;
		Console.WriteLine($"Capacity: {numbers.Capacity}");
	}
}

/* this code outputs:
Count: 0
Capacity: 0
Capacity: 4
Capacity: 8
Capacity: 16
Capacity: 32
Capacity: 64
Capacity: 128
Capacity: 256
Capacity: 512
Capacity: 1024
*/

Setting the capacity of an array in C#

It’s also possible to control the capacity of a List manually, either when the List is initialised (as in this example where we set the capacity to 10):

var numbers = new List<int>(10) {1, 2, 3};	
Console.WriteLine($"Count: {numbers.Count}");
Console.WriteLine($"Capacity: {numbers.Capacity}");

/* this code outputs:
Count: 3
Capacity: 10
*/

Or after initialisation, as in this example where we set it to 20:

var numbers = new List<int> {1, 2, 3};	
Console.WriteLine($"Count: {numbers.Count}");
numbers.Capacity = 20;
Console.WriteLine($"Capacity: {numbers.Capacity}");

/* this code outputs:
Count: 3
Capacity: 20
*/

When setting the capacity, it might be tempting to set it to a value smaller than the list’s current size. Don’t do it – this will result in an exception being thrown at runtime:

using System.Collections.Generic;
					
var numbers = new List<int> {1, 2, 3};	
numbers.Capacity = 2;

/* this code outputs:
Unhandled exception. System.ArgumentOutOfRangeException: capacity was less than the current size. (Parameter 'value')
   at System.Collections.Generic.List`1.set_Capacity(Int32 value)
   at <Program>$.<Main>$(String[] args)
Command terminated by signal 6
*/

ICollection Count property

List isn’t the only datatype in C# to have a Count property, in fact every type that implements the ICollection interface has a count property, some notable examples include: Dictionary, HashSet and SortedSet. A more complete list is available in the ICollection docs.

IEnumerable Count() method

There is another, more generic (implemented by more diverse objects) option – that’s to use the Linq IEnumerable.Count() method:

using System;
using System.Linq;

public class Program
{
  public static void Main()
  {
    var numbers = new int[] {1, 2, 3};
    Console.WriteLine($"array length: {numbers.Count()}");
  }
}

/* this code outputs:
array length: 3
*/

Note the line at the top saying “using System.Linq” – you’ll get an error if you don’t include that.

This method is more widely available than either the Length or Count properties, but it can be much slower. For more in depth discussion on Count vs Count() you can see my post on the subject: C# Linq Count

C# Reverse List – How to Reverse a List (and other collections) in C#

Introduction

I’ve mentioned the Reverse method a couple of times now, so I feel it deserved it’s own post, comparing and contrasting the various implementations.

Most recently it came up in one of my technical interviews, and knowing it existed ultimately helped me to land the job, so I want to share this wonderful Linq extension method with you all. Let’s start with an example:

Example:

Given an array of strings, print them all in reverse order.

using System;
using System.Linq;

public class Program
{
  public static void Main()
  {
    var numbers = new int[] {1, 2, 3};
    foreach(var number in numbers.Reverse())
    {
      Console.WriteLine(number);
    }
  }
}

/* this code outputs:
3
2
1
*/

This is a fairly straightforward example, but what’s going on here?

Well, the line: “foreach(var number in numbers.Reverse())” is the interesting one, this causes use to iterate over the array in reverse order. Importantly it uses my favoured Linq Reverse extension method.

Does this create a copy of the array? this stack overflow answer suggest that yes, it does. The reason being that changing this behaviour might break existing code that modifies the collection it’s iterating over.

Certainly there are some cases (i.e. IEnumerables of unknown size) where it’s necessary to loop through and reach the end of the collection before we can start reversing it.

But I still prefer it because it sticks to functional programming principles of not mutating/changing the original dataset. It’s also very handy because you can chain the calls.

List Reverse (without Linq)

Consider the following (non-functional) example which does mutate the original datastructure:

using System;
using System.Collections.Generic;

public class Program
{
  public static void Main()
  {
    var numbers = new List<int>() {1, 2, 3};
	numbers.Reverse();
    foreach(var number in numbers)
    {
      Console.WriteLine(number);
    }
  }
}

/* this code outputs:
3
2
1
*/

The difference here is subtle, but note that we’re not using System.Linq.

This is an older (pre-Linq) method for reversing a list and we lose the original (un-reversed) data. This change has come about because we’ve switched from using an Array to using a List and (unlike Array) List had it’s own .Reverse() implementation before the Linq extension method was introduced.

So how do we use the newer (and better?) Linq extension method with a list?

Using Linq Reverse with a List with Enumerable.Reverse(ourList)

To use the Linq extension method on a List we need to call it explicitly using Enumerable.Reverse():

using System;
using System.Collections.Generic;
using System.Linq;

public class Program
{
  public static void Main()
  {
    var numbers = new List<int>() {1, 2, 3};
    foreach(var number in Enumerable.Reverse(numbers))
    {
      Console.WriteLine(number);
    }
	Console.WriteLine();
	
	Console.WriteLine("We still have access to the original list:");
	foreach(var number in numbers)
    {
      Console.WriteLine(number);
    }
  }
}

/* this code outputs:
3
2
1

We still have access to the original list:
1
2
3
*/

Array.Reverse vs Linq Reverse for Arrays

The first example used an array and it defaulted to Linq, while the second example used a List but it had a Reverse implementation which pre-dated Linq. But hang on, Arrays are much older than Linq, why don’t they have a Reverse implementation?

It turns out there is a non-functional (reverse in place) reverse method for Arrays, it’s just accessed as a static method on Array:

using System;

public class Program
{
  public static void Main()
  {
    var numbers = new int[] {1, 2, 3};
	Array.Reverse(numbers);
    foreach(var number in numbers)
    {
      Console.WriteLine(number);
    }
  }
}

/* this code outputs:
3
2
1
*/

Conclusion

We’ve seen that the .Reverse() Linq method is great, it doesn’t mutate the underlying data and is very handy for iterating backwards through a collection.

We have to be careful when using Lists however, as they have their own .Reverse which behaves differently.

We can call Enumerable.Reverse(ourList) to get the modern functional behaviour with lists.

At some point I’ll do a deep dive on Functional style programming in C#, but until then, you might like to check out some more details on my recent Linq Interview questions (including the one using Reverse): Linq Interview Questions – How to prepare for an interview by example.

Linq Interview Questions by Example, how and why!

The following question was set for me recently. I had the advantage that my recruiter gave me a heads up on what to expect and I could play around with writing the clearest solution while not under the intense pressure of an interview.

I’m in two minds about pre-warning candidates what they should expect in an interview, on the one hand you’re undermining the interview process and potentially giving certain candidates an advantage. On the other hand (and this definitely applies to me), you’re not going to get the best out of a candidate that’s stressing due to being in an interview. Giving a heads up lets people prepare and ultimately remain calm and give a better performance.

Ultimately, I think that if you’re marking people strictly on their solution, you’re probably doing it wrong. You should use the coding question as a jumping off point for discussion and it’s the answers they give during the Q&A that should be the deciding factor.

I’m putting this here to give anyone interested enough, the chance to prepare.

The Linq Interview Question

“Given a string of words, write some code to reverse them.”

My Solution

using System;
using System.Linq;

public class Program
{
  public static void Main()
  {
    var myString = "The quick brown fox jumps over the lazy dog";
    Console.WriteLine(Reverse(myString));
  }

  public static string Reverse(string input)
  {
    return String.Join(" ", input.Split(' ').Reverse());
  }
}

All the work is done in this line, the rest is just scaffolding:

return String.Join(" ", input.Split(' ').Reverse());

Discussion

Breaking this down:

First, the string is split into an array of words using input.Split(‘ ‘). The official String.Split() documentation might help with understanding this.

Then, the Linq happens! we’re only using one method, but it’s a great one for this use case: .Reverse(). I’ve written about this method before in my post on Linq Except and it’s brethren.

Finally, the array is reconstituted back into a large string with String.Join() before being retrurned.

All in all, there’s not much to this solution, but especially in interviews, they’re often looking for the simple elegant solution.

Conclusion

If you want to read about some of the other Linq extention methods that you might get tested on, you could do worse that checking out my post on Except and other set based Linq methods: C# Linq Except: How to Get Items Not In Another List.

To put your mind at ease, yes I did pass this particular interview! Though the rounds after this did get trickier. I’ll save all the juicy bits for a future post.

C# Linq ForEach – How to Linq style loop over items in a List

Introduction

First a quick warning, I have occasionally used this construct in my code, but as part of writing this article I’ve come round to the idea that it’s often a bad idea! That said, to paraphrase Randall Munroe: “The Rules of [coding] are like magic spells. If you never acquire them, then not using them says nothing.” So let’s do this, shall we?

List.ForEach() example

The following code will print out one line for each element in a list using Linq like syntax:

var numbers = new List<int>() { 1, 2, 3 };
numbers.ForEach(x => Console.WriteLine(x));

/* this code outputs:
1
2
3
*/

Note though, that this is a List extension method in the same System.Collections.Generic as List itself. So there is nothing Linq about this method or syntax, it just looks like Linq.

C# Linq ForEach Where – Execute an action foreach item in a collect where a condition is true

The example above will perform the WriteLine method on every item in a list. Sometimes though, you only want to perform such an action on certain items.

This is easy to do by using a where clause to filter the items, before using foreach. But be careful! the where clause will result in an IEnumerable, which needs to be converted to a List before we can use List’s ForEach. For example:

var numbers = new List<int>() { 1, 2, 3, 4, 5 };
numbers.Where(x => x > 2)
    .ToList()
    .ForEach(x => Console.WriteLine(x));

/* this code outputs:
3
4
5
*/

Linq ForEach Where In

This is one for those coming from an SQL background, for them WHERE IN is a very common construct. It can be done in C# using .Contains() as follows:

var numbers = new List<int>() { 1, 2, 3, 4, 5 };
var squares = new List<int>() { 2, 4, 9 };
numbers.Where(x => squares.Contains(x))
    .ToList()
    .ForEach(x => Console.WriteLine(x));

/* this code outputs:
2
4
*/

Linq ForEach Multiple Actions

All the examples so far have used Console.WriteLine() to print the result, but what if we want to do perform multiple actions within a Linq style ForEach? That can be achieved as follows:

var numbers = new List<int>() { 1, 2, 3, 4, 5 };

numbers.Where(num => num > 2)
  .ToList()
  .ForEach( number =>
  {
    var square = number * number;
    Console.WriteLine($"{number} squared is {square}");
  });

/* this code outputs:
3 squared is 9
4 squared is 16
5 squared is 25
*/

But hang on, the .ToList() smells like a hack, it will create a new copy of the data, potentially wasting memory and computation time. Can we do any better?

Well, at this point you might as well use a foreach loop instead:

var numbers = new List<int>() { 1, 2, 3, 4, 5 };

foreach (var number in numbers.Where(num => num > 2))
{
  var square = number * number;
  Console.WriteLine($"{number} squared is {square}");
}

/* this code outputs:
3 squared is 9
4 squared is 16
5 squared is 25
*/

But there is another way… We could implement a Linq style .ForEach ourselves if we really want to:

C# Linq ForEach IEnumerable – implementing it ourselves

It turns out that it’s really rather simple to implement this ourselves:

public static void ForEach<T>(this IEnumerable<T> sequence, Action<T> action)
{
  if (action == null)
  {
    throw new ArgumentNullException(nameof(action));
  }

  foreach(T item in sequence)
  {
    action(item);
  }
}

With our own implementation of .ForEach for IEnumerables we can then write code like this (note, no need for .ToList() and it’s associated performance problems!):

var numbers = new List<int>() { 1, 2, 3, 4, 5 };
numbers.Where(x => x > 2).ForEach(x => Console.WriteLine(x));

/* this code outputs:
3
4
5
*/

But hang on, if it’s that easy, why isn’t it part of the standard implementation?

Why doesn’t .ForEach work with IEnumerables out of the box?

As explained above, the ForEach Linq extension doesn’t work for IEnumerables, it’s only works for on a List. Why is that?

The closest thing I could find to an official answer on this came from this blog post, to summarise: “[it] violates the functional programming principles… [and] adds zero new representational power to the language”.

The first argument is that Linq expressions are assumed to not have side effects, while .ForEach is explicitly there to create side effects. This results in code which potentially doesn’t do what the person reading it expects.

I also found this argument about lazy evaluation interesting: when I’m working with an IEnumerable I don’t expect the expression to be evaluated until I call .ToList() or similar – should calling .ForEach() on an IEnumerable evaluate it? I’m pretty sure that yes, it should, but I can see that it’s not obvious (so probably worth avoiding).

The second official argument is basically, why would you bother when you have foreach? And while my coding style (heavily influenced by stylecop!) means .ForEach can look a lot cleaner, I have to admit that using a foreach loop is easier to remember, clear what it’s doing and isn’t exactly a hardship:

var numbers = new List<int>() { 1, 2, 3, 4, 5 };

foreach (var number in numbers.Where(num => num > 2))
{
  Console.WriteLine(number);
}

/* this code outputs:
3
4
5
*/

Conclusion

.ForEach() is easy to use, but it’s for List only (there is no true Linq ForEach).

.ToList() is a nice hack that we can use with IEnumerables (but probably shouldn’t)

It’s pretty easy to add our own IEnumerable .ForEach(), but it’s probably not worth it.

Just use foreach when you have an IEnumerable and your aim is to cause side effects.

I feel that I’ve acquired the knowledge of how to use a Linq style ForEach in my code, but I feel enlightened enough to know that (unless I already have a List) my code is probably better off without it.

If you’re into Linq, you might like this post on Except and other set based Linq extension methods: C# Linq Except: How to Get Items Not In Another List

How to Sort a C# Dictionary By Key (and when not to!)

Overview

Dictionaries in C# are implemented as a Hash Table, this allows near constant time lookups, but is inherently unsorted.

To do a one-off extract of the data from a dictionary, sorted by key, you can use the OrderBy Linq method as follows:

var sorted = myDictionary.OrderBy(x => x.Key);

This is not going to have the best performance, O(n*log(n)), as it needs to sort all the entries, hence why I said only use it for one-off ordering.

If you need to store the elements of your dictionary in order (because you need to to repeatedly access them in order) then you should consider using a SortedList or a SortedDictionary instead:

var mySortedList = new SortedList<string, int>();
var mySortedDictionary = new SortedDictionary<string, int>();

The name SortedList is misleading and comes from it’s internal implementation (using lists and relying on binary search), it’s still a dictionary in that it maps keys to values. SortedDictionary uses a different implementation again, this time using a tree structure and binary search.

By using these structures you can extract the list of elements in order in linear time O(n), but lose some performance in lookup and insertion times.

One-off sorting dictionary by key and by value

As mentioned in the overview, the Linq OrderBy method can be used to extract the elements of a dictionary and sort them. If you need to do this repeatedly you should consider the SortedList or SortedDictionary data structures below, but for one off sorting it’s ideal. In this section I’ll also show you how to sort a dictionary in descending order and how to sort a dictionary by value, all with example code you can reuse.

Linq OrderBy

OrderBy lets you sort a dictionary by it’s keys, or more accurately, it lets you extract an IOrderedEnumerable of KeyValuePairs from your dictionary.

var fruit = new Dictionary<string, int>
{
    ["apple"] = 1,
    ["pear"] = 4,
    ["banana"] = 6,
};

foreach (var item in fruit.OrderBy(x => x.Key))
{
    Console.WriteLine(item);
}

/* this code outputs:
[apple, 1]
[banana, 6]
[pear, 4]
*/

As mentioned in the overview, the dictionary doesn’t store the elements in order, so sorting them will be O(n*log(n)).

Linq OrderByDescending

There is also the OrderByDescending Linq method, which does as it’s name suggests – it reverses the order:

var fruit = new Dictionary<string, int>
{
    ["apple"] = 1,
    ["pear"] = 4,
    ["banana"] = 6,
};

foreach (var item in fruit.OrderByDescending(x => x.Key))
{
    Console.WriteLine(item);
}

/* this code outputs:
[pear, 4]
[banana, 6]
[apple, 1]
*/

Sort C# Dictionary by Value

To sort a dictionary by value we make use of the same OrderBy method, we just pass is a slightly different lamba expression:

var sorted = myDictionary.OrderBy(x => x.Value);

Or to show this in context:

var fruit = new Dictionary<string, int>
{
    ["apple"] = 1,
    ["pear"] = 4,
    ["banana"] = 6,
};

foreach (var item in fruit.OrderBy(x => x.Value))
{
    Console.WriteLine(item);
}

/* this code outputs:
[apple, 1]
[pear, 4]
[banana, 6]
*/

And of course, you can always sort it descending:

var sorted = myDictionary.OrderByDescending(x => x.Value);

Dictionary Style Data Structures with Sorting Built In

When I say dictionary style, what I mean is they map keys to values as so:

var fruit = new SortedDictionary<string, int>
{
    ["apple"] = 1,
    ["pear"] = 4,
    ["banana"] = 6,
};

Console.WriteLine($"apple's value is: {fruit["apple"]}");

/* this code outputs:
apple's value is: 1
*/

When I say they have sorting built in, I mean they internally store their items in order, so it’s quick and easy O(n) to get the values out in order.

Both the SortList and SortedDictionary have these properties.

C# SortedList

As mentioned above, the name of this data structure can be misleading. It maps keys to values so can be use just like a dictionary.

The name comes from it’s internal implementation using a list. It uses a binary search to find items by key (which is slower than the has table implementation used by Dictionary).

As the name suggests, it stores it’s values in order. So if we iterate through the elements of the SortedList they come back in order:

var fruit = new SortedList<string, int>
{
    ["apple"] = 1,
    ["pear"] = 4,
    ["banana"] = 6,
};

foreach (var item in fruit)
{
    Console.WriteLine(item);
}

/* this code outputs:
[apple, 1]
[banana, 6]
[pear, 4]
*/

The important thing to note here, is that we didn’t use an OrderBy clause on our foreach line. The data is guaranteed to be returned in the order of the keys.

SortedDictionary

The only difference here is its internal implementation (using a tree structure) which can have some slightly different performance trade offs when it comes to lookups and insertions. For our purposes, this works just like a SortedList:

var fruit = new SortedDictionary<string, int>
{
    ["apple"] = 1,
    ["pear"] = 4,
    ["banana"] = 6,
};

foreach (var item in fruit)
{
    Console.WriteLine(item);
}

/* this code outputs:
[apple, 1]
[banana, 6]
[pear, 4]

If you’re trying to decide between the two, consider trying both on a sample of your data and see if you can spot a different. Personally, if there’s no significant performance difference, I use SortedDictionary as I feel the name is less likely to cause confusion.

Linq Reverse

What is the equivalent of OrderByDescending for these data structures? If you want to iterate through the elements of a SortedList or SortedDictionary in reverse order, then the Linq Reverse method is your friend:

var fruit = new SortedDictionary<string, int>
{
    ["apple"] = 1,
    ["pear"] = 4,
    ["banana"] = 6,
};

foreach (var item in fruit.Reverse())
{
    Console.WriteLine(item);
}

/* this code outputs:
[pear, 4]
[banana, 6]
[apple, 1]
*/

Summary

To sort a C# Dictionary by it’s keys as a one off, use OrderBy or OrderByDescending:

var sorted = myDictionary.OrderBy(x => x.Key);
var sorted = myDictionary.OrderByDescending(x => x.Key);

To sort a C# Dictionary by it’s values as a one off, use OrderBy with x => x.Value as the lamba expression:

var sorted = myDictionary.OrderBy(x => x.Value);

If you need a datastructure that still has dictionary style value lookups by key, use either a SortedList or SortedDictionary:

var sortedList = new SortedList<string, int>();
var sortedDict = new SortedDictionary<string, int>();

To loop over these in descending order by key, use the Linq Reverse method:

foreach (var item in sortedList.Reverse()) { ... }

Conclusion

I hope this has cleared up any confusion there might have been around how to Sort a C# Dictionary by it’s keys or it’s values. It might even have brought some new data structures to your attention.

If you want to know more about how these data structures actually perform, leave a comment below and I’ll update this post with some real-world performance benchmarks.

Argo Workflows – Why You Need It!

Introduction

I’ve been getting down and dirty with Argo Workflows over the past few months as part of my day job. I’ve been evaluating it for use as a workflow automation tool for some risk analytics and I thought I would share some of my experiences. This is the first part in that series explaining what argo workflows is and what it can bring to you and your company.

What is Argo Workflows?

Argo Workflows describes itself as “an open source container-native workflow engine for orchestrating parallel jobs on Kubernetes”. In plain english, it’s a tool for chaining simple kubernetes jobs/pods together into useful workflows. When I say chains, I really mean DAGs (directed acyclic graphs) which means you can build up very complicated workflows indeed!

Why Do I Need Argo Workflows?

Every company I’ve worked in has, over time, accumulated a lot of workflows. Chances are that over time, you’ve accumulated workflows too. Don’t believe me? take a look at your cron jobs, windows scheduler, or wherever you initiate your batch processing. Those jobs probably include some form of moving data around and/or processing data, i.e. they’re workflows!

Visibility

If those jobs have been built without the aid of a workflow tool, then they’re hidden workflows. A hidden workflow (a term I just invented by the way) is a workflow that is not easily visualised: you can’t see what’s going on inside it.

By porting these workflows into Argo, you gain visibility through Ago’s graph visualiser. This lets you quickly get a feeling for what your workflows look like, and hence a better understanding for what they’re doing.

Simplicity

The complexity comes in when there isn’t a clear line between the workflow code and the job code in your existing batch jobs. If you have beautifully architected batch jobs with amazing separation, this doesn’t apply to you, but who spends time architecting their batch jobs?!

In my experience batch jobs are hacked together in a low level scripting language (bash, cmd), with a smattering of high level languages thrown in only when the scripting language really couldn’t get the job done. The work they’re doing is boring, scripting language aren’t fun to write in, so they’re unloved and typically unarchitected.

Argo incentivises you to separate the workflow code (workflows are built up of argo kubernetes resources using yaml) from the job code (written in any language, packaged as a container to run in kubernetes). In this way you can take a mess of spaghetti batch code, and turn it into simple (dare I say reusable) components, orchestrated by argo.

Kubernetes

This is a hard one to explain concisely as it encompases a whole host of smaller benefits, such as:

Language Agnostic

Because the components are packaged as containers and run on kubernetes, it doesn’t matter what language they’re written in. The aim here is to have each component be responsible for a single task and to have a straightforward interface (often json).

Easy To Test

I haven’t done any automated testing of containers yet, but the same idea of simple components would lend itself very well to automated testing.

Scalability

Your jobs only use resources which they’re running, and it’s easy to spin up multiple copies of a job if you want things to run in parallel. Argo describe this as putting “a cloud-scale supercomputer at your fingertips”.

The Case Against Argo Workflows

It’s Not Yet Mature

It’s a new project, it’s currently being worked on heavily which means things are changes, some features haven’t been built yet. You’re likely to encounter some bugs in the newest features, but you can expect them to be fixed fairly quickly.

The Community Isn’t Huge

If you’re anything like me you’ll google for answers as soon as you have a question. For argo, you can’t (yet) expect google to have all the answers neatly packaged up for you in a Stack Overflow Q&A. Instead you should expect to have to read the docs and the github issues.

It’s Another Tool To Learn

Whenever you’re bringing another tool in to an enterprise you need to consider the cost of supporting that tool and training other developers on it. This isn’t a drawback unique to argo, but you do need to ensure the advantages given above are significant enough to justify the cost.

Mitigating Factors

A responsive team: It is being worked on heavily and the guys doing the work are responsive. I’ve had same day responses to issues I’ve raised on github and usually when I’ve spotted something I’m missing, that feature is already being worked on. There’s also an active slack channel.

It’s fairly simple: The scope of the project is quite narrow and it’s building on the amazing piece of work that is kubernetes for most of the heavy lifting. In essence this means that you’re unlikely to hit a roadblock as you can usually get what you want done using kubernetes features even without argo (argo helps though!).

Conclusion

I’m a fan of Argo, I think the concept is great one and I can see the benefits it can bring to any organisation with a complicated back end. The idea of bringing good coding habits to unloved batch code, while giving me the freedom to write components in my language of choice fills me with joy.

I hope this article has provided a good enough introduction to Argo to whet your appetite. If you’re looking for more info stick around for part 2 where I’ll be going a bit more in depth into how to configure argo to get the best out of it.