Data Races in Programming

And how to prevent them

Steven Curtis

--

Photo by Lars Kienle on Unsplash

When a programmer creates software they typically wish that software to be predictable.

Many pieces of software are deterministic so given a set of inputs the same output is always given. Although non-deterministic algorithms have their uses (for example to approximate answers) this focuses on deterministic functions.

Determinism and Data Races

In the following java class, we would generally wish the output to always be the same for the same input. In the example below, we would expect the sum of 4 and 5 to be 9.

public static int add(int number, int num){
return number + num;
}

int result = add(4, 5); // 9

This helps testing, as the input wholly predicts the output. More generally we can see the difference between deterministic and non-deterministic functions in the diagram below:

The issue for non-deterministic functions is that the output may be Y₁ or it may be Y₂ for identical input set X₁ and X₂.

Multithreaded Programming and Data Races

Multiple tasks or threads may access a shared resource which can lead to undefined or unpredictable behaviour.

This is known as a data race and programming can be slow and problematic due to the non-deterministic nature of the issue. A bug or error may occur for only a small subset of executions which appear to not be reproducible.

A data race can occur in the following situation:

  • Two or more threads concurrently access the same data
  • At least one of them is a write operation

In this simple example below, Thread 1 and Thread 2 are in a race! Which one will win? Without any protection mechanism either Thread 1 OR Thread 2 can complete first:

--

--