https://verdagon.dev/blog/python-data-races

 
 
Languages [?] Architecture
Github Sponsor us on GitHub! r/Vale Discord

Data Races in Python, Despite the Global Interpreter Lock

Feb 20, 2022  --  Evan Ovadia

I recently wrote an article on how a language can improve on Rust's
and Pony's concurrency, and one of my readers asked:

"Does Python have fearless concurrency? The Global Interpreter Lock
makes only one thread run at a time, so I'd assume there's no data
races."

Note that we're not talking about general race conditions, just data
races. A data race is when: 0

  * Two or more threads concurrently accessing a location of memory.
  * One or more of them is a write.
  * One or more of them is unsynchronized.

But in Python, every thread is synchronized, because of the Global
Interpreter Lock. So it can't have data races, right?

Data races only happen in parallel code, not concurrent code, right?
1

Data Races in Python, Despite the Global Interpreter Lock

  * A Python Data Race
  * The Intrigue
  * Thoughts on Detecting Races
  * Conclusion

Notes
[ ] Notes [-] Notes [+] 0 1
0

From the Rustonomicon.

1

In Python, since only one thread can ever run at a time, it's
concurrent. Concurrency is when we're doing multiple tasks at once,
but only progressing one task at a time. Parallelism is when we e.g.
use multiple cores to progress multiple threads at a time.

A Python Data Race

At first I thought that, although we need mutexes to avoid race
conditions, the GIL makes Python immune to mere data races.

However, that didn't seem right. I did a little bit of experimenting,
and made this program:

python
from threading import Thread from time import sleep counter = 0 def
increase(): global counter for i in range(0, 100000): counter =
counter + 1 threads = [] for i in range(0, 400): threads.append
(Thread(target=increase)) for thread in threads: thread.start() for
thread in threads: thread.join() print(f'Final counter: {counter}')
stdout
Final counter: 31735072

Surprisingly, we didn't get 40000000, we got 31735072. 2

This program's data race happens in the counter = counter + 1 line.
Let's pretend there are only two threads, and the program just
started.

  * Thread A loads 0 from counter into register* X. 3
  * Thread B loads 0 from counter into register Y.
  * Thread A adds 1 to register X, it now contains 1.
  * Thread B adds 1 to register Y, it now contains 1.
  * Thread A stores register X into counter, it now contains 1.
  * Thread B stores register Y into counter, it now contains 1.

As you can see, even though both threads incremented counter, it's
not 2, it's 1. This is a data race in action.

[ ] Notes [-] Notes [+] 2 3
2

This was on a 2.6 GHz 6-Core i7 Mac, for your computer to show a data
race you may need to change the 400 or 100000.

3

I say "register", but CPython doesn't actually use registers
directly; it load variables onto stack frames that are stored on the
heap, according to this post.

The Intrigue

This was actually pretty hard to discover. The first few experiments
failed, because Python is pretty smart about when it runs each
thread.

Python will only interrupt a thread if it's taking too long. From
anekix's StackOverflow answer:

In new versions, instead of using instruction count as a metric to
switch threads, a configurable time interval is used. The default
switch interval is 5 milliseconds.

So, it seems a thread needs to last longer than 5 milliseconds to
possibly trigger a data race. This explains why we had to do 100,000
iterations in each thread.

This policy makes it difficult to identify when one has data races.
However, it probably also reduces their frequency in production,
which is a nice benefit.

Thoughts on Detecting Races

As a language designer, I wonder if there was a missed opportunity in
here.

Go's map iteration order is random, in part because it prevents us
from accidentally relying on iteration order. We could take some
inspiration from that.

I wonder if, in development mode, Python could use a shorter interval
so that we notice any data races hiding in our code, and in release
mode, could use this longer (5ms) interval. 4

This also relates to one's philosophy on determinism. We know that:

  * Determinism helps a lot for reproducing bugs. Nobody likes
    heisenbugs. 5
  * Relying on non-guaranteed determinism can cause some bugs. For
    example, if we accidentally depend on a hash map's ordering in
    100 places, and then change our hash map's algorithm, we suddenly
    have 100 bugs.

These would seem to be incompatible, but a core goal of Vale is to
make races more obvious, and reproduce them easily.

We could do this with universal deterministic replayability, where in
development mode we record all non-deterministic inputs, such as
command line arguments, stdin, sockets, files, etc. 6 7, plus the
scheduling of all inter-thread messages and mutex lockings. This
might seem difficult, but it's possible for a language to guarantee
determinism, for example if it uses a region-based borrow checker,
and eliminates unsafe blocks and undefined behavior.

With that, every run would be random but recorded, and when we do
encounter a race, we could reproduce the race trivially by replaying
the recording.

Someday, perhaps problems like these will be a thing of the past!

[ ] Notes [-] Notes [+] 4 5 6 7
4

We could do this today, by using sys.setswitchinterval()!

5

A "heisenbug" is when we encounter a bug, but when investigating,
have difficulty making it happen again, because it's based on some
non-deterministic factor.

6

More specifically, it would just records any inputs from FFI into a
file.

7

This feature is about halfway complete in Vale: the FFI semantics and
FFI serialization are ready, but we don't record to a file yet.

Conclusion

TL;DR: Python threads can have data races!

This might all sounds pretty obvious to Python veterans, but it was
surprising to me. Maybe it will be surprising to others!

Thanks for reading! If you want to discuss more, come by the r/Vale
subreddit or the Vale discord, and if you enjoy these articles or
want to support our efforts, please sponsor us on GitHub!

- Evan O

Copyright (c) 2020 Evan Ovadia