https://verdagon.dev/blog/python-data-races Languages [?] Architecture Github Sponsor us on GitHub! r/Vale Discord Data Races in Python, Despite the Global Interpreter Lock Feb 20, 2022 -- Evan Ovadia I recently wrote an article on how a language can improve on Rust's and Pony's concurrency, and one of my readers asked: "Does Python have fearless concurrency? The Global Interpreter Lock makes only one thread run at a time, so I'd assume there's no data races." Note that we're not talking about general race conditions, just data races. A data race is when: 0 * Two or more threads concurrently accessing a location of memory. * One or more of them is a write. * One or more of them is unsynchronized. But in Python, every thread is synchronized, because of the Global Interpreter Lock. So it can't have data races, right? Data races only happen in parallel code, not concurrent code, right? 1 Data Races in Python, Despite the Global Interpreter Lock * A Python Data Race * The Intrigue * Thoughts on Detecting Races * Conclusion Notes [ ] Notes [-] Notes [+] 0 1 0 From the Rustonomicon. 1 In Python, since only one thread can ever run at a time, it's concurrent. Concurrency is when we're doing multiple tasks at once, but only progressing one task at a time. Parallelism is when we e.g. use multiple cores to progress multiple threads at a time. A Python Data Race At first I thought that, although we need mutexes to avoid race conditions, the GIL makes Python immune to mere data races. However, that didn't seem right. I did a little bit of experimenting, and made this program: python from threading import Thread from time import sleep counter = 0 def increase(): global counter for i in range(0, 100000): counter = counter + 1 threads = [] for i in range(0, 400): threads.append (Thread(target=increase)) for thread in threads: thread.start() for thread in threads: thread.join() print(f'Final counter: {counter}') stdout Final counter: 31735072 Surprisingly, we didn't get 40000000, we got 31735072. 2 This program's data race happens in the counter = counter + 1 line. Let's pretend there are only two threads, and the program just started. * Thread A loads 0 from counter into register* X. 3 * Thread B loads 0 from counter into register Y. * Thread A adds 1 to register X, it now contains 1. * Thread B adds 1 to register Y, it now contains 1. * Thread A stores register X into counter, it now contains 1. * Thread B stores register Y into counter, it now contains 1. As you can see, even though both threads incremented counter, it's not 2, it's 1. This is a data race in action. [ ] Notes [-] Notes [+] 2 3 2 This was on a 2.6 GHz 6-Core i7 Mac, for your computer to show a data race you may need to change the 400 or 100000. 3 I say "register", but CPython doesn't actually use registers directly; it load variables onto stack frames that are stored on the heap, according to this post. The Intrigue This was actually pretty hard to discover. The first few experiments failed, because Python is pretty smart about when it runs each thread. Python will only interrupt a thread if it's taking too long. From anekix's StackOverflow answer: In new versions, instead of using instruction count as a metric to switch threads, a configurable time interval is used. The default switch interval is 5 milliseconds. So, it seems a thread needs to last longer than 5 milliseconds to possibly trigger a data race. This explains why we had to do 100,000 iterations in each thread. This policy makes it difficult to identify when one has data races. However, it probably also reduces their frequency in production, which is a nice benefit. Thoughts on Detecting Races As a language designer, I wonder if there was a missed opportunity in here. Go's map iteration order is random, in part because it prevents us from accidentally relying on iteration order. We could take some inspiration from that. I wonder if, in development mode, Python could use a shorter interval so that we notice any data races hiding in our code, and in release mode, could use this longer (5ms) interval. 4 This also relates to one's philosophy on determinism. We know that: * Determinism helps a lot for reproducing bugs. Nobody likes heisenbugs. 5 * Relying on non-guaranteed determinism can cause some bugs. For example, if we accidentally depend on a hash map's ordering in 100 places, and then change our hash map's algorithm, we suddenly have 100 bugs. These would seem to be incompatible, but a core goal of Vale is to make races more obvious, and reproduce them easily. We could do this with universal deterministic replayability, where in development mode we record all non-deterministic inputs, such as command line arguments, stdin, sockets, files, etc. 6 7, plus the scheduling of all inter-thread messages and mutex lockings. This might seem difficult, but it's possible for a language to guarantee determinism, for example if it uses a region-based borrow checker, and eliminates unsafe blocks and undefined behavior. With that, every run would be random but recorded, and when we do encounter a race, we could reproduce the race trivially by replaying the recording. Someday, perhaps problems like these will be a thing of the past! [ ] Notes [-] Notes [+] 4 5 6 7 4 We could do this today, by using sys.setswitchinterval()! 5 A "heisenbug" is when we encounter a bug, but when investigating, have difficulty making it happen again, because it's based on some non-deterministic factor. 6 More specifically, it would just records any inputs from FFI into a file. 7 This feature is about halfway complete in Vale: the FFI semantics and FFI serialization are ready, but we don't record to a file yet. Conclusion TL;DR: Python threads can have data races! This might all sounds pretty obvious to Python veterans, but it was surprising to me. Maybe it will be surprising to others! Thanks for reading! If you want to discuss more, come by the r/Vale subreddit or the Vale discord, and if you enjoy these articles or want to support our efforts, please sponsor us on GitHub! - Evan O Copyright (c) 2020 Evan Ovadia