[HN Gopher] Data Race Patterns in Go ___________________________________________________________________ Data Race Patterns in Go Author : statenjason Score : 67 points Date : 2022-06-10 19:35 UTC (3 hours ago) (HTM) web link (eng.uber.com) (TXT) w3m dump (eng.uber.com) | thallavajhula wrote: | >We developed a system to detect data races at Uber using a | dynamic data race detection technique. | | By system do you mean a process or a tool that detects these? | JamesSwift wrote: | Really good article, and gave me several ideas to track down some | gremlins that have been bugging me in a go codebase recently. | bumper_crop wrote: | Sorry to say, but these hit close to home for me. A lot of the | synchronization paradigms in Go are easy to misuse, but lead the | author into thinking it's okay. the WaitGroup one is particularly | poignant for me, since the race detector doesn't catch it. | | I'll add one other data race goof: atomic.Value. Look at the | implementation. Unlike pretty much every other language I've | seen, atomic.Value isn't really atomic, since the concrete type | can't ever change after being set. This stems from fact that | interfaces are two words rather than one, and they can't be | (hardware) atomically set. To fix it, Go just documents "hey, | don't do that", and then panics if you do. | Groxx wrote: | The lack of generics has forced all Go concurrency to be | intrusive (i.e. implemented by the person using _literally any | concurrency_ ), and yeah. It's horrifyingly error-prone in my | experience. It means everyone needs to be an expert, and lol, | everyone is not an expert. | | Generics might save us. Expect to see `Locker<T>` and | `Atomic<T>` types cropping up. Unbounded buffered thread-safe | queues backing channels. Etc. I'm very, very much looking | forward to it. | morelisp wrote: | > atomic.Value isn't really atomic, since the concrete type | can't ever change after being set. | | How does this mean it's _non-atomic_? As far as I know you can | still never Load() a partial Store(). (Also, even if it was | possible, this would never be a good idea...) | bumper_crop wrote: | That's why I opened with "Look at the implementation". Go is | unable to store the type and the pointer at the same time, so | it warps what "atomic" means. Pretty much every other | language has atomic mean "one of these will win, one will | lose". Go says "one will win, one will panic and destroy the | goroutine. | | In fact, it's even worse than that. If the Store() caller | goes to sleep between setting the type and storing the | pointer, it causes every Goroutine that calls Load() to | block. They can't make forward progress if the store caller | hangs. | jchw wrote: | This is pretty cool. 50 million lines of code is quite a large | corpus to work off of. | | I'm surprised by some of them. For example, go vet nominally | catches misuses of mutexes, so it's surprising that even a few of | those slipped through. I wonder if those situations are a bit | more complicated than the example. | | Obviously, the ideal outcome is that static analysis can help | eliminate as many issues as possible, by restricting the | language, discouraging bad patterns, or giving the programmer | more tools to detect bugs. gVisor, for example, has a really | interesting tool called checklocks: | | https://github.com/google/gvisor/tree/master/tools/checklock... | | While it definitely has some caveats, ideas like these should | help Go programs achieve a greater degree of robustness. | Obviously, this class of error would be effectively prevented by | borrow checking, but I suppose if you want programming language | tradeoffs more tilted towards robustness, Rust already has a lot | of that covered. | likeabbas wrote: | > I suppose if you want programming language tradeoffs more | tilted towards robustness, Rust already has a lot of that | covered | | Does anyone _not_ want robustness of their language to cover | their mistakes? | metadat wrote: | > 2. Slices are confusing types that create subtle and hard-to- | diagnose data races | | The "Slices" example is just nasty! Like, this is just damning | for Go's promise of _" _relatively_ easy and carefree | concurrency"_. | | Think about it for a second or two, | | >> The reference to the slice was resized in the middle of an | append operation from another async routine. | | What exactly happens in these cases? How can I trust myself, as a | fallible human being, to reason about such cases when I'm trying | to efficiently roll up a list of results. :-/ | | Compared to every other remotely mainstream language, perhaps | even C++, these are extremely subtle and sharp.. nigh, _razor | sharp_ edges. Yuck. | | One big takeaway is this harsh realization: Golang guarantees are | scant more than what is offered by pure, relatively naive and | unadulterated BASH shell programming. I still will use it, but | with newfound fear. | | As a multi-hundred-kloc-authoring-gopher: I love Go, and this | article is _killing me_ inside. Go appears extremely sloppy at | the edges of the envelope and language boundaries, moreso than | even I had ever realized prior to now. | | Full-disclosure: I am disgusted by the company that is Uber, but | I'm grateful to the talented folks who've cast a light on this | cesspool region of Golang. Thank you! | | p.s. inane aside: I never would've guessed that in 2022, Java | would start looking more and more appealing in new ways. Until | now I've been more or less "all-in" on Go for years. | masklinn wrote: | > The "Slices" example is just nasty! | | I've got to say I'm not _entirely_ clear on what they talk | about specifically. | | Is it simply that the `results` inside the goroutine will be | desync'd from `myResults` (and so the call to myAppend will | interact oddly with additional manipulations of results), or is | it that the copy can be made mid-update, and `result` itself | could be incoherent? | symfoniq wrote: | I have the same question. | | They talk about the "meta fields" of a slice. Is the problem | that these "meta fields" (e.g. slice length and capacity) are | passed by value, and that by copying them, they can get out | of sync between coroutines? | aaronbee wrote: | I believe they made a mistake with that example. It doesn't | look unsafe to me because the myResults sliced passed to the | goroutine is not used. Or perhaps the racy part was left out | of their snippet. | | Below is what might be what they have meant. This code | snippet is racy because an unsafe read of myResults is done | to pass it to the goroutine and then that version of | myResults is passed to safeAppend: func | ProcessAll(uuids []string) { var myResults []string | var mutex sync.Mutex safeAppend := func(results | []string, res string) { mutex.Lock() | myResults = append(myResults, res) mutex.Unlock() | } for _, uuid := range uuids { go | func(id string, results []string) { res := | Foo(id) safeAppend(myResults, id) | }(uuid, myResults) # <<< unsafe read of myResults } | } | | EDIT: Formatting and clarity | [deleted] | Philip-J-Fry wrote: | So a slice consists of a pointer to a backing array, a length | and a capacity. If you don't use a pointer and pass this | slice around you will _copy_ it. | | This is problematic because even though you copy it, you're | still pointing at the same backing array. | | Therefore, a backing array with data like [1,2,3,4,5] could | be pointed at by 2 slice headers (slice metadata) looking | like | | A: {len: 2, cap: 10} [1,2] B: {len:5, cap: 10} [1,2,3,4,5] | | So any append operations on slice A will mess up the data in | that backing array. | | Now, sometimes your append will resize the slice, in which | case the data is copied and a slice with a new larger backing | array is returned. If this was happening concurrently then | you'd lose the data in racing appends. | | If the append doesn't need to resize the slice, then you'll | overwrite the data in the backing array. And so you'll | corrupt the data in the slice. | | Here's an example I threw together: | https://go.dev/play/p/qRUKUwIf3vx | | Although the code in the post doesn't actually look like it | has an issue. Their tooling just flagged it up as it | _potentially_ has an issue if the copy was actually used in | the function. But the `safeAppend` function targets the | correct slice each time. | morelisp wrote: | Without disagreeing that it's an enormous footgun, one good way | to avoid such slice issues is to use the uncommon `a[x:y:z]` | form to ensure the slice can't grow. As we're starting to write | a lot of generic slice functions with 1.18, we're using this | form in almost all of them which may add elements. | masklinn wrote: | > one good way to avoid such slice issues is to use the | uncommon `a[x:y:z]` form to ensure the slice can't grow. | | Do you mean you always use `a[x:y:y]` in order to ensure | there is no extra capacity and any append will have to copy | the slice? | | Is append _guaranteed_ to create a new slice (and copy over | the data) if the parameter is at capacity? Because if it | _could_ realloc internally then I don 't think this trick is | safe. | morelisp wrote: | > Because if it could realloc internally then I don't think | this trick is safe. | | Slices are 3 word values of (ptr, len, cap). They cannot be | "realloced internally", changing any of those three things | requires creating a new slice. | masklinn wrote: | Of course but the new slice could be (ptr, len+1, cap+x) | because realloc() was able to expand the buffer in-place. | Which yields essentially the same behaviour as an append | call with leftover capacity. | | But I guess realloc is a libc function, and Go probably | goes for mmap directly and would implement its own | allocator, and so might not do that. Unless / until they | decide to add support for it. | throwaway894345 wrote: | > What exactly happens in these cases? How can I trust myself, | as a fallible human being, to reason about such cases when I'm | trying to efficiently roll up a list of results. :-/ | | For me: minimize shared mutable data. If I really can't get rid | of some shared mutable data, I mutex it or use atomics or | similar. This works _very well_ --I almost never run into data | races this way, but it is a discipline rather than a technical | control, so you might have to deal with coworkers who lack this | particular discipline. | metadat wrote: | Absolutely, the disappointing part is that as code authors, | we need to constantly remember about various (otherwise | appealing and even encouraged by the syntax,) footguns and | "never approach suxh areas" of (totally valid) syntax. | | Reminds me of programming in Javascript (it's extreme | example, but the similarity is there). | throwaway894345 wrote: | Yeah, it's a bit disappointing. It doesn't bother me too | much, but it could be improved by a linter which could help | you find shared mutable state. Without a concept of "const" | (for complex types, anyway), I'm not sure how feasible such | a linter would be. | jchw wrote: | Slices may just be one of the best and worst parts of Go. | They're cumbersome, their behavior sometimes feels | 'inexplicable,' and even as an experienced developer you are | likely to eventually fallen into one of the traps where your | 'obvious' code isn't so obvious. | | That said... when programming in programming languages | _without_ a slice type, I always want to have one. And though | it 's confusing at times, the design does actually make sense; | without a doubt, it's hard to think of how you would improve on | the actual underlying design. | | I really wish that Go's container types were persistent | immutable or some-such. It wouldn't solve _everything_ , but it | feels to me like if they could've managed to do that, it | would've been a lot easier to reason about. | knorker wrote: | Indeed. Probably the most common data structure ever used, | list of stuff, Go managed to make subtle and full of | surprises. A knife without a handle. | | This makes the stated reason for the delay of generics hard | to understand. They didn't wait to get | list/vector/array/slice right. | masklinn wrote: | > And though it's confusing at times, the design does | actually make sense; without a doubt, it's hard to think of | how you would improve on the actual underlying design. | | Go slices are absolutely the worst type in Go, because out of | laziness they serve as both slices and vectors rather than | have a separate, independent, and opaque vector types. | | This schizophrenia is the source of most if not all their | traps and issues. | | > I really wish that Go's container types were persistent | immutable or some-such. | | That would go against everything Go holds dear, since it's | allergic to immutability and provides no support whatsoever | for it (aside from simple information hiding). | travisd wrote: | Worth noting that some of these can be detected statically -- and | some are detected by go vet (e.g., passing a sync.Mutex by | value). I don't think it detects the wg.Add bug, but that seems | relatively straightforward(+) to add a check for. | | (+famous last words, I know) | [deleted] | dubswithus wrote: | > We developed a system to detect data races at Uber using a | dynamic data race detection technique. This system, over a period | of six months, detected about 2,000 data races in our Go code | base, of which our developers already fixed ~1,100 data races. | | This isn't open source, correct? ___________________________________________________________________ (page generated 2022-06-10 23:00 UTC)