[HN Gopher] Data Race Patterns in Go
       ___________________________________________________________________
        
       Data Race Patterns in Go
        
       Author : statenjason
       Score  : 67 points
       Date   : 2022-06-10 19:35 UTC (3 hours ago)
        
 (HTM) web link (eng.uber.com)
 (TXT) w3m dump (eng.uber.com)
        
       | thallavajhula wrote:
       | >We developed a system to detect data races at Uber using a
       | dynamic data race detection technique.
       | 
       | By system do you mean a process or a tool that detects these?
        
       | JamesSwift wrote:
       | Really good article, and gave me several ideas to track down some
       | gremlins that have been bugging me in a go codebase recently.
        
       | bumper_crop wrote:
       | Sorry to say, but these hit close to home for me. A lot of the
       | synchronization paradigms in Go are easy to misuse, but lead the
       | author into thinking it's okay. the WaitGroup one is particularly
       | poignant for me, since the race detector doesn't catch it.
       | 
       | I'll add one other data race goof: atomic.Value. Look at the
       | implementation. Unlike pretty much every other language I've
       | seen, atomic.Value isn't really atomic, since the concrete type
       | can't ever change after being set. This stems from fact that
       | interfaces are two words rather than one, and they can't be
       | (hardware) atomically set. To fix it, Go just documents "hey,
       | don't do that", and then panics if you do.
        
         | Groxx wrote:
         | The lack of generics has forced all Go concurrency to be
         | intrusive (i.e. implemented by the person using _literally any
         | concurrency_ ), and yeah. It's horrifyingly error-prone in my
         | experience. It means everyone needs to be an expert, and lol,
         | everyone is not an expert.
         | 
         | Generics might save us. Expect to see `Locker<T>` and
         | `Atomic<T>` types cropping up. Unbounded buffered thread-safe
         | queues backing channels. Etc. I'm very, very much looking
         | forward to it.
        
         | morelisp wrote:
         | > atomic.Value isn't really atomic, since the concrete type
         | can't ever change after being set.
         | 
         | How does this mean it's _non-atomic_? As far as I know you can
         | still never Load() a partial Store(). (Also, even if it was
         | possible, this would never be a good idea...)
        
           | bumper_crop wrote:
           | That's why I opened with "Look at the implementation". Go is
           | unable to store the type and the pointer at the same time, so
           | it warps what "atomic" means. Pretty much every other
           | language has atomic mean "one of these will win, one will
           | lose". Go says "one will win, one will panic and destroy the
           | goroutine.
           | 
           | In fact, it's even worse than that. If the Store() caller
           | goes to sleep between setting the type and storing the
           | pointer, it causes every Goroutine that calls Load() to
           | block. They can't make forward progress if the store caller
           | hangs.
        
       | jchw wrote:
       | This is pretty cool. 50 million lines of code is quite a large
       | corpus to work off of.
       | 
       | I'm surprised by some of them. For example, go vet nominally
       | catches misuses of mutexes, so it's surprising that even a few of
       | those slipped through. I wonder if those situations are a bit
       | more complicated than the example.
       | 
       | Obviously, the ideal outcome is that static analysis can help
       | eliminate as many issues as possible, by restricting the
       | language, discouraging bad patterns, or giving the programmer
       | more tools to detect bugs. gVisor, for example, has a really
       | interesting tool called checklocks:
       | 
       | https://github.com/google/gvisor/tree/master/tools/checklock...
       | 
       | While it definitely has some caveats, ideas like these should
       | help Go programs achieve a greater degree of robustness.
       | Obviously, this class of error would be effectively prevented by
       | borrow checking, but I suppose if you want programming language
       | tradeoffs more tilted towards robustness, Rust already has a lot
       | of that covered.
        
         | likeabbas wrote:
         | > I suppose if you want programming language tradeoffs more
         | tilted towards robustness, Rust already has a lot of that
         | covered
         | 
         | Does anyone _not_ want robustness of their language to cover
         | their mistakes?
        
       | metadat wrote:
       | > 2. Slices are confusing types that create subtle and hard-to-
       | diagnose data races
       | 
       | The "Slices" example is just nasty! Like, this is just damning
       | for Go's promise of _" _relatively_ easy and carefree
       | concurrency"_.
       | 
       | Think about it for a second or two,
       | 
       | >> The reference to the slice was resized in the middle of an
       | append operation from another async routine.
       | 
       | What exactly happens in these cases? How can I trust myself, as a
       | fallible human being, to reason about such cases when I'm trying
       | to efficiently roll up a list of results. :-/
       | 
       | Compared to every other remotely mainstream language, perhaps
       | even C++, these are extremely subtle and sharp.. nigh, _razor
       | sharp_ edges. Yuck.
       | 
       | One big takeaway is this harsh realization: Golang guarantees are
       | scant more than what is offered by pure, relatively naive and
       | unadulterated BASH shell programming. I still will use it, but
       | with newfound fear.
       | 
       | As a multi-hundred-kloc-authoring-gopher: I love Go, and this
       | article is _killing me_ inside. Go appears extremely sloppy at
       | the edges of the envelope and language boundaries, moreso than
       | even I had ever realized prior to now.
       | 
       | Full-disclosure: I am disgusted by the company that is Uber, but
       | I'm grateful to the talented folks who've cast a light on this
       | cesspool region of Golang. Thank you!
       | 
       | p.s. inane aside: I never would've guessed that in 2022, Java
       | would start looking more and more appealing in new ways. Until
       | now I've been more or less "all-in" on Go for years.
        
         | masklinn wrote:
         | > The "Slices" example is just nasty!
         | 
         | I've got to say I'm not _entirely_ clear on what they talk
         | about specifically.
         | 
         | Is it simply that the `results` inside the goroutine will be
         | desync'd from `myResults` (and so the call to myAppend will
         | interact oddly with additional manipulations of results), or is
         | it that the copy can be made mid-update, and `result` itself
         | could be incoherent?
        
           | symfoniq wrote:
           | I have the same question.
           | 
           | They talk about the "meta fields" of a slice. Is the problem
           | that these "meta fields" (e.g. slice length and capacity) are
           | passed by value, and that by copying them, they can get out
           | of sync between coroutines?
        
           | aaronbee wrote:
           | I believe they made a mistake with that example. It doesn't
           | look unsafe to me because the myResults sliced passed to the
           | goroutine is not used. Or perhaps the racy part was left out
           | of their snippet.
           | 
           | Below is what might be what they have meant. This code
           | snippet is racy because an unsafe read of myResults is done
           | to pass it to the goroutine and then that version of
           | myResults is passed to safeAppend:                 func
           | ProcessAll(uuids []string) {         var myResults []string
           | var mutex sync.Mutex         safeAppend := func(results
           | []string, res string) {           mutex.Lock()
           | myResults = append(myResults, res)           mutex.Unlock()
           | }              for _, uuid := range uuids {           go
           | func(id string, results []string) {             res :=
           | Foo(id)             safeAppend(myResults, id)
           | }(uuid, myResults) # <<< unsafe read of myResults         }
           | }
           | 
           | EDIT: Formatting and clarity
        
             | [deleted]
        
           | Philip-J-Fry wrote:
           | So a slice consists of a pointer to a backing array, a length
           | and a capacity. If you don't use a pointer and pass this
           | slice around you will _copy_ it.
           | 
           | This is problematic because even though you copy it, you're
           | still pointing at the same backing array.
           | 
           | Therefore, a backing array with data like [1,2,3,4,5] could
           | be pointed at by 2 slice headers (slice metadata) looking
           | like
           | 
           | A: {len: 2, cap: 10} [1,2] B: {len:5, cap: 10} [1,2,3,4,5]
           | 
           | So any append operations on slice A will mess up the data in
           | that backing array.
           | 
           | Now, sometimes your append will resize the slice, in which
           | case the data is copied and a slice with a new larger backing
           | array is returned. If this was happening concurrently then
           | you'd lose the data in racing appends.
           | 
           | If the append doesn't need to resize the slice, then you'll
           | overwrite the data in the backing array. And so you'll
           | corrupt the data in the slice.
           | 
           | Here's an example I threw together:
           | https://go.dev/play/p/qRUKUwIf3vx
           | 
           | Although the code in the post doesn't actually look like it
           | has an issue. Their tooling just flagged it up as it
           | _potentially_ has an issue if the copy was actually used in
           | the function. But the `safeAppend` function targets the
           | correct slice each time.
        
         | morelisp wrote:
         | Without disagreeing that it's an enormous footgun, one good way
         | to avoid such slice issues is to use the uncommon `a[x:y:z]`
         | form to ensure the slice can't grow. As we're starting to write
         | a lot of generic slice functions with 1.18, we're using this
         | form in almost all of them which may add elements.
        
           | masklinn wrote:
           | > one good way to avoid such slice issues is to use the
           | uncommon `a[x:y:z]` form to ensure the slice can't grow.
           | 
           | Do you mean you always use `a[x:y:y]` in order to ensure
           | there is no extra capacity and any append will have to copy
           | the slice?
           | 
           | Is append _guaranteed_ to create a new slice (and copy over
           | the data) if the parameter is at capacity? Because if it
           | _could_ realloc internally then I don 't think this trick is
           | safe.
        
             | morelisp wrote:
             | > Because if it could realloc internally then I don't think
             | this trick is safe.
             | 
             | Slices are 3 word values of (ptr, len, cap). They cannot be
             | "realloced internally", changing any of those three things
             | requires creating a new slice.
        
               | masklinn wrote:
               | Of course but the new slice could be (ptr, len+1, cap+x)
               | because realloc() was able to expand the buffer in-place.
               | Which yields essentially the same behaviour as an append
               | call with leftover capacity.
               | 
               | But I guess realloc is a libc function, and Go probably
               | goes for mmap directly and would implement its own
               | allocator, and so might not do that. Unless / until they
               | decide to add support for it.
        
         | throwaway894345 wrote:
         | > What exactly happens in these cases? How can I trust myself,
         | as a fallible human being, to reason about such cases when I'm
         | trying to efficiently roll up a list of results. :-/
         | 
         | For me: minimize shared mutable data. If I really can't get rid
         | of some shared mutable data, I mutex it or use atomics or
         | similar. This works _very well_ --I almost never run into data
         | races this way, but it is a discipline rather than a technical
         | control, so you might have to deal with coworkers who lack this
         | particular discipline.
        
           | metadat wrote:
           | Absolutely, the disappointing part is that as code authors,
           | we need to constantly remember about various (otherwise
           | appealing and even encouraged by the syntax,) footguns and
           | "never approach suxh areas" of (totally valid) syntax.
           | 
           | Reminds me of programming in Javascript (it's extreme
           | example, but the similarity is there).
        
             | throwaway894345 wrote:
             | Yeah, it's a bit disappointing. It doesn't bother me too
             | much, but it could be improved by a linter which could help
             | you find shared mutable state. Without a concept of "const"
             | (for complex types, anyway), I'm not sure how feasible such
             | a linter would be.
        
         | jchw wrote:
         | Slices may just be one of the best and worst parts of Go.
         | They're cumbersome, their behavior sometimes feels
         | 'inexplicable,' and even as an experienced developer you are
         | likely to eventually fallen into one of the traps where your
         | 'obvious' code isn't so obvious.
         | 
         | That said... when programming in programming languages
         | _without_ a slice type, I always want to have one. And though
         | it 's confusing at times, the design does actually make sense;
         | without a doubt, it's hard to think of how you would improve on
         | the actual underlying design.
         | 
         | I really wish that Go's container types were persistent
         | immutable or some-such. It wouldn't solve _everything_ , but it
         | feels to me like if they could've managed to do that, it
         | would've been a lot easier to reason about.
        
           | knorker wrote:
           | Indeed. Probably the most common data structure ever used,
           | list of stuff, Go managed to make subtle and full of
           | surprises. A knife without a handle.
           | 
           | This makes the stated reason for the delay of generics hard
           | to understand. They didn't wait to get
           | list/vector/array/slice right.
        
           | masklinn wrote:
           | > And though it's confusing at times, the design does
           | actually make sense; without a doubt, it's hard to think of
           | how you would improve on the actual underlying design.
           | 
           | Go slices are absolutely the worst type in Go, because out of
           | laziness they serve as both slices and vectors rather than
           | have a separate, independent, and opaque vector types.
           | 
           | This schizophrenia is the source of most if not all their
           | traps and issues.
           | 
           | > I really wish that Go's container types were persistent
           | immutable or some-such.
           | 
           | That would go against everything Go holds dear, since it's
           | allergic to immutability and provides no support whatsoever
           | for it (aside from simple information hiding).
        
       | travisd wrote:
       | Worth noting that some of these can be detected statically -- and
       | some are detected by go vet (e.g., passing a sync.Mutex by
       | value). I don't think it detects the wg.Add bug, but that seems
       | relatively straightforward(+) to add a check for.
       | 
       | (+famous last words, I know)
        
       | [deleted]
        
       | dubswithus wrote:
       | > We developed a system to detect data races at Uber using a
       | dynamic data race detection technique. This system, over a period
       | of six months, detected about 2,000 data races in our Go code
       | base, of which our developers already fixed ~1,100 data races.
       | 
       | This isn't open source, correct?
        
       ___________________________________________________________________
       (page generated 2022-06-10 23:00 UTC)