[HN Gopher] A new ProtoBuf generator for Go
       ___________________________________________________________________
        
       A new ProtoBuf generator for Go
        
       Author : tanoku
       Score  : 206 points
       Date   : 2021-06-03 17:50 UTC (5 hours ago)
        
 (HTM) web link (vitess.io)
 (TXT) w3m dump (vitess.io)
        
       | jzelinskie wrote:
       | I hadn't realized that Gogo was in such a bad spot with the
       | upstream Go protobuf changes. There was lots of drama when the
       | changes were made and I guess that overshadowed any optics I had
       | on Gogo.
       | 
       | Making vtprotobuf an additional protoc plugin seems like the
       | Right Thing(tm), although it's a shame how complicated protoc
       | commands end up becoming for mature projects. I'm pretty tempted
       | to port Authzed over to this and run some benchmarks -- our
       | entire service requires e2e latency under 20ms, so every little
       | bit counts. The biggest performance win is likely just having an
       | unintrusive interface for pooling allocated protos.
        
         | jeffbee wrote:
         | Proto message unmarshal in Go for a small message should be 5
         | orders of magnitude below 20ms, shouldn't even begin to matter
         | until you are sweating individual microseconds.
        
           | lttlrck wrote:
           | The significance of 20ms isn't clear so this is hard to
           | judge.
           | 
           | Perhaps they have significant external (network) latency
           | leaving only a few ms budget for the application stack - so
           | they could easily be up against a wall.
        
           | morelisp wrote:
           | Until the GC kicks in and steals a full 200usec + a bunch of
           | your throughput...
           | 
           | (Holy shit, who is downvoting this? It's literally the whole
           | article!)
        
             | harikb wrote:
             | Properly written Go code (or even Java for that matter)
             | will try to minimize allocations. For Java, unless I am
             | mistaken pause-less GC is only offered by Azul - $$
        
               | morelisp wrote:
               | Yeah, the whole point of the article is that gRPC v2 (and
               | frankly v1 for that matter) are not "properly written" to
               | do this.
        
               | RhodesianHunter wrote:
               | >or even Java
               | 
               | Just in case you may be unaware, the latest GCs for Java
               | (Shenandoah, ZGC) are miles ahead of anything available
               | for Go due to sheer age and manpower. Parallel and
               | Pauseless are easily achievable in most cases.
        
               | morelisp wrote:
               | Java's GC is better but Go's GC is also parallel and
               | "pauseless" - iirc ZGC is 50-500usec which is comparable
               | to Go's target 200usec.
               | 
               | The point is, neither is "five orders of magnitude" below
               | 20ms. And neither needs zero CPU even if it doesn't block
               | other threads.
        
               | geodel wrote:
               | > Latest GCs for Java (Shenandoah, ZGC) are miles ahead
               | of anything available.
               | 
               | Beyond hyperbole, do you have any actual comparison of Go
               | vs Java GC performance?
        
             | throwaway894345 wrote:
             | If your path is sensitive to 200us of latency you should
             | probably optimize your application and tune your GC.
             | Typically 200us for freeing all unreachable memory is not a
             | big deal.
        
               | jcelerier wrote:
               | > If your path is sensitive to 200us of latency you
               | should probably optimize your application and tune your
               | GC.
               | 
               | okay, you've done this, three years later and it's the
               | same thing again since you need to accomodate the new
               | features. your users haven't upgraded their computers.
               | what do you do ?
        
           | brandmeyer wrote:
           | 3% regression in QPS, 20% regression in CPU, and 5%
           | regression in memory usage according to the article. Those
           | are considerably worse than "5 orders of magnitude below".
        
             | harikb wrote:
             | GP meant 5 orders of magnitude below "20 ms". 20 ms is a
             | lot of time.
             | 
             | There is nothing one can do to a, say, a 1 kilo byte buffer
             | that will cross 1 ms in _any_ language. My own Go code
             | doesn 't cross more than few micros per message.
        
               | brandmeyer wrote:
               | GP's root claim is that protobuf
               | serialization/deserialization performance shouldn't
               | matter, on an article where a user is _specifically
               | demonstrating that it does matter_.
        
               | joshuamorton wrote:
               | The usecase described in the article, and the usecase
               | described in the top post in this thread aren't the same
               | usecase. If you aren't throughput bound, a 5% regression
               | in parse speed doesn't matter if your goal is to stay
               | under 20ms and parsing takes 17 us. Sure it now takes 19
               | us, which is a regression of 2 us out of 20ms, or
               | 1/10000th of your time.
        
         | rapsey wrote:
         | > our entire service requires e2e latency under 20ms
         | 
         | Why are you using Go then?
        
           | kodah wrote:
           | 20ms is a pretty considerable amount of time WRT E2E
           | transaction time in today's world. Can you expand on your
           | concerns with Go?
        
             | mcronce wrote:
             | It's not really suitable for latency-critical applications.
             | 
             | EDIT: Fixed unfortunate typo
        
               | fcantournet wrote:
               | You can 100% write services with P999 < 20ms in go. Not
               | even trying that hard. Go is entirely suitable for this
               | kind of constraints, I dare say that's go's main target.
               | 
               | P99 < 1ms, that's when you're going to want to switch it
               | up.
        
               | somethingwitty1 wrote:
               | was the double-negative intentional? I've used Go for
               | sub-millisecond needs. So 20ms seems like it would be a
               | reasonable choice from where I'm sitting.
        
               | mcronce wrote:
               | It was not intentional, thanks for asking...very
               | unfortunate typo ;)
               | 
               | Go doesn't give you control over inline vs indirect
               | allocation, instead relying on escape analysis, which is
               | notoriously finicky. Seemingly unrelated changes, along
               | with compiler upgrades, can ruin your carefully optimized
               | code.
               | 
               | This is especially heinous because it uses a GC;
               | unnecessary allocations have a disproportionately large
               | impact on your application performance. One or the other
               | wouldn't be nearly as bad.
               | 
               | Time and time again we see reports from
               | organizations/projects with perfectly fine average
               | latency, but horrendous p95+ times, when written in Go -
               | some going as far as to do straight-up insane
               | optimizations (see Dragph) or rewrite in other languages.
        
       | jen20 wrote:
       | I'm not sure that the phrasing in the article is particularly
       | fair:
       | 
       | > The maintainers of Gogo, understandably, were not up to the
       | gigantic task.
       | 
       | I'm 99% sure they are "up to" (as in "capable of") doing so, they
       | are just not "up for" it (as in, "will not do it").
        
         | jahewson wrote:
         | Yes I assume the author meant "not up for"
        
         | Zababa wrote:
         | They could be "not up to" because of lack of resources,
         | probably time and/or money. I think that's what is implied,
         | rather than lack of technical knowledge.
        
         | lux wrote:
         | I got the sense that they meant "not willing" but I agree
         | that's one of those English phrases that can easily be
         | misconstrued towards the more negative interpretation.
         | 
         | That said, I love the detailed post and the interesting
         | solution, and the commitment to performance!
        
       | n0x1m wrote:
       | the biggest current problem with Go and ProtoBuf is swagger
       | support when using it for API returns. Enums are not supported
       | for example. The leniency of protojson can't be used in other
       | languages that built on top of the swagger docs.
        
       | PostThisTooFast wrote:
       | Is there one for Kotlin yet? It's pretty pathetic that Google's
       | own protocol lacks native support for its most popular operating
       | system.
        
         | HNLogInsSuckAss wrote:
         | Yes, I was surprised by this. We ended up using the Java ones
         | only two years ago because of the lack of a Kotlin generator.
         | Google had a blog post talking about what a struggle it was for
         | some reason, but meanwhile someone had already created a decent
         | Swift generator.
        
         | hn_go_brrrrr wrote:
         | Yes: https://developers.google.com/protocol-
         | buffers/docs/kotlintu...
        
           | HNLogInsSuckAss wrote:
           | Those must've been released only in the last couple of years.
           | In 2019 there was still no Kotlin generator. The OP shouldn't
           | have been modded down, because that is indeed pathetic.
           | 
           | https://medium.com/digitalfrontiers/a-dance-with-
           | protocols-k...
        
       | jupp0r wrote:
       | Using CPU utilization as a performance metric can be extremely
       | misleading. My favorite article on the subject is from Brendan
       | Gregg:
       | 
       | http://www.brendangregg.com/blog/2017-05-09/cpu-utilization-...
       | 
       | A much better way to test the influence of the new compiler would
       | be to test the actual throughput at which saturation is achieved
       | (which is what the benchmark in the C++ grpc library measure to
       | assess their performance).
        
         | dkhenry wrote:
         | There is a fairly robust set of benchmarks that are run to test
         | out performance improvements[1] and macro benchmarks are the
         | ultimate test of holistic improvement. CPU isn't a great proxy,
         | but one of the biggest problems in real world performance on
         | this specific system ( databases in general ) is latency. CPU
         | time is a really good proxy for latency so by taking a look at
         | CPU time we can get an idea of how the system will respond
         | under "normal" conditions.
         | 
         | 1.https://benchmark.vitess.io/macrobench
        
         | et1337 wrote:
         | In this case the regression also caused a 3% decrease in
         | throughput.
        
       | gilgad13 wrote:
       | Maybe I'm missing something, but my read of
       | golang/protobuf#364[1] was that part of the motivation for the
       | re-organization in protobuf-go v2 was to allow for optimizations
       | like gogoprotobuf to be developed without requiring a complete
       | fork. I totally understand that the authors of gogoprotobuf do
       | not have the time to re-architect their library to use these
       | hooks, but best I can figure this generator does not use these
       | hooks either. Instead it defines additional member functions, and
       | wrappers that look for those specialized functions and fallback
       | to the generic ones if not found.
       | 
       | For example, it looks like pooled decoders could be implemented
       | by setting a custom unmarshaller through the ProtoMethods[2] API.
       | 
       | I wonder why not? Did the authors of the vtprotobuf extension not
       | want to bite off that much work? Is the new API not sufficient to
       | do what they want (thus failing some of the goals expressed in
       | golang/protobuf#364?
       | 
       | [1]: https://github.com/golang/protobuf/issues/364
       | 
       | [2]:
       | https://pkg.go.dev/google.golang.org/protobuf@v1.26.0/reflec...
        
         | alecthomas wrote:
         | I haven't looked in more detail, but one blocker is that
         | `ProtoMethods() *methods` returns a private type, making it
         | effectively unimplementable outside this package.
        
           | zeeboo wrote:
           | So, I thought this at one point, too. But it turns out that
           | methods is a type alias to an unnamed type, so there's no
           | package level privacy issues:
           | https://github.com/protocolbuffers/protobuf-
           | go/blob/v1.26.0/...
        
             | [deleted]
        
       | shoefindortz wrote:
       | > Arenas are, however, unfeasible to implement in Go because it
       | is a garbage collected language.
       | 
       | If you are willing to use cgo, google already implemented one for
       | gapid.
       | 
       | https://github.com/google/gapid/tree/master/core/memory/aren...
        
         | pjmlp wrote:
         | Not only that, there are other garbage collected languages like
         | D, Nim and C# that offer the language features to do arenas
         | without having to touch any C code.
         | 
         | There is still so much education to do.
        
           | throwaway894345 wrote:
           | Do I misunderstand what arenas are? I thought it was just
           | "allocate this big array as a single allocation rather than N
           | little allocations"? If so, how is that not supported in Go?
           | (e.g., `arena := make([]Foo, 1000000000)`)
        
             | slimsag wrote:
             | An arena allocator allows you to store many allocations _of
             | different types_ in the same single chunk of memory, and
             | then free all of them at one point in time.
        
               | throwaway894345 wrote:
               | Why can't you do this in Go? I'm 99% sure we can allocate
               | a massive array of bytes using safe Go and use unsafe to
               | cast a chunk of bytes to an instance of a type. This
               | isn't type safe, but neither would the equivalent C code.
        
               | slimsag wrote:
               | That's what this whole thread is about: you can literally
               | do just that.
        
               | throwaway894345 wrote:
               | > That's what this whole thread is about: you can
               | literally do just that
               | 
               | I don't know how you get that from the thread:
               | 
               | > Arenas are, however, unfeasible to implement in Go
               | because it is a garbage collected language.
               | 
               | > If you are willing to use cgo, google already
               | implemented one for gapid.
               | 
               | > there are other garbage collected languages like D, Nim
               | and C# that offer the language features to do arenas
               | without having to touch any C code.
               | 
               | It seems like the above statements implicitly or
               | explicitly claim that this isn't feasible in Go without
               | C.
        
               | pjmlp wrote:
               | You are misunderstanding the thread, I just mentioned
               | some of the languages I like (still waiting for Go's
               | generics), and the comment I was replying to made an
               | assert about an implementation that uses cgo.
               | 
               | Both of us are dismissing the assertion that "Arenas are,
               | however, unfeasible to implement in Go because it is a
               | garbage collected language."
               | 
               | You can do manually memory allocation via a syscall into
               | the host OS, use unsafe to cast memory blocks to the
               | types that you want and then clean it all up with defer,
               | assuming the arena is only usable inside a lexical
               | region, otherwise extra care is needed to avoid leaks.
        
               | throwaway894345 wrote:
               | Fair enough.
        
               | shoefindortz wrote:
               | I was proposing the cgo option because it's already
               | implemented.
               | 
               | I _think_ allocating a slice of contiguous bytes and
               | using unsafe pointers should work fine as long as you are
               | very cautious about structs/vars with pointers into the
               | buffer getting freed by the GC.
        
               | throwaway894345 wrote:
               | > I _think_ allocating a slice of contiguous bytes and
               | using unsafe pointers should work fine as long as you are
               | very cautious about structs/vars with pointers into the
               | buffer getting freed by the GC
               | 
               | Go's GC is conservative, so I don't think you need to
               | take any special caution in that regard. I would expect
               | that you just need to take care that your casts are
               | correct (e.g., that you aren't casting overlapping
               | regions of memory as distinct objects).
        
               | acrispino wrote:
               | Go went to a precise GC with version 1.3
        
               | throwaway894345 wrote:
               | Oh wow, I didn't realize.
        
           | p_l wrote:
           | Aren't arenas old news in GC languages in general?
           | 
           | Most of the time, their non-presence is due to general pools
           | being just as good most of the time, or people simply not
           | needing them that much with modern GC
        
             | pjmlp wrote:
             | Yes, so I really did not got how come such assertion was
             | made.
             | 
             | Probably lack of experience with machine friendly code.
        
           | dimitrios1 wrote:
           | I can't believe we've managed to have this lengthy of a
           | discussion about GC languages and speed without anyone
           | mentioning rust. Has HN turned a corner?
        
             | shoefindortz wrote:
             | Rust has an arena allocator too[1], but it is implemented
             | with 165(!!!) usages of unsafe. :)
             | 
             | [1] https://github.com/fitzgen/bumpalo
        
               | coder543 wrote:
               | This is far from the only arena allocator written in
               | Rust.
               | 
               | From the same author, a zero-unsafe arena allocator:
               | https://github.com/fitzgen/generational-arena
               | 
               | There are many, _many_ arena implementations available
               | with varying characteristics. It 's disingenuous to act
               | like Rust requires the author of an arena library to
               | write "unsafe" everywhere.
        
             | pjmlp wrote:
             | Maybe, don't know.
             | 
             | In what concerns me, although I like Rust, I only see it
             | for scenarios where any kind of memory allocation is very
             | precious, Ada/SPARK and MISRA-C style.
             | 
             | I have been using GC languages with C++ like features, or
             | polyglot codebases, for almost 20 years to think otherwise.
             | 
             | Most of the time developers learn about _new_ and miss out
             | on the low level language features.
             | 
             | It is a matter of balance, either trying to do everything
             | in a single language, or eventually write a couple of
             | functions in a lower level language that are then used as
             | building blocks for the rest of the application.
             | 
             | No need to throw away the ecosystem and developer tooling
             | just to rewrite a data structure.
        
               | azth wrote:
               | Would you consider codecs or heavy numerical simulations
               | to fall under those memory allocation scenarios that
               | you'd use Rust for as well?
        
       | flakiness wrote:
       | I wonder what Google is thinking about the v2 performance. It's
       | well known that protobuf processing is taxing heavy on their data
       | center [1]. It's hard to imagine they just leave it slow. Or do
       | they?
       | 
       | [1] https://research.google/pubs/pub44271/
        
         | justicezyx wrote:
         | There was a project to develop a asic (probably bundled inside
         | NIC) to do protobuf parsing. At some point Sanjay did a change
         | to proto API that rendered that project less appealing.
         | 
         | Disclaimer: Google had a lot of internal stuff they considered
         | important to their core tech competencies. For example, no open
         | source about Google paxos APIs and infrastructure, networking,
         | etc.
        
       ___________________________________________________________________
       (page generated 2021-06-03 23:00 UTC)