[HN Gopher] A new ProtoBuf generator for Go ___________________________________________________________________ A new ProtoBuf generator for Go Author : tanoku Score : 206 points Date : 2021-06-03 17:50 UTC (5 hours ago) (HTM) web link (vitess.io) (TXT) w3m dump (vitess.io) | jzelinskie wrote: | I hadn't realized that Gogo was in such a bad spot with the | upstream Go protobuf changes. There was lots of drama when the | changes were made and I guess that overshadowed any optics I had | on Gogo. | | Making vtprotobuf an additional protoc plugin seems like the | Right Thing(tm), although it's a shame how complicated protoc | commands end up becoming for mature projects. I'm pretty tempted | to port Authzed over to this and run some benchmarks -- our | entire service requires e2e latency under 20ms, so every little | bit counts. The biggest performance win is likely just having an | unintrusive interface for pooling allocated protos. | jeffbee wrote: | Proto message unmarshal in Go for a small message should be 5 | orders of magnitude below 20ms, shouldn't even begin to matter | until you are sweating individual microseconds. | lttlrck wrote: | The significance of 20ms isn't clear so this is hard to | judge. | | Perhaps they have significant external (network) latency | leaving only a few ms budget for the application stack - so | they could easily be up against a wall. | morelisp wrote: | Until the GC kicks in and steals a full 200usec + a bunch of | your throughput... | | (Holy shit, who is downvoting this? It's literally the whole | article!) | harikb wrote: | Properly written Go code (or even Java for that matter) | will try to minimize allocations. For Java, unless I am | mistaken pause-less GC is only offered by Azul - $$ | morelisp wrote: | Yeah, the whole point of the article is that gRPC v2 (and | frankly v1 for that matter) are not "properly written" to | do this. | RhodesianHunter wrote: | >or even Java | | Just in case you may be unaware, the latest GCs for Java | (Shenandoah, ZGC) are miles ahead of anything available | for Go due to sheer age and manpower. Parallel and | Pauseless are easily achievable in most cases. | morelisp wrote: | Java's GC is better but Go's GC is also parallel and | "pauseless" - iirc ZGC is 50-500usec which is comparable | to Go's target 200usec. | | The point is, neither is "five orders of magnitude" below | 20ms. And neither needs zero CPU even if it doesn't block | other threads. | geodel wrote: | > Latest GCs for Java (Shenandoah, ZGC) are miles ahead | of anything available. | | Beyond hyperbole, do you have any actual comparison of Go | vs Java GC performance? | throwaway894345 wrote: | If your path is sensitive to 200us of latency you should | probably optimize your application and tune your GC. | Typically 200us for freeing all unreachable memory is not a | big deal. | jcelerier wrote: | > If your path is sensitive to 200us of latency you | should probably optimize your application and tune your | GC. | | okay, you've done this, three years later and it's the | same thing again since you need to accomodate the new | features. your users haven't upgraded their computers. | what do you do ? | brandmeyer wrote: | 3% regression in QPS, 20% regression in CPU, and 5% | regression in memory usage according to the article. Those | are considerably worse than "5 orders of magnitude below". | harikb wrote: | GP meant 5 orders of magnitude below "20 ms". 20 ms is a | lot of time. | | There is nothing one can do to a, say, a 1 kilo byte buffer | that will cross 1 ms in _any_ language. My own Go code | doesn 't cross more than few micros per message. | brandmeyer wrote: | GP's root claim is that protobuf | serialization/deserialization performance shouldn't | matter, on an article where a user is _specifically | demonstrating that it does matter_. | joshuamorton wrote: | The usecase described in the article, and the usecase | described in the top post in this thread aren't the same | usecase. If you aren't throughput bound, a 5% regression | in parse speed doesn't matter if your goal is to stay | under 20ms and parsing takes 17 us. Sure it now takes 19 | us, which is a regression of 2 us out of 20ms, or | 1/10000th of your time. | rapsey wrote: | > our entire service requires e2e latency under 20ms | | Why are you using Go then? | kodah wrote: | 20ms is a pretty considerable amount of time WRT E2E | transaction time in today's world. Can you expand on your | concerns with Go? | mcronce wrote: | It's not really suitable for latency-critical applications. | | EDIT: Fixed unfortunate typo | fcantournet wrote: | You can 100% write services with P999 < 20ms in go. Not | even trying that hard. Go is entirely suitable for this | kind of constraints, I dare say that's go's main target. | | P99 < 1ms, that's when you're going to want to switch it | up. | somethingwitty1 wrote: | was the double-negative intentional? I've used Go for | sub-millisecond needs. So 20ms seems like it would be a | reasonable choice from where I'm sitting. | mcronce wrote: | It was not intentional, thanks for asking...very | unfortunate typo ;) | | Go doesn't give you control over inline vs indirect | allocation, instead relying on escape analysis, which is | notoriously finicky. Seemingly unrelated changes, along | with compiler upgrades, can ruin your carefully optimized | code. | | This is especially heinous because it uses a GC; | unnecessary allocations have a disproportionately large | impact on your application performance. One or the other | wouldn't be nearly as bad. | | Time and time again we see reports from | organizations/projects with perfectly fine average | latency, but horrendous p95+ times, when written in Go - | some going as far as to do straight-up insane | optimizations (see Dragph) or rewrite in other languages. | jen20 wrote: | I'm not sure that the phrasing in the article is particularly | fair: | | > The maintainers of Gogo, understandably, were not up to the | gigantic task. | | I'm 99% sure they are "up to" (as in "capable of") doing so, they | are just not "up for" it (as in, "will not do it"). | jahewson wrote: | Yes I assume the author meant "not up for" | Zababa wrote: | They could be "not up to" because of lack of resources, | probably time and/or money. I think that's what is implied, | rather than lack of technical knowledge. | lux wrote: | I got the sense that they meant "not willing" but I agree | that's one of those English phrases that can easily be | misconstrued towards the more negative interpretation. | | That said, I love the detailed post and the interesting | solution, and the commitment to performance! | n0x1m wrote: | the biggest current problem with Go and ProtoBuf is swagger | support when using it for API returns. Enums are not supported | for example. The leniency of protojson can't be used in other | languages that built on top of the swagger docs. | PostThisTooFast wrote: | Is there one for Kotlin yet? It's pretty pathetic that Google's | own protocol lacks native support for its most popular operating | system. | HNLogInsSuckAss wrote: | Yes, I was surprised by this. We ended up using the Java ones | only two years ago because of the lack of a Kotlin generator. | Google had a blog post talking about what a struggle it was for | some reason, but meanwhile someone had already created a decent | Swift generator. | hn_go_brrrrr wrote: | Yes: https://developers.google.com/protocol- | buffers/docs/kotlintu... | HNLogInsSuckAss wrote: | Those must've been released only in the last couple of years. | In 2019 there was still no Kotlin generator. The OP shouldn't | have been modded down, because that is indeed pathetic. | | https://medium.com/digitalfrontiers/a-dance-with- | protocols-k... | jupp0r wrote: | Using CPU utilization as a performance metric can be extremely | misleading. My favorite article on the subject is from Brendan | Gregg: | | http://www.brendangregg.com/blog/2017-05-09/cpu-utilization-... | | A much better way to test the influence of the new compiler would | be to test the actual throughput at which saturation is achieved | (which is what the benchmark in the C++ grpc library measure to | assess their performance). | dkhenry wrote: | There is a fairly robust set of benchmarks that are run to test | out performance improvements[1] and macro benchmarks are the | ultimate test of holistic improvement. CPU isn't a great proxy, | but one of the biggest problems in real world performance on | this specific system ( databases in general ) is latency. CPU | time is a really good proxy for latency so by taking a look at | CPU time we can get an idea of how the system will respond | under "normal" conditions. | | 1.https://benchmark.vitess.io/macrobench | et1337 wrote: | In this case the regression also caused a 3% decrease in | throughput. | gilgad13 wrote: | Maybe I'm missing something, but my read of | golang/protobuf#364[1] was that part of the motivation for the | re-organization in protobuf-go v2 was to allow for optimizations | like gogoprotobuf to be developed without requiring a complete | fork. I totally understand that the authors of gogoprotobuf do | not have the time to re-architect their library to use these | hooks, but best I can figure this generator does not use these | hooks either. Instead it defines additional member functions, and | wrappers that look for those specialized functions and fallback | to the generic ones if not found. | | For example, it looks like pooled decoders could be implemented | by setting a custom unmarshaller through the ProtoMethods[2] API. | | I wonder why not? Did the authors of the vtprotobuf extension not | want to bite off that much work? Is the new API not sufficient to | do what they want (thus failing some of the goals expressed in | golang/protobuf#364? | | [1]: https://github.com/golang/protobuf/issues/364 | | [2]: | https://pkg.go.dev/google.golang.org/protobuf@v1.26.0/reflec... | alecthomas wrote: | I haven't looked in more detail, but one blocker is that | `ProtoMethods() *methods` returns a private type, making it | effectively unimplementable outside this package. | zeeboo wrote: | So, I thought this at one point, too. But it turns out that | methods is a type alias to an unnamed type, so there's no | package level privacy issues: | https://github.com/protocolbuffers/protobuf- | go/blob/v1.26.0/... | [deleted] | shoefindortz wrote: | > Arenas are, however, unfeasible to implement in Go because it | is a garbage collected language. | | If you are willing to use cgo, google already implemented one for | gapid. | | https://github.com/google/gapid/tree/master/core/memory/aren... | pjmlp wrote: | Not only that, there are other garbage collected languages like | D, Nim and C# that offer the language features to do arenas | without having to touch any C code. | | There is still so much education to do. | throwaway894345 wrote: | Do I misunderstand what arenas are? I thought it was just | "allocate this big array as a single allocation rather than N | little allocations"? If so, how is that not supported in Go? | (e.g., `arena := make([]Foo, 1000000000)`) | slimsag wrote: | An arena allocator allows you to store many allocations _of | different types_ in the same single chunk of memory, and | then free all of them at one point in time. | throwaway894345 wrote: | Why can't you do this in Go? I'm 99% sure we can allocate | a massive array of bytes using safe Go and use unsafe to | cast a chunk of bytes to an instance of a type. This | isn't type safe, but neither would the equivalent C code. | slimsag wrote: | That's what this whole thread is about: you can literally | do just that. | throwaway894345 wrote: | > That's what this whole thread is about: you can | literally do just that | | I don't know how you get that from the thread: | | > Arenas are, however, unfeasible to implement in Go | because it is a garbage collected language. | | > If you are willing to use cgo, google already | implemented one for gapid. | | > there are other garbage collected languages like D, Nim | and C# that offer the language features to do arenas | without having to touch any C code. | | It seems like the above statements implicitly or | explicitly claim that this isn't feasible in Go without | C. | pjmlp wrote: | You are misunderstanding the thread, I just mentioned | some of the languages I like (still waiting for Go's | generics), and the comment I was replying to made an | assert about an implementation that uses cgo. | | Both of us are dismissing the assertion that "Arenas are, | however, unfeasible to implement in Go because it is a | garbage collected language." | | You can do manually memory allocation via a syscall into | the host OS, use unsafe to cast memory blocks to the | types that you want and then clean it all up with defer, | assuming the arena is only usable inside a lexical | region, otherwise extra care is needed to avoid leaks. | throwaway894345 wrote: | Fair enough. | shoefindortz wrote: | I was proposing the cgo option because it's already | implemented. | | I _think_ allocating a slice of contiguous bytes and | using unsafe pointers should work fine as long as you are | very cautious about structs/vars with pointers into the | buffer getting freed by the GC. | throwaway894345 wrote: | > I _think_ allocating a slice of contiguous bytes and | using unsafe pointers should work fine as long as you are | very cautious about structs/vars with pointers into the | buffer getting freed by the GC | | Go's GC is conservative, so I don't think you need to | take any special caution in that regard. I would expect | that you just need to take care that your casts are | correct (e.g., that you aren't casting overlapping | regions of memory as distinct objects). | acrispino wrote: | Go went to a precise GC with version 1.3 | throwaway894345 wrote: | Oh wow, I didn't realize. | p_l wrote: | Aren't arenas old news in GC languages in general? | | Most of the time, their non-presence is due to general pools | being just as good most of the time, or people simply not | needing them that much with modern GC | pjmlp wrote: | Yes, so I really did not got how come such assertion was | made. | | Probably lack of experience with machine friendly code. | dimitrios1 wrote: | I can't believe we've managed to have this lengthy of a | discussion about GC languages and speed without anyone | mentioning rust. Has HN turned a corner? | shoefindortz wrote: | Rust has an arena allocator too[1], but it is implemented | with 165(!!!) usages of unsafe. :) | | [1] https://github.com/fitzgen/bumpalo | coder543 wrote: | This is far from the only arena allocator written in | Rust. | | From the same author, a zero-unsafe arena allocator: | https://github.com/fitzgen/generational-arena | | There are many, _many_ arena implementations available | with varying characteristics. It 's disingenuous to act | like Rust requires the author of an arena library to | write "unsafe" everywhere. | pjmlp wrote: | Maybe, don't know. | | In what concerns me, although I like Rust, I only see it | for scenarios where any kind of memory allocation is very | precious, Ada/SPARK and MISRA-C style. | | I have been using GC languages with C++ like features, or | polyglot codebases, for almost 20 years to think otherwise. | | Most of the time developers learn about _new_ and miss out | on the low level language features. | | It is a matter of balance, either trying to do everything | in a single language, or eventually write a couple of | functions in a lower level language that are then used as | building blocks for the rest of the application. | | No need to throw away the ecosystem and developer tooling | just to rewrite a data structure. | azth wrote: | Would you consider codecs or heavy numerical simulations | to fall under those memory allocation scenarios that | you'd use Rust for as well? | flakiness wrote: | I wonder what Google is thinking about the v2 performance. It's | well known that protobuf processing is taxing heavy on their data | center [1]. It's hard to imagine they just leave it slow. Or do | they? | | [1] https://research.google/pubs/pub44271/ | justicezyx wrote: | There was a project to develop a asic (probably bundled inside | NIC) to do protobuf parsing. At some point Sanjay did a change | to proto API that rendered that project less appealing. | | Disclaimer: Google had a lot of internal stuff they considered | important to their core tech competencies. For example, no open | source about Google paxos APIs and infrastructure, networking, | etc. ___________________________________________________________________ (page generated 2021-06-03 23:00 UTC)