[HN Gopher] The dark side of GraphQL: performance
       ___________________________________________________________________
        
       The dark side of GraphQL: performance
        
       Author : kamranahmedse
       Score  : 84 points
       Date   : 2020-01-01 19:07 UTC (3 hours ago)
        
 (HTM) web link (twitter.com)
 (TXT) w3m dump (twitter.com)
        
       | jensneuse wrote:
       | The title is misleading. The post doesn't discover any dark sides
       | of GraphQL. The post is about a potential performance problem
       | with a library that implements the GraphQL spec. There might be a
       | problem with the library itself. There might be a problem with
       | the use of said library. The author states that it takes 19ms to
       | fetch 20 recipes from a postgres database. This looks really
       | suspicious. Why does it take so long to fetch 20 indexed rows?
       | Maybe there's some general performance problem with the
       | application?
        
         | [deleted]
        
         | ctvo wrote:
         | You focus on and make assumptions it's indexed rows and that
         | ~20ms average for a database call is "suspicious" but you're
         | not concerned about the 400ms flamechart for graphql-js doing
         | validation shown in the thread?
         | 
         | graphql-js is the reference implementation of GraphQL, so it's
         | not any random library.
        
           | jensneuse wrote:
           | The graphql-js library focuses on correctness, not on
           | performance. Facebook doesn't invoke it at runtime, only at
           | built time. They use persistent queries only. If you want a
           | high performance server runtime I wouldn't use Node.JS.
           | Especially for a complicated task like validating and
           | resolving a GraphQL query Node.JS is the wrong tool. It's too
           | high level to tweak hot paths and optimize the garbage
           | collector. So no, I don't think the flame graph is
           | suspicious. In my language of choice (go) I could drill down
           | memory and CPU consumption for each line of code to find the
           | bottleneck. Maybe this is possible for Node.JS too, I don't
           | know the tooling so well. I would suggest that if such tool
           | exists a detailed flame graph of the Node.JS application
           | might help understand the issue.
        
             | hn_throwaway_99 wrote:
             | OP is using Apollo Server, which is by far the most common
             | server implementation for GraphQL. It may well be there are
             | issues specific to Apollo, but it's definitely worth
             | getting to the bottom of based on how widely used Apollo
             | is.
             | 
             | There is nothing in the posts that identifies NodeJS as the
             | culprit, and based on the info I'd be very surprised if it
             | was. It seems most likely that the type validation is what
             | is taking so much time. But then again, strong types are
             | one of the main benefits of GraphQL. If anything, I've
             | found Node to be one of the easiest and most "natural"
             | server languages for GraphQL, and I have implemented
             | GraphQL servers in Node, Java and Python.
        
               | [deleted]
        
             | foota wrote:
             | This comment neglects the fact that people can and do use
             | node in situations where performance is important, and
             | 400ms is especially egregious. Having the obvious path for
             | using graphql in js server side perform terrible is
             | problematic.
        
               | andrewingram wrote:
               | For a point of comparison, when I was using graphql-js
               | around 3 years ago, I was benchmarking things pretty
               | carefully and the main bottleneck wasn't graphql-js -- I
               | had comparable (or faster) response times to equivalent
               | existing hand-crafted JSON endpoints.
               | 
               | But if you're fetching a lot more data than you need for
               | a typical UI, you might run into bottlenecks.
        
             | sciolistse wrote:
             | Regarding profiling in Node.JS (in case anyone who is
             | interested is unaware,) if you start your application with
             | the "--inspect" argument and then open devtools in
             | chrome/chromium there's a little node icon that shows up in
             | the top left corner.
             | 
             | If you click on that you can get performance flame graphs /
             | tables, memory profiling, and there's also a REPL for the
             | process, as well as a list of loaded source files so you
             | can set breakpoints through there if you like, as well as
             | modify the files on the fly if you need something more for
             | debugging.
             | 
             | It can be very useful, and works pretty much the same as
             | the normal web devtools.
        
               | city41 wrote:
               | There is also ndb for a very similar effect. What I
               | really like about ndb is it works in front of just about
               | anything, for example `ndb yarn test`
               | 
               | https://github.com/GoogleChromeLabs/ndb
        
             | swyx wrote:
             | iirc fb doesnt even use graphql-js much? the whole thing
             | was embedded in PHP Ents . graphql-js was written purely
             | for the opensourcing. ofc things may have changed somewhat
             | in recent years but i doubt if
        
       | m_ke wrote:
       | Sounds to me like an issue that comes with coupling of validation
       | with serialization. A lot of these API frameworks combine the
       | two, with a the goal of automating validation when receiving data
       | from clients, but then also do that validation when serializing
       | response data, which should already be validated if it's sitting
       | in your database.
       | 
       | I've ran into similar issues with FastAPI and DRF when dealing
       | with really large payloads.
        
       | hamandcheese wrote:
       | I've seen similar issues in graphql-ruby. Even if I hardcode the
       | data in my resolvers, it takes hundreds to thousands of ms to
       | render a list with some moderate nesting.
        
       | dclowd9901 wrote:
       | Forgive me if this sounds a bit "hindsight 20/20", but I feel
       | like performance was always a lower consideration when it came to
       | utilizing graphql. The win is in reducing overhead around
       | providing new endpoints.
       | 
       | Like react, it eschews performance for the sake of enterprise
       | level scaling. This shouldn't come as a surprise to anyone, being
       | both of these came from one of the largest dev organizations in
       | the world.
        
         | toomim wrote:
         | > performance was always a lower consideration when it came to
         | utilizing graphql
         | 
         | That's strange, because I thought the main selling point was to
         | consume only the data you need. The client specifies exactly
         | which fields it wants. Then it doesn't over-fetch. To make
         | things higher performance.
        
           | wolfgang42 wrote:
           | GraphQL the protocol/language was designed for performance,
           | but (when I tried GraphQL, which was several years ago) the
           | server-side implementations seem to have had much less of a
           | focus on it.
           | 
           | It's true that the _client_ doesn 't over-fetch (and also
           | doesn't need multiple round-trips), but at least when I tried
           | the gql-js library it required the _server_ to over-fetch: it
           | would ask for individual records, and then do the field
           | plucking /record joining itself; there was no way to
           | intercept the query along the way to find out which fields it
           | needed so you could only fetch those.
           | 
           | I get the impression that the server libraries were designed
           | to work with a document store or "fat" REST API that is only
           | capable of taking a single ID and returning the entire
           | record. In this situation it makes sense to have a separate
           | middleware server to keep the big fetches and round-trips
           | inside the datacenter and only give the client exactly what
           | it needs, and needing a little more server power isn't a big
           | deal. But, if you need to do something more sophisticated
           | (even something as simple as only fetching certain fields
           | from the datastore), they were no help whatsoever; when I was
           | looking into it there wasn't even a way to parse the query
           | into an AST and do the rest of the query planning yourself.
        
             | gavinray wrote:
             | Echoing this, GraphQL is just a specification, and it is up
             | to library authors how that spec is implemented.
             | 
             | I think there might be a disconnect or misunderstanding in
             | the developer about this. GraphQL is sort of like the Flux
             | pattern for MVVM architecture. It isn't so much a thing as
             | an idea.
        
         | spamizbad wrote:
         | While it's certainly true performance can be a trade off...
         | 400ms+ response times are annoyingly slow. I'm not sure a trade
         | off is worth it unless it's some really exotic endpoint you've
         | created
        
         | nbardy wrote:
         | Eschew performance isn't the right way to put it. React allows
         | you to do 90% of UI work in performant ways. It has good
         | predictable performance for the majority of work and allows you
         | to move through a lot of simple UI tasks quickly. And spend
         | time focusing on the performance in parts of your app that
         | matter. The situations where you really need performance tuning
         | are going to be unique to your specific app and data.
        
       | picardo wrote:
       | I'm not sure if this is mentioned in the thread, but one of the
       | reason it takes so long for the requests to return is when GQL
       | initializes the entire record in memory and then reduces it back
       | to only the fields you wanted. This can be a big problem if you
       | have a deeply nested data model, and potentially many results.
       | The memory consumption can hit the roof. I find that the best
       | approach in those cases is to create a one-off REST endpoint (or
       | to create a field higher up the GQL hierarchy) and handroll the
       | SQL query.
        
         | benawad wrote:
         | I thought it could have been a memory problem too, but VPS
         | didn't show any signs of anything spiking
         | https://twitter.com/benawad/status/1212404379371917313
         | 
         | but I do think it's related to my nested object
         | https://twitter.com/benawad/status/1212407236284338176
        
         | viraptor wrote:
         | > when GQL initializes the entire record in memory
         | 
         | GQL is an idea not an implementation. I don't believe there's
         | anything preventing actual software from optimising this case.
         | Or am I missing something here? The query defines what you're
         | asking for so extra data does not necessarily need to be
         | fetched.
        
           | xtagon wrote:
           | You're correct. It's common for GraphQL API implementations
           | to batch all the parent record's fields up front, but it's
           | not the only way. One alternative method is to traverse the
           | whole query object and generate one big query to your
           | database (SQL, graph database, what have you) instead of
           | batching queries per table/object. This has trade-offs.
           | Sometimes it's more performant, especially for smaller
           | queries, but for larger queries it can actually be slower
           | because joining lots of tables into one query causes some
           | duplicated data and transfer overhead (assuming you're using
           | SQL). I have a feeling that this method would perform very
           | well if your GraphQL data was backed by an actual graph
           | database though.
        
         | np_tedious wrote:
         | How deeply do you mean "entire record"? I am pretty sure that
         | this concern only applies to those fields which have the same
         | resolver
        
         | greenpizza13 wrote:
         | Things have matured quite a bit. With Apollo Server it's
         | possible to fully understand which fields are being requested
         | before creating and running, for example, and SQL query.
         | Fetching only the requested data for a given query reduces in-
         | memory footprint. Most people get the whole data object and
         | then allow GQL to select the subset of fields the user asked
         | for, but for cases where performance is a problem there is
         | another solution.
        
           | picardo wrote:
           | I haven't used Apollo Server lately. But the way you describe
           | it doesn't address the core issue, which is the
           | initialization of the intermediate objects in-memory. So just
           | to give an example, if I wanted to query for the projects of
           | listings of my company, I can write it this way in GQL:
           | me { company { listings { projects { id name } } } }
           | 
           | This will initialize: a User, a Company, Listings and
           | Projects of all listings.
           | 
           | I can also write this in SQL using a couple of joins and
           | return an array. The memory consumption is trivial in
           | comparison to the original request.
        
           | andrewem wrote:
           | You say "problem there is another solution" - what is the
           | other solution? (I'm guessing it involves somehow telling
           | Apollo Server which fields/related objects you will need?)
        
         | sergiotapia wrote:
         | I'm positive there's a way to only load the relevant data in
         | your stack.
         | 
         | In Elixir with Absinthe we can resolve to the specific fields
         | we need and we don't load the entire records then slim down.
        
           | picardo wrote:
           | I never used Absinthe, but if you're initializing an ORM in
           | your resolver, loading the entire record into memory is
           | unavoidable. How does Absinthe get around that? (Sounds like
           | it generates the SQL?)
        
         | city41 wrote:
         | It's also possible to look at what fields were requested in the
         | GraphQL query and use them to aid what gets fetched.
        
       | CharlesW wrote:
       | Reading the thread, this isn't a "dark side of GraphQL" but a
       | "dark side of not understanding how to debug/improve performance
       | in my software dependency".
        
         | lasdfas wrote:
         | Not sure why you are getting downvoted. The person actually
         | states that they don't know how to debug, "honestly, I'm not
         | 100% sure the best way to debug from here." They are just
         | looking at Datadog stats and not finding the root cause. They
         | could do some basic JS debugging of the open source library to
         | figure out the issue. Blaming Apollo would be a stretch (which
         | may not even be the issue since they haven't done any
         | debugging), but the protocol of graphql is way too far.
        
           | jmull wrote:
           | > Not sure why you are getting downvoted.
           | 
           | I don't know why anyone downvotes as they do, but the
           | previous post is an irrelevant argument about semantics, so
           | in my opinion it deserves to be downvoted.
           | 
           | Actually, now that I think of it, it's a little worse than
           | that. The OP is being criticized for not understanding how to
           | debug or improve the performance of their dependency _while
           | actively engaged in figuring out how to debug and improve the
           | performance of their dependency_. (People respond with
           | questions, OP provides substantive answers, there 's a back-
           | and-forth and OP forms an idea it's related to a deeply
           | nested schema, and so on.)
        
       | ex3ndr wrote:
       | Quite strange, GQL server source code is literally just walking
       | by fields and resolving promises, very simple and
       | straightforward.
       | 
       | We had something like this in our backend, but this long times is
       | usually meant that something wastes event loop and just blocks
       | everything from execution.
       | 
       | It could be anything for example it could be async hooks that
       | makes ~1000 times slower if you are using a lot of promises
       | (since resolving fields often are just promises) since overhead
       | is per promise. In general in latest nodejs you can do huge
       | amount of promises and they have little to no overhead, but,
       | again - something wrong with nodejs setup, some library populate
       | event loop or something deeper in nodejs internals. It is not an
       | issue with gql itself since if you have gql performance issues
       | that means that your server is super slow in processing like
       | anything. Our team was shocked by performance and it turns out
       | that NodeJS is super fast and it is some libraries (like
       | sequelize) that kills the performance, but gql is not one of
       | them.
        
       | coding123 wrote:
       | Hmmmm... if anything the performance of a graphql query should
       | generally outshine REST in nearly any category of performance.
       | From the sound of things, the performance issue doesn't make any
       | sense. He's using Dataloader, and he is certain it's not related
       | to dataloader anyway. So maybe some dependency he's using is the
       | wrong version.
        
       ___________________________________________________________________
       (page generated 2020-01-01 23:00 UTC)