[HN Gopher] How Shopify made storefront response times 4x faster
       ___________________________________________________________________
        
       How Shopify made storefront response times 4x faster
        
       Author : vaillancourtmax
       Score  : 136 points
       Date   : 2020-08-20 20:39 UTC (2 hours ago)
        
 (HTM) web link (engineering.shopify.com)
 (TXT) w3m dump (engineering.shopify.com)
        
       | momonga wrote:
       | I wish the article detailed the performance issues with the old
       | implementation, and why those issues necessitated a rewrite
       | (other than "strong primitives" and "difficult to retrofit").
        
       | ww520 wrote:
       | The performance related bits:
       | 
       | - Handcrafted SQL.
       | 
       | - Reduce memory usage, e.g. use mutable map.
       | 
       | - Aggressive caching with layers of caches, DB result cache, app
       | level object cache, and HTTP cache. Some DB queries are
       | partitioned and each partitioned result is cached in key-value
       | store.
        
       | tehlike wrote:
       | This is very interesting. N+1 and lazy loading have been a very
       | common problem that profilers can spot, but eager loading also
       | has a cartesian product problem where if you have an an entity
       | with 6 sub item, and 100 of another subitem, you'll end up
       | getting 600 rows to construct a single object / view model.
       | 
       | I have been recently playing with RavenDB (from my all time
       | favorite engineer turned CEO), it approaches most of these as an
       | indexing problem in the database, where the view models are
       | calculated offline as part of indexing pipeline. It approaches
       | the problem from a very pragmatic angle. It's goal is to be a
       | database that is very application centric.
       | 
       | Still to be seen if we will end up adopting, but it'll be
       | interesting to play with.
       | 
       | Disclaimer: I am a former NHibernate contributor, and have been
       | very intimate with AR features and other pitfalls.
        
         | balfirevic wrote:
         | Didn't NHibernate have the cartesian product problem solved in
         | a neat way by having various fetch strategies?
         | 
         | You could specify to eagerly load some collections and have
         | NHibernate issue additional select statement to load the
         | children, producing maximum of 2-3 queries (depending on the
         | eager-loading depth) but avoiding both N+1 problem and
         | cartesian row explosion problem.
        
           | tehlike wrote:
           | yes, that's the common method, but you still end up issuing
           | multiple network calls. The problem wit issuing select
           | statements to load the children is you have to wait on the
           | first query (root) to finish so you can issue others which
           | adds to the network latency (usually low, but it also
           | depends). It's still not as good as having materialized
           | viewmodels on server where you can issue a single query to
           | get everything you need. The disadvantage is the storage
           | cost, though.
        
             | balfirevic wrote:
             | I went and looked at the docs to refresh my memory - there
             | was also a subquery fetch strategy where you didn't have to
             | wait for the root entity to load, but that comes at the
             | expense of searching through data twice - which might or
             | might not be worth it, depending on how complicated the
             | query is.
             | 
             | I do wish relational databases (PostgreSQL and SQL Server
             | specifically, since I work with those) had better support
             | for automatically updated real-time materialized views.
             | 
             | Anyway, thanks for working on NHibernate - I miss some of
             | it's configurability and advanced capabilities.
        
       | pqdbr wrote:
       | Some of the listed optimizations were:
       | 
       | > We carefully vet what we eager-load depending on the type of
       | request and we optimize towards reducing instances of N+1
       | queries.
       | 
       | > Reducing Memory Allocations
       | 
       | > Implementing Efficient Caching Layers
       | 
       | All of those steps seem pretty standard ways of optimizing a
       | Rails application. I wished the article made it clearer why they
       | decided to pursue such a complex route (the whole custom
       | Lua/nginx routing and two applications instead of a monolith).
       | 
       | Shopify surely has tons of Rails experts and I assume they
       | pondered a lot before going for this unusual rewrite, so of
       | course they have their reasons, but I really didn't understand
       | (from the article) what they accomplished here that they couldn't
       | have done in the Rails monolith.
       | 
       | You don't need to ditch Rails if you just don't want to use
       | ActiveRecord.
        
         | [deleted]
        
         | pqdbr wrote:
         | Someone replied but deleted right when I was posting this
         | answer, so I'm replying to myself:
         | 
         | What I didn't understand was why the listed performance
         | optimizations couldn't be implemented in the monolith itself
         | and ensued the development of a new application, which is still
         | Ruby.
         | 
         | In a production env, the request reaches the Rails controller
         | pretty fast.
         | 
         | I know for a fact that the view layer (.html.erb) can be a
         | little slow if you compare it to, say, just a `render json:`,
         | but if you're still going to be sending fully-rendered HTML
         | pages over the wire, the listed optimizations (caching, query
         | optimization and memory allocation) could all be implemented in
         | Rails itself to a huge extent, and that's what I'd love to know
         | more about.
        
           | nthj wrote:
           | They talk about reducing memory allocations. My guess is the
           | rest of the app is very large and they're benefiting from not
           | sharing memory and GC with that.
           | 
           | Of course, everything you said is true for a small-to-medium
           | sized Rails application.
           | 
           | They likely could have explored a separate Rails app to meet
           | this goal, but then they have to maintain the dependency tree
           | and security risks twice. And if the Rails core refactors
           | away any optimizations they make, they have to maintain and
           | integrate with those.
           | 
           | There's definitely some wiggle room and a judgement call here
           | but their custom implementation has merit.
        
           | cbothner wrote:
           | Don't forget that a Shopify store is 100% customizable by
           | merchants using Liquid (Turing complete, not that you should
           | try). There is no .html.erb layer. Think of Storefront
           | Renderer as a Liquid interpreter using optimized presenters
           | for the business models.
        
         | pushrax wrote:
         | (contributor here)
         | 
         | The project does still use code from Rails. Some parts of
         | ActiveSupport in particular are really not worth rewriting, it
         | works fine and has a lot of investment already.
         | 
         | The MVC part of Rails is not used for this project, because the
         | storefront of Shopify works in a very different way than a CRUD
         | app, and doesn't benefit nearly as much. Custom code is a lot
         | smaller and easier to understand and optimize. Outside of
         | storefront, Shopify still benefits a lot from Rails MVC.
         | 
         | I'll also add that storefront serves a majority of requests
         | made to Shopify but it's a surprisingly tiny fraction of the
         | actual code.
        
           | joshmn wrote:
           | > because the storefront of Shopify works in a very different
           | way than a CRUD app
           | 
           | Any interesting/successful patterns you can share/resources
           | you can share on said patterns?
        
             | pushrax wrote:
             | "Not a CRUD app" isn't a design decision, it's just that
             | storefront is almost entirely read-only, and the views are
             | merchant-provided Liquid code.
        
             | tekstar wrote:
             | Shopify's storefront is based around a liquid renderer
             | instance. If you look up how objects are added to the
             | liquid context that is pretty similar to the overall
             | pattern (or at least was back when I worked there, hi
             | pushrax :)
        
           | adrr wrote:
           | If it's rail, can you just expose the rails cache object in
           | liquid? Give control to merchants. That would yield bigger
           | speed improvements.
        
       | notsureaboutpg wrote:
       | Most commenters are focused on the optimizations made, but I
       | actually think the custom routing and verification mechanism is
       | the interesting bit.
       | 
       | That kind of a tool could be handy in lots of scenarios
       | (comparing the same service written in two different languages or
       | with different dependencies, etc).
       | 
       | But how does their verifier mechanism deal with changes in the
       | production database between responses? If the response of the
       | legacy service comes first and the response of the new service
       | comes after, in between both responses (the request being the
       | same) couldn't the data from the database change and thus result
       | in the responses not passing verification when they otherwise
       | should have? How do they manuever around that issue?
       | 
       | Great write-up by the way! I really liked it :)
        
       | polote wrote:
       | tldr: rewrote the backend focusing on speed
       | 
       | Which is good. At Reddit they would have tried to rewrite
       | everything on reasonML and then tried to prove at the end that it
       | is now faster
        
       | kn8 wrote:
       | Is the new implementation still Rails?
        
         | bsaul wrote:
         | That's also my question after reading this post. When trying to
         | shave off milliseconds by going for a full rewrite, moving away
         | from ruby seems like an obvious decision...at least
         | intuitively..
        
           | crispyporkbites wrote:
           | Ruby is more than fast enough for the web
        
           | sbarre wrote:
           | Obvious how?
           | 
           | Are you going to restructure literally thousands of employees
           | and their teams, staffed with Rubyists and organized around
           | your current setup?
           | 
           | Will you re-hire and/or re-train everyone?
           | 
           | That doesn't seem so obvious... At the scale of a team like
           | Shopify, refactoring to a different language is probably a
           | non-starter.
        
             | nicoburns wrote:
             | If you have thousands of rubyists then you surely have
             | hundreds who also know other languages? Seems to make sense
             | to use a fast langauage for the small performance sensitive
             | part of your codebase.
        
               | Jach wrote:
               | Seems also that since Ruby is not going to be taught as
               | part of people's normal formal education in programming,
               | you can expect Rubyists to be on average more capable
               | of... learning new things.
               | 
               | So yes, "re-train". Give everyone a book on the new
               | language, maybe pay for some online courses from
               | pluralsight or wherever, cancel meetings for a week. You
               | can learn _a lot_ faster than in a school environment
               | when you 've got paid 8 hour days to put into a single
               | subject + coworkers to chat with.
               | 
               | Besides, it's not like they don't get to avoid learning
               | new things anyway, even if you restrict it to the Ruby
               | ecosystem. In the JS world (which I'm sure they all know
               | too, as one tends to when working on web sites even if
               | you're mostly back-end) as new revisions of the language
               | come out people have to keep up with the syntax and
               | changing idioms.
               | 
               | "For some reason, programmers _love_ to learn new stuff,
               | as long as it 's not syntax." -- Steve Yegge
        
             | cookiecaper wrote:
             | Yeah. Consider that BigCos end up writing transpilers and
             | new runtimes for their target platforms before rewriting
             | the application, which would entail discarding the decades
             | of built-in bugfixes and application logic as well as
             | reconstructing the organization around a different platform
             | -- HipHop for PHP, Grumpy, etc. A language change is no
             | small thing in any company of appreciable size.
        
           | cutler wrote:
           | Their monolith was written in Rails so Ruby alone was not the
           | source of slow performance. In fact the solution was more to
           | do with cloning the database in order to be able to isolate
           | reads and writes so not even a programming language problem
           | at all.
        
         | k__ wrote:
         | At least it's still Ruby. They wrote how they had to write non-
         | idiomatic Ruby code to get better performance.
        
         | bretthopper wrote:
         | No, it's still Ruby but built directly on top of Rack.
        
           | top_sigrid wrote:
           | How do you know?
        
             | mikeyouse wrote:
             | Because he apparently works at Shopify?
             | 
             | https://twitter.com/swalkinshaw
        
         | mikepurvis wrote:
         | I'm assuming the details of exactly what the new implementation
         | is have been deliberately withheld for some future post where
         | they talk specifics (especially if it's something exciting like
         | Rust/Elixir/Go). This keeps the focus of this post on the
         | approach to migration, using the old implementation as a
         | reference in order to burn down the list of divergences, etc.
        
       ___________________________________________________________________
       (page generated 2020-08-20 23:00 UTC)