[HN Gopher] How Shopify made storefront response times 4x faster ___________________________________________________________________ How Shopify made storefront response times 4x faster Author : vaillancourtmax Score : 136 points Date : 2020-08-20 20:39 UTC (2 hours ago) (HTM) web link (engineering.shopify.com) (TXT) w3m dump (engineering.shopify.com) | momonga wrote: | I wish the article detailed the performance issues with the old | implementation, and why those issues necessitated a rewrite | (other than "strong primitives" and "difficult to retrofit"). | ww520 wrote: | The performance related bits: | | - Handcrafted SQL. | | - Reduce memory usage, e.g. use mutable map. | | - Aggressive caching with layers of caches, DB result cache, app | level object cache, and HTTP cache. Some DB queries are | partitioned and each partitioned result is cached in key-value | store. | tehlike wrote: | This is very interesting. N+1 and lazy loading have been a very | common problem that profilers can spot, but eager loading also | has a cartesian product problem where if you have an an entity | with 6 sub item, and 100 of another subitem, you'll end up | getting 600 rows to construct a single object / view model. | | I have been recently playing with RavenDB (from my all time | favorite engineer turned CEO), it approaches most of these as an | indexing problem in the database, where the view models are | calculated offline as part of indexing pipeline. It approaches | the problem from a very pragmatic angle. It's goal is to be a | database that is very application centric. | | Still to be seen if we will end up adopting, but it'll be | interesting to play with. | | Disclaimer: I am a former NHibernate contributor, and have been | very intimate with AR features and other pitfalls. | balfirevic wrote: | Didn't NHibernate have the cartesian product problem solved in | a neat way by having various fetch strategies? | | You could specify to eagerly load some collections and have | NHibernate issue additional select statement to load the | children, producing maximum of 2-3 queries (depending on the | eager-loading depth) but avoiding both N+1 problem and | cartesian row explosion problem. | tehlike wrote: | yes, that's the common method, but you still end up issuing | multiple network calls. The problem wit issuing select | statements to load the children is you have to wait on the | first query (root) to finish so you can issue others which | adds to the network latency (usually low, but it also | depends). It's still not as good as having materialized | viewmodels on server where you can issue a single query to | get everything you need. The disadvantage is the storage | cost, though. | balfirevic wrote: | I went and looked at the docs to refresh my memory - there | was also a subquery fetch strategy where you didn't have to | wait for the root entity to load, but that comes at the | expense of searching through data twice - which might or | might not be worth it, depending on how complicated the | query is. | | I do wish relational databases (PostgreSQL and SQL Server | specifically, since I work with those) had better support | for automatically updated real-time materialized views. | | Anyway, thanks for working on NHibernate - I miss some of | it's configurability and advanced capabilities. | pqdbr wrote: | Some of the listed optimizations were: | | > We carefully vet what we eager-load depending on the type of | request and we optimize towards reducing instances of N+1 | queries. | | > Reducing Memory Allocations | | > Implementing Efficient Caching Layers | | All of those steps seem pretty standard ways of optimizing a | Rails application. I wished the article made it clearer why they | decided to pursue such a complex route (the whole custom | Lua/nginx routing and two applications instead of a monolith). | | Shopify surely has tons of Rails experts and I assume they | pondered a lot before going for this unusual rewrite, so of | course they have their reasons, but I really didn't understand | (from the article) what they accomplished here that they couldn't | have done in the Rails monolith. | | You don't need to ditch Rails if you just don't want to use | ActiveRecord. | [deleted] | pqdbr wrote: | Someone replied but deleted right when I was posting this | answer, so I'm replying to myself: | | What I didn't understand was why the listed performance | optimizations couldn't be implemented in the monolith itself | and ensued the development of a new application, which is still | Ruby. | | In a production env, the request reaches the Rails controller | pretty fast. | | I know for a fact that the view layer (.html.erb) can be a | little slow if you compare it to, say, just a `render json:`, | but if you're still going to be sending fully-rendered HTML | pages over the wire, the listed optimizations (caching, query | optimization and memory allocation) could all be implemented in | Rails itself to a huge extent, and that's what I'd love to know | more about. | nthj wrote: | They talk about reducing memory allocations. My guess is the | rest of the app is very large and they're benefiting from not | sharing memory and GC with that. | | Of course, everything you said is true for a small-to-medium | sized Rails application. | | They likely could have explored a separate Rails app to meet | this goal, but then they have to maintain the dependency tree | and security risks twice. And if the Rails core refactors | away any optimizations they make, they have to maintain and | integrate with those. | | There's definitely some wiggle room and a judgement call here | but their custom implementation has merit. | cbothner wrote: | Don't forget that a Shopify store is 100% customizable by | merchants using Liquid (Turing complete, not that you should | try). There is no .html.erb layer. Think of Storefront | Renderer as a Liquid interpreter using optimized presenters | for the business models. | pushrax wrote: | (contributor here) | | The project does still use code from Rails. Some parts of | ActiveSupport in particular are really not worth rewriting, it | works fine and has a lot of investment already. | | The MVC part of Rails is not used for this project, because the | storefront of Shopify works in a very different way than a CRUD | app, and doesn't benefit nearly as much. Custom code is a lot | smaller and easier to understand and optimize. Outside of | storefront, Shopify still benefits a lot from Rails MVC. | | I'll also add that storefront serves a majority of requests | made to Shopify but it's a surprisingly tiny fraction of the | actual code. | joshmn wrote: | > because the storefront of Shopify works in a very different | way than a CRUD app | | Any interesting/successful patterns you can share/resources | you can share on said patterns? | pushrax wrote: | "Not a CRUD app" isn't a design decision, it's just that | storefront is almost entirely read-only, and the views are | merchant-provided Liquid code. | tekstar wrote: | Shopify's storefront is based around a liquid renderer | instance. If you look up how objects are added to the | liquid context that is pretty similar to the overall | pattern (or at least was back when I worked there, hi | pushrax :) | adrr wrote: | If it's rail, can you just expose the rails cache object in | liquid? Give control to merchants. That would yield bigger | speed improvements. | notsureaboutpg wrote: | Most commenters are focused on the optimizations made, but I | actually think the custom routing and verification mechanism is | the interesting bit. | | That kind of a tool could be handy in lots of scenarios | (comparing the same service written in two different languages or | with different dependencies, etc). | | But how does their verifier mechanism deal with changes in the | production database between responses? If the response of the | legacy service comes first and the response of the new service | comes after, in between both responses (the request being the | same) couldn't the data from the database change and thus result | in the responses not passing verification when they otherwise | should have? How do they manuever around that issue? | | Great write-up by the way! I really liked it :) | polote wrote: | tldr: rewrote the backend focusing on speed | | Which is good. At Reddit they would have tried to rewrite | everything on reasonML and then tried to prove at the end that it | is now faster | kn8 wrote: | Is the new implementation still Rails? | bsaul wrote: | That's also my question after reading this post. When trying to | shave off milliseconds by going for a full rewrite, moving away | from ruby seems like an obvious decision...at least | intuitively.. | crispyporkbites wrote: | Ruby is more than fast enough for the web | sbarre wrote: | Obvious how? | | Are you going to restructure literally thousands of employees | and their teams, staffed with Rubyists and organized around | your current setup? | | Will you re-hire and/or re-train everyone? | | That doesn't seem so obvious... At the scale of a team like | Shopify, refactoring to a different language is probably a | non-starter. | nicoburns wrote: | If you have thousands of rubyists then you surely have | hundreds who also know other languages? Seems to make sense | to use a fast langauage for the small performance sensitive | part of your codebase. | Jach wrote: | Seems also that since Ruby is not going to be taught as | part of people's normal formal education in programming, | you can expect Rubyists to be on average more capable | of... learning new things. | | So yes, "re-train". Give everyone a book on the new | language, maybe pay for some online courses from | pluralsight or wherever, cancel meetings for a week. You | can learn _a lot_ faster than in a school environment | when you 've got paid 8 hour days to put into a single | subject + coworkers to chat with. | | Besides, it's not like they don't get to avoid learning | new things anyway, even if you restrict it to the Ruby | ecosystem. In the JS world (which I'm sure they all know | too, as one tends to when working on web sites even if | you're mostly back-end) as new revisions of the language | come out people have to keep up with the syntax and | changing idioms. | | "For some reason, programmers _love_ to learn new stuff, | as long as it 's not syntax." -- Steve Yegge | cookiecaper wrote: | Yeah. Consider that BigCos end up writing transpilers and | new runtimes for their target platforms before rewriting | the application, which would entail discarding the decades | of built-in bugfixes and application logic as well as | reconstructing the organization around a different platform | -- HipHop for PHP, Grumpy, etc. A language change is no | small thing in any company of appreciable size. | cutler wrote: | Their monolith was written in Rails so Ruby alone was not the | source of slow performance. In fact the solution was more to | do with cloning the database in order to be able to isolate | reads and writes so not even a programming language problem | at all. | k__ wrote: | At least it's still Ruby. They wrote how they had to write non- | idiomatic Ruby code to get better performance. | bretthopper wrote: | No, it's still Ruby but built directly on top of Rack. | top_sigrid wrote: | How do you know? | mikeyouse wrote: | Because he apparently works at Shopify? | | https://twitter.com/swalkinshaw | mikepurvis wrote: | I'm assuming the details of exactly what the new implementation | is have been deliberately withheld for some future post where | they talk specifics (especially if it's something exciting like | Rust/Elixir/Go). This keeps the focus of this post on the | approach to migration, using the old implementation as a | reference in order to burn down the list of divergences, etc. ___________________________________________________________________ (page generated 2020-08-20 23:00 UTC)