[HN Gopher] We upgraded an old, 3PB large, Elasticsearch cluster... ___________________________________________________________________ We upgraded an old, 3PB large, Elasticsearch cluster without downtime Author : ollieparsley Score : 108 points Date : 2022-11-11 15:54 UTC (7 hours ago) (HTM) web link (underthehood.meltwater.com) (TXT) w3m dump (underthehood.meltwater.com) | metadat wrote: | I've heard horror stories from friends about working at | meltwater. Setting that aside for a moment, this is an amazing | software engineering achievement. | | Pulling off this level of scale with Elasticsearch is no easy | feat and very impressive from a technical perspective. When | you're running ES with petabytes of mission critical data as a | core service powering the universe of a business, cluster | rebuilds aren't an option (or maybe they are, as a last resort, | but absolutely will not be acceptable on an ongoing basis). | | Relying on Elasticsearch mega-clusters in this manner is akin to | running an ultra-marathon with a really sharp pair of scissors | glued in each hand. Or maybe even more extreme than my | (admittedly lame) analogy. | | Running nodes with such high shard counts is an appreciably | precarious proposition, because there is a fair amount of | overhead in the Elasticsearch management protocol. I wonder what | the performance testing strategy entailed. | | I have a lot of respect for the engineers working to make this | project and service a success story. When it comes to | Elasticsearch at scale, such outcomes are the exception. | karlney wrote: | Thank you for those kind words. | | And yes we have had our fair share of pain with the old cluster | for sure. | | the new version (7.17) is still behaving a lot better so far | and feels a lot more predictable. | trendy0 wrote: | What were the stories? | karlney wrote: | One time, a few years ago a particularly nasty query was | executed over and over again and it took a few hours to find | it and then block it. | | And during that time so many nodes had became slow and | unresponsive that another (for us) previously unseen memory | leak started to occur. | | Nodes kept building up queues of unanswered ping requests on | them. And the requests contained our 100Mb large cluster | state, so the heaps filled up and evenmore nodes became | unresponsive. | | And from then on the whole thing turned into a death spiral | of doom. | | After trying, and failing to get it under control for 48 | hours we gave up and rebuilt the whole cluster from scratch, | using the snapshots we store on S3. | | The recovery took another 90 hours or so. That was not a fun | week. | andrelaszlo wrote: | I usually don't explain my downvotes, but I thought that your | comment was good overall, but the "horror stories from friends | about working at meltwater" without explaining what they are | just makes it a bit unfair. | | As criticism, it's very vague, and as someone who doesn't work | at Meltwater (for the last 5 years or so at least) it doesn't | give me any information either. Well except that there are | rumors about Meltwater, but that would be true about any large | corporation. | | Maybe I misunderstood and the horror stories were about ES, but | I got it as being about the company itself. Could you expand? | What type of stories? :) | krallja wrote: | While we're telling ES war stories: | | FogBugz was still on twelve ElasticSearch 1.6 nodes when I left | in 2018. We also had a custom plugin (essentially requesting | facets that weren't stored in ElasticSearch back from FogBugz), | which was the main reason we hadn't spent much time thinking | about upgrading it. To keep performance adequate, we scheduled | cache flush operations that, even at the time, we knew were | pants-on-head crazy to be doing in production. I can't remember | if we were running 32-bit or 64-bit with Compressed OOPs. | | Kiln was on an even older version, v1.4 if I remember correctly. | And one of the shards had a corruption warning, yet it didn't | seem to affect stability or results. But that wasn't a fun | cluster to operate, since it refused to do certain types of | maintenance because of the supposed corruption. | | Hopefully the newer versions are easier to migrate between. I | don't remember what exactly was preventing us from upgrading, but | I'm sure part of it was wanting to avoid a full reindex. | rjh29 wrote: | It's good to hear stories of real-world systems. If you only | look at blog posts you get the idea that everyone is doing | everything perfectly, but of course it's not really like that | at all... | andrelaszlo wrote: | Congrats on finishing that monster migration! | | > In order to control how queries are executed, we have built a | plugin which exposes a set of custom query types. We use these | query types to provide functionality and performance | optimisations not available in stock Elasticsearch. For example, | we have implemented wildcards within phrases, with support for | executing within SpanNear queries. We optimise "*" to a match- | all-query. And a whole lot of other things. | | Did you port your the in-house plugins? Seems like a big blocker. | karlney wrote: | Thank you. Yes it was a massive project. | | I don't want to spoil the other blog posts but we managed to | solve almost all of our custom use cases without modifying | elasticsearch itself. We still have one custom plugin but only | to enhance functionality, not for performance and stability | reasons. | semi-extrinsic wrote: | While I fully understand why you run this thing with 300+ | nodes as you do, I have to wonder, just for fun - could you | actually fit this whole thing on a single large server? Looks | like something with 16 TiB RAM and 2 PiB SSD storage is | actually a server you could theoretically buy today? | karlney wrote: | We feel that ~300 nodes strikes a good balance in the | cattle vs pets philosophy. | | Going up to i3en.12xlarge (or equivalent) would probably | have worked as well. | | But after that the impact of loosing just one node would be | too big. | andrelaszlo wrote: | Cool! Will stay tuned for the next post :) | permb wrote: | Such an amazing engineering team that the world doesn't know | about (based in Gothenburg, Sweden). | | Disclaimer: I was once part of it | taf2 wrote: | I did an upgrade with the team from 1.7 to 7.5.2 a few years ago | we used terraform to build the 7.5.2 cluster with about 28 nodes. | First we did a snapshot to upgrade the data from 1.7 to 2.4 and | we synced by having our applications write to both. To get them | to a synced state right before snapshotting we set a redis key | that told our application servers to start writing every document | changed or created to a redis set so we would have a set of all | things changed since snapshot. This was to account for the time | between snapshotting and getting the new cluster up. Once we have | the set of changes synced we could test queries by switching a | customer account to read from 2.4 via another redis set of | upgrade accounts. Once we were confident and saw no new | deprecations we did the process again for 5.6 and the. 7.5... as | I recall we could skip 6.x It was an intense few weeks but | definitely worth it for us. We also cleaned up our deployment to | have a dedicated set of master, data and client nodes. | yeldarb wrote: | Apologies for the shameless plug, but strikes me that this might | be the most relevant place on the Internet right now to reach a | bunch of Elasticsearch experts who might be interested... we're | using Elasticsearch to index over 100M images for multimodal | vector search & looking to expand our team: | https://www.ycombinator.com/companies/roboflow/jobs/fYL4yzG-... | endisneigh wrote: | Is there no other search database that can be persisted other | than Elastic/Lucene/Solr? | | I get that there's little money to be made in these things but | it's surprising. Seems like most full text search are relatively | simple plug-ins to existing databases or in memory only. | bratao wrote: | Yes, there is. We moved from ES to Vespa (vespa.ai) and never | looked back. WE got better results, speed and WAY lower | maintenance costs. I really don't understand how underrated | this project is. | murkt wrote: | How do you deal with Vespa's query language, YQL? | bratao wrote: | I was also suspicious "Great, another one that wants to | reinvent SQL". But in practice it works very well, to the | point of I enjoying it. | morelisp wrote: | Nearly a decade ago (oh god) I converted some overdesigned five | node ES mess to https://github.com/mchaput/whoosh. It's | (obviously) not the fastest or anything, but it was more than | good enough for low-dozens of GBs of mostly static data. ___________________________________________________________________ (page generated 2022-11-11 23:01 UTC)