[HN Gopher] We upgraded an old, 3PB large, Elasticsearch cluster...
       ___________________________________________________________________
        
       We upgraded an old, 3PB large, Elasticsearch cluster without
       downtime
        
       Author : ollieparsley
       Score  : 108 points
       Date   : 2022-11-11 15:54 UTC (7 hours ago)
        
 (HTM) web link (underthehood.meltwater.com)
 (TXT) w3m dump (underthehood.meltwater.com)
        
       | metadat wrote:
       | I've heard horror stories from friends about working at
       | meltwater. Setting that aside for a moment, this is an amazing
       | software engineering achievement.
       | 
       | Pulling off this level of scale with Elasticsearch is no easy
       | feat and very impressive from a technical perspective. When
       | you're running ES with petabytes of mission critical data as a
       | core service powering the universe of a business, cluster
       | rebuilds aren't an option (or maybe they are, as a last resort,
       | but absolutely will not be acceptable on an ongoing basis).
       | 
       | Relying on Elasticsearch mega-clusters in this manner is akin to
       | running an ultra-marathon with a really sharp pair of scissors
       | glued in each hand. Or maybe even more extreme than my
       | (admittedly lame) analogy.
       | 
       | Running nodes with such high shard counts is an appreciably
       | precarious proposition, because there is a fair amount of
       | overhead in the Elasticsearch management protocol. I wonder what
       | the performance testing strategy entailed.
       | 
       | I have a lot of respect for the engineers working to make this
       | project and service a success story. When it comes to
       | Elasticsearch at scale, such outcomes are the exception.
        
         | karlney wrote:
         | Thank you for those kind words.
         | 
         | And yes we have had our fair share of pain with the old cluster
         | for sure.
         | 
         | the new version (7.17) is still behaving a lot better so far
         | and feels a lot more predictable.
        
         | trendy0 wrote:
         | What were the stories?
        
           | karlney wrote:
           | One time, a few years ago a particularly nasty query was
           | executed over and over again and it took a few hours to find
           | it and then block it.
           | 
           | And during that time so many nodes had became slow and
           | unresponsive that another (for us) previously unseen memory
           | leak started to occur.
           | 
           | Nodes kept building up queues of unanswered ping requests on
           | them. And the requests contained our 100Mb large cluster
           | state, so the heaps filled up and evenmore nodes became
           | unresponsive.
           | 
           | And from then on the whole thing turned into a death spiral
           | of doom.
           | 
           | After trying, and failing to get it under control for 48
           | hours we gave up and rebuilt the whole cluster from scratch,
           | using the snapshots we store on S3.
           | 
           | The recovery took another 90 hours or so. That was not a fun
           | week.
        
         | andrelaszlo wrote:
         | I usually don't explain my downvotes, but I thought that your
         | comment was good overall, but the "horror stories from friends
         | about working at meltwater" without explaining what they are
         | just makes it a bit unfair.
         | 
         | As criticism, it's very vague, and as someone who doesn't work
         | at Meltwater (for the last 5 years or so at least) it doesn't
         | give me any information either. Well except that there are
         | rumors about Meltwater, but that would be true about any large
         | corporation.
         | 
         | Maybe I misunderstood and the horror stories were about ES, but
         | I got it as being about the company itself. Could you expand?
         | What type of stories? :)
        
       | krallja wrote:
       | While we're telling ES war stories:
       | 
       | FogBugz was still on twelve ElasticSearch 1.6 nodes when I left
       | in 2018. We also had a custom plugin (essentially requesting
       | facets that weren't stored in ElasticSearch back from FogBugz),
       | which was the main reason we hadn't spent much time thinking
       | about upgrading it. To keep performance adequate, we scheduled
       | cache flush operations that, even at the time, we knew were
       | pants-on-head crazy to be doing in production. I can't remember
       | if we were running 32-bit or 64-bit with Compressed OOPs.
       | 
       | Kiln was on an even older version, v1.4 if I remember correctly.
       | And one of the shards had a corruption warning, yet it didn't
       | seem to affect stability or results. But that wasn't a fun
       | cluster to operate, since it refused to do certain types of
       | maintenance because of the supposed corruption.
       | 
       | Hopefully the newer versions are easier to migrate between. I
       | don't remember what exactly was preventing us from upgrading, but
       | I'm sure part of it was wanting to avoid a full reindex.
        
         | rjh29 wrote:
         | It's good to hear stories of real-world systems. If you only
         | look at blog posts you get the idea that everyone is doing
         | everything perfectly, but of course it's not really like that
         | at all...
        
       | andrelaszlo wrote:
       | Congrats on finishing that monster migration!
       | 
       | > In order to control how queries are executed, we have built a
       | plugin which exposes a set of custom query types. We use these
       | query types to provide functionality and performance
       | optimisations not available in stock Elasticsearch. For example,
       | we have implemented wildcards within phrases, with support for
       | executing within SpanNear queries. We optimise "*" to a match-
       | all-query. And a whole lot of other things.
       | 
       | Did you port your the in-house plugins? Seems like a big blocker.
        
         | karlney wrote:
         | Thank you. Yes it was a massive project.
         | 
         | I don't want to spoil the other blog posts but we managed to
         | solve almost all of our custom use cases without modifying
         | elasticsearch itself. We still have one custom plugin but only
         | to enhance functionality, not for performance and stability
         | reasons.
        
           | semi-extrinsic wrote:
           | While I fully understand why you run this thing with 300+
           | nodes as you do, I have to wonder, just for fun - could you
           | actually fit this whole thing on a single large server? Looks
           | like something with 16 TiB RAM and 2 PiB SSD storage is
           | actually a server you could theoretically buy today?
        
             | karlney wrote:
             | We feel that ~300 nodes strikes a good balance in the
             | cattle vs pets philosophy.
             | 
             | Going up to i3en.12xlarge (or equivalent) would probably
             | have worked as well.
             | 
             | But after that the impact of loosing just one node would be
             | too big.
        
           | andrelaszlo wrote:
           | Cool! Will stay tuned for the next post :)
        
       | permb wrote:
       | Such an amazing engineering team that the world doesn't know
       | about (based in Gothenburg, Sweden).
       | 
       | Disclaimer: I was once part of it
        
       | taf2 wrote:
       | I did an upgrade with the team from 1.7 to 7.5.2 a few years ago
       | we used terraform to build the 7.5.2 cluster with about 28 nodes.
       | First we did a snapshot to upgrade the data from 1.7 to 2.4 and
       | we synced by having our applications write to both. To get them
       | to a synced state right before snapshotting we set a redis key
       | that told our application servers to start writing every document
       | changed or created to a redis set so we would have a set of all
       | things changed since snapshot. This was to account for the time
       | between snapshotting and getting the new cluster up. Once we have
       | the set of changes synced we could test queries by switching a
       | customer account to read from 2.4 via another redis set of
       | upgrade accounts. Once we were confident and saw no new
       | deprecations we did the process again for 5.6 and the. 7.5... as
       | I recall we could skip 6.x It was an intense few weeks but
       | definitely worth it for us. We also cleaned up our deployment to
       | have a dedicated set of master, data and client nodes.
        
       | yeldarb wrote:
       | Apologies for the shameless plug, but strikes me that this might
       | be the most relevant place on the Internet right now to reach a
       | bunch of Elasticsearch experts who might be interested... we're
       | using Elasticsearch to index over 100M images for multimodal
       | vector search & looking to expand our team:
       | https://www.ycombinator.com/companies/roboflow/jobs/fYL4yzG-...
        
       | endisneigh wrote:
       | Is there no other search database that can be persisted other
       | than Elastic/Lucene/Solr?
       | 
       | I get that there's little money to be made in these things but
       | it's surprising. Seems like most full text search are relatively
       | simple plug-ins to existing databases or in memory only.
        
         | bratao wrote:
         | Yes, there is. We moved from ES to Vespa (vespa.ai) and never
         | looked back. WE got better results, speed and WAY lower
         | maintenance costs. I really don't understand how underrated
         | this project is.
        
           | murkt wrote:
           | How do you deal with Vespa's query language, YQL?
        
             | bratao wrote:
             | I was also suspicious "Great, another one that wants to
             | reinvent SQL". But in practice it works very well, to the
             | point of I enjoying it.
        
         | morelisp wrote:
         | Nearly a decade ago (oh god) I converted some overdesigned five
         | node ES mess to https://github.com/mchaput/whoosh. It's
         | (obviously) not the fastest or anything, but it was more than
         | good enough for low-dozens of GBs of mostly static data.
        
       ___________________________________________________________________
       (page generated 2022-11-11 23:01 UTC)