[HN Gopher] Elasticsearch from the Bottom Up (2013)
       ___________________________________________________________________
        
       Elasticsearch from the Bottom Up (2013)
        
       Author : bobjordan
       Score  : 71 points
       Date   : 2020-01-02 09:54 UTC (1 days ago)
        
 (HTM) web link (www.elastic.co)
 (TXT) w3m dump (www.elastic.co)
        
       | pixelmonkey wrote:
       | There's also a YouTube recording of a talk with similar content
       | by the same author from EuroPython 2014. Helped me out when I was
       | adopting ES at scale in that time period. (And the principles are
       | pretty timeless to modern ES, too.)
       | 
       | https://youtu.be/PpX7J-G2PEo
       | 
       | If you like this, you might also enjoy my deep dive on Lucene
       | (the indexing technology underneath Elasticsearch) in "Lucene:
       | The Good Parts":
       | 
       | https://blog.parse.ly/post/1691/lucene/?utm_source=hn
        
       | bratao wrote:
       | Shameless plug from someone who want this project to flourish.
       | Check https://vespa.ai as an alternative to Elasticsearch.
       | Migrating from a ES to it, I got a faster search, never had to
       | face a unhealthy node and native tensor support (And Native ANN
       | is coming soon https://github.com/vespa-
       | engine/vespa/issues/9747).
       | 
       | Very mature, and still progressing at a neck-break rate
       | (https://blog.vespa.ai)
        
         | atombender wrote:
         | Vespa looks pretty good, at least in terms of performance and
         | operation. I've been evaluating it myself. I'm less happy about
         | everything else.
         | 
         | It's got a mishmash of odd APIs, lots of XML, several query
         | languages, lots of weird little quirks. It doesn't feel modern.
         | It's pretty clear that this is originally an in-house project,
         | developed over many years by many people, where not as much
         | effort has been spent on consistent/cohesive design or
         | documentation.
         | 
         | One rough area is the approach to schemas and indexing. Rather
         | than let you define a "clean" schema and put in _your_ data and
         | then have Vespa index it in all the ways it knows about, you
         | 're forced to essentially reshape your data into a format
         | compatible with Vespa, which brings with it some severe
         | restrictions. For example, Vespa will not index arbitrarily
         | nested structured data. If you have something like {categories:
         | [{id: 1}]}, Vespa will not index that. You have to flatten any
         | array data to the top level. Nested maps and arrays are mostly
         | not supported, although it's hard to tell from the
         | documentation what is supported.
         | 
         | Vespa is also very obviously skewed toward ranking, not
         | filtering. You can't search by exact string matching: You can't
         | do something like "topic = 'news'". You only get case
         | insensitive substring search. It's got lot of ranking functions
         | but very little that's optimized for filtering.
         | 
         | Overall, I'm a bit surprised that Vespa's authors position it
         | as an Elasticsearch competitor, because you certainly cannot
         | just port an app that uses ES over to it.
         | 
         | To be sure, it's got lots of interesting features such as ML
         | integration, and, again, performance and clustering design
         | seems good. But it still feels very much like a niche product.
        
           | bratao wrote:
           | I migrated from ES and for me, I do not agree about the
           | feeling that it doesn't feel modern. The Middleware logic
           | container and Live reconfiguration it is mind blowing. About
           | those two things:
           | 
           | - Nested (For my use cases, this is a problem I do not have.
           | For more complex cases, there is parent-child
           | https://blog.vespa.ai/post/174589826190/parent-child-in-
           | vesp...)
           | 
           | - Exact match ( use the exact match
           | https://docs.vespa.ai/documentation/reference/search-
           | definit... )
        
             | draw_down wrote:
             | GP: I have this problem with Vespa.
             | 
             | Parent: I do not have that problem.
        
             | atombender wrote:
             | By modern I mean the approach to configuring and running,
             | and the myriad of languages used: Antiquated XML for some
             | things, a homegrown DSL for others, JSON for query results,
             | then multiple languages for expressing various parts of the
             | query -- it's pretty chaotic.
             | 
             | Another thing that felt antiquated: The whole notion of
             | uploading an "application". I can appreciate the benefits
             | of controlling the lifecycle of the configuration and have
             | Vespa distribute it to nodes. But when you start out, that
             | "application" is just one or two files, and yet you have to
             | create a whole directory structure for it, as opposed to
             | just POSTing individual configs to REST endpoints like you
             | can do with ES. It feels very "Java".
             | 
             | The document you linked to is a different type of exact
             | match. I've been through this, and even posted a Github
             | issue. Mysteriously, a Vespa developer replied that nobody
             | had ever needed exact string matching, so nobody had
             | bothered to implement it.
             | 
             | Parent/child is not applicable to what I was talking about,
             | I think. I'm not talking about hierarchical relationships.
             | 
             | For my part, most of my work is in structured data, not
             | text or vector-based ranking, and Vespa really doesn't seem
             | to be designed for that.
             | 
             | ES also has a very, very good aggregation API. Vespa's
             | aggregation syntax is odd and seemingly much more limited.
        
       | misterman0 wrote:
       | I used to use Lucene back in the 1.x days when a fuzzy search was
       | a complete table scan. It was quite a surprise to see how your
       | single term fuzzy query was interpreted as one term query for
       | each fuzzy hit OR-ed together. The Lucene team soon realized they
       | needed to code a levenstein automaton but none of them had ever
       | done that before. They pulled several all-nighters reading math
       | papers and coding and when they succeeded they were so happy they
       | told the world about it [0]. It's a great story.
       | 
       | https://dzone.com/articles/lucenes-fuzzyquery-100-times
        
       | inertiatic wrote:
       | One thing that really bothers me about ES is that compared to
       | Solr, some terms that have a specific meaning in Lucene are
       | either not used for the corresponding concept or even worse re-
       | used for a different one.
       | 
       | It sometimes makes explaining the underlying implementation a bit
       | harder to people who are Lucene-agnostic but are ES users, with
       | no good reason apart from, I would guess, brand differentiation?
        
       | MuffinFlavored wrote:
       | Does Elasticsearch need to be as complicated as it is?
       | 
       | I was surprised to find there wasn't an Elasticsearch + Kibana
       | competitor that is "simpler".
       | 
       | I just want to be able to store JSON logs with a timestamp + a
       | bunch of fields then search them in a nice little UI later.
       | Apparently, that's pretty hard to do right.
        
         | fizx wrote:
         | I mean, that product is called LogDNA.
         | 
         | But Elasticsearch has evolved into a whole bunch of things to
         | meet everyone's needs. There's a way to do what you want and
         | simply, but you have to find the simple path in the middle of
         | the big product.
        
         | inertiatic wrote:
         | I think the question you should be asking is, does my use case
         | require a tool as powerful as ES?
         | 
         | And only you can answer that fully.
        
         | softwaredoug wrote:
         | I mean, the same reason that SQL isn't any simpler? It takes
         | arbitrary data, and you can filter and aggregate it in
         | arbitrary ways? People aren't just doing log data... it's just
         | been heavily adopted for that purpose
         | 
         | I think the "simpler" version of ES+Kibana is probably a
         | spreadsheet.
        
         | dijit wrote:
         | ElasticSearch is really just clustered Lucene with some nice
         | features wrapping it. You can probably get away with something
         | that also wraps lucene. Though elasticsearch has a dominant
         | position precisely because it is quite full featured and easy
         | to run.
        
         | chasers wrote:
         | I built https://logflare.app for exactly this.
        
         | jrudolph wrote:
         | My team is using Loki + Grafana and we're pretty happy with it.
         | It's pretty basic but it does what you expect it to do just
         | fine.
         | 
         | We ditched Elastic as it was a super massive PITA to operate (&
         | a resource hog at that). I'll admit I'm not an expert at ELK at
         | all, but tbh I was absolutely surprised just how bad Elastic +
         | Kibana was for our basic log uses when they tout it as one of
         | their mainstays. Or we were just exceptionally stupid, who
         | knows. In any case, the experience we had with it didn't
         | motivate us to become ELK experts at all. Our pet peeves:
         | 
         | - The Kibana UI needlessly wastes tons screen for whitespace
         | 
         | - makes it hard to dig down into logs
         | 
         | - never seems to find exact string matches when we wanted it to
         | and instead returns "helpful" fuzzy matches
         | 
         | - Kibana has no qualms sending requests to Elastic that will
         | happily kill your node instead of applying sensible paging /
         | query timeouts. I mean that's why I'm using Kibana and not
         | writing my own elastic frontend...
        
         | itronitron wrote:
         | Solr is quite a bit more straightforward than ElasticSearch,
         | but still a bit complicated in my opinion.
        
         | mountaineer wrote:
         | > JSON logs with a timestamp + a bunch of fields then search
         | them
         | 
         | There is S3+Athena for this with AWS and Google can store/query
         | JSON with BigQuery. The nice little UI doesn't come with it,
         | but at least you don't have to spin up an Elastic cluster.
        
           | liveoneggs wrote:
           | aws cloudwatch log insights can do a lot natively without the
           | s3/athena step now
        
       | brasetvik wrote:
       | Hey. I wrote that a long time ago. Funny to see it re-surface and
       | glad it's helpful. :)
       | 
       | There is a presentation version of it here:
       | https://www.youtube.com/watch?v=PpX7J-G2PEo&feature=youtu.be (I
       | made the presentation first and then wrote the blog posts)
       | 
       | I wrote a follow-up at called Elasticsearch from the Top Down
       | here: https://www.elastic.co/blog/found-elasticsearch-top-down
        
       ___________________________________________________________________
       (page generated 2020-01-03 23:00 UTC)