[HN Gopher] Elasticsearch from the Bottom Up (2013) ___________________________________________________________________ Elasticsearch from the Bottom Up (2013) Author : bobjordan Score : 71 points Date : 2020-01-02 09:54 UTC (1 days ago) (HTM) web link (www.elastic.co) (TXT) w3m dump (www.elastic.co) | pixelmonkey wrote: | There's also a YouTube recording of a talk with similar content | by the same author from EuroPython 2014. Helped me out when I was | adopting ES at scale in that time period. (And the principles are | pretty timeless to modern ES, too.) | | https://youtu.be/PpX7J-G2PEo | | If you like this, you might also enjoy my deep dive on Lucene | (the indexing technology underneath Elasticsearch) in "Lucene: | The Good Parts": | | https://blog.parse.ly/post/1691/lucene/?utm_source=hn | bratao wrote: | Shameless plug from someone who want this project to flourish. | Check https://vespa.ai as an alternative to Elasticsearch. | Migrating from a ES to it, I got a faster search, never had to | face a unhealthy node and native tensor support (And Native ANN | is coming soon https://github.com/vespa- | engine/vespa/issues/9747). | | Very mature, and still progressing at a neck-break rate | (https://blog.vespa.ai) | atombender wrote: | Vespa looks pretty good, at least in terms of performance and | operation. I've been evaluating it myself. I'm less happy about | everything else. | | It's got a mishmash of odd APIs, lots of XML, several query | languages, lots of weird little quirks. It doesn't feel modern. | It's pretty clear that this is originally an in-house project, | developed over many years by many people, where not as much | effort has been spent on consistent/cohesive design or | documentation. | | One rough area is the approach to schemas and indexing. Rather | than let you define a "clean" schema and put in _your_ data and | then have Vespa index it in all the ways it knows about, you | 're forced to essentially reshape your data into a format | compatible with Vespa, which brings with it some severe | restrictions. For example, Vespa will not index arbitrarily | nested structured data. If you have something like {categories: | [{id: 1}]}, Vespa will not index that. You have to flatten any | array data to the top level. Nested maps and arrays are mostly | not supported, although it's hard to tell from the | documentation what is supported. | | Vespa is also very obviously skewed toward ranking, not | filtering. You can't search by exact string matching: You can't | do something like "topic = 'news'". You only get case | insensitive substring search. It's got lot of ranking functions | but very little that's optimized for filtering. | | Overall, I'm a bit surprised that Vespa's authors position it | as an Elasticsearch competitor, because you certainly cannot | just port an app that uses ES over to it. | | To be sure, it's got lots of interesting features such as ML | integration, and, again, performance and clustering design | seems good. But it still feels very much like a niche product. | bratao wrote: | I migrated from ES and for me, I do not agree about the | feeling that it doesn't feel modern. The Middleware logic | container and Live reconfiguration it is mind blowing. About | those two things: | | - Nested (For my use cases, this is a problem I do not have. | For more complex cases, there is parent-child | https://blog.vespa.ai/post/174589826190/parent-child-in- | vesp...) | | - Exact match ( use the exact match | https://docs.vespa.ai/documentation/reference/search- | definit... ) | draw_down wrote: | GP: I have this problem with Vespa. | | Parent: I do not have that problem. | atombender wrote: | By modern I mean the approach to configuring and running, | and the myriad of languages used: Antiquated XML for some | things, a homegrown DSL for others, JSON for query results, | then multiple languages for expressing various parts of the | query -- it's pretty chaotic. | | Another thing that felt antiquated: The whole notion of | uploading an "application". I can appreciate the benefits | of controlling the lifecycle of the configuration and have | Vespa distribute it to nodes. But when you start out, that | "application" is just one or two files, and yet you have to | create a whole directory structure for it, as opposed to | just POSTing individual configs to REST endpoints like you | can do with ES. It feels very "Java". | | The document you linked to is a different type of exact | match. I've been through this, and even posted a Github | issue. Mysteriously, a Vespa developer replied that nobody | had ever needed exact string matching, so nobody had | bothered to implement it. | | Parent/child is not applicable to what I was talking about, | I think. I'm not talking about hierarchical relationships. | | For my part, most of my work is in structured data, not | text or vector-based ranking, and Vespa really doesn't seem | to be designed for that. | | ES also has a very, very good aggregation API. Vespa's | aggregation syntax is odd and seemingly much more limited. | misterman0 wrote: | I used to use Lucene back in the 1.x days when a fuzzy search was | a complete table scan. It was quite a surprise to see how your | single term fuzzy query was interpreted as one term query for | each fuzzy hit OR-ed together. The Lucene team soon realized they | needed to code a levenstein automaton but none of them had ever | done that before. They pulled several all-nighters reading math | papers and coding and when they succeeded they were so happy they | told the world about it [0]. It's a great story. | | https://dzone.com/articles/lucenes-fuzzyquery-100-times | inertiatic wrote: | One thing that really bothers me about ES is that compared to | Solr, some terms that have a specific meaning in Lucene are | either not used for the corresponding concept or even worse re- | used for a different one. | | It sometimes makes explaining the underlying implementation a bit | harder to people who are Lucene-agnostic but are ES users, with | no good reason apart from, I would guess, brand differentiation? | MuffinFlavored wrote: | Does Elasticsearch need to be as complicated as it is? | | I was surprised to find there wasn't an Elasticsearch + Kibana | competitor that is "simpler". | | I just want to be able to store JSON logs with a timestamp + a | bunch of fields then search them in a nice little UI later. | Apparently, that's pretty hard to do right. | fizx wrote: | I mean, that product is called LogDNA. | | But Elasticsearch has evolved into a whole bunch of things to | meet everyone's needs. There's a way to do what you want and | simply, but you have to find the simple path in the middle of | the big product. | inertiatic wrote: | I think the question you should be asking is, does my use case | require a tool as powerful as ES? | | And only you can answer that fully. | softwaredoug wrote: | I mean, the same reason that SQL isn't any simpler? It takes | arbitrary data, and you can filter and aggregate it in | arbitrary ways? People aren't just doing log data... it's just | been heavily adopted for that purpose | | I think the "simpler" version of ES+Kibana is probably a | spreadsheet. | dijit wrote: | ElasticSearch is really just clustered Lucene with some nice | features wrapping it. You can probably get away with something | that also wraps lucene. Though elasticsearch has a dominant | position precisely because it is quite full featured and easy | to run. | chasers wrote: | I built https://logflare.app for exactly this. | jrudolph wrote: | My team is using Loki + Grafana and we're pretty happy with it. | It's pretty basic but it does what you expect it to do just | fine. | | We ditched Elastic as it was a super massive PITA to operate (& | a resource hog at that). I'll admit I'm not an expert at ELK at | all, but tbh I was absolutely surprised just how bad Elastic + | Kibana was for our basic log uses when they tout it as one of | their mainstays. Or we were just exceptionally stupid, who | knows. In any case, the experience we had with it didn't | motivate us to become ELK experts at all. Our pet peeves: | | - The Kibana UI needlessly wastes tons screen for whitespace | | - makes it hard to dig down into logs | | - never seems to find exact string matches when we wanted it to | and instead returns "helpful" fuzzy matches | | - Kibana has no qualms sending requests to Elastic that will | happily kill your node instead of applying sensible paging / | query timeouts. I mean that's why I'm using Kibana and not | writing my own elastic frontend... | itronitron wrote: | Solr is quite a bit more straightforward than ElasticSearch, | but still a bit complicated in my opinion. | mountaineer wrote: | > JSON logs with a timestamp + a bunch of fields then search | them | | There is S3+Athena for this with AWS and Google can store/query | JSON with BigQuery. The nice little UI doesn't come with it, | but at least you don't have to spin up an Elastic cluster. | liveoneggs wrote: | aws cloudwatch log insights can do a lot natively without the | s3/athena step now | brasetvik wrote: | Hey. I wrote that a long time ago. Funny to see it re-surface and | glad it's helpful. :) | | There is a presentation version of it here: | https://www.youtube.com/watch?v=PpX7J-G2PEo&feature=youtu.be (I | made the presentation first and then wrote the blog posts) | | I wrote a follow-up at called Elasticsearch from the Top Down | here: https://www.elastic.co/blog/found-elasticsearch-top-down ___________________________________________________________________ (page generated 2020-01-03 23:00 UTC)