[HN Gopher] MeiliSearch: Zero-config alternative to Elasticsearc... ___________________________________________________________________ MeiliSearch: Zero-config alternative to Elasticsearch, made in Rust Author : qdequelen Score : 142 points Date : 2020-03-25 16:01 UTC (6 hours ago) (HTM) web link (github.com) (TXT) w3m dump (github.com) | otterley wrote: | MeiliSearch appears to be more of an alternative to Lucene than | it is to Elasticsearch. Lucene is the search engine that runs on | a single instance; ES is the horizontally-scalable distribution | and aggregation layer atop the instances. Absent a similar | aggregation layer, MeiliSearch isn't "elastic" as the comparison | implies. | jpgvm wrote: | You might be thinking of Solr. Which is the server developed by | the Lucene team. Lucene is used in most full-text search | systems written in Java. | | Also for bonus points there is a distributed version of Solr | called Solr Cloud. | tpayet wrote: | Actually Lucene is the library for search that Elastic uses | under the hood. Lucene does not provide any HTTP API, which | Elastic does. Before using Lucene, you have to build the | interface around it. | | In this way MeiliSearch is comparable to ES, especially for | site search and app search working out of the box as standard | with its http api. | | MeiliSearch does not offer distribution yet, but it is | something the team is working on :) | MuffinFlavored wrote: | The real power of Elasticsearch for me is the ability to filter | logs by: | | 1. exact match this nested JSON field (with support for lists of | values) | | 2. negative match this nested JSON field (with support for lists | of values) | | coupled with the ability to filter by "timeframe", then pump it | through to visualizations (tables/graphs) in Kibana | | MeiliSearch would be cool if it spoke the API Kibana expects from | Elasticsearch | time0ut wrote: | Nice. This looks promising. Very clean API. I like the focus on a | narrow use case. | | Do you have any information on security topics like using TLS, | client authentication, etc? | Kerollmops wrote: | Currently we think this kind of security can be enabled by a | simple nginx setup, allowing autorefesh of certificates easily | (e.g. certbot). But in the future we will probably handle that | in the engine itself. | throw03172019 wrote: | Are the documents stored on disk or only in memory? | tpayet wrote: | We are using LMDB as the key/value store, so the documents are | memory-mapped (usually on disk, and in memory when needed) | seemslegit wrote: | Hardly an "alternative to Elastic search" if only because the | later is scalable beyond a single machine. | | This overhyped description coupled with on-by-default analytics | suggests to me MeiliSearch should be dismissed regardless of | potential usefulness or technical merit. | greendave wrote: | The analytics seem pretty benign. | | "We send events to our Amplitude instance to be aware of the | number of people who use MeiliSearch. We only send the platform | on which the server runs once by day. No other information is | sent. If you do not want us to send events, you can disable | these analytics by using the MEILI_NO_ANALYTICS env variable." | seemslegit wrote: | The practice itself is malignant, either explicitly ask upon | first run or require a MEILI_YES_ANALYTICS env variable to | enable it. | turdnagel wrote: | That would require configuration. This is zero-config. | computerex wrote: | It'd still be zero-config to provide it's primary | function. I don't think anyone would say anything against | MeiliSearch or not consider it zero-config had they | decided to enable analytics off an env var rather than | having analytics be sent by default. | throw03172019 wrote: | We use Algolia and use the public API keys with search filters | encoded so they can only search their data (I.e. account_id:123) | | Is there anything similar here? Otherwise all the queries need to | go through our servers first to ensure the filter is present. | Kerollmops wrote: | The current API key system is a simple and temporary solution. | | We will work on a more feature-full API key system including | the one you are talking about. This is on our roadmap IIRC. | eliseumds wrote: | Pretty heavy user of ES here, and one cannot compare the two | products. | beastman82 wrote: | But Rust! Lol | heipei wrote: | I know the project doesn't claim it, but the title somewhat | implies this: I honestly don't understand people claiming | ElasticSearch is hard to operate, especially not at small scales. | If anything, ElasticSearch for me has been one of the easiest | pieces of infrastructure to operate, for me pretty much "zero- | config". Let me elaborate: You can run ElasticSearch via Docker | command-line, if you want a cluster you just supply IPs of the | other nodes. Then you start indexing documents with simple HTTP | calls. You can add or remove nodes at any time and don't have to | do anything but to start another ElasticSearch instance. If you | run out of space or performance just start another node. | Everything needed for management, indexing, search is available | through HTTP APIs, no tools needed. | | Clustered ElasticSearch has been rock-solid for me and I've used | it in anger many times. The level of maintenance needed is close | to zero, both initially and long-term. Compare that with the | abysmal experience of setting up a sharded MongoDB cluster for | example... | | Please enlighten me how ElasticSearch is "a lot of work to | operate" (heard that one multiple times), and what you're | comparing it to. | winrid wrote: | I've had issues scaling writes to it. You can get around it, | but maybe this would be better in a high write environment. | rodgerd wrote: | > I honestly don't understand people claiming ElasticSearch is | hard to operate, especially not at small scales. | | The problem is that ES is deceptively simple to operate. As | millions of people who have found things like their medical | records shared with the world can attest. | jniedrauer wrote: | I've been bitten by elasticsearch twice in my career, and I've | seen others bitten by it as well. Once you put it in | production, you can't just run it from docker on your | workstation. You have to set up a cluster with enough capacity | for whatever load you're going to throw at it, gracefully | handle failures, updates, scaling up as load increases, etc. | | There are so many switches and dials to tune, and unless you | really learn it in depth, you won't know which ones you need. | It's difficult to even determine what hardware requirements you | have. And it's a hard sell to tell your business guys "I think | elasticsearch will work better if we give it more... CPU? | Memory? Disk speed? I'm not really sure." and can't provide any | concrete metrics to back that up. | | Another place where footguns abound is upgrading from one | version to another, _especially_ if you 've got plugins | installed. There are tricks that you have to learn the hard | way. | | At this point, I think long and hard before reaching for a | solution like elasticsearch. If I've got a DBA whose entire job | it is to master the tech and wield it expertly, that's one | thing. But if I'm part of an early stage startup, I just can't | justify the lost time and potential for catastrophe. | dijit wrote: | Elasticsearch is easier than mongo in some ways and harder in | others. | | I run a few 10TiB ES clusters (which, is not much to be fair) | but infrequently find that I have to reindex or reshard the | cluster because I can't just add another node. There's | something to be said for understanding the index rotation too, | and access patterns. | | It's easy to make an ES cluster, it's difficult to maintain | one, it's nearly impossible to debug one _. | | _ - if you consider that "it's slow" is what you have to debug. | jniedrauer wrote: | > if you consider that "it's slow" is what you have to debug. | | This is exactly it. This is a problem you encounter with | every database engine, but in most of them you can quickly | find the bottleneck and fix it. With elasticsearch... it's a | frustrating and expensive game of trial and error. | Kerollmops wrote: | MeiliSearch is "zero-config" compared to ElasticSearch in terms | of setup to make it work for end-user instant and relevant | search engine. Our engine follows the Algolia engine in terms | of typo-tolerance, relevancy, and speed. | | Here is a little comparison to enlighten your questions: https: | //docs.meilisearch.com/resources/comparison_to_alterna.... | heipei wrote: | Thanks, hadn't seen that, that makes a lot more sense. I | agree that ElasticSearch is definitely not "zero-config" when | it comes to building certain bespoke applications on it that | go beyond simple filtering or query-relevance document | search. | ksec wrote: | May be adding Vespa [1] to comparison? | | [1] https://vespa.ai | Bedon292 wrote: | While this might be an alternative for that one specific use case | (search bar), it does not feel like a viable alternative to ES. I | am sure it is great at that specific case, and don't want to | knock them on that. But, I have never used ES for a simple search | like they are. when I use ES, I want to store billions of records | redundantly and search them by text, time, and/or location. And | then create visualizations with the results. | | When I first read the title I thought it might be a Rust based | Lucene engine or something, and thought that would be pretty | cool. Though no idea how that would work. On its own, this is a | pretty nifty little tool, however I think the framing as an ES | alternative is what feels wrong to me, and apparently others in | the comments as well. | ellimilial wrote: | Seconding. Text searching is a horribly hairy problem. I know 2 | businesses for which the main source of income is tuning | ES/Solr to particular user needs. Starting from performance, | through templating case-specific queries to custom plugins. | nicoburns wrote: | https://github.com/tantivy-search/tantivy is a Rust based | Lucene-alike. | ghh wrote: | I wanted to mention Sonic [1] as another lightweight document | indexing alternative written in rust, when I found MeiliSearch to | provide a thoughtful comparison page [2] | | [1] https://github.com/valeriansaliou/sonic | | [2] | https://docs.meilisearch.com/resources/comparison_to_alterna... | udfalkso wrote: | Sounds more like a potential alternative to Sphinx than Elastic | Search. | | sphinxsearch.com/ | bryanrasmussen wrote: | ok I just looked through things a bit but the phrase 0 config | worries me - first off I could conceivably run ElasticSearch with | 0 configuration but then it needs to make decisions as to what | types things are, and how things should be analyzed, and | sometimes those decisions are not what I want. | | Often ElasticSearch makes a mistake in typing because the | programmer has made a mistake in data format, if you fixed that | mistake your data would now not fit the format that ElasticSearch | has chosen for it (actually don't know if this is still a problem | because it has been years since I have ran without all my fields | being mapped first) but actually don't see how it couldn't be a | problem. | | so theoretically if you didn't want to go through the trouble of | defining a wrapping you could just reindex all your data fixed in | such a way that ElasticSearch will choose a better type for | individual fields but why would you do this? | | And I mean what does MelliSearch do? I wonder - because looking | through this code here | https://github.com/meilisearch/MeiliSearch/blob/master/meili... | (and not being a rust guy my understanding of it is probably off) | but it seems like maybe it is no configuration because it expects | you to follow its semantics. Which to be fair lots of things do, | at the base level, everything has a title, description, date. | | But if I have a domain with different or probably more advanced | semantics what happens? | | Search Engines are generally configurable because you want to add | other fields and rank hits in those fields higher than other | things, or maybe do a specific search that only targets those | fields - like say Brands based search. | | on preview: lots of other people with similar views it seems, I | got maybe a bit ranty just because the title sets me off when it | just is so wrong it even seems like lying. | kvz wrote: | Is there already a browser library that can talk to MeiliSearch? | Kerollmops wrote: | Yes, there is, you can find all clients on this documentation | page: https://docs.meilisearch.com/resources/sdks.html | | Note that we are reworking the js library and there will | probably be React integration too! | ghayes wrote: | The goal of ElasticSearch, I always thought, was that it scales | horizontally and can handle the loss of multiple nodes without | availability- or data-loss. It's interesting to build a single- | server replacement, and this can likely work for many use-cases, | but it's definitely a different approach from ElasticSearch | itself. | tpayet wrote: | Replication for MeiliSearch is on its way :) The main | differentiator is that MeiliSearch algorithms are made for end- | user search not for complex queries. MeiliSearch focus on site | search or app search, not analytics on hyper large datasets | rjammala wrote: | what is the size of the largest dataset that you have indexed | with MeiliSearch? | tpayet wrote: | We are currently working with this dataset: | https://data.discogs.com/?prefix=data/2020/ | | It's a dataset of 107M songs, 7.6 Gb of compressed files | which represents 250 Gb of disk usage by MeiliSearch. We | are indexing the release, song and artists names. | | We also work with a dataset of 2M cities that we can index | in less than 2 minutes when the db uses 3 shards. | nodesocket wrote: | Is it just replication (can sustain node failures) or also | sharding the data? | tpayet wrote: | We are working on both replication (for high availability | and we may use the Raft consensus) and distribution | (sharding to scale horizontally and keeping low latency) | rockwotj wrote: | Is there a point of contact for this work? A GitHub issue | open? This is an area I'd be interested in. ___________________________________________________________________ (page generated 2020-03-25 23:00 UTC)