[HN Gopher] MeiliSearch: Zero-config alternative to Elasticsearc...
       ___________________________________________________________________
        
       MeiliSearch: Zero-config alternative to Elasticsearch, made in Rust
        
       Author : qdequelen
       Score  : 142 points
       Date   : 2020-03-25 16:01 UTC (6 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | otterley wrote:
       | MeiliSearch appears to be more of an alternative to Lucene than
       | it is to Elasticsearch. Lucene is the search engine that runs on
       | a single instance; ES is the horizontally-scalable distribution
       | and aggregation layer atop the instances. Absent a similar
       | aggregation layer, MeiliSearch isn't "elastic" as the comparison
       | implies.
        
         | jpgvm wrote:
         | You might be thinking of Solr. Which is the server developed by
         | the Lucene team. Lucene is used in most full-text search
         | systems written in Java.
         | 
         | Also for bonus points there is a distributed version of Solr
         | called Solr Cloud.
        
         | tpayet wrote:
         | Actually Lucene is the library for search that Elastic uses
         | under the hood. Lucene does not provide any HTTP API, which
         | Elastic does. Before using Lucene, you have to build the
         | interface around it.
         | 
         | In this way MeiliSearch is comparable to ES, especially for
         | site search and app search working out of the box as standard
         | with its http api.
         | 
         | MeiliSearch does not offer distribution yet, but it is
         | something the team is working on :)
        
       | MuffinFlavored wrote:
       | The real power of Elasticsearch for me is the ability to filter
       | logs by:
       | 
       | 1. exact match this nested JSON field (with support for lists of
       | values)
       | 
       | 2. negative match this nested JSON field (with support for lists
       | of values)
       | 
       | coupled with the ability to filter by "timeframe", then pump it
       | through to visualizations (tables/graphs) in Kibana
       | 
       | MeiliSearch would be cool if it spoke the API Kibana expects from
       | Elasticsearch
        
       | time0ut wrote:
       | Nice. This looks promising. Very clean API. I like the focus on a
       | narrow use case.
       | 
       | Do you have any information on security topics like using TLS,
       | client authentication, etc?
        
         | Kerollmops wrote:
         | Currently we think this kind of security can be enabled by a
         | simple nginx setup, allowing autorefesh of certificates easily
         | (e.g. certbot). But in the future we will probably handle that
         | in the engine itself.
        
       | throw03172019 wrote:
       | Are the documents stored on disk or only in memory?
        
         | tpayet wrote:
         | We are using LMDB as the key/value store, so the documents are
         | memory-mapped (usually on disk, and in memory when needed)
        
       | seemslegit wrote:
       | Hardly an "alternative to Elastic search" if only because the
       | later is scalable beyond a single machine.
       | 
       | This overhyped description coupled with on-by-default analytics
       | suggests to me MeiliSearch should be dismissed regardless of
       | potential usefulness or technical merit.
        
         | greendave wrote:
         | The analytics seem pretty benign.
         | 
         | "We send events to our Amplitude instance to be aware of the
         | number of people who use MeiliSearch. We only send the platform
         | on which the server runs once by day. No other information is
         | sent. If you do not want us to send events, you can disable
         | these analytics by using the MEILI_NO_ANALYTICS env variable."
        
           | seemslegit wrote:
           | The practice itself is malignant, either explicitly ask upon
           | first run or require a MEILI_YES_ANALYTICS env variable to
           | enable it.
        
             | turdnagel wrote:
             | That would require configuration. This is zero-config.
        
               | computerex wrote:
               | It'd still be zero-config to provide it's primary
               | function. I don't think anyone would say anything against
               | MeiliSearch or not consider it zero-config had they
               | decided to enable analytics off an env var rather than
               | having analytics be sent by default.
        
       | throw03172019 wrote:
       | We use Algolia and use the public API keys with search filters
       | encoded so they can only search their data (I.e. account_id:123)
       | 
       | Is there anything similar here? Otherwise all the queries need to
       | go through our servers first to ensure the filter is present.
        
         | Kerollmops wrote:
         | The current API key system is a simple and temporary solution.
         | 
         | We will work on a more feature-full API key system including
         | the one you are talking about. This is on our roadmap IIRC.
        
       | eliseumds wrote:
       | Pretty heavy user of ES here, and one cannot compare the two
       | products.
        
         | beastman82 wrote:
         | But Rust! Lol
        
       | heipei wrote:
       | I know the project doesn't claim it, but the title somewhat
       | implies this: I honestly don't understand people claiming
       | ElasticSearch is hard to operate, especially not at small scales.
       | If anything, ElasticSearch for me has been one of the easiest
       | pieces of infrastructure to operate, for me pretty much "zero-
       | config". Let me elaborate: You can run ElasticSearch via Docker
       | command-line, if you want a cluster you just supply IPs of the
       | other nodes. Then you start indexing documents with simple HTTP
       | calls. You can add or remove nodes at any time and don't have to
       | do anything but to start another ElasticSearch instance. If you
       | run out of space or performance just start another node.
       | Everything needed for management, indexing, search is available
       | through HTTP APIs, no tools needed.
       | 
       | Clustered ElasticSearch has been rock-solid for me and I've used
       | it in anger many times. The level of maintenance needed is close
       | to zero, both initially and long-term. Compare that with the
       | abysmal experience of setting up a sharded MongoDB cluster for
       | example...
       | 
       | Please enlighten me how ElasticSearch is "a lot of work to
       | operate" (heard that one multiple times), and what you're
       | comparing it to.
        
         | winrid wrote:
         | I've had issues scaling writes to it. You can get around it,
         | but maybe this would be better in a high write environment.
        
         | rodgerd wrote:
         | > I honestly don't understand people claiming ElasticSearch is
         | hard to operate, especially not at small scales.
         | 
         | The problem is that ES is deceptively simple to operate. As
         | millions of people who have found things like their medical
         | records shared with the world can attest.
        
         | jniedrauer wrote:
         | I've been bitten by elasticsearch twice in my career, and I've
         | seen others bitten by it as well. Once you put it in
         | production, you can't just run it from docker on your
         | workstation. You have to set up a cluster with enough capacity
         | for whatever load you're going to throw at it, gracefully
         | handle failures, updates, scaling up as load increases, etc.
         | 
         | There are so many switches and dials to tune, and unless you
         | really learn it in depth, you won't know which ones you need.
         | It's difficult to even determine what hardware requirements you
         | have. And it's a hard sell to tell your business guys "I think
         | elasticsearch will work better if we give it more... CPU?
         | Memory? Disk speed? I'm not really sure." and can't provide any
         | concrete metrics to back that up.
         | 
         | Another place where footguns abound is upgrading from one
         | version to another, _especially_ if you 've got plugins
         | installed. There are tricks that you have to learn the hard
         | way.
         | 
         | At this point, I think long and hard before reaching for a
         | solution like elasticsearch. If I've got a DBA whose entire job
         | it is to master the tech and wield it expertly, that's one
         | thing. But if I'm part of an early stage startup, I just can't
         | justify the lost time and potential for catastrophe.
        
         | dijit wrote:
         | Elasticsearch is easier than mongo in some ways and harder in
         | others.
         | 
         | I run a few 10TiB ES clusters (which, is not much to be fair)
         | but infrequently find that I have to reindex or reshard the
         | cluster because I can't just add another node. There's
         | something to be said for understanding the index rotation too,
         | and access patterns.
         | 
         | It's easy to make an ES cluster, it's difficult to maintain
         | one, it's nearly impossible to debug one _.
         | 
         | _ - if you consider that "it's slow" is what you have to debug.
        
           | jniedrauer wrote:
           | > if you consider that "it's slow" is what you have to debug.
           | 
           | This is exactly it. This is a problem you encounter with
           | every database engine, but in most of them you can quickly
           | find the bottleneck and fix it. With elasticsearch... it's a
           | frustrating and expensive game of trial and error.
        
         | Kerollmops wrote:
         | MeiliSearch is "zero-config" compared to ElasticSearch in terms
         | of setup to make it work for end-user instant and relevant
         | search engine. Our engine follows the Algolia engine in terms
         | of typo-tolerance, relevancy, and speed.
         | 
         | Here is a little comparison to enlighten your questions: https:
         | //docs.meilisearch.com/resources/comparison_to_alterna....
        
           | heipei wrote:
           | Thanks, hadn't seen that, that makes a lot more sense. I
           | agree that ElasticSearch is definitely not "zero-config" when
           | it comes to building certain bespoke applications on it that
           | go beyond simple filtering or query-relevance document
           | search.
        
           | ksec wrote:
           | May be adding Vespa [1] to comparison?
           | 
           | [1] https://vespa.ai
        
       | Bedon292 wrote:
       | While this might be an alternative for that one specific use case
       | (search bar), it does not feel like a viable alternative to ES. I
       | am sure it is great at that specific case, and don't want to
       | knock them on that. But, I have never used ES for a simple search
       | like they are. when I use ES, I want to store billions of records
       | redundantly and search them by text, time, and/or location. And
       | then create visualizations with the results.
       | 
       | When I first read the title I thought it might be a Rust based
       | Lucene engine or something, and thought that would be pretty
       | cool. Though no idea how that would work. On its own, this is a
       | pretty nifty little tool, however I think the framing as an ES
       | alternative is what feels wrong to me, and apparently others in
       | the comments as well.
        
         | ellimilial wrote:
         | Seconding. Text searching is a horribly hairy problem. I know 2
         | businesses for which the main source of income is tuning
         | ES/Solr to particular user needs. Starting from performance,
         | through templating case-specific queries to custom plugins.
        
         | nicoburns wrote:
         | https://github.com/tantivy-search/tantivy is a Rust based
         | Lucene-alike.
        
       | ghh wrote:
       | I wanted to mention Sonic [1] as another lightweight document
       | indexing alternative written in rust, when I found MeiliSearch to
       | provide a thoughtful comparison page [2]
       | 
       | [1] https://github.com/valeriansaliou/sonic
       | 
       | [2]
       | https://docs.meilisearch.com/resources/comparison_to_alterna...
        
       | udfalkso wrote:
       | Sounds more like a potential alternative to Sphinx than Elastic
       | Search.
       | 
       | sphinxsearch.com/
        
       | bryanrasmussen wrote:
       | ok I just looked through things a bit but the phrase 0 config
       | worries me - first off I could conceivably run ElasticSearch with
       | 0 configuration but then it needs to make decisions as to what
       | types things are, and how things should be analyzed, and
       | sometimes those decisions are not what I want.
       | 
       | Often ElasticSearch makes a mistake in typing because the
       | programmer has made a mistake in data format, if you fixed that
       | mistake your data would now not fit the format that ElasticSearch
       | has chosen for it (actually don't know if this is still a problem
       | because it has been years since I have ran without all my fields
       | being mapped first) but actually don't see how it couldn't be a
       | problem.
       | 
       | so theoretically if you didn't want to go through the trouble of
       | defining a wrapping you could just reindex all your data fixed in
       | such a way that ElasticSearch will choose a better type for
       | individual fields but why would you do this?
       | 
       | And I mean what does MelliSearch do? I wonder - because looking
       | through this code here
       | https://github.com/meilisearch/MeiliSearch/blob/master/meili...
       | (and not being a rust guy my understanding of it is probably off)
       | but it seems like maybe it is no configuration because it expects
       | you to follow its semantics. Which to be fair lots of things do,
       | at the base level, everything has a title, description, date.
       | 
       | But if I have a domain with different or probably more advanced
       | semantics what happens?
       | 
       | Search Engines are generally configurable because you want to add
       | other fields and rank hits in those fields higher than other
       | things, or maybe do a specific search that only targets those
       | fields - like say Brands based search.
       | 
       | on preview: lots of other people with similar views it seems, I
       | got maybe a bit ranty just because the title sets me off when it
       | just is so wrong it even seems like lying.
        
       | kvz wrote:
       | Is there already a browser library that can talk to MeiliSearch?
        
         | Kerollmops wrote:
         | Yes, there is, you can find all clients on this documentation
         | page: https://docs.meilisearch.com/resources/sdks.html
         | 
         | Note that we are reworking the js library and there will
         | probably be React integration too!
        
       | ghayes wrote:
       | The goal of ElasticSearch, I always thought, was that it scales
       | horizontally and can handle the loss of multiple nodes without
       | availability- or data-loss. It's interesting to build a single-
       | server replacement, and this can likely work for many use-cases,
       | but it's definitely a different approach from ElasticSearch
       | itself.
        
         | tpayet wrote:
         | Replication for MeiliSearch is on its way :) The main
         | differentiator is that MeiliSearch algorithms are made for end-
         | user search not for complex queries. MeiliSearch focus on site
         | search or app search, not analytics on hyper large datasets
        
           | rjammala wrote:
           | what is the size of the largest dataset that you have indexed
           | with MeiliSearch?
        
             | tpayet wrote:
             | We are currently working with this dataset:
             | https://data.discogs.com/?prefix=data/2020/
             | 
             | It's a dataset of 107M songs, 7.6 Gb of compressed files
             | which represents 250 Gb of disk usage by MeiliSearch. We
             | are indexing the release, song and artists names.
             | 
             | We also work with a dataset of 2M cities that we can index
             | in less than 2 minutes when the db uses 3 shards.
        
           | nodesocket wrote:
           | Is it just replication (can sustain node failures) or also
           | sharding the data?
        
             | tpayet wrote:
             | We are working on both replication (for high availability
             | and we may use the Raft consensus) and distribution
             | (sharding to scale horizontally and keeping low latency)
        
               | rockwotj wrote:
               | Is there a point of contact for this work? A GitHub issue
               | open? This is an area I'd be interested in.
        
       ___________________________________________________________________
       (page generated 2020-03-25 23:00 UTC)