[HN Gopher] Postgres full-text search: A search engine in a data... ___________________________________________________________________ Postgres full-text search: A search engine in a database (2021) Author : craigkerstiens Score : 126 points Date : 2022-07-11 18:07 UTC (4 hours ago) (HTM) web link (www.crunchydata.com) (TXT) w3m dump (www.crunchydata.com) | itsthecourier wrote: | we built a listings site similar to ebay in latin america after | buying the operation from an european company, | https://corotos.com | | we made use of bson in the listings table for metadata and FTS | inside postgres. spanish dictionaries, weighting, white-listings | of words and including metadata inside the tsvectors have been | awesome. really quick search, you never have unsync documents in | another part of the stack and you save a lot in storage | | glad to help anybody going this path, no regrets. | cpursley wrote: | A writeup or gist of this approach would be awesome. | mixcocam wrote: | I think SQLite is even more batteries included! | https://www.sqlite.org/fts5.html | canadiantim wrote: | More difficult to do facets with sqlite vs postgres though | shakezula wrote: | I used SQLite's built in fuzzy search and it is surprisingly | powerful. I used it to for full text search over a magic the | gathering card database and it was amazing. | cduzz wrote: | I've often wondered why someone doesn't reimplement | elasticsearch using sqlite as the back-end datastore, instead | of lucene. | | Or did I just describe foundationDB? | | I guess the real problem is that such a beast wouldn't have | kibana, and people _want_ kibana and put up with | elasticsearch to get it. | jfbaro wrote: | I have seen this presentation from a company that used PG Full | Text search for a pretty complex use case. Interesting -> | http://matheusoliveira.s3-website-us-east-1.amazonaws.com/pr... | (updated thanks to ddevault) | RowanH wrote: | Good link thinks. Have to say we're moving to pgSearch after | implementing SOLR alongside a PG/Rails backend. Makes the stack | simpler, less components to worry about (less headaches for gem | versioning dependencies with SOLR). Lot to be said for it after | reading through what's available now... | sandGorgon wrote: | are you planning to do relevance ? this is one of the | blockers for us - TFIDF vs BM25 or something. | | From what i know - PG fulltext seaarch does not implement | relevance. | rocmcd wrote: | This has been my understanding of the state of Postgres | full-text search. It's great if your search requirements | are fairly vanilla, but I haven't seen any solutions for | more advanced search needs, such boosting, relevance, | scoring, etc. | mamcx wrote: | An alternative is to just layer the FTS on top of vanilla | sql to get the extra stuff (this is what I do for my | eCommerce backend), so is pretty simple to have something | alike: SELECT .. -- Get the fts | IN ( FTS QUERY) ORDER BY -- The relevance is | hardcoded? mayber in another table that store th rankings? | ( Products, Inventory, | Invoices,.. ) | | I found is much easier and predictable if I code the | "rankings" based on the business logic instead of let the | FTS engine guess it. You can store that stuff as normal | columns or use the "sources" (ie: products, inventory) as | ways to know what could be more important to pull first. | | This have the nice property that our search results are ver | good and better: Never return non-sensical stuff! (like | searching for a apple in the store and get and blog post!) | ddevault wrote: | Please don't use URL shorteners. | | http://matheusoliveira.s3-website-us-east-1.amazonaws.com/pr... | jfbaro wrote: | Updated! Thanks | lettergram wrote: | I've written some tips on this before: | https://austingwalters.com/fast-full-text-search-in-postgres... | | Basically, it's easy to create triggers that'll update an index | for you on entry. Then you can create custom indices that have | key words, etc. | Latty wrote: | Is there a benefit to using a trigger over a generated stored | column? | lettergram wrote: | You can technically do way more robust things in a trigger. | If you don't need it then :shrug: but I often create fairly | robust updates | crtxcr wrote: | I was a bit disappointed by some limitations back then when I | tried it for a project of mine. When searching phrases where | ordering matters, phraseto_tsquery() does not quite work for | larger documents, as the tsvector position values are quite | limited: https://www.postgresql.org/docs/14/textsearch- | limitations.ht... Here I had much better success with sqlite's | FTS implemention. | cpursley wrote: | Here's the approach I settled on that allows for fuzzy search | (also some useful links at the bottom): | | https://gist.github.com/cpursley/e3586382c3a42c54ca7f5fef166... | | Downside is I haven't found a way to do weighting. | panabee wrote: | slightly off-topic, but if the goal is to store vast amounts of | protein and gene nucleotide sequences and allow wildcard searches | (e.g., %CTAAACGGG%), would postgres be the recommended approach? | | assume 200K proteins/genes with nucleotide sequences ranging from | 20 base pairs to 2.3 million base pairs. | giovannibonetti wrote: | Pg_trgrm based indexes works well to make this type of LIKE | %...% queries fast | jonatron wrote: | Last time I looked, Postgres full text search doesn't really do | CJK without third party parsers, which can be a bit of a deal | breaker. | zkirill wrote: | I love Postgres but lack of CJK full-text search is the main | reason why I am going with Lucene/Solr for in-app search. | | It's possible to make Postgres do amazing things (e.g. job | queue with SKIP LOCKED!) but at what point do you draw the line | and use the right tool for the job? | blowski wrote: | Any idea how well CJK works in general with Postgres? Or is | it just on full text where it's lacking? | zkirill wrote: | In general, we have absolutely no problem with CJK in | Postgres. Search is very application specific and some of | our users want to do things like kana + kanji search. [1] I | don't even know where you would start with that in | Postgres. | | We also need stuff like language detection and analysis of | mixed language data. | | A big advantage of going with a dedicated search tool is | that it teaches you what you don't know about search, and | it turned out that we knew pretty much nothing. | | [1] https://www.elastic.co/blog/implementing-japanese- | autocomple... | mistrial9 wrote: | the Japanese national phone company NTT is a major | postgresql contributor | chucky_z wrote: | When the feature doesn't exist! For existing | workflows/functionality; I think using a new tool for a | feature that doesn't (and won't for some time) exist is a | _perfectly_ good reason to introduce the tool. On top of that | you're looking at using something that is mature and well- | used, Lucene/Solr. | | If you wanted to be cheeky you could go ask your CTO if they | would foot the bill for someone like 2nd Quadrant to write | CJK full-text search into pgsql. ;) | | I've done this for some features/issues in random OSS | projects and if you feel someone else has the same problem I | always feel paying money to have it implemented is well worth | it. | [deleted] | simonw wrote: | The hardest part of building any search engine is keeping the | index up-to-date with changes made to the underlying data store. | | It's a solvable problem, but it's always a lot of work to build | and to keep working as the database schema changes in the future. | | This is why I really like PostgreSQL FTS: it's good enough that | for many projects I don't need to use an external search engine | any more - and it's way easier to keep the search index up-to- | date than if the index lives in a separate system. | | I wrote this tutorial on implementing faceted search with | PostgreSQL and Django a while ago: | https://simonwillison.net/2017/Oct/5/django-postgresql-facet... | canadiantim wrote: | Have you ever considered using PGroonga | (https://pgroonga.github.io/) to further extend FTS in | postgres? They've got a drilldown feature that is basically | their implementation of facets. Always wondered what your | thoughts on that might be? | | I've got a django setup with postgres and facets implemented as | you detailed in your blog (thank you for all the amazing info | over the years btw!), now I'm looking to extend it with | PGroonga and potentially considering Hasura to try and further | extend what I can do while still using Postgres as my single | source of truth without need for keeping any other service in | sync. | almog wrote: | > The hardest part of building any search engine is keeping the | index up-to-date with changes made to the underlying data | store. | | That makes sense and I'm curious to know if you're referring to | the challenges that arise from attempting to implement a | distributed heterogeneous transaction that includes system of | record (say RDBMS) and derived data (search engine), or did you | think about something that's more specific to search engines? | soperj wrote: | How hard is this on the database? | ravirajx7 wrote: | I am feeling real bad now as I used the like query as OP did | (although in MySQL) for implementing a custom search over two | fields ( Name & Description ) in one of my personal projects. I | thought it worked pretty well though and I was so happy about it. | giovannibonetti wrote: | Don't confuse full text search with word part search. If you | have a list of names, you're probably looking for the latter | one, where LIKE works fine and can be indexed with pg_trgrm. | phemartin wrote: | mongodb has its own full-text implementation with Atlas Search. | it has full Lucene capability. it's worth looking into. | | https://www.mongodb.com/docs/atlas/atlas-search/ | cpursley wrote: | If you're already running postgres, why would you set up mongo | for search instead of elastic, etc? | qohen wrote: | FYI, this crunchydata.com blog-post was discussed 11 months ago | on HN[0], in a thread awarded 601 points, with 137 comments. | | [0] https://news.ycombinator.com/item?id=27973497 | qohen wrote: | Among other things in that earlier thread, there's some | discussion of aspects of ZomboDB with its developer. | | _ZomboDB is a Postgres extension that enables efficient full- | text searching via the use of indexes backed by Elasticsearch._ | (From the project 's website[0]). | | Lots more info in readme at the project's github[1]. | | [0] http://zombodb.com | | [1] https://github.com/zombodb/zombodb | puika wrote: | Mandatory mention of the RUM extension | (https://github.com/postgrespro/rum) if this caught your eye. | Lots of tutorials and conference presentations out there | showcasing the advantages in terms of ranking, timestamps... | adhoc_slime wrote: | Very convenient for me that this article swings around, I'm | weighing pros and cons of PGSQL and elastic search for full-text | search for the business's use-case. | avereveard wrote: | Also consider full ngram search, so you can do client side | stemming and use a single index for all languages. It's not out | of the box, but installing the ngram extension is literally one | line. ___________________________________________________________________ (page generated 2022-07-11 23:00 UTC)