[HN Gopher] Ask HN: Books about full text search? ___________________________________________________________________ Ask HN: Books about full text search? I would love to learn more about FTS at a very low level and I'm looking for books to read more on that topic. Any good suggestions ? Author : sopromo Score : 86 points Date : 2022-11-24 17:58 UTC (5 hours ago) | [deleted] | pixelmonkey wrote: | Take a look at my post "Lucene: The Good Parts"-- | | https://blog.parse.ly/lucene/ | | The book mentioned there is Lucene in Action. | | And then this YouTube presentation by a Lucene/Elasticsearch | committer will give you a nice overview of some related | algorithms-- | | https://youtu.be/eQ-rXP-D80U | DamonHD wrote: | Managing Gigabytes | | https://books.google.co.uk/books/about/Managing_Gigabytes.ht... | | Old but good! | CoolestBeans wrote: | Came here to recommend Managing Gigabytes as well. People these | days are managing far more than gigabytes but the fundamental | ideas remain useful. | 100k wrote: | At a general audience level, "Index" is on my list to read. It | covers the invention of the index up to digital search engines. | https://www.nytimes.com/2022/02/09/books/review-index-histor... | | "Introduction to Information Retrieval" is a textbook which is | available online https://nlp.stanford.edu/IR-book/ Here's a | review: http://glinden.blogspot.com/2009/02/book-review- | introduction... | | Another textbook which IMHO is a bit lower level is "Information | Retrieval: Implementing and Evaluating Search Engines". The book | website is down for me right now, but you can find it on Amazon | here: https://www.amazon.com/Information-Retrieval-Implementing- | Ev... | | Another commenter linked to "Relevant Search", which is great if | you want to learn how to effectively use a search engine to | improve relevance (as opposed to how to implement a search | engine). It's old, but another book in that vein that was really | helpful for me earlier in my career is Lucene in Action: | https://www.amazon.com/Lucene-Action-Second-Covers-Apache/dp... | tgv wrote: | Check the literature of open courses on Text Retrieval. E.g. | https://stanford.edu/class/cs276/ | binarymax wrote: | "Relevant search" by Doug Turnbull and John Berryman, published | by Manning, is THE best book to get started with tuning search | engines. | | I'be been a search engineer for >10 years and this is always the | first book I recommend. | | https://www.manning.com/books/relevant-search | softwaredoug wrote: | Awe thanks Max <3 | francoisprunier wrote: | Not a book, but this paper from 2019 covers a lot of ground and | reviews the different topics extensively: | https://tonellotto.github.io/publication/fntir/fntir_main.pd... | fiedzia wrote: | https://www.manning.com/books/relevant-search | | Also "taming text" | arooaroo wrote: | Manning also have a book on Lucene, the library that powers | Solr and ElasticSearch. IIRC the book covered how Lucene | actually works under-the-good and would therefore act as a good | reference on the subject in general. | gardenfelder wrote: | Taming Text is about building a question-answering system; it | came out about the time Watson came online; it's not a plan, | rather a cookbook of experiments using Apache products like | Solr and OpenNLP, but is a great tutorial on how question | answering works. | vdfs wrote: | Lucene in Action, good introduction to Lucene, which can be | helpful to learn ElasticSearch (most used FTS these days) | _tom_ wrote: | Lucene in Action covers Lucene 3.0, and is from 2010. Current | version is 9.4.2. So much has changed. | cb321 wrote: | It's all in the Nim programming language, but if you prefer | reading code or running diffs then you might get a vague sense of | (some) low level nuts & bolts from: | https://github.com/c-blake/nimsearch | unixhero wrote: | Just use Postgres fulltext Search, its good enough | http://rachbelaid.com/postgres-full-text-search-is-good-enou... | ssn wrote: | Three reference textbooks are available openly: | | * Introduction to Information Retrieval, | http://informationretrieval.org/ | | * Information Retrieval in Practice, http://www.search-engines- | book.com/ | | * Entity-Oriented Search, https://eos-book.org/ | | Modern Information Retrieval is also a classic reference. Not | openly available but some contents are (were?) available online. | Their site seems to be down but the Internet Archive has a copy. | | Additional resources here: | | * https://nlp.stanford.edu/IR-book/information-retrieval.html | http://web.archive.org/web/20220708135205/http://grupoweb.up... | brudgers wrote: | Not a book but Hellerstein's CS186 from 2015 starting with | Lecture 17 gave me a basic understanding (I think). | | Playlist | https://youtube.com/playlist?list=PLhMnuBfGeCDPtyC9kUf_hG_Qw... | | Also from that lecture series, the low level is always IO. One | disk read tends to dwarf n^2 in-memory algorithms. | | And IO is all about tuning caches and hardware for the specific | structural relationships in the data, the way in which it is | accessed, and the hardware everything runs on. | | Good luck. ___________________________________________________________________ (page generated 2022-11-24 23:00 UTC)