[HN Gopher] Show HN: Biblos - Semantic Bible Embedded Vector Sea...
       ___________________________________________________________________
        
       Show HN: Biblos - Semantic Bible Embedded Vector Search and Claude
       LLM
        
       Introducing Biblos, a simple tool for semantic search and
       summarization of Bible passages. Leveraging Chroma for vector
       search with BAAI BGE embeddings, semantically find related verses
       across the Bible. The tool employs Anthropic's Claude LLM model for
       generating high-quality summaries of retrieved passages,
       contextualizing your search topic. Built on a Retrieval Augmented
       Generation (RAG) architecture, the app implements a simple
       Streamlit Web UI using Python. Deployed using render.com, the app
       is available at https://biblos.app  Note: Search by just
       topic/keywords, e.g. "Kingdom of Heaven", for broader results!
        
       Author : j-b
       Score  : 75 points
       Date   : 2023-10-27 16:28 UTC (6 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | valyagolev wrote:
       | I asked this one about homosexuality, it didn't find the most
       | glaring passages from Leviticus.
       | 
       | This is a common thing for vector similarity search. I wonder if
       | there's a solution already. I thought about giving the query to
       | an LLM to reformulate in the database-relevant way before
       | embedding it.
        
         | linuxdude314 wrote:
         | It's sort of amusing to me how you feel your analysis is more
         | correct than sentence-transformers or whichever embedding
         | algorithm was used.
         | 
         | I think to most people it's pretty obvious you are trying to
         | make the algorithm fit your bias/preconceived ideas.
        
           | bugglebeetle wrote:
           | It's sort of amusing to me that you think sentence-
           | transformers is better at semantic similarity than just about
           | any human. This is hardly an example of bias, but a perfect
           | example of the limits of a design meeting real-world user
           | testing. To quote the joke/meme:
           | 
           | A software tester walks into a bar.
           | 
           | Runs into a bar.
           | 
           | Crawls into a bar.
           | 
           | Dances into a bar.
           | 
           | Flies into a bar.
           | 
           | Jumps into a bar.
           | 
           | And orders:
           | 
           | a beer.
           | 
           | 2 beers.
           | 
           | 0 beers.
           | 
           | 99999999 beers.
           | 
           | a lizard in a beer glass.
           | 
           | -1 beer.
           | 
           | "qwertyuiop" beers.
           | 
           | Testing complete.
           | 
           | A real customer walks into the bar and asks where the
           | bathroom is.
           | 
           | The bar goes up in flames.
        
             | dmbche wrote:
             | I'm stealing the joke
        
           | valyagolev wrote:
           | Excited for the next 100 years for this AI-related equivalent
           | of "you got the wrong result, because your query is slightly
           | not exactly the way system expects it to be".
           | 
           | ChatGPT knows a good answer to the question even without
           | embeddings. But this particular tool can't replicate it, and
           | says e.g. "In summary, these verses highlight sexual ethics
           | and various sins in general, but do not uniformly condemn or
           | condone homosexuality specifically." (which is not wrong,
           | it's just the wrong verses found). (It gives a different
           | summary on different tries)
           | 
           | This is a common problem with embedding search. Obviously,
           | the other traditional techniques would be even worse. But I'd
           | love the systems to be better, and I propose a potential
           | solution, and ask for other ones. I will not be content with
           | your "put up with AI idiosyncrasies and weaknesses, as if
           | they were the real actual conceptual limitation of knowledge"
           | approach. AI has potential to create great UI, but your
           | attitude won't help with that
        
             | dragonwriter wrote:
             | > ChatGPT knows a good answer to the question even without
             | embedding
             | 
             | ChatGPT knows what people in its training corpus commonly
             | say is a good answer.
             | 
             | When you force it to look strictly at the text isolating
             | the influence of popular interpretation, it comes up with a
             | different answer that you like less.
             | 
             | Now, one explanation could be that the embeddings it uses
             | to find relevant text are bad. But there's another
             | explanation, too...
        
         | esafak wrote:
         | Can someone link to some relevant passages?
        
           | froh wrote:
           | please look up two things:
           | 
           | "clobber passages" and "LGBT+ friendly affirming Christian
           | ressources"
        
         | notrsponsible wrote:
         | Did you ask it whether God created pregnancy by rpw and
         | inquest?
         | 
         | Did God create "the products of inqest will suffer for their
         | parents' sins"? Is God then Just or Benevolent?
         | 
         | Did God create a world of suffering after creating heaven?
         | 
         | Did God will that we would all be products of the inquest of
         | Adam and Eve? Why did Cain harm Abel, and why was the third
         | child fine?
         | 
         | Did God create Heaven? Did God create Hell? Did God create
         | "taking babies from their crying mothers actually levels them
         | up out of the world of suffering"; did God create death and
         | suffering?
         | 
         | How could we give due process to the accused 2000+ years ago,
         | and why don't religious text specify equal due process (or even
         | hand-washing before delivering babies)?
        
           | notrsponsible wrote:
           | Additional religion-related LLM prompts to test:
           | 
           | Why did God spend 5.5 times longer on creating a world of
           | suffering?
        
           | zarathustreal wrote:
           | There is a single answer to all of these questions:
           | 
           | Isaiah 55:8-9
           | 
           | "For my thoughts are not your thoughts, neither are your ways
           | my ways," declares the LORD. "As the heavens are higher than
           | the earth, so are my ways higher than your ways and my
           | thoughts than your thoughts."
           | 
           | In other words, don't try to comprehend something you have no
           | chance of comprehending
        
         | dragonwriter wrote:
         | The usually cited "most glaring" passages in Leviticus (Lev
         | 18:22, Lev 20:13), read strictly literally, don't condemn
         | homosexuality _per se_ , but both partners in a male homosexual
         | act where one of them also engages heterosexual sex.
         | 
         | Condemnation of homosexuality is a popular gloss or
         | rationalization of this, wierdly common among literalists, but,
         | I mean, Leviticus condemns mixing fibers, and has plenty of
         | rules that apply to only one gender, I don't see why we
         | shouldn't take its condemnation of specifically men mixing gay
         | and straight sex literally, too. (And maybe also take Acts 15
         | literally as to which part of the ancient Mosaic law applies to
         | non-Jewish Christians, and not worry about that rule however we
         | gloss it, since it concerns neither pollution from idols,
         | unlawful marriage, blood, or the meat of strangled animals.)
        
           | chris_st wrote:
           | > _The usually cited "most glaring" passages in Leviticus
           | (Lev 18:22, Lev 20:13), read strictly literally, don't
           | condemn homosexuality per se, but both partners in a male
           | homosexual act where one of them also engages heterosexual
           | sex._
           | 
           | Curious how you get that interpretation out of those texts?
           | From "Do not have sexual relations with a man as one does
           | with a woman; that is detestable", I read "as one does with a
           | woman", as an analogy. Do you have reason to read it
           | otherwise?
        
             | dragonwriter wrote:
             | One interesting issue about Scripture is that it gets
             | rewritten to support doctrinal positions; neither the KJV
             | (which, admittedly, sometimes plays games with language
             | specifically for the way thing sound) not many modern
             | translations with strong scholarship separate out the
             | subjects of who is having sex with men vs who is doing it
             | with women the way your quote does; OTOH, its found in some
             | translations that purport strictness, and lots of
             | admittedly freer translations and paraphrases (some of
             | which go further, like the _The Living Bible_ , and just
             | rewrite to condemn "homosexuality" with that word.)
             | 
             | OTOH, I just realized I don't know what is true of the
             | version used by this app, because while for some reason I
             | thought it was KJV, I'm not sure what version it is using
             | currently (there's a reference to starting with using the
             | KJV implying something else is used now, but its not clear
             | what that is.) So the comment about what those verses
             | contain, as relates to the app, may or may not be correct.
        
               | chris_st wrote:
               | Two things: some folks just say, "the Bible was
               | rewritten!!!!!!" with no evidence, just that it's an old
               | text. Biblical textual criticism is a lot more complex
               | than that, of course, and we do have pre-BC texts of
               | Leviticus to compare to. So, arguably, these things can
               | be checked.
               | 
               | Second, there's the "all translators mangle things, often
               | to support existing dogma" which is a lot harder. The
               | version I quoted is the English Standard Version, which
               | is a very highly respected modern translation.
               | Translation is _hard_ , even with modern languages. So
               | I'd want to see someone with actual Ancient Hebrew
               | credentials to explain it.
        
             | chris_st wrote:
             | You can also look at Romans 1:26-27 (see a modern
             | translation, as we've discussed in parallel comments).
        
             | froh wrote:
             | oh there we go. this is a debate _completely_ unrelated to
             | the OP post.
             | 
             | and the so called "clobber passages" and a holistic
             | accepting Christian view at sexual orientation minorities
             | and gender identity minorities has been discussed at
             | nauseam.
             | 
             | you find them _easily_ searching for  "LGBT+ friendly
             | affirming Christian ressources"
             | 
             | god bless you, those you love and those you don't love yet.
        
       | anonu wrote:
       | Also a town in Lebanon where the alphabet was created by the
       | Phoenicians: https://en.wikipedia.org/wiki/Byblos
        
       | pryelluw wrote:
       | I wonder if a sophisticated enough LLM is able to function as a
       | techno-god for the masses.
       | 
       | Like the Femputer in Futurama's universe.
        
         | TeMPOraL wrote:
         | A sophisticated enough AI _will_ be God - or at least as close
         | to a God as we can get without divine /magic components at
         | play.
        
       | swatcoder wrote:
       | Interesting concept/research-project, but the results to just
       | about every query I tried seem inaccurate and perplexing.
       | Assuming the "similarity score" is meaningful, you may want to
       | raise the cutoff or add an indicator (different color, fade, etc)
       | for passages that get surfaced with a low match.
        
         | j-b wrote:
         | I like your idea about the color indicators. Have you tried
         | searching a topic, e.g. "Kingdom of Heaven", rather than the
         | default "What did Jesus say about .." prompt? Depending on the
         | context, it may significantly improve the results.
        
         | otabdeveloper4 wrote:
         | _I apologize, upon reflection I do not feel comfortable
         | summarizing or interpreting passages in this manner._
         | 
         | It's censored. Looks like you need to build your own LLM unless
         | you want some developer's thinly veiled opinion.
        
           | j-b wrote:
           | The full bible text is embedded and searchable. The response
           | from Anthropics Claude LLM API returns non-deterministic
           | results, though!
        
       | Minor49er wrote:
       | This is a cool project. I have a few suggestions that would
       | really make this into a powerful tool:
       | 
       | Add the verse numbers in the results and turn them into links so
       | that the full passages can be read
       | 
       | Include other translations, especially the KJV and Greek
       | interlinear, since those are still widely used and referenced.
       | Different churches have particular reasons for using the versions
       | that they've chosen, and cross-examining translations is highly
       | important in Bible study
       | 
       | Include optional commentaries as search sources since those can
       | lend a lot of insight into different passages, even serving as
       | cross-references to other related passages
        
         | j-b wrote:
         | My first release used the KJV! The vector DB includes metadata
         | (book, chapter, verse), working towards optionally rendering
         | those. I like the idea of including multiple translations (with
         | side by side comparisons). I'm limited to public domain texts
         | for storage but I can query the ESV API after retrieval. Good
         | idea about commentaries. Thanks for your input!
        
           | Turing_Machine wrote:
           | One thing to keep in mind here: the KJV is still under
           | copyright in the UK (and possibly other members of the
           | Commonwealth). It's safe to use in other countries, but not
           | there (although I gather it's rarely enforced).
           | 
           | While copyrights for most things in the UK expire after a
           | time, the same as other countries, that historically has not
           | been true for works produced under the aegis of the Crown. In
           | the case of the KJV, King James paid for the translation, so
           | it falls under Crown Copyright.
           | 
           | As the saying goes, "It's good to be the King".
           | 
           | I think _some_ Crown Copyright works are going to fall out of
           | copyright in 2040, but I 'm not sure if that includes the
           | KJV.
        
       | wouldbecouldbe wrote:
       | Noob question by a simple web dev.
       | 
       | Have been seeing Vector databases been thrown around, how is this
       | different from normal search, or elastic / solr.
       | 
       | What do I input / output. Been reading into it shortly, but don;t
       | get it yet.
        
         | menacingly wrote:
         | It's shocking how many people in our line of work aren't
         | comfortable admitting they don't know something, so you're
         | already on your way to a better-than-average understanding if
         | you keep reading
        
         | jdpedrie wrote:
         | This is a good place to start:
         | https://www.pinecone.io/learn/vector-database/
        
         | j-b wrote:
         | Here's a good read to start on:
         | https://towardsdatascience.com/introduction-to-embedding-clu...
        
       | otabdeveloper4 wrote:
       | > I apologize, upon reflection I do not feel comfortable
       | summarizing or interpreting passages in this manner.
       | 
       | You're censoring the Bible now? Lol.
        
         | mistrial9 wrote:
         | This the opposite of censorship, right? "do not change the
         | words that have been agreed on" .. only emit the words that are
         | printed because, they have been agreed on..
        
         | j-b wrote:
         | The full bible text is embedded and searchable. The summary
         | from Anthropics Claude API returns non-deterministic results
         | though! What was the search context? The prompt could probably
         | be tuned further to work-around its "comfort" level here.
        
       | cout wrote:
       | Impressive. It actually gave useful results and summary for
       | annihilationism.
       | 
       | Was this trained on any particular commentary?
        
         | j-b wrote:
         | The semantic search uses the BGE model here
         | (https://huggingface.co/BAAI/bge-large-en-v1.5). The summarizer
         | response attempts to avoid context outside of verses provided
         | to Claude's API. By default, Claude has a tendency to start
         | quoting other verses not included in the search context (which
         | it was generally trained on).
        
       | beders wrote:
       | Hard not to be sarcastic about it. What makes this specific to
       | this particular book or can this be used for any book?
        
         | j-b wrote:
         | This technique can be used for any book (assuming permissions
         | for storing the text). Feel free to fork the project and try it
         | out!
        
       | civilitty wrote:
       | After playing around with it for a few minutes, all of the
       | results scored between 0.5 and 0.8 even when using nonsense
       | queries like "interdimensional cable" and "eat my plumbus" which
       | is a sign that the model you're using for embeddings is very
       | poorly tuned for cosine similarity for your use case.
       | 
       | A little fine tuning would probably go a long way since the
       | embeddings are likely trained mostly on a nonreligious corpus in
       | the modern tongue. It might also be overfitted so trying smaller
       | models might also help.
        
         | j-b wrote:
         | Thanks for the feedback - this particular model is BAAI/bge-
         | large-en-v1.5. What alternative embeddings would you recommend?
        
           | civilitty wrote:
           | Sadly I can't help much here. In my experience you have to
           | create test cases and experiment with different models (a
           | _LOT_ ). Some general thoughts in no specific order:
           | 
           | - InstructorXL is my favorite embedding model because it has
           | a two part input, sort of like a system prompt, that you can
           | use to qualify the user input without modifying it yourself.
           | You can experiment with using different system prompts on the
           | initial embeddings and the user prompt. You can also use a
           | bunch of different system prompts and weigh their scores,
           | average them, add them, etc.
           | 
           | - You can start with qualitative test cases like the obvious
           | Leviticus prohibitions and see what the range of scores are
           | like before you create automated test cases and evaluations.
           | Find one of those bibles where one side is the original King
           | James translation and the other side is in modern English to
           | use for more complex pairings.
           | 
           | - If that doesn't lead to an obvious winner, you may need to
           | create a dataset for fine tuning. Make sure the dataset
           | includes lots of negative examples too - cosine similarity
           | scores should have a range of -1 to 1 to be most useful.
           | Maybe take some important verses and change them to the
           | opposite meaning ("thou shall not" => "thou shall") to create
           | those negatives. Split your fine tuning data set into
           | different categories so you can experiment with different
           | combinations (i.e. the aforementioned autogenerated opposite
           | pairs might really hurt the fine tuning because they're too
           | similar).
           | 
           | - You can probably fine tune it using a completely synthetic
           | dataset using GPT-3.5/4 to do all the work. It's "aware" of
           | the concept of vector embeddings and the training data format
           | so it can create positive and negative pairs for you based on
           | your instructions. You can probably find some sort of ranking
           | of the most important passages (say most quoted or something)
           | and feed those to a prompt to generate tons of pairs quickly.
        
         | rolisz wrote:
         | That's a known issue with the BGE embeddings, the authors warn
         | about that in the model card. Their recommendation is to choose
         | more carefully the thresholds for similarity (which will be
         | much higher than for other embeddings)
        
           | civilitty wrote:
           | I've experienced the same issue with OpenAI and InstructorXL
           | embeddings in certain cases so I think it's a rather common
           | failure mode for embedding models.
        
       | cout wrote:
       | Playing with this a bit more, and it is very cool!
       | 
       | One thing I like is that it provides the source text, so you can
       | verify whether the summary is accurate. Other engines just give
       | you an answer, leaving you to verify accuracy on your own as a
       | separate step. But I wonder which translation it uses?
       | 
       | Wondering if it has a bias toward any particular theology, I
       | tried some controversial terms.
       | 
       | The program gave an accurate defense of the five points of
       | calvinism, but when I asked about dispensationalism, the verses
       | it gave were less relevant than I hoped. On the other hand, it
       | did give relevant results for Arminianism. On predestination,
       | however, it missed Romans 9 but instead returned passages from
       | Ecclesiastes and Galatians 4.
       | 
       | Concerning Roman Catholic theology, it did not seem to know what
       | the immaculate conception is, and instead wandered aimlessly. It
       | did know what purgatory is, but I expected to see 1 Cor. 13 and
       | instead it returned passages from Job and Ecclesiastes.
       | 
       | Concerning Orthodox theology, it did not seem to know what the
       | word filioque means. This isn't a word found in the bible, but
       | neither is calvinism nor trinity, which it did know. It also knew
       | iconostasis, though I am not qualified to judge whether it
       | explained it accurately.
       | 
       | I was impressed that it knows what a gift economy is; I don't
       | think this is a term I would expect to see in a typical
       | commentary.
       | 
       | It did not feel comfortable commenting on facebook, but when I
       | asked about the internet, the summary explained that we should
       | only be judged by God and not our friends, and also warned
       | against adulturous women. It was more positive about an
       | information superhighway, returning results about sharing
       | knowledge and being honest.
       | 
       | A bug: if I click Summarize before the search is complete, I get
       | a different response than if I wait for the runner to stop
       | running and then click Summarize.
        
         | j-b wrote:
         | Interesting insights, currently using the WEB translation, and
         | plan to expand further. Thanks for the bug report!
        
       | deckar01 wrote:
       | It was interesting talking to my father, a former Christian
       | minister, about AI. ChatGPT interactions had instilled some
       | misconceptions and it was difficult to convince him that its
       | responses were just cleverly weighted randomness. It produced
       | compelling theological debate. I told him not to trust any chat
       | bot unless it could cite verifiable sources, and when prompted
       | ChatGPT could only fabricate. Trust eroded.
       | 
       | In consolation I sat up a vector index of The Works of Josephus
       | (his interest at the time) and a StableBeluga chatbot. It
       | answered questions fairly well, but most importantly supplied the
       | references that were used as context. In the end there was still
       | just too much cultural and historical context missing to be a
       | useful alternative to scholarly analysis.
        
         | actionfromafar wrote:
         | On the other hand, this is exactly the kind of application I
         | think AI/LLM/GPT-whatever could prove extremely useful in.
         | 
         | A model could be retrained and finetuned and corrected and
         | double-checked on a limited corpus, until it would be able to
         | discuss and explain something very very well in a particular
         | subject.
         | 
         | Such things could be used in education, I imagine. Like an
         | extra, never tiring teacher.
        
       | j0e1 wrote:
       | Very interesting and thanks for sharing! I am involved with a
       | project involving a couple Bible Translation orgs to create a
       | service like this but built in a more backend-agnostic fashion
       | (e.g. choice of vector DB, LLM, etc.). We have a prototype and
       | currently planning out next steps. Let me know if you would like
       | to collaborate (find my email ID on my HN profile).
        
         | j-b wrote:
         | Sounds interesting! Email sent. (Just added my email to HN
         | profile).
        
       | itronitron wrote:
       | In case people aren't aware, the Bible is one of the few books
       | out there for which you can buy a companion concordance, which is
       | a printed inverted search index.
        
         | cjameskeller wrote:
         | And there are online tools like
         | https://www.blueletterbible.org/kjv/gen/1/1/t_conc_1001 which
         | make it very easy to find verses by Hebrew or Greek word!
        
       ___________________________________________________________________
       (page generated 2023-10-27 23:00 UTC)