[HN Gopher] PDF ChatBot - Upload, chat and interact with any PDF... ___________________________________________________________________ PDF ChatBot - Upload, chat and interact with any PDF document Author : armcat Score : 45 points Date : 2023-04-03 20:01 UTC (2 hours ago) (HTM) web link (askyourpdf.com) (TXT) w3m dump (askyourpdf.com) | da4id wrote: | Just wanted to share my project too. I made a tutorial on how you | can build the same thing here: | | https://docs.dopplerai.com/quick-start | isuckatcoding wrote: | I've tried two similar services and for some reason both of them | cap at 200pages. What's your limit? | carlgreene wrote: | Looks like 200 pages :(. | _pdp_ wrote: | Try unlimited pages at chatbotkit.com - there is 1,000,000 | cap on the tokens though. Still more than all other services. | wubbert wrote: | [dead] | agumonkey wrote: | Kinda like have the author at your fingertips | rockzom wrote: | (((:::))) | jrpt wrote: | Nice, I made one too: | | https://docalysis.com/ | | My take on this space is that it'll eventually be built into the | operating system or PDF viewers, so you're going to have to do | more than just "chat with a PDF" -- but that chatting with PDFs | is a great place to get started! | bodge5000 wrote: | I had an idea to do something like this combined with something | like Zeal or DevDocs so you can have a kind of chatGPT localised | just to your specific language or framework. But I guess this | does just that job, but in a far more general way | Yackson0031 wrote: | Yea AskYourPdf is also multilingual. | dgco wrote: | Just install Edge, sidebar Bing Chat with your PDF opened, ask | your questions. You're welcome. | https://twitter.com/sergeykarayev/status/1640764492018765824 | lxe wrote: | I'm guessing langchain / llamaindex + openai API? | iKlsR wrote: | PDF chatbots are the new equivalent of todo apps... | https://custombot.ai/. No doubt cool but loses the lustre after | you've seen the umpteenth one that passes on the token cost to | you and still hallucinates. | rvz wrote: | Almost all of them do the exact same thing and it is completely | saturated with these websites looking very similar akin to a | copy and paste job. | | There is nothing new or unique about any of them other than a | new AI snake-oil to push their new grift on to users uploading | sensitive PDFs to 'chat' with their document as 'the future'. | | Another race to the bottom until Microsoft Word or Google Docs | releases the exact same thing for free and unlimited tokens. | cloudking wrote: | Has anyone ever had a problem they needed to solve by asking | their PDF a question? | kolinko wrote: | Terms of service for various companies and other long and | boring documents. | | My real estate agent wanted me to sign up a document that is | 10 pages long. I would prefer to use the bot to answer my | questions, and possibly - verify with other legal things. | | Tried the document with the service (after removing personal | info), and it worked so-so. Could specify which paragraphs | mention the commission, but couldn't extract info about how | high the commission is. | | Perhaps it's because the document is in Polish. But GPT-3.5 | or 4 shouldn't have a problem with such queries. | taf2 wrote: | PDFs sure are annoying when you want to quickly jump through | the docs... but really I wonder how this will do once gpt4 | api supports images ... maybe then it can help me understand | electronic data sheets... cause I'm still trying to figure | out was pin 0 the sdl or sda pin... and was vcc 3.3 or 1.8 | volts... | yakubin wrote: | ,,What do I need to learn to understand you?" | bowmessage wrote: | Could be useful for a PDF textbook, perhaps? | Lapsa wrote: | why would you want to chat and interact with a PDF document? | sporkl wrote: | Doesn't seem to work well for music scores :( | | Tangentially, I haven't been able to find any software which has | reliable OCR for music scores; they tend to be just bad enough as | to be useless. Was curious if any recent AI developments could be | applied to this, but don't have the expertise to look into this | myself. If anyone has any thoughts or wants to look into this, | please feel free to email me! (link to my website in profile, | which has my email) | thinkmassive wrote: | Have you tried Audiveris? | | https://audiveris.github.io/audiveris/_pages/handbook/ | | I haven't tried it but my first thought was to use something | like tesseract OCR, and I found this optical music recognition | (OMR) project from there. | anonymouse008 wrote: | How does this work? Do you first scrape the PDF or do you have | gpt4 multimodal access? The privacy policy link is broken at the | moment so I can't tell for sure | jrpt wrote: | I can't answer for theirs but I made one too: | | https://docalysis.com/ | | The way it works is you first parse the PDF to analyze its | text, then use a LLM along with the relevant text when | answering user questions. | laen wrote: | Can you elaborate on how you parse the PDF? Are you simply | converting it to text using a python library or something | more robust like GROBID[1]? | | 1: https://github.com/kermitt2/grobid | Yackson0031 wrote: | You just upload your pdf doc directly or via a url and you are | good to go. | jimjimjim wrote: | Interesting, but how many people are going to upload things they | really shouldn't? | | ... | | You retain ownership of any PDF documents you upload to | AskYourPdf. By uploading PDF documents to AskYourPdf, you grant | AskYourPdf a non-exclusive, worldwide, royalty-free license to | use, modify, reproduce, and distribute the PDF documents for the | purpose of providing the AskYourPdf web application | | ... | aaronharnly wrote: | Yikes. | Spivak wrote: | That is the legal jargon required for them to ingest, index, | and display the PDF you upload back to you. | ke88y wrote: | It is legal jargon that gives them the right to do that, | but it also gives them a lot of other rights. If they only | wanted to display the PDF back to you, they could affect | that meaning very easily. | meltedcapacitor wrote: | bad actors are not gonna be stopped by their own "legal | jargon"... the terms look like copy pasta or AI generated | themselves. can't imagine the operators spent much time | reading them. | | though maybe true bad actors would try harder to pretend | being a company with some humans involved, rather than | this openly anonymous site. | Bjartr wrote: | > they could affect that meaning very easily. | | Not using phrasing thst has already been tested in court | is easy, but fraught. If someone sues you because of a | reasonable thing you did to display a document and you | have this phrasing. It's open and shut because someone | else has already litigated it and so there's legal | precedent. If you use different phrasing and someone sues | you, there's a greater chance you'll have an actual drawn | out court case to convince a judge that your phrasing | means what you wanted it to mean. Remember, the meaning | of words and phrases in a legal context can differ almost | arbitrarily from what they mean in a conversational one. | | As a business owner that just wants to get on and provide | a service that displays a pdf you got sent, which do you | go with, the one that lets your resources go to providing | the service you intend to provide, or the one where | there's a greater chance your resources will get tied up | in a legal battle for the sake of making the terms almost | no-one reads anyway a little nicer? | tiedieconderoga wrote: | Since you mention it, I have heard several of my friends and | colleagues saying that they think ChatGPT could be their | lawyer, doctor, and tax prep advisor if only they could send it | documents for review. | xkcd-sucks wrote: | > purpose of providing the AskYourPdf web application | | For the purpose of funding it as a free service by selling | upload content or derived metrics :) ___________________________________________________________________ (page generated 2023-04-03 23:00 UTC)