[HN Gopher] Show HN: Tuchu - Automatically highlight the importa... ___________________________________________________________________ Show HN: Tuchu - Automatically highlight the important parts of a document Author : heresjohnny Score : 42 points Date : 2020-12-28 11:47 UTC (11 hours ago) (HTM) web link (tuchu.app) (TXT) w3m dump (tuchu.app) | bzb6 wrote: | I would name it "tuchus" and fill the web page with provocative | imagerie | dang wrote: | Please don't do this here. | | Edit: please really don't keep doing this here. | monkeydust wrote: | You need an example pdf on the page. Cycle through a few topical | examples e.g brexit deal full text (it's over 1000 pages!) | | https://ec.europa.eu/transparency/regdoc/rep/1/2020/EN/COM-2... | toufique wrote: | Love this! Would be amazing as a Chrome Extension for web pages. | theaussiestew wrote: | Looks useful. What framework did you use to implement TextRank? | 0xffff2 wrote: | Is Firefox not supported? Tried submitting a paper and it says | "File is not a PDF, too large, or corrupt" in Firefox. Seems to | work fine in Chrome. | | In any case, the paper I submitted is one I coauthored, so I like | to think I'm a reasonably good judge of what's important. Maybe | the tool just isn't a good fit for my field or my writing, but | the highlights appear to be essentially random. | heresjohnny wrote: | Hey HN, | | I created Tuchu because I wanted to increase my reading | efficiency. It is a tool that automatically highlights the | important parts of any document. Most documents take about two to | three seconds to process. It's directed at students, researchers, | or anyone with a reading list and little time for that matter. | | During my studies I had to go through a lot of literature, for | example when I had to select relevant material for my thesis, or | when I had to familiarize myself with a course's reading list. | Tuchu helped me to get up to speed in these cases. What started | off as a command-line Python script is now a web application that | does its analysis without any back-end. I don't get to see your | documents. | | The underlying algorithm that selects what's relevant is called | TextRank, an unsupervised summarization method [1]. It models a | document (or a collection thereof) as a fully connected graph. | Its nodes are parts of the text -- I use sentences -- and the | edges between them are weighted by a similarity measure, in my | case simple word overlap. The subset of sentences with the | highest PageRank are then highlighted. For good measure, I also | highlight sentences that contain signal words that -- in my | academic experience -- signify importance. | | It's important to note that Tuchu is not a substitute for doing | your own reading. It could make you a faster reader by directing | your attention to the important parts, but you'll still have to | ponder about the true essence of a document yourself. | | [1] https://www.aclweb.org/anthology/W04-3252/ | karthikb wrote: | How did Tuchu's highlighting compared to the abstract? | donclark wrote: | I dont have a PDF laying around that I could test it with. It | would be nice if you had an example link to a PDF, or | screenshots showing an example of the highlighting that the | service does. | yorwba wrote: | The background image with the stylized images of highlighted | documents is an ideal candidate for replacement with a | screenshot of actual highlighted documents. | krat0sprakhar wrote: | Looks interesting! Is there a way to try it out on a webpage | instead of PDF? | [deleted] | quaintdev wrote: | This kind of service would be amazing with a browser | extension. | | Nowadays a lot of bloggers/reporters keep running round in | circles before coming to a point. I actually wrote a post | related about this a while back [0]. With browser extension | this can save lot of time of readers. Heck I would pay for | such a service! | | [0]: https://www.ankshilp.com/stop_beating_around_the_bush/ | donclark wrote: | I agree as well. This may have a 2nd life as a browser | extension to use on articles on the web. | sologuardsman2 wrote: | +1 to the browser extension idea | rdhyee wrote: | Very interesting. I'd love a way to download a highlighted | version of the PDF that I fed to Tuchu. | | I uploaded a PDF version of a Wikipedia article to see what was | selected, and at a quick glance, it's not obvious to me that | the most important parts of the article have been highlighted. | On the other hand, it's not obvious that trivial parts have | been selected either -- leaving me intrigued to look further. | aquajet wrote: | I made a similar site to Tuchu a while ago that attaches the | highlights to the pdf: https://anishthite.github.io/ailight/. | It's a bit slow though, I've been trying to get it to run | faster. | sologuardsman2 wrote: | Like the idea! Tested on a couple of random research papers with | mixed but decent results. Really look forward to leveraging this | sort of tool as it improves. ___________________________________________________________________ (page generated 2020-12-28 23:00 UTC)