[HN Gopher] Show HN: Tuchu - Automatically highlight the importa...
       ___________________________________________________________________
        
       Show HN: Tuchu - Automatically highlight the important parts of a
       document
        
       Author : heresjohnny
       Score  : 42 points
       Date   : 2020-12-28 11:47 UTC (11 hours ago)
        
 (HTM) web link (tuchu.app)
 (TXT) w3m dump (tuchu.app)
        
       | bzb6 wrote:
       | I would name it "tuchus" and fill the web page with provocative
       | imagerie
        
         | dang wrote:
         | Please don't do this here.
         | 
         | Edit: please really don't keep doing this here.
        
       | monkeydust wrote:
       | You need an example pdf on the page. Cycle through a few topical
       | examples e.g brexit deal full text (it's over 1000 pages!)
       | 
       | https://ec.europa.eu/transparency/regdoc/rep/1/2020/EN/COM-2...
        
       | toufique wrote:
       | Love this! Would be amazing as a Chrome Extension for web pages.
        
       | theaussiestew wrote:
       | Looks useful. What framework did you use to implement TextRank?
        
       | 0xffff2 wrote:
       | Is Firefox not supported? Tried submitting a paper and it says
       | "File is not a PDF, too large, or corrupt" in Firefox. Seems to
       | work fine in Chrome.
       | 
       | In any case, the paper I submitted is one I coauthored, so I like
       | to think I'm a reasonably good judge of what's important. Maybe
       | the tool just isn't a good fit for my field or my writing, but
       | the highlights appear to be essentially random.
        
       | heresjohnny wrote:
       | Hey HN,
       | 
       | I created Tuchu because I wanted to increase my reading
       | efficiency. It is a tool that automatically highlights the
       | important parts of any document. Most documents take about two to
       | three seconds to process. It's directed at students, researchers,
       | or anyone with a reading list and little time for that matter.
       | 
       | During my studies I had to go through a lot of literature, for
       | example when I had to select relevant material for my thesis, or
       | when I had to familiarize myself with a course's reading list.
       | Tuchu helped me to get up to speed in these cases. What started
       | off as a command-line Python script is now a web application that
       | does its analysis without any back-end. I don't get to see your
       | documents.
       | 
       | The underlying algorithm that selects what's relevant is called
       | TextRank, an unsupervised summarization method [1]. It models a
       | document (or a collection thereof) as a fully connected graph.
       | Its nodes are parts of the text -- I use sentences -- and the
       | edges between them are weighted by a similarity measure, in my
       | case simple word overlap. The subset of sentences with the
       | highest PageRank are then highlighted. For good measure, I also
       | highlight sentences that contain signal words that -- in my
       | academic experience -- signify importance.
       | 
       | It's important to note that Tuchu is not a substitute for doing
       | your own reading. It could make you a faster reader by directing
       | your attention to the important parts, but you'll still have to
       | ponder about the true essence of a document yourself.
       | 
       | [1] https://www.aclweb.org/anthology/W04-3252/
        
         | karthikb wrote:
         | How did Tuchu's highlighting compared to the abstract?
        
         | donclark wrote:
         | I dont have a PDF laying around that I could test it with. It
         | would be nice if you had an example link to a PDF, or
         | screenshots showing an example of the highlighting that the
         | service does.
        
           | yorwba wrote:
           | The background image with the stylized images of highlighted
           | documents is an ideal candidate for replacement with a
           | screenshot of actual highlighted documents.
        
         | krat0sprakhar wrote:
         | Looks interesting! Is there a way to try it out on a webpage
         | instead of PDF?
        
           | [deleted]
        
           | quaintdev wrote:
           | This kind of service would be amazing with a browser
           | extension.
           | 
           | Nowadays a lot of bloggers/reporters keep running round in
           | circles before coming to a point. I actually wrote a post
           | related about this a while back [0]. With browser extension
           | this can save lot of time of readers. Heck I would pay for
           | such a service!
           | 
           | [0]: https://www.ankshilp.com/stop_beating_around_the_bush/
        
           | donclark wrote:
           | I agree as well. This may have a 2nd life as a browser
           | extension to use on articles on the web.
        
           | sologuardsman2 wrote:
           | +1 to the browser extension idea
        
         | rdhyee wrote:
         | Very interesting. I'd love a way to download a highlighted
         | version of the PDF that I fed to Tuchu.
         | 
         | I uploaded a PDF version of a Wikipedia article to see what was
         | selected, and at a quick glance, it's not obvious to me that
         | the most important parts of the article have been highlighted.
         | On the other hand, it's not obvious that trivial parts have
         | been selected either -- leaving me intrigued to look further.
        
           | aquajet wrote:
           | I made a similar site to Tuchu a while ago that attaches the
           | highlights to the pdf: https://anishthite.github.io/ailight/.
           | It's a bit slow though, I've been trying to get it to run
           | faster.
        
       | sologuardsman2 wrote:
       | Like the idea! Tested on a couple of random research papers with
       | mixed but decent results. Really look forward to leveraging this
       | sort of tool as it improves.
        
       ___________________________________________________________________
       (page generated 2020-12-28 23:00 UTC)