[HN Gopher] Show HN: 40k books on HN extracted using deep learning
       ___________________________________________________________________
        
       Show HN: 40k books on HN extracted using deep learning
        
       Author : tracyhenry
       Score  : 534 points
       Date   : 2021-09-20 16:58 UTC (6 hours ago)
        
 (HTM) web link (hacker-recommended-books.vercel.app)
 (TXT) w3m dump (hacker-recommended-books.vercel.app)
        
       | sushisource wrote:
       | Heh, for a minute there I thought you meant Warhammer 40k books
       | specifically, and I thought that was a pretty funny thing to be
       | scraping from HN :)
        
         | bnbond wrote:
         | Same. I'm a little disappointed it's not.
        
         | russellbeattie wrote:
         | I'm pretty sure there's more Warhammer 40k books than there are
         | days in the year... It's like someone heard the term "space
         | opera" and thought that meant "soap opera in space".
         | 
         | Recommendations would include comments like, "This novel is
         | really the one that ties the previous 37 books together." or
         | "You might want to skip the next dozen books if you're
         | squeamish about things that ooze."
        
           | LanceH wrote:
           | While I don't consider the 40k books on par with the better
           | science fiction out there, I do enjoy that they bring a bit
           | of scale and what it means to space. It's a different take
           | from the rosy, post-scarcity, future of space. Bad things are
           | _really_ bad. Unattended good things turn bad on their own
           | just from drift.
           | 
           | Then there is there is the unashamed embrace of over-the-top
           | in so many different ways.
        
       | unmole wrote:
       | Interesting idea but not completely accurate. My own comment
       | about how I hated _Thinking, Fast and Slow_ seems to be counted
       | as a recommendation.
        
         | tracyhenry wrote:
         | Right, the model is not perfect with limited training dataset I
         | have (we hand labeled 4,000 - which is already tons of work for
         | a side project). But the intention was to filter out negative
         | ones.
        
           | jimmySixDOF wrote:
           | You did a stellar job here thanks so much for this addition
           | to the community !
           | 
           | On labeling, if you have a method statement or some go-by
           | referance I am sure you would get some support here - I know
           | I would help ! Maybe package a few blocks of 100 unlabeled
           | comments with a readme & see what happens ?
        
         | sampo wrote:
         | > My own comment about how I hated Thinking, Fast and Slow
         | seems to be counted as a recommendation.
         | 
         | What is the level of sentiment analysis in natural language
         | processing? Would it be easy to add the feature, to recognize
         | whether the book was mentioned in a positive or negative light?
        
         | munk-a wrote:
         | If you want to see some amusing "recommendations" I'd check out
         | The Communist Manifesto by Karl Marx and what comments it's
         | drawn. I think the network trying to find recommendations needs
         | to incorporate more sentiment analysis.
         | 
         | i.e "Guards Guards by Sir Terry Pratchett is a great book" vs.
         | "I've never read anything as slow and uninteresting as The Two
         | Towers by J.R.R. Tolkein" or "I thought Seveneves by Neal
         | Stephenson was good - but it probably should've been two
         | separate books with the second half actually having some meat
         | to it."
        
         | [deleted]
        
         | FranklinMaillot wrote:
         | Lessons: My Path to a Meaningful Life by Gisele Bundchen, the
         | top model, is probably the most out of place recommendation :)
         | None of the comments is about the book obviously, they just
         | mention the word "Lessons".
         | 
         | https://hacker-recommended-books.vercel.app/category/15/all-...
        
         | dang wrote:
         | Yes. I've removed the word "recommendations" from the title
         | because there are too many cases of negative mentions being
         | treated as recommendations.
         | 
         | Not a criticism! Sentiment analysis seems to remain an unsolved
         | problem.
         | 
         | See also
         | 
         | https://news.ycombinator.com/item?id=28598341
         | 
         | https://news.ycombinator.com/item?id=28596882
        
         | therealdrag0 wrote:
         | This was an amusing "extraction":                   > I have
         | not yet read the good book Atlas Shrugged but be sure to check
         | it out based on your recommendation.              You're
         | delusional. Where did I ever recommend reading          Atlas
         | Shrugged? Ayn Rand is nuts.
        
         | jgwil2 wrote:
         | Yeah, I'm seeing some issues with _Code_ by Petzold citing
         | comments that are talking about e.g. _Code Complete_ or just
         | code in general, but with such a generic name (and given the
         | forum) it 's actually pretty impressive to me that most
         | comments are identified correctly.
         | 
         | Edit: another one that is tough is _Open_ by Agassi - seems
         | most of these comments do not actually have anything to do with
         | the book. I would guess most one-word titles will have similar
         | issues.
        
           | tracyhenry wrote:
           | That's correct observation. I'm guessing it has to do with
           | whether the words after _Open_ are indicative enough to the
           | model that they should be brought in together with _Open_. As
           | I said in other comments, with more training data this issue
           | will likely go away. And these tough comments are the best
           | candidates.
        
       | jedwhite wrote:
       | Hey this is really awesome! Well done.
       | 
       | You mentioned transformers and BERT for large NLP models. I've
       | been playing around with this too and it's a really powerful
       | approach. Have you used spacy-transformers? [0]
       | 
       | The approach is pretty cool and can be used with BERT,
       | GPT-2/Hugging Face etc.
       | 
       | I'm just starting to experiment with GPT-J and thinking of trying
       | this approach also [1].
       | 
       | Anyway, totally awesome project and the results are really good.
       | This stuff really is almost unreasonably effective!
       | 
       | [0] https://explosion.ai/blog/spacy-transformers
       | 
       | [1] https://6b.eleuther.ai/
        
         | tracyhenry wrote:
         | Thanks! I used Huggingface's pretrained BERT.
        
           | malshe wrote:
           | This is really impressive! Can you please elaborate more on
           | the way you labeled the data? I think usually there is a lot
           | to learn from labeling methods.
        
           | jedwhite wrote:
           | This is a really good application of it. Getting NER right
           | for something like book titles with so much name collision
           | with other domains and entity types is really hard, and this
           | works great on something that most people would never realize
           | would be so hard!
        
           | sillysaurusx wrote:
           | Please write up how you did this! It may seem easy or
           | straightforward, but I assure you it's black magic to a lot
           | of people.
        
       | maiensch wrote:
       | Love it, will you do a write-up on how to replicate this with
       | other sources? I'm currently analyzing both Indie Hackers and
       | StartupsForTheRestOfUs Interview Transcriptions and this could be
       | a fun analysis!
        
       | alanbernstein wrote:
       | This is great. I just read Permutation City, which I
       | coincidentally see recommended on HN all the time, so I was
       | surprised not to see it in the search results or the top of the
       | fiction or scifi lists. Any idea why that is?
        
         | tracyhenry wrote:
         | That might be that the book database I used is quite limited,
         | sadly.
        
       | Tycho wrote:
       | Sounds good. Blocked by my work firewall though.
       | 
       | A few years ago I found an article that was something like '100
       | short books everyone should read before they're 40'. It was a mix
       | of fiction and non-fiction. I've never been able to find it
       | again! But I really liked the list because these are books you
       | can consume in a few hours and may be life changing.
       | 
       | I remember a few of the titles: Games People Play, Meditations,
       | The Prince, The Art of War. (I suppose it may have been non-
       | fiction only, although I think _The Awakening_ may have been on
       | there.)
       | 
       | Wish I could find the link again.
        
         | ZeroGravitas wrote:
         | If it was Oliver Sacks' Awakenings then it is non-fiction,
         | though it did get turned into a movie.
        
           | Tycho wrote:
           | Different book - Kate Chopin
        
         | [deleted]
        
         | Rd6n6 wrote:
         | I don't understand how software engineers get away with
         | browsing the internet for fun at work when nobody else can
        
           | themodelplumber wrote:
           | Sounds more like personal development than fun?
        
             | sillysaurusx wrote:
             | That's a bit like saying watching porn is more for personal
             | development than fun. Perhaps you'll learn something, but
             | it's incidental.
             | 
             | I've learned a lot from HN. But it wouldn't be good to fool
             | myself into thinking that an employer wants to fund my
             | personal development in this regard. Otherwise, they'd pay
             | me to HN all day.
             | 
             | The crux of the issue is that it's impossible to work 8
             | hours every day. We all invent lies to fill the downtime.
        
               | themodelplumber wrote:
               | Is all that hyperbole really necessary? Each new sentence
               | seems primed to leak edge and corner cases. Without
               | giving more attention to such a rhetorical blind spot, I
               | wonder how one could imagine they know the crux from the
               | passenger side door.
        
               | sillysaurusx wrote:
               | Which sentence is mistaken?
        
               | themodelplumber wrote:
               | The one with all the generalizations
        
               | sillysaurusx wrote:
               | If it's mistaken, it should be easy to explain why.
               | Otherwise I'm inclined to believe it's merely an
               | uncomfortable truth.
               | 
               | Would your employer pay you to HN all day? If not,
               | precisely how much of your day are they comfortable with
               | you HN'ing? Are you sure it's officially approved?
        
           | chadcmulligan wrote:
           | https://xkcd.com/303/
           | 
           | Waiting for Compiles is the usual, there's a lot of waiting
           | in software - waiting for compiles, scripts to run, someone
           | else to do something.
        
       | cyberge99 wrote:
       | I'm Curious as to why you didn't choose to monetize with
       | affiliate links.
       | 
       | Is seems simple and easily justifiable reward. I didn't click the
       | links, but hopefully you used smile.amazon for charity.
       | 
       | This is novel and useful. Thank you.
        
       | kvathupo wrote:
       | In anticipation of getting flagged into oblivion, am I the only
       | one who's disappointed in this selection of books?
       | 
       | Of course, taste is subjective, and it should perhaps be expected
       | that much of the list is in line with what is read by the general
       | public, but many of the books are either presenting fact or
       | attempting to convince the reader of the veracity of a certain
       | viewpoint. I'd like to read more open-ended works that ask for
       | interpretation on the part of the reader or, at the least, don't
       | explicitly spell out what they want the reader to walk away with.
       | (certainly some books here fit the bill, e.g. Infinite Jest,
       | Pride & Prejudice, etc.). Again, interests are subjective.
       | 
       | In light of this, book recommendations?
        
         | themodelplumber wrote:
         | Personally I wouldn't recommend others' books to someone who is
         | left unfulfilled by such a huge list. I would rather recommend
         | writing or other subjectively-pinned activities, to hold the
         | subject accountable and help them stay out of the critic zone
         | long enough to find their way into more fulfilling growth.
        
         | awillen wrote:
         | I think that's just the nature of pulling books from HN
         | comments - a lot of those comments are trying to convince
         | people of a viewpoint, so it seems unsurprising that this is
         | the kind of list you'd end up with.
         | 
         | Not good or bad, just a function of where they're coming from.
         | 
         | And as for book recommendations, Children of Time by Adrian
         | Tchaikovsky.
        
       | figassis wrote:
       | This is amazing, thank you.
        
       | rustmachine wrote:
       | Cool project, and cool resultats. As an anthropologist who reads
       | HN as a way to keep abreast of the tech community and tech
       | insights, its interesting to see atlas shrugged as one of the
       | most often recommended books. Interesting and maybe slightly
       | disturbing. HN would make for quite interesting source material
       | for someone who wanted to study tech culture.
        
         | dang wrote:
         | I'd be careful about that generalization. This software seems
         | to be going more by mentions than by recommendations - e.g. the
         | top reply to https://news.ycombinator.com/item?id=16323808
         | ("Ask HN: Which are the most damaging books you've read?") is
         | being counted as a recommendation.
         | 
         | Sentiment analysis is hard. In fact I've never seen it work
         | yet.
        
       | concernedctzn wrote:
       | Found it interesting that I couldn't find results for Knuth (The
       | Art of Computer Programming) or SICP on here. Maybe the casual
       | way we refer to these texts is hard to detect as a reference to a
       | book, or their importance is just implied community knowledge?
        
         | [deleted]
        
         | tracyhenry wrote:
         | If there is no search result for the book name then it just
         | means it's not in my current book database (which is limited).
        
       | supperburg wrote:
       | Surprised to see "A pattern language" on there. I've read most of
       | it in preparation for building my house. It's more of a
       | dictionary than a book but it's unbelievably useful. It's just a
       | huge list of little things that an architect would notice over
       | the span of his career. Little things that are important but not
       | obvious. If you're building a house, another really good book is
       | "what not to build."
       | 
       | I also recommend "Islamic imperialism" from Yale, "the bomb in my
       | garden" by mahdi obeidi and "nothing to envy."
        
         | gjm11 wrote:
         | Most likely the main reason "A Pattern Language" is popular
         | here on HN is that it spawned a movement in software
         | engineering:
         | https://en.wikipedia.org/wiki/Software_design_pattern
         | 
         | (Plus the fact that it's a good book on its own terms. At
         | least, it is so far as I can tell; I am not an architect and
         | maybe some of the advice in it is actually terrible. But it
         | _seems_ almost always reasonable and frequently insightful, and
         | it 's well written, and the "pattern language" idea that
         | software engineering borrowed from it is a nice one. (Though
         | the software-engineering borrowings don't generally amount to
         | actual pattern languages as opposed to miscellaneous grab-bags
         | of alleged patterns.)
        
       | amelius wrote:
       | Perhaps you can do the same for research papers. Would the code
       | need to be changed in any way?
        
         | tracyhenry wrote:
         | Not much - but it needs a new set of training data for research
         | papers. Btw - there seems to be an existing website for this
         | already: https://www.hackernewspapers.com/ Although it only
         | looks for posts.
         | 
         | I'd assume that Arxiv links are often there. So it's a problem
         | that can be addressed with an easier solution (just looking for
         | Arxiv links).
        
       | personjerry wrote:
       | The problem with reading book lists like this is that nobody has
       | time to read all the books. That's a ton of crap out there and I
       | want HN to help me filter through them.
       | 
       | Thus the problem with existing solutions is NOT "limited recall"
       | or "insufficient rules" or "no Amazon link".
       | 
       | And the problem with this "solution" is that there is no
       | justification for why a book is great and applicable to my
       | circumstances, and people have to trust your black box. Otherwise
       | I'm likely to waste my time, just like reading books from any
       | other crappy recommendation engine.
       | 
       | With a deep learning model reducing all the reviews to "book
       | names" you've successfully removed the value of the book
       | discussions themselves. Therefore, for me this engine and all
       | similar engines are strictly worse than simply going through the
       | actual big threads themselves, i.e.
       | https://news.ycombinator.com/item?id=21900498
       | 
       | Edit: I've just seen the embedded comments by switching to a
       | desktop browser. It's a nice addition. However, for me to make
       | sure I'm not wasting my time going through arbitrary books and
       | comments, I would need to know why a book is ranked highly
       | compared to other books. And I want to be sure that ranking is
       | tailored to me, at a very, very high accuracy.
        
         | adewinter wrote:
         | > With a deep learning model reducing all the reviews to "book
         | names" you've successfully removed the value of the book
         | discussions themselves.
         | 
         | It literally shows each comment in full that it extracted the
         | book name from. It also includes a link to the comment in the
         | original thread. What more could you possibly want?
        
           | personjerry wrote:
           | Oh, I was on mobile and could not see the comments section.
           | It's interesting for sure. But what I want in particular is
           | to learn why a book is ranked highly compared to other books.
           | And I want to be sure that ranking is tailored to me, at a
           | very, very high accuracy.
        
             | lbriner wrote:
             | They are ranked highly because of the number of times they
             | are recomended in a comment.
        
         | FredPret wrote:
         | There's no way around the black box element of a book review,
         | but Nassim Taleb suggests waiting a few decades and, if the
         | book is still well known, then reading it.
        
           | bachmeier wrote:
           | Wow. What a helpful piece of advice (I guess he's smarter
           | than the rest of us so it's hard to understand the genius of
           | his strategy). Any mention of the cost of missing out on the
           | content in the book for a few decades?
        
             | phgn wrote:
             | The idea behind reading older books is that they're already
             | proven to be useful - it's a filter for you to spend less
             | time on useless information. It's generally called the
             | "Lindy effect"
        
             | lifekaizen wrote:
             | Love this question. I could imagine him suggesting reading
             | academic papers for cutting edge things; like his 'barbell'
             | excercise strategy of mostly walking with occasional HIITs.
        
             | dwighttk wrote:
             | Just read older books until there aren't any and then move
             | onto newer ones. There are too many that are a couple
             | decades old for you to ever run out.
             | 
             | Also occasionally break the rule for a book you want to
             | read. It isn't like that would kill you.
        
           | dhosek wrote:
           | I guess it's ok to read _The C Programming Language_ , then.
        
             | FredPret wrote:
             | Technical manuals are more like a journal in the sense that
             | you have to read all the new ones if you want to keep up.
             | 
             | Novels, philosophies, and histories are works that can
             | stand the test of time if they're good enough
        
         | tracyhenry wrote:
         | > I want HN to help me filter through them.
         | 
         | The comments panel show the actual recommendations. And the
         | books are ranked by number of recommendations. Is this not
         | enough?
        
       | deaddabe wrote:
       | Impressive work.
       | 
       | What data source are you using for the books, authors and covers?
       | I looked at OpenLibrary [1] but the covers are not the same, so I
       | suppose it is something else? Maybe Amazon directly somehow?
       | 
       | [1] https://openlibrary.org/search?q=zero+to+one&mode=everything
        
         | tracyhenry wrote:
         | I crawled about 20k books from Amazon. Thanks for pointing me
         | openlibrary!
        
       | tinmandespot wrote:
       | This makes me happy
        
       | justinzollars wrote:
       | Wow! this is amazing. Nice work! :) made my day
        
       | leobg wrote:
       | Very cool. This one's wrong though: "Zero: The Biography of a
       | Dangerous Idea". Comments are talking about other books with
       | "zero" in the tile, such as Thiel's "Zero To One". Perhaps parse
       | longer titles first, and eliminate them, before matching for
       | shorter titles? Great MVP. Had in fact been thinking about how
       | great it would be to gather book data from HN myself just
       | yesterday. So am really happy to see that someone actually made
       | it. Plus, it looks great and is fun to use.
        
         | tracyhenry wrote:
         | Thanks. In theory this is the model's fault that's not learning
         | "Zero to One" should be considered as a whole book. One
         | limitation I mentioned in my root comment. Should be fixable
         | with more training data!
        
       | awinter-py wrote:
       | it confused rationalist harry potter fanfic with 'harry potter
       | hogwarts hardcover journal and elder wand pen set', amazing
        
         | wizzwizz4 wrote:
         | To my knowledge, /jk Rowling doesn't allow people to sell Harry
         | Potter fanfic, even though she's fine with its existence.
        
       | begueradj wrote:
       | some are interesting
        
       | bonniejawker wrote:
       | are you planning to open source the app? could you do one for
       | lobste.rs too?
        
       | GuB-42 wrote:
       | Interesting lack of 1984, even though it is mentioned way too
       | often. The lesser known "Animal Farm" and other dystopias like
       | "Brave New World" and "Fahrenheit 451" are here.
       | 
       | Is it because it is a number?
        
         | tracyhenry wrote:
         | It's because my book database doesn't have it. In fact my model
         | identifies 132 mentions of 1984, some examples:
         | 
         | https://news.ycombinator.com/item?id=20285306
         | 
         | https://news.ycombinator.com/item?id=12518804
         | 
         | https://news.ycombinator.com/item?id=22724495
        
           | SquishyPanda23 wrote:
           | This is great thank you.
           | 
           | On the topic of Brave New World, the site categorizes it as a
           | Reference.
        
         | SquishyPanda23 wrote:
         | The book title is "Nineteen Eighty-Four", but nobody spells it
         | that way.
         | 
         | The app may need to special case it.
        
       | guidovranken wrote:
       | Nice. The Hacker News archive contains a wealth of great
       | information. I've previously performed similar extractions like
       | OP but with grep and SQL. I've also looked for people who have
       | accurately predicted the stock market (I did identify one pro
       | investor. He's now into NFTs). I've found so much cool stuff,
       | spending whole nights looking for interesting users and reading
       | their entire post histories and being blown away by many
       | insightful posts. I've been considering making a blog consisting
       | entirely of insightful HN posts that I come across.
        
         | moneywoes wrote:
         | Do you mind sharing what investor
        
         | air7 wrote:
         | Please do. That sounds super interesting.
        
       | Andrew_nenakhov wrote:
       | Ok #3 is Dune. It'll surely be super helpful in building my
       | interstellar empire.
       | 
       | Step 1: make elites addicted to drugs
       | 
       | Step 2: monopolize drug trade
       | 
       | Step 3: install a religious fundamentalistic regime with yourself
       | at its head
       | 
       | (All very logical until this point, but next step might be a
       | problem, can anyone offer advice)
       | 
       | Step 4: transform into a worm
       | 
       | ??!
        
         | robotresearcher wrote:
         | Don't forget the step of achieving prescience, which allows you
         | to figure out what the '??!' is.
        
           | Andrew_nenakhov wrote:
           | That's what drugs are for, no?!
        
       | defect0 wrote:
       | Noticed an issue. Some, but not all, comments referencing Strunk
       | and White's Elements of Style are showing up instead as Erin
       | Gates' Elements of Style: Designing a Home & a Life
        
         | tracyhenry wrote:
         | Good catch! This is the limitation mentioned in my root comment
         | - the algorithm will fail when two books have similar names.
         | The partial solution is to look at authors too when available.
         | Something to be included in the future.
        
       | leobg wrote:
       | BTW, going through that list, I see why I love the HN crowd. 70 %
       | of those books I've read myself, and did so before coming to HN.
       | There must be some strong personality type filtering going on.
        
         | reducesuffering wrote:
         | I think it's been quite obvious there's some personality type
         | filtering going on, as with most online communities. I'm quite
         | curious how it'd be quantified. Surely software engineers,
         | startup founder, ADHD, INTJ, and Meyers-Briggs-is-bogus types
         | are overrepresented. Might tell us a bit more about
         | ourselves...
        
         | cinntaile wrote:
         | There's a strange error in there. The Art of War by Sun Tzu is
         | listed twice, why is that? Since it finds the right book and
         | author?
        
       | qwert12345887 wrote:
       | Can this be done to get list of blogs posted here with topic
       | analysis?
        
       | baby wrote:
       | Interestingly it cannot differentiate between the different harry
       | potter recommendation (the original books, fanfics, and that book
       | on philosophy that mentions harry potter)
        
       | spookyuser wrote:
       | This is really incredible!
       | 
       | A while ago I created something adjacent to this that looks for
       | hacker news review of books on goodreads
       | (https://github.com/spookyuser/hacker-reads)
       | 
       | So I'm very curious how you managed to find book titles, I ran
       | into a lot of issues trying to figure out, for example, with
       | "Clean Code" whether to search for "Clean Code" or "Clean Code: A
       | Handbook of Agile Software Craftsmanship" since people mentioning
       | the book used both instances. And of course someone mentioning
       | just "Clean Code" might be referring to the concept not the book.
       | I ended up settling on `${titleMinusColon} - ${author}` but I'd
       | love to know what your approach was given that you used deep
       | learning to search.
       | 
       | EDIT: Just read your comment below on your approach, very
       | interesting!
        
       | jp42 wrote:
       | Slightly off from post. The best book recommendation I got is
       | from one of the following ways, dedicated recommendation service
       | or app never worked for me:
       | 
       | - told by friend
       | 
       | - someone I admire read book and commented on it
       | 
       | - I'm working on some problem, during that exploration i came
       | across books.
       | 
       | - random people mentioning book on platform like HN on a
       | topic/post of my interest.
        
         | rahimnathwani wrote:
         | The most life-changing book recommendation I got was from HN:
         | 'Teach your child to read in 100 easy lessons'.
        
           | jp42 wrote:
           | Thanks for the comment! My son is 3y8m. He knows letter and
           | many word. Looks like this book could be what he needs to get
           | on next level.
        
         | TakerofVita wrote:
         | Yeah, in my experience a lot of 'general' book reviews are
         | super critical and don't really try to hook you. Going through
         | several reviews, you just come away with the collected gripes
         | and nitpicks of what is otherwise a good book.
         | 
         | I find that I get sold on a lot more when it is just a random
         | single comment on some thread somewhere that focuses on a
         | single aspect of a book.
         | 
         | If you can find a hyper specific subreddit/forum/etc. for a
         | sub-genre you like, then you will spend more time reading books
         | than reviews...
        
         | spookyuser wrote:
         | > random people mentioning book on platform like HN on a
         | topic/post of my interest.
         | 
         | Same! Some of my favorite book recommendations have especially
         | come from this one, I don't know why but a one line comment on
         | a HN thread of "what book changed your life" has become my
         | favorite way for discovering books.
        
       | ramraj07 wrote:
       | Great work but do note that the list basically looks slightly
       | better than an amazon list (atlas shrugged lol). I think some
       | effort into more useful ranking (looking for metrics of
       | controversiality or maybe page rank) might make it more useful!
        
         | vavooom wrote:
         | I am also curious to know if the # of votes is integrated into
         | the ranking at all, possibly weighted. Could also attempt NLP
         | Text Sentiment analysis to influence the model as well.
         | 
         | Regardless, fantastic work already!
        
           | tracyhenry wrote:
           | Right now the ranking is a simple combination of sentiment
           | and length. Including #vote definitely sounds useful!
        
       | FinanceAnon wrote:
       | Awesome idea and nice looking UI! I will definitely visit when I
       | am looking for new books to read.
       | 
       | One thing that I've noticed is that when I select another book,
       | the scrollbar in the comment section doesn't automatically scroll
       | up.
        
         | tracyhenry wrote:
         | I know this but I wasn't able to fix it. Would love suggestions
         | on how to keep the scroll position in one div (for the books)
         | but not the other (for the comments) when doing client-side
         | navigation using Next.js...
        
       | zeristor wrote:
       | Can't find my recommendations for J. Scott Turner's "The Extended
       | Organism"
       | 
       | To summarise: organisms evolving to change the environment around
       | them to their benefit. I went to Foyle's one day with butlying a
       | book on Termite mounds in mind, that is one chapter in the book.
       | 
       | I found out too late that UCL had hosted a talk by Dr Turner a
       | year too late.
        
       | soheil wrote:
       | "You're delusional. Where did I ever recommend reading Atlas
       | Shrugged? Ayn Rand is nuts."
       | 
       | Interesting that that's one of the recommendations.
        
       | the_arun wrote:
       | Thinking loud here - what is the difference between Google Search
       | Algorithm & AI Based deep learning? They both are trying to do
       | same I guess - that is structuring unstructured data?
        
       | mdp2021 wrote:
       | Suggestions: any way to notify you (your system) of book
       | recommendations your processing has missed?
       | 
       | You could have a form to notify you of a post which seems to be
       | not processed, e.g.
       | "https://news.ycombinator.com/item?id=28549134", or "id=28591398"
       | etc.
       | 
       | (BTW: very great work, and thank you for your invaluable service)
        
       | tracyhenry wrote:
       | Although viewable on mobile, this app is best viewed on larger
       | screens! :)
        
       | wombatmobile wrote:
       | I like this a lot.
       | 
       | The longer extracts are more useful than the shorter extracts.
       | 
       | For Brave New World, I noticed the first 100 - 200 comments are
       | short, and not useful as reviews so much as indicators of
       | preference. Then after that, the comments are longer, and hence,
       | more useful because they explain something.
       | 
       | It would be useful to be able to filter word length so as to be
       | able to distinguish between Opinion Mode vs Review Mode.
        
       | srcreigh wrote:
       | You helped me spend $150 on books! Two comments
       | 
       | 1. I regret you earned $0 for helping me spending so much on
       | books. Have you considered setting up affiliate links or a
       | donation button? Maybe affiliate links as a service will be your
       | next project.
       | 
       | 2. The Amazon links are for Amazon.com, but I'm in Canada. Maybe
       | easy internationalized Amazon affiliate links will be your next
       | project.
        
         | xpe wrote:
         | I regret that so many people regret that other people are not
         | monetizing.
        
           | lostgame wrote:
           | You know what, if commenter OP finds value in the services
           | offered; and wishes to compensate the author of the software
           | - just gonna say - I have no problem with that.
        
             | mihaic wrote:
             | You might not have a problem with that, but some of us
             | dislike knowing that monetizatization has to become
             | omnipresent, as it changes everything.
        
               | srcreigh wrote:
               | Affiliate programs are the most anti-big corp
               | monetization strategy ever.
               | 
               | Considering I already buy books on Amazon, if there's
               | anyway I can just find an affiliate (any affiliate),
               | Amazon gets 5.5% less revenue.
               | 
               | For tracyhenry, they would get ~$8.25 CAD straight out of
               | Amazon's pocket for my $150 purchase.
               | 
               | https://associates.amazon.ca/help/node/topic/GRXPHT8U84RA
               | YDX...
        
               | soperj wrote:
               | except it becomes much harder to find genuine
               | recommendations for things on the web.
        
               | scns wrote:
               | You can use Pi-holes' monetization link.
               | 
               | (edit) just scroll down that page: https://pi-
               | hole.net/donate/#sponsorship
        
               | dublinben wrote:
               | Amazon would get even less revenue if you bought your
               | books somewhere else, like https://bookshop.org/ or
               | directly from an independent book store.
        
               | thaufeki wrote:
               | A Patreon/crypto address to make a donation to is the
               | compromise here, surely.
        
               | ijidak wrote:
               | How do you pay your bills?
               | 
               | People should get paid for work.
               | 
               | Whether that work is having a job. Or making a website.
               | 
               | I don't see the difference...
               | 
               | Someone who does useful work deserves wages.
        
           | MathCodeLove wrote:
           | I regret that some people seem to think that any sort of
           | compensation for services rendered or monetization in any way
           | is automatically bad or wrong somehow.
        
             | darwinwhy wrote:
             | I regret having read this entire comment chain.
        
               | amelius wrote:
               | I regret that you did not get compensated for your lost
               | time.
        
             | ijidak wrote:
             | Agree.
             | 
             | People should get paid for work.
             | 
             | Whether that work is having a job. Or making a website.
             | 
             | Someone who does useful work deserves wages.
             | 
             | Even 2,000 years ago the Bible said: "the worker deserves
             | his wages."
             | 
             | Most of the people who are again monetization are perfectly
             | happy to get paid by their employer.
             | 
             | Is direct employment the only morally upright way to
             | receive payment for hard work?
        
         | gricardo99 wrote:
         | You helped me spend $150 on books!
         | 
         | Check your local Library. Depending on where you are, it could
         | be a fantastic resource for books.
        
           | malshe wrote:
           | After reading such comments here on HN, last month I got
           | myself a local library card and it has turned out to be a
           | great decision! I am using Libby app to get digital books and
           | even audiobooks! Absolutely fantastic
        
         | rahimnathwani wrote:
         | For #2, there are services OP could use, that will
         | automatically switch links to the right country store, e.g.
         | https://geniuslink.com/how-it-works/for-affiliates
        
       | cweill wrote:
       | Great execution, and very neat app!
       | 
       | But, what's wrong with using Amazon affiiliate links? If
       | anything, monetizing would be great since it would give you more
       | incentive to maintain this wonderful application? And it doesn't
       | cost us users anything.
        
         | tracyhenry wrote:
         | Great point. I'm on a student visa which forbids any non-work
         | income. That's one reason why :)
        
       | nautilius wrote:
       | Amazing and super useful: If I start reading today, and I read a
       | book a day, it'll only take 112 years to finish, assuming that no
       | additional books will be recommended in the next century.
        
       | inanutshellus wrote:
       | I'm reminded of Goodhart's Law... So long as your project remains
       | secret it'll be valuable. Once someone sees money being made from
       | it, it'll kick off ingenuine recommendations... anyway... high
       | quality problem to have I guess!
        
       | bachmeier wrote:
       | Interesting idea, but this is _mentions_ of books, not
       | recommendations. It includes comments by someone that 's reading
       | the book, has it on their reading list, or read it and thought it
       | was terrible.
        
         | tracyhenry wrote:
         | The intention was to only show recommendations. But because of
         | limited training data (we hand labeled ~4000 comments), the
         | model wasn't able to filter out bad ones effectively. More
         | training data should be able to solve it.
        
       | zsmi wrote:
       | It's a really interesting project. And I am sure it's really
       | hard.
       | 
       | I was curious how many times some common textbooks were mentioned
       | but didn't find them via the search, which could be user error.
       | But to give a specific example. None of the books in this comment
       | thread were found:
       | 
       | https://news.ycombinator.com/item?id=19893447
       | 
       | Comment text like this: "CMOS VLSI Design: A Circuits and Systems
       | Perspective (4th Edition)" by Weste and Harris
       | 
       | should've been caught, right?
        
         | tracyhenry wrote:
         | It could be that I don't have this book in my book database.
        
       | nickthemagicman wrote:
       | Came here for the Warhammer stayed for the book recc's.
        
       | rahimnathwani wrote:
       | This is awesome. The best thing is that it's so fast to navigate.
       | I like how the HN comments are styled just like on HN.
       | 
       | A couple of thoughts:
       | 
       | * It would be great if each book were to have its own URL (for
       | sharing).
       | 
       | * Consider allowing the search to allow author input, e.g. if I
       | want to find the book 'Who' by Geoff Smart, the single-word title
       | isn't specific enough to show that book at the top of the search
       | results.
        
         | soco wrote:
         | If I look for one single word and that single word _is_ the
         | answer, shouldn 't that be the very first result? I mean that's
         | a 100% match right there...
        
           | rahimnathwani wrote:
           | If the dataset were perfect, maybe. But, if a book with a
           | single-word title has only few comments, it's plausible that
           | most/all of those comments are false matches.
           | 
           | In the case of the book I searched 'Who', showing it in 4th
           | position seemed about right.
        
           | tracyhenry wrote:
           | y the search can definitely be improved (e.g. to include
           | author). Right now it's SELECT * FROM books WHERE name LIKE
           | '%{search_string}%'
        
       | artursapek wrote:
       | this is awesome, thanks for making it
        
       | MarcScott wrote:
       | HN really likes Neal Stephenson. I've never read a book of his
       | that I didn't love, so will be definitely looking though more of
       | the recommended fiction from the community here.
        
         | samuel wrote:
         | REAMDE was crap, IMO, and I'm a Stephenson fan.
         | 
         | And the problem with Stephenson is that's rarely succint so a
         | bad book from him turns into a huge loss of time.
        
           | macintux wrote:
           | I addressed that problem with _Seveneves_ by skimming about 1
           | /3rd of it.
        
       | samuel wrote:
       | This is amazing. Thank you!
       | 
       | Does it take into account negative reviews/comments? I have seen
       | that Why we sleep is being recommended in the 6 months tab, but,
       | while it was received with a lot of praise, it was soon
       | critizised by others researchers in the field and I would expect
       | that the HN crowd would have followed that trend.
        
         | tracyhenry wrote:
         | When I labeled the comments, I didn't label books that were
         | criticized. So in theory the model should filter out negative
         | reviews. But currently the training dataset is pretty limited
         | in size so you still can see some negative ones. I suspect that
         | with more training data this problem will go away.
        
       | jeron wrote:
       | 40k good books out there and I can only read like 24 a year if I
       | really push myself
        
       | ehutch79 wrote:
       | It has atlas shrugged in the top 10?
        
         | gjm11 wrote:
         | "Atlas Shrugged" is a polarizing book: people tend to either
         | love it or hate it. And the people who love it love to tell
         | other people how great it is, whereas many of the people who
         | hate it just don't talk about it (because there's generally
         | little need to talk about the badness of bad books).
         | 
         | I think a book list is more useful if it has some books in it
         | that some love and some hate, rather than only books that no
         | one minds very much. Maybe some of them will turn out to be
         | ones I love.
         | 
         | (I happen not to be a Rand fan myself.)
        
       | [deleted]
        
       | Borlands wrote:
       | Brilliant
        
       | tracyhenry wrote:
       | Hi HN!
       | 
       | I built this small app in my spare time to aggregate books
       | recommended on Hacker News. I personally find books recommended
       | on HN to be super helpful, so I think this is the way that I can
       | contribute back.
       | 
       | This book aggregation idea is not new. A bunch of sites have done
       | similar things [1, 2, 3].
       | 
       | Yet one common limitation of those sites is that they have
       | limited recall (i.e. not able to get a comprehensive set of book
       | mentions), and thus don't paint an accurate picture of what the
       | top books are. They're all based on insufficient rules, e.g.,
       | looking for Amazon Links. As you can see from my app, people
       | often do not include Amazon links when recommending a book.
       | 
       | I wondered, why can't we just match book names? Well, not so
       | easy. Some books have pretty short names, e.g. Meditations [4],
       | or Steve Jobs [5]. Some book name might as well be the name of a
       | movie, e.g. Ready Player One [6]. Simply matching the names of
       | the books would produce a whole lot of irrelevant results.
       | 
       | This is where Deep Learning comes into play. Recent advances in
       | large NLP models (transformers and BERT in particular) have made
       | machine language understanding unprecedentedly accurate. It
       | enables me to fine-tune a BERT model on a couple thousand labeled
       | HN comments and predict accurately whether each word in a comment
       | is part of a book or not - a task commonly termed as Named Entity
       | Recognition (NER).
       | 
       | As a result, my app is able to present a whole lot more results
       | while maintaining desirable accuracy. For example, NER works
       | pretty well on the tough examples I mentioned ([4, 5, 6]).
       | Compared to prior sites, my app captures 9-50X more mentions and
       | thus presents a much more complete picture of what books are
       | recommended on HN.
       | 
       | Furthermore, I've made sure that the comments are presented well
       | in the UI because the recommendations are just as useful as the
       | books. I highlighted the mentioned book name, and used a custom
       | NLP-based ranking function to sort the comments. These are non-
       | trivial improvements over prior sites, which I hope you can find
       | useful.
       | 
       | Nevertheless, this app is not without limitations: 1) matching
       | book names would fail when two books have the same or similar
       | names; 2) although not often, this approach would wrongly
       | classify some short stop-word names [7] and 3) sometimes NER
       | fails to see that the commenter actually hates the book. These
       | problems can be alleviated with more Deep Learning. For 1), one
       | can use BERT to learn the authors mentioned which can be used as
       | a filtering criteria. 2) and 3) should be fixable with more
       | training data (currently there are only ~4,000 hand-labeled HN
       | comments).
       | 
       | Lastly, I'd like to especially thank my gf who helped me label
       | ~1,000 comments, which boosted the model accuracy by 5 percent! I
       | also want to thank the people who create and maintain the
       | HackerNews big query dataset [8]. And of course, thank everyone
       | on HN who recommends books to others.
       | 
       | Hope you enjoy this app! Feedback and suggestions are welcome :)
       | 
       | [1] https://news.ycombinator.com/item?id=15169611
       | 
       | [2] https://news.ycombinator.com/item?id=10924741
       | 
       | [3] https://news.ycombinator.com/item?id=12365693
       | 
       | [4] https://hacker-recommended-
       | books.vercel.app/category/0/all-t...
       | 
       | [5] https://hacker-recommended-
       | books.vercel.app/category/1/all-t...
       | 
       | [6] https://hacker-recommended-
       | books.vercel.app/category/0/all-t...
       | 
       | [7] https://hacker-recommended-
       | books.vercel.app/category/12/past...
       | 
       | [8] https://news.ycombinator.com/item?id=19304326
       | 
       | P.s. The amazon links are NOT sponsored. This app is free of
       | monetization.
        
       | oakfr wrote:
       | This is really cool stuff. Would be really nice to do the same
       | for movies :)
        
       | metalliqaz wrote:
       | A book that I and others has recommended doesn't show up in the
       | database.
       | 
       | Animal, Vegetable, Junk: A History of Food, from Sustainable to
       | Suicidal by Mark Bittman
        
       | endofreach wrote:
       | Amazing. I appreciate that there are no affiliate links. But I
       | honestly think: you should put affiliate links.
       | 
       | Also, if it makes sense, have a monthly list.
        
       | godmode2019 wrote:
       | This is very impressive, well done on deploying this.
       | 
       | 95% of every book I have ever read or owned is in the first 20
       | pages.
       | 
       | Its almost just as fun to read the comment chain about each book.
       | 
       | You must be independently wealthy because I know no one cares if
       | their is an affiliate link. I believe affiliates are always paid
       | to the last cookie you have.
        
       | oakfr wrote:
       | @tracyhenry: how does the system work exactly? I cannot find any
       | documentation on your website.
        
         | tracyhenry wrote:
         | hey, you can scroll down to find a long comment of mine
         | documenting the approach.
        
           | dang wrote:
           | https://news.ycombinator.com/item?id=28596207
        
       | [deleted]
        
       ___________________________________________________________________
       (page generated 2021-09-20 23:00 UTC)