[HN Gopher] Show HN: 40k books on HN extracted using deep learning ___________________________________________________________________ Show HN: 40k books on HN extracted using deep learning Author : tracyhenry Score : 534 points Date : 2021-09-20 16:58 UTC (6 hours ago) (HTM) web link (hacker-recommended-books.vercel.app) (TXT) w3m dump (hacker-recommended-books.vercel.app) | sushisource wrote: | Heh, for a minute there I thought you meant Warhammer 40k books | specifically, and I thought that was a pretty funny thing to be | scraping from HN :) | bnbond wrote: | Same. I'm a little disappointed it's not. | russellbeattie wrote: | I'm pretty sure there's more Warhammer 40k books than there are | days in the year... It's like someone heard the term "space | opera" and thought that meant "soap opera in space". | | Recommendations would include comments like, "This novel is | really the one that ties the previous 37 books together." or | "You might want to skip the next dozen books if you're | squeamish about things that ooze." | LanceH wrote: | While I don't consider the 40k books on par with the better | science fiction out there, I do enjoy that they bring a bit | of scale and what it means to space. It's a different take | from the rosy, post-scarcity, future of space. Bad things are | _really_ bad. Unattended good things turn bad on their own | just from drift. | | Then there is there is the unashamed embrace of over-the-top | in so many different ways. | unmole wrote: | Interesting idea but not completely accurate. My own comment | about how I hated _Thinking, Fast and Slow_ seems to be counted | as a recommendation. | tracyhenry wrote: | Right, the model is not perfect with limited training dataset I | have (we hand labeled 4,000 - which is already tons of work for | a side project). But the intention was to filter out negative | ones. | jimmySixDOF wrote: | You did a stellar job here thanks so much for this addition | to the community ! | | On labeling, if you have a method statement or some go-by | referance I am sure you would get some support here - I know | I would help ! Maybe package a few blocks of 100 unlabeled | comments with a readme & see what happens ? | sampo wrote: | > My own comment about how I hated Thinking, Fast and Slow | seems to be counted as a recommendation. | | What is the level of sentiment analysis in natural language | processing? Would it be easy to add the feature, to recognize | whether the book was mentioned in a positive or negative light? | munk-a wrote: | If you want to see some amusing "recommendations" I'd check out | The Communist Manifesto by Karl Marx and what comments it's | drawn. I think the network trying to find recommendations needs | to incorporate more sentiment analysis. | | i.e "Guards Guards by Sir Terry Pratchett is a great book" vs. | "I've never read anything as slow and uninteresting as The Two | Towers by J.R.R. Tolkein" or "I thought Seveneves by Neal | Stephenson was good - but it probably should've been two | separate books with the second half actually having some meat | to it." | [deleted] | FranklinMaillot wrote: | Lessons: My Path to a Meaningful Life by Gisele Bundchen, the | top model, is probably the most out of place recommendation :) | None of the comments is about the book obviously, they just | mention the word "Lessons". | | https://hacker-recommended-books.vercel.app/category/15/all-... | dang wrote: | Yes. I've removed the word "recommendations" from the title | because there are too many cases of negative mentions being | treated as recommendations. | | Not a criticism! Sentiment analysis seems to remain an unsolved | problem. | | See also | | https://news.ycombinator.com/item?id=28598341 | | https://news.ycombinator.com/item?id=28596882 | therealdrag0 wrote: | This was an amusing "extraction": > I have | not yet read the good book Atlas Shrugged but be sure to check | it out based on your recommendation. You're | delusional. Where did I ever recommend reading Atlas | Shrugged? Ayn Rand is nuts. | jgwil2 wrote: | Yeah, I'm seeing some issues with _Code_ by Petzold citing | comments that are talking about e.g. _Code Complete_ or just | code in general, but with such a generic name (and given the | forum) it 's actually pretty impressive to me that most | comments are identified correctly. | | Edit: another one that is tough is _Open_ by Agassi - seems | most of these comments do not actually have anything to do with | the book. I would guess most one-word titles will have similar | issues. | tracyhenry wrote: | That's correct observation. I'm guessing it has to do with | whether the words after _Open_ are indicative enough to the | model that they should be brought in together with _Open_. As | I said in other comments, with more training data this issue | will likely go away. And these tough comments are the best | candidates. | jedwhite wrote: | Hey this is really awesome! Well done. | | You mentioned transformers and BERT for large NLP models. I've | been playing around with this too and it's a really powerful | approach. Have you used spacy-transformers? [0] | | The approach is pretty cool and can be used with BERT, | GPT-2/Hugging Face etc. | | I'm just starting to experiment with GPT-J and thinking of trying | this approach also [1]. | | Anyway, totally awesome project and the results are really good. | This stuff really is almost unreasonably effective! | | [0] https://explosion.ai/blog/spacy-transformers | | [1] https://6b.eleuther.ai/ | tracyhenry wrote: | Thanks! I used Huggingface's pretrained BERT. | malshe wrote: | This is really impressive! Can you please elaborate more on | the way you labeled the data? I think usually there is a lot | to learn from labeling methods. | jedwhite wrote: | This is a really good application of it. Getting NER right | for something like book titles with so much name collision | with other domains and entity types is really hard, and this | works great on something that most people would never realize | would be so hard! | sillysaurusx wrote: | Please write up how you did this! It may seem easy or | straightforward, but I assure you it's black magic to a lot | of people. | maiensch wrote: | Love it, will you do a write-up on how to replicate this with | other sources? I'm currently analyzing both Indie Hackers and | StartupsForTheRestOfUs Interview Transcriptions and this could be | a fun analysis! | alanbernstein wrote: | This is great. I just read Permutation City, which I | coincidentally see recommended on HN all the time, so I was | surprised not to see it in the search results or the top of the | fiction or scifi lists. Any idea why that is? | tracyhenry wrote: | That might be that the book database I used is quite limited, | sadly. | Tycho wrote: | Sounds good. Blocked by my work firewall though. | | A few years ago I found an article that was something like '100 | short books everyone should read before they're 40'. It was a mix | of fiction and non-fiction. I've never been able to find it | again! But I really liked the list because these are books you | can consume in a few hours and may be life changing. | | I remember a few of the titles: Games People Play, Meditations, | The Prince, The Art of War. (I suppose it may have been non- | fiction only, although I think _The Awakening_ may have been on | there.) | | Wish I could find the link again. | ZeroGravitas wrote: | If it was Oliver Sacks' Awakenings then it is non-fiction, | though it did get turned into a movie. | Tycho wrote: | Different book - Kate Chopin | [deleted] | Rd6n6 wrote: | I don't understand how software engineers get away with | browsing the internet for fun at work when nobody else can | themodelplumber wrote: | Sounds more like personal development than fun? | sillysaurusx wrote: | That's a bit like saying watching porn is more for personal | development than fun. Perhaps you'll learn something, but | it's incidental. | | I've learned a lot from HN. But it wouldn't be good to fool | myself into thinking that an employer wants to fund my | personal development in this regard. Otherwise, they'd pay | me to HN all day. | | The crux of the issue is that it's impossible to work 8 | hours every day. We all invent lies to fill the downtime. | themodelplumber wrote: | Is all that hyperbole really necessary? Each new sentence | seems primed to leak edge and corner cases. Without | giving more attention to such a rhetorical blind spot, I | wonder how one could imagine they know the crux from the | passenger side door. | sillysaurusx wrote: | Which sentence is mistaken? | themodelplumber wrote: | The one with all the generalizations | sillysaurusx wrote: | If it's mistaken, it should be easy to explain why. | Otherwise I'm inclined to believe it's merely an | uncomfortable truth. | | Would your employer pay you to HN all day? If not, | precisely how much of your day are they comfortable with | you HN'ing? Are you sure it's officially approved? | chadcmulligan wrote: | https://xkcd.com/303/ | | Waiting for Compiles is the usual, there's a lot of waiting | in software - waiting for compiles, scripts to run, someone | else to do something. | cyberge99 wrote: | I'm Curious as to why you didn't choose to monetize with | affiliate links. | | Is seems simple and easily justifiable reward. I didn't click the | links, but hopefully you used smile.amazon for charity. | | This is novel and useful. Thank you. | kvathupo wrote: | In anticipation of getting flagged into oblivion, am I the only | one who's disappointed in this selection of books? | | Of course, taste is subjective, and it should perhaps be expected | that much of the list is in line with what is read by the general | public, but many of the books are either presenting fact or | attempting to convince the reader of the veracity of a certain | viewpoint. I'd like to read more open-ended works that ask for | interpretation on the part of the reader or, at the least, don't | explicitly spell out what they want the reader to walk away with. | (certainly some books here fit the bill, e.g. Infinite Jest, | Pride & Prejudice, etc.). Again, interests are subjective. | | In light of this, book recommendations? | themodelplumber wrote: | Personally I wouldn't recommend others' books to someone who is | left unfulfilled by such a huge list. I would rather recommend | writing or other subjectively-pinned activities, to hold the | subject accountable and help them stay out of the critic zone | long enough to find their way into more fulfilling growth. | awillen wrote: | I think that's just the nature of pulling books from HN | comments - a lot of those comments are trying to convince | people of a viewpoint, so it seems unsurprising that this is | the kind of list you'd end up with. | | Not good or bad, just a function of where they're coming from. | | And as for book recommendations, Children of Time by Adrian | Tchaikovsky. | figassis wrote: | This is amazing, thank you. | rustmachine wrote: | Cool project, and cool resultats. As an anthropologist who reads | HN as a way to keep abreast of the tech community and tech | insights, its interesting to see atlas shrugged as one of the | most often recommended books. Interesting and maybe slightly | disturbing. HN would make for quite interesting source material | for someone who wanted to study tech culture. | dang wrote: | I'd be careful about that generalization. This software seems | to be going more by mentions than by recommendations - e.g. the | top reply to https://news.ycombinator.com/item?id=16323808 | ("Ask HN: Which are the most damaging books you've read?") is | being counted as a recommendation. | | Sentiment analysis is hard. In fact I've never seen it work | yet. | concernedctzn wrote: | Found it interesting that I couldn't find results for Knuth (The | Art of Computer Programming) or SICP on here. Maybe the casual | way we refer to these texts is hard to detect as a reference to a | book, or their importance is just implied community knowledge? | [deleted] | tracyhenry wrote: | If there is no search result for the book name then it just | means it's not in my current book database (which is limited). | supperburg wrote: | Surprised to see "A pattern language" on there. I've read most of | it in preparation for building my house. It's more of a | dictionary than a book but it's unbelievably useful. It's just a | huge list of little things that an architect would notice over | the span of his career. Little things that are important but not | obvious. If you're building a house, another really good book is | "what not to build." | | I also recommend "Islamic imperialism" from Yale, "the bomb in my | garden" by mahdi obeidi and "nothing to envy." | gjm11 wrote: | Most likely the main reason "A Pattern Language" is popular | here on HN is that it spawned a movement in software | engineering: | https://en.wikipedia.org/wiki/Software_design_pattern | | (Plus the fact that it's a good book on its own terms. At | least, it is so far as I can tell; I am not an architect and | maybe some of the advice in it is actually terrible. But it | _seems_ almost always reasonable and frequently insightful, and | it 's well written, and the "pattern language" idea that | software engineering borrowed from it is a nice one. (Though | the software-engineering borrowings don't generally amount to | actual pattern languages as opposed to miscellaneous grab-bags | of alleged patterns.) | amelius wrote: | Perhaps you can do the same for research papers. Would the code | need to be changed in any way? | tracyhenry wrote: | Not much - but it needs a new set of training data for research | papers. Btw - there seems to be an existing website for this | already: https://www.hackernewspapers.com/ Although it only | looks for posts. | | I'd assume that Arxiv links are often there. So it's a problem | that can be addressed with an easier solution (just looking for | Arxiv links). | personjerry wrote: | The problem with reading book lists like this is that nobody has | time to read all the books. That's a ton of crap out there and I | want HN to help me filter through them. | | Thus the problem with existing solutions is NOT "limited recall" | or "insufficient rules" or "no Amazon link". | | And the problem with this "solution" is that there is no | justification for why a book is great and applicable to my | circumstances, and people have to trust your black box. Otherwise | I'm likely to waste my time, just like reading books from any | other crappy recommendation engine. | | With a deep learning model reducing all the reviews to "book | names" you've successfully removed the value of the book | discussions themselves. Therefore, for me this engine and all | similar engines are strictly worse than simply going through the | actual big threads themselves, i.e. | https://news.ycombinator.com/item?id=21900498 | | Edit: I've just seen the embedded comments by switching to a | desktop browser. It's a nice addition. However, for me to make | sure I'm not wasting my time going through arbitrary books and | comments, I would need to know why a book is ranked highly | compared to other books. And I want to be sure that ranking is | tailored to me, at a very, very high accuracy. | adewinter wrote: | > With a deep learning model reducing all the reviews to "book | names" you've successfully removed the value of the book | discussions themselves. | | It literally shows each comment in full that it extracted the | book name from. It also includes a link to the comment in the | original thread. What more could you possibly want? | personjerry wrote: | Oh, I was on mobile and could not see the comments section. | It's interesting for sure. But what I want in particular is | to learn why a book is ranked highly compared to other books. | And I want to be sure that ranking is tailored to me, at a | very, very high accuracy. | lbriner wrote: | They are ranked highly because of the number of times they | are recomended in a comment. | FredPret wrote: | There's no way around the black box element of a book review, | but Nassim Taleb suggests waiting a few decades and, if the | book is still well known, then reading it. | bachmeier wrote: | Wow. What a helpful piece of advice (I guess he's smarter | than the rest of us so it's hard to understand the genius of | his strategy). Any mention of the cost of missing out on the | content in the book for a few decades? | phgn wrote: | The idea behind reading older books is that they're already | proven to be useful - it's a filter for you to spend less | time on useless information. It's generally called the | "Lindy effect" | lifekaizen wrote: | Love this question. I could imagine him suggesting reading | academic papers for cutting edge things; like his 'barbell' | excercise strategy of mostly walking with occasional HIITs. | dwighttk wrote: | Just read older books until there aren't any and then move | onto newer ones. There are too many that are a couple | decades old for you to ever run out. | | Also occasionally break the rule for a book you want to | read. It isn't like that would kill you. | dhosek wrote: | I guess it's ok to read _The C Programming Language_ , then. | FredPret wrote: | Technical manuals are more like a journal in the sense that | you have to read all the new ones if you want to keep up. | | Novels, philosophies, and histories are works that can | stand the test of time if they're good enough | tracyhenry wrote: | > I want HN to help me filter through them. | | The comments panel show the actual recommendations. And the | books are ranked by number of recommendations. Is this not | enough? | deaddabe wrote: | Impressive work. | | What data source are you using for the books, authors and covers? | I looked at OpenLibrary [1] but the covers are not the same, so I | suppose it is something else? Maybe Amazon directly somehow? | | [1] https://openlibrary.org/search?q=zero+to+one&mode=everything | tracyhenry wrote: | I crawled about 20k books from Amazon. Thanks for pointing me | openlibrary! | tinmandespot wrote: | This makes me happy | justinzollars wrote: | Wow! this is amazing. Nice work! :) made my day | leobg wrote: | Very cool. This one's wrong though: "Zero: The Biography of a | Dangerous Idea". Comments are talking about other books with | "zero" in the tile, such as Thiel's "Zero To One". Perhaps parse | longer titles first, and eliminate them, before matching for | shorter titles? Great MVP. Had in fact been thinking about how | great it would be to gather book data from HN myself just | yesterday. So am really happy to see that someone actually made | it. Plus, it looks great and is fun to use. | tracyhenry wrote: | Thanks. In theory this is the model's fault that's not learning | "Zero to One" should be considered as a whole book. One | limitation I mentioned in my root comment. Should be fixable | with more training data! | awinter-py wrote: | it confused rationalist harry potter fanfic with 'harry potter | hogwarts hardcover journal and elder wand pen set', amazing | wizzwizz4 wrote: | To my knowledge, /jk Rowling doesn't allow people to sell Harry | Potter fanfic, even though she's fine with its existence. | begueradj wrote: | some are interesting | bonniejawker wrote: | are you planning to open source the app? could you do one for | lobste.rs too? | GuB-42 wrote: | Interesting lack of 1984, even though it is mentioned way too | often. The lesser known "Animal Farm" and other dystopias like | "Brave New World" and "Fahrenheit 451" are here. | | Is it because it is a number? | tracyhenry wrote: | It's because my book database doesn't have it. In fact my model | identifies 132 mentions of 1984, some examples: | | https://news.ycombinator.com/item?id=20285306 | | https://news.ycombinator.com/item?id=12518804 | | https://news.ycombinator.com/item?id=22724495 | SquishyPanda23 wrote: | This is great thank you. | | On the topic of Brave New World, the site categorizes it as a | Reference. | SquishyPanda23 wrote: | The book title is "Nineteen Eighty-Four", but nobody spells it | that way. | | The app may need to special case it. | guidovranken wrote: | Nice. The Hacker News archive contains a wealth of great | information. I've previously performed similar extractions like | OP but with grep and SQL. I've also looked for people who have | accurately predicted the stock market (I did identify one pro | investor. He's now into NFTs). I've found so much cool stuff, | spending whole nights looking for interesting users and reading | their entire post histories and being blown away by many | insightful posts. I've been considering making a blog consisting | entirely of insightful HN posts that I come across. | moneywoes wrote: | Do you mind sharing what investor | air7 wrote: | Please do. That sounds super interesting. | Andrew_nenakhov wrote: | Ok #3 is Dune. It'll surely be super helpful in building my | interstellar empire. | | Step 1: make elites addicted to drugs | | Step 2: monopolize drug trade | | Step 3: install a religious fundamentalistic regime with yourself | at its head | | (All very logical until this point, but next step might be a | problem, can anyone offer advice) | | Step 4: transform into a worm | | ??! | robotresearcher wrote: | Don't forget the step of achieving prescience, which allows you | to figure out what the '??!' is. | Andrew_nenakhov wrote: | That's what drugs are for, no?! | defect0 wrote: | Noticed an issue. Some, but not all, comments referencing Strunk | and White's Elements of Style are showing up instead as Erin | Gates' Elements of Style: Designing a Home & a Life | tracyhenry wrote: | Good catch! This is the limitation mentioned in my root comment | - the algorithm will fail when two books have similar names. | The partial solution is to look at authors too when available. | Something to be included in the future. | leobg wrote: | BTW, going through that list, I see why I love the HN crowd. 70 % | of those books I've read myself, and did so before coming to HN. | There must be some strong personality type filtering going on. | reducesuffering wrote: | I think it's been quite obvious there's some personality type | filtering going on, as with most online communities. I'm quite | curious how it'd be quantified. Surely software engineers, | startup founder, ADHD, INTJ, and Meyers-Briggs-is-bogus types | are overrepresented. Might tell us a bit more about | ourselves... | cinntaile wrote: | There's a strange error in there. The Art of War by Sun Tzu is | listed twice, why is that? Since it finds the right book and | author? | qwert12345887 wrote: | Can this be done to get list of blogs posted here with topic | analysis? | baby wrote: | Interestingly it cannot differentiate between the different harry | potter recommendation (the original books, fanfics, and that book | on philosophy that mentions harry potter) | spookyuser wrote: | This is really incredible! | | A while ago I created something adjacent to this that looks for | hacker news review of books on goodreads | (https://github.com/spookyuser/hacker-reads) | | So I'm very curious how you managed to find book titles, I ran | into a lot of issues trying to figure out, for example, with | "Clean Code" whether to search for "Clean Code" or "Clean Code: A | Handbook of Agile Software Craftsmanship" since people mentioning | the book used both instances. And of course someone mentioning | just "Clean Code" might be referring to the concept not the book. | I ended up settling on `${titleMinusColon} - ${author}` but I'd | love to know what your approach was given that you used deep | learning to search. | | EDIT: Just read your comment below on your approach, very | interesting! | jp42 wrote: | Slightly off from post. The best book recommendation I got is | from one of the following ways, dedicated recommendation service | or app never worked for me: | | - told by friend | | - someone I admire read book and commented on it | | - I'm working on some problem, during that exploration i came | across books. | | - random people mentioning book on platform like HN on a | topic/post of my interest. | rahimnathwani wrote: | The most life-changing book recommendation I got was from HN: | 'Teach your child to read in 100 easy lessons'. | jp42 wrote: | Thanks for the comment! My son is 3y8m. He knows letter and | many word. Looks like this book could be what he needs to get | on next level. | TakerofVita wrote: | Yeah, in my experience a lot of 'general' book reviews are | super critical and don't really try to hook you. Going through | several reviews, you just come away with the collected gripes | and nitpicks of what is otherwise a good book. | | I find that I get sold on a lot more when it is just a random | single comment on some thread somewhere that focuses on a | single aspect of a book. | | If you can find a hyper specific subreddit/forum/etc. for a | sub-genre you like, then you will spend more time reading books | than reviews... | spookyuser wrote: | > random people mentioning book on platform like HN on a | topic/post of my interest. | | Same! Some of my favorite book recommendations have especially | come from this one, I don't know why but a one line comment on | a HN thread of "what book changed your life" has become my | favorite way for discovering books. | ramraj07 wrote: | Great work but do note that the list basically looks slightly | better than an amazon list (atlas shrugged lol). I think some | effort into more useful ranking (looking for metrics of | controversiality or maybe page rank) might make it more useful! | vavooom wrote: | I am also curious to know if the # of votes is integrated into | the ranking at all, possibly weighted. Could also attempt NLP | Text Sentiment analysis to influence the model as well. | | Regardless, fantastic work already! | tracyhenry wrote: | Right now the ranking is a simple combination of sentiment | and length. Including #vote definitely sounds useful! | FinanceAnon wrote: | Awesome idea and nice looking UI! I will definitely visit when I | am looking for new books to read. | | One thing that I've noticed is that when I select another book, | the scrollbar in the comment section doesn't automatically scroll | up. | tracyhenry wrote: | I know this but I wasn't able to fix it. Would love suggestions | on how to keep the scroll position in one div (for the books) | but not the other (for the comments) when doing client-side | navigation using Next.js... | zeristor wrote: | Can't find my recommendations for J. Scott Turner's "The Extended | Organism" | | To summarise: organisms evolving to change the environment around | them to their benefit. I went to Foyle's one day with butlying a | book on Termite mounds in mind, that is one chapter in the book. | | I found out too late that UCL had hosted a talk by Dr Turner a | year too late. | soheil wrote: | "You're delusional. Where did I ever recommend reading Atlas | Shrugged? Ayn Rand is nuts." | | Interesting that that's one of the recommendations. | the_arun wrote: | Thinking loud here - what is the difference between Google Search | Algorithm & AI Based deep learning? They both are trying to do | same I guess - that is structuring unstructured data? | mdp2021 wrote: | Suggestions: any way to notify you (your system) of book | recommendations your processing has missed? | | You could have a form to notify you of a post which seems to be | not processed, e.g. | "https://news.ycombinator.com/item?id=28549134", or "id=28591398" | etc. | | (BTW: very great work, and thank you for your invaluable service) | tracyhenry wrote: | Although viewable on mobile, this app is best viewed on larger | screens! :) | wombatmobile wrote: | I like this a lot. | | The longer extracts are more useful than the shorter extracts. | | For Brave New World, I noticed the first 100 - 200 comments are | short, and not useful as reviews so much as indicators of | preference. Then after that, the comments are longer, and hence, | more useful because they explain something. | | It would be useful to be able to filter word length so as to be | able to distinguish between Opinion Mode vs Review Mode. | srcreigh wrote: | You helped me spend $150 on books! Two comments | | 1. I regret you earned $0 for helping me spending so much on | books. Have you considered setting up affiliate links or a | donation button? Maybe affiliate links as a service will be your | next project. | | 2. The Amazon links are for Amazon.com, but I'm in Canada. Maybe | easy internationalized Amazon affiliate links will be your next | project. | xpe wrote: | I regret that so many people regret that other people are not | monetizing. | lostgame wrote: | You know what, if commenter OP finds value in the services | offered; and wishes to compensate the author of the software | - just gonna say - I have no problem with that. | mihaic wrote: | You might not have a problem with that, but some of us | dislike knowing that monetizatization has to become | omnipresent, as it changes everything. | srcreigh wrote: | Affiliate programs are the most anti-big corp | monetization strategy ever. | | Considering I already buy books on Amazon, if there's | anyway I can just find an affiliate (any affiliate), | Amazon gets 5.5% less revenue. | | For tracyhenry, they would get ~$8.25 CAD straight out of | Amazon's pocket for my $150 purchase. | | https://associates.amazon.ca/help/node/topic/GRXPHT8U84RA | YDX... | soperj wrote: | except it becomes much harder to find genuine | recommendations for things on the web. | scns wrote: | You can use Pi-holes' monetization link. | | (edit) just scroll down that page: https://pi- | hole.net/donate/#sponsorship | dublinben wrote: | Amazon would get even less revenue if you bought your | books somewhere else, like https://bookshop.org/ or | directly from an independent book store. | thaufeki wrote: | A Patreon/crypto address to make a donation to is the | compromise here, surely. | ijidak wrote: | How do you pay your bills? | | People should get paid for work. | | Whether that work is having a job. Or making a website. | | I don't see the difference... | | Someone who does useful work deserves wages. | MathCodeLove wrote: | I regret that some people seem to think that any sort of | compensation for services rendered or monetization in any way | is automatically bad or wrong somehow. | darwinwhy wrote: | I regret having read this entire comment chain. | amelius wrote: | I regret that you did not get compensated for your lost | time. | ijidak wrote: | Agree. | | People should get paid for work. | | Whether that work is having a job. Or making a website. | | Someone who does useful work deserves wages. | | Even 2,000 years ago the Bible said: "the worker deserves | his wages." | | Most of the people who are again monetization are perfectly | happy to get paid by their employer. | | Is direct employment the only morally upright way to | receive payment for hard work? | gricardo99 wrote: | You helped me spend $150 on books! | | Check your local Library. Depending on where you are, it could | be a fantastic resource for books. | malshe wrote: | After reading such comments here on HN, last month I got | myself a local library card and it has turned out to be a | great decision! I am using Libby app to get digital books and | even audiobooks! Absolutely fantastic | rahimnathwani wrote: | For #2, there are services OP could use, that will | automatically switch links to the right country store, e.g. | https://geniuslink.com/how-it-works/for-affiliates | cweill wrote: | Great execution, and very neat app! | | But, what's wrong with using Amazon affiiliate links? If | anything, monetizing would be great since it would give you more | incentive to maintain this wonderful application? And it doesn't | cost us users anything. | tracyhenry wrote: | Great point. I'm on a student visa which forbids any non-work | income. That's one reason why :) | nautilius wrote: | Amazing and super useful: If I start reading today, and I read a | book a day, it'll only take 112 years to finish, assuming that no | additional books will be recommended in the next century. | inanutshellus wrote: | I'm reminded of Goodhart's Law... So long as your project remains | secret it'll be valuable. Once someone sees money being made from | it, it'll kick off ingenuine recommendations... anyway... high | quality problem to have I guess! | bachmeier wrote: | Interesting idea, but this is _mentions_ of books, not | recommendations. It includes comments by someone that 's reading | the book, has it on their reading list, or read it and thought it | was terrible. | tracyhenry wrote: | The intention was to only show recommendations. But because of | limited training data (we hand labeled ~4000 comments), the | model wasn't able to filter out bad ones effectively. More | training data should be able to solve it. | zsmi wrote: | It's a really interesting project. And I am sure it's really | hard. | | I was curious how many times some common textbooks were mentioned | but didn't find them via the search, which could be user error. | But to give a specific example. None of the books in this comment | thread were found: | | https://news.ycombinator.com/item?id=19893447 | | Comment text like this: "CMOS VLSI Design: A Circuits and Systems | Perspective (4th Edition)" by Weste and Harris | | should've been caught, right? | tracyhenry wrote: | It could be that I don't have this book in my book database. | nickthemagicman wrote: | Came here for the Warhammer stayed for the book recc's. | rahimnathwani wrote: | This is awesome. The best thing is that it's so fast to navigate. | I like how the HN comments are styled just like on HN. | | A couple of thoughts: | | * It would be great if each book were to have its own URL (for | sharing). | | * Consider allowing the search to allow author input, e.g. if I | want to find the book 'Who' by Geoff Smart, the single-word title | isn't specific enough to show that book at the top of the search | results. | soco wrote: | If I look for one single word and that single word _is_ the | answer, shouldn 't that be the very first result? I mean that's | a 100% match right there... | rahimnathwani wrote: | If the dataset were perfect, maybe. But, if a book with a | single-word title has only few comments, it's plausible that | most/all of those comments are false matches. | | In the case of the book I searched 'Who', showing it in 4th | position seemed about right. | tracyhenry wrote: | y the search can definitely be improved (e.g. to include | author). Right now it's SELECT * FROM books WHERE name LIKE | '%{search_string}%' | artursapek wrote: | this is awesome, thanks for making it | MarcScott wrote: | HN really likes Neal Stephenson. I've never read a book of his | that I didn't love, so will be definitely looking though more of | the recommended fiction from the community here. | samuel wrote: | REAMDE was crap, IMO, and I'm a Stephenson fan. | | And the problem with Stephenson is that's rarely succint so a | bad book from him turns into a huge loss of time. | macintux wrote: | I addressed that problem with _Seveneves_ by skimming about 1 | /3rd of it. | samuel wrote: | This is amazing. Thank you! | | Does it take into account negative reviews/comments? I have seen | that Why we sleep is being recommended in the 6 months tab, but, | while it was received with a lot of praise, it was soon | critizised by others researchers in the field and I would expect | that the HN crowd would have followed that trend. | tracyhenry wrote: | When I labeled the comments, I didn't label books that were | criticized. So in theory the model should filter out negative | reviews. But currently the training dataset is pretty limited | in size so you still can see some negative ones. I suspect that | with more training data this problem will go away. | jeron wrote: | 40k good books out there and I can only read like 24 a year if I | really push myself | ehutch79 wrote: | It has atlas shrugged in the top 10? | gjm11 wrote: | "Atlas Shrugged" is a polarizing book: people tend to either | love it or hate it. And the people who love it love to tell | other people how great it is, whereas many of the people who | hate it just don't talk about it (because there's generally | little need to talk about the badness of bad books). | | I think a book list is more useful if it has some books in it | that some love and some hate, rather than only books that no | one minds very much. Maybe some of them will turn out to be | ones I love. | | (I happen not to be a Rand fan myself.) | [deleted] | Borlands wrote: | Brilliant | tracyhenry wrote: | Hi HN! | | I built this small app in my spare time to aggregate books | recommended on Hacker News. I personally find books recommended | on HN to be super helpful, so I think this is the way that I can | contribute back. | | This book aggregation idea is not new. A bunch of sites have done | similar things [1, 2, 3]. | | Yet one common limitation of those sites is that they have | limited recall (i.e. not able to get a comprehensive set of book | mentions), and thus don't paint an accurate picture of what the | top books are. They're all based on insufficient rules, e.g., | looking for Amazon Links. As you can see from my app, people | often do not include Amazon links when recommending a book. | | I wondered, why can't we just match book names? Well, not so | easy. Some books have pretty short names, e.g. Meditations [4], | or Steve Jobs [5]. Some book name might as well be the name of a | movie, e.g. Ready Player One [6]. Simply matching the names of | the books would produce a whole lot of irrelevant results. | | This is where Deep Learning comes into play. Recent advances in | large NLP models (transformers and BERT in particular) have made | machine language understanding unprecedentedly accurate. It | enables me to fine-tune a BERT model on a couple thousand labeled | HN comments and predict accurately whether each word in a comment | is part of a book or not - a task commonly termed as Named Entity | Recognition (NER). | | As a result, my app is able to present a whole lot more results | while maintaining desirable accuracy. For example, NER works | pretty well on the tough examples I mentioned ([4, 5, 6]). | Compared to prior sites, my app captures 9-50X more mentions and | thus presents a much more complete picture of what books are | recommended on HN. | | Furthermore, I've made sure that the comments are presented well | in the UI because the recommendations are just as useful as the | books. I highlighted the mentioned book name, and used a custom | NLP-based ranking function to sort the comments. These are non- | trivial improvements over prior sites, which I hope you can find | useful. | | Nevertheless, this app is not without limitations: 1) matching | book names would fail when two books have the same or similar | names; 2) although not often, this approach would wrongly | classify some short stop-word names [7] and 3) sometimes NER | fails to see that the commenter actually hates the book. These | problems can be alleviated with more Deep Learning. For 1), one | can use BERT to learn the authors mentioned which can be used as | a filtering criteria. 2) and 3) should be fixable with more | training data (currently there are only ~4,000 hand-labeled HN | comments). | | Lastly, I'd like to especially thank my gf who helped me label | ~1,000 comments, which boosted the model accuracy by 5 percent! I | also want to thank the people who create and maintain the | HackerNews big query dataset [8]. And of course, thank everyone | on HN who recommends books to others. | | Hope you enjoy this app! Feedback and suggestions are welcome :) | | [1] https://news.ycombinator.com/item?id=15169611 | | [2] https://news.ycombinator.com/item?id=10924741 | | [3] https://news.ycombinator.com/item?id=12365693 | | [4] https://hacker-recommended- | books.vercel.app/category/0/all-t... | | [5] https://hacker-recommended- | books.vercel.app/category/1/all-t... | | [6] https://hacker-recommended- | books.vercel.app/category/0/all-t... | | [7] https://hacker-recommended- | books.vercel.app/category/12/past... | | [8] https://news.ycombinator.com/item?id=19304326 | | P.s. The amazon links are NOT sponsored. This app is free of | monetization. | oakfr wrote: | This is really cool stuff. Would be really nice to do the same | for movies :) | metalliqaz wrote: | A book that I and others has recommended doesn't show up in the | database. | | Animal, Vegetable, Junk: A History of Food, from Sustainable to | Suicidal by Mark Bittman | endofreach wrote: | Amazing. I appreciate that there are no affiliate links. But I | honestly think: you should put affiliate links. | | Also, if it makes sense, have a monthly list. | godmode2019 wrote: | This is very impressive, well done on deploying this. | | 95% of every book I have ever read or owned is in the first 20 | pages. | | Its almost just as fun to read the comment chain about each book. | | You must be independently wealthy because I know no one cares if | their is an affiliate link. I believe affiliates are always paid | to the last cookie you have. | oakfr wrote: | @tracyhenry: how does the system work exactly? I cannot find any | documentation on your website. | tracyhenry wrote: | hey, you can scroll down to find a long comment of mine | documenting the approach. | dang wrote: | https://news.ycombinator.com/item?id=28596207 | [deleted] ___________________________________________________________________ (page generated 2021-09-20 23:00 UTC)