[HN Gopher] Internet Archive Scholar: Search Millions of Researc... ___________________________________________________________________ Internet Archive Scholar: Search Millions of Research Papers Author : bnewbold Score : 144 points Date : 2021-03-09 18:06 UTC (4 hours ago) (HTM) web link (blog.archive.org) (TXT) w3m dump (blog.archive.org) | nathias wrote: | archive.org is really one of the few things still good on the | internet, while studying it has been invaluable for my studies, I | can't imagine what the previous generations that could only | access 5% of sources were even doing. | 8bitsrule wrote: | Oh yeah! Tried this on several specific topics I've looked at | recently (2 years ago, 7ya, and 150ya) and the results were fast | and on the mark. I'll certainly favor using Scholar over IA | searches. Congratulations! | BugsJustFindMe wrote: | I couldn't find a list of what sources (like which journals) | they're archiving from. Does anyone know where to find that? It | would be nice to see what subject categories the archive covers. | jahewson wrote: | I took one look at that logo and concluded "this is not for me". | throwaway8451 wrote: | Here is an appropriate soundtrack for browsing the results: | | https://www.youtube.com/watch?v=x8gBfEDoEbY | simonw wrote: | I had the exact opposite reaction. That logo is fabulous. | AnimalMuppet wrote: | If you're going to judge it by the logo rather than by the | search results, it almost certainly is not for you... | betamaxthetape wrote: | This is amazing. I had a play around with it whilst it was in | beta, and was blown away by the variety of papers returned. On a | whim I searched for a very obscure topic that I'd researched | before (just for personal interest) in the past using worldcat / | google scholar, and to my surprise was presented with several | highly relevant papers I'd never come across before, that were | _exactly_ what I was looking for. | carbocation wrote: | Interesting. For my field (cardiovascular genetics), the results | weren't really what I was expecting. I think that my expectations | probably fit pretty well with a PageRank graph of citations. So | my guess is that the "relevancy" is semantic only? | sundarurfriend wrote: | (OffTopic) All this talk about the logo here made me check the | page out, instead of moving on after reading just the comments as | I might otherwise have done. Perhaps that's a HN strategy to use, | to get people to actually click through - add a bikesheddy thing | to the page that's likely to be divisive, but doesn't require | thought. Gives us a cheap way to have an opinion, and thus an | incentive to click! | endisneigh wrote: | I'm curious, how does the Internet Archive handle copyright with | all of its services? | marcodiego wrote: | The internet archive is becoming an alternative good internet. It | has a web archive, film archive, software archive, media | archive... and now research papers archive. That is the internet | as a giant library as we dreamed in early 90's. | Black101 wrote: | Way too centralized (Centranet?), but it is very nice for now. | It's a bit like the library of Alexandria, so it could | change/disappear at any time. | dbrereton wrote: | I'm sure they'd be willing to decentralize it if there was a | good way to do that. Maybe this can be done with something | like IPFS [0]. | | [0] https://ipfs.io/ | zucker42 wrote: | The amount of data is absolutely insane. | Black101 wrote: | Yes, they have very good intentions right now, but what if | the leader gets hit by a bus. | musicale wrote: | Presumably it would be be acquired, paywalled, and | monetized by a private equity firm (or some suitably | hostile intellectual property rightsholder organization) | before going bankrupt and shutting down for good. | | Thanks for an incredible journey. | puddingnomeat wrote: | Is it easy to have a local copy? | capableweb wrote: | Internet Archive strikes again! I love Internet Archive, not just | for archiving websites but for archiving everything and making it | easily accessible. This is another great service that'll help a | lot of researchers and hobby-researchers, which is lovely to see. | | Don't forget to donate if you also like Internet Archive, they | need every penny: https://archive.org/donate/?origin=hn | bnewbold wrote: | This service was hinted at back in September, but is now formally | announced and live at https://scholar.archive.org | | Related previous post: | https://news.ycombinator.com/item?id=24485444 | | Much of the catalog functionality can be accessed from the | fatcat.wiki API (https://api.fatcat.wiki/redoc). Scholar adds a | search index over the body content of papers, and we are still | thinking through how to make this available through a public API | without slowing down query latency even more. | | Folks here might also be interested in this CLI for interfacing | with the catalog and making edits: | https://gitlab.com/bnewbold/fatcat-cli | breck wrote: | I absolutely love everything about it (the logo <3). | | Super fast. All my test searches returned what I was looking | for. | | What is your relationship with semantic scholar like? | | Any plans to integrate ranking signals like references, etc? | | I'm going to double my monthly donation. This is great. | bnewbold wrote: | Thank you for the kind words! | | We are friendly with Semantic Scholar, and have used their | "open corpus" dumps as one of several URL seed lists for | crawling in the past. Their search and discovery tech is more | sophisticated than ours is likely to be any time soon | (https://medium.com/ai2-blog/building-a-better-search- | engine-...). We would love to get to the place where groups | like AI2, which are primarily research-oriented, could build | on an existing open catalog and corpus, and not need to | duplicate time crawling, merging catalogs, cleaning metadata, | etc. As of today Microsoft Academic (used by Semantic | Scholar) might be a better option. | | Want to be thoughtful about ranking signals, and are deeply | skeptical of journal impact factor, h-index, and most | bibliometrics. "Has this been cited more than a handful of | times" seems like a reasonable coarse boost. Hope to include | more curated signals, like "won a paper prize", "journal in | DOAJ and other reviewed indices", etc. | | Have been working on a citation graph, keep an eye out for | something about that in coming months. One cool thing we hope | to do with the citation graph is find "missing works" not yet | in the catalog (eg, don't have a DOI, especially for pre-1990 | era). ___________________________________________________________________ (page generated 2021-03-09 23:00 UTC)