[HN Gopher] Pirate Library Mirror: Preserving 7TB of books (that... ___________________________________________________________________ Pirate Library Mirror: Preserving 7TB of books (that are not in Libgen) Author : ValentineC Score : 243 points Date : 2022-07-03 20:39 UTC (2 hours ago) (HTM) web link (pilimi.org) (TXT) w3m dump (pilimi.org) | krick wrote: | So, if I get it right. First there was Libgen, which is | mirrorable. Then, some Z-Library copied Libgen and added some | more books, without making it mirrorable. The goal is to make | these new books, which are not mirrorable -- mirrorable (i.e. to | "preserve" them). | | So, why not just re-upload them to Libgen, then? I guess somebody | will do that now anyway, but you could easily done it in the | first place, without making your own mirror, which is not a | mirror of Libgen. Just upload them to Libgen and make a mirror of | Libgen. | facethewolf wrote: | From their FAQ: | | > _Q: Should the Z-Library collection be added to Library | Genesis?_ | | > _A: Yes! However, it is tricky. Library Genesis splits out | its collection between non-fiction and fiction. They also have | relatively high quality standards. If you are interested in | organizing all the books to meet their requirements, let us | know._ | krick wrote: | Oh, it didn't come up to me to navigate "backwards". Thanks. | Actually, their whole FAQ is quite more enlightening than | just the linked page. | | http://pilimi.org/faq.html | [deleted] | sacrosanct wrote: | This really needs to be hosted on a Tor hidden service. Clearnet | sites are easy to take down. | fsflover wrote: | Or, better on torrents inside I2P: http://geti2p.net. | progman32 wrote: | The homepage has a link to the onion site. | sacrosanct wrote: | So why does it have a clearnet address? To have more reach? | What's their threat model such that a clearnet presence could | possibly out the people behind this? | progman32 wrote: | I don't know. I'd guess reach. Probably wouldn't be on HN | if it were darknet only. | ipaddr wrote: | There is no ssl so no cert fingerprint shodan matching leak | metadat wrote: | Please don't flag it this time. Folks deserve the option to at | least be aware of it and decide for themselves if they wish to | pursue it. | [deleted] | yeetsFromHellL3 wrote: | AdmiralAsshat wrote: | HTTP-only makes me weary of visiting a self-professed piracy | site. | | They couldn't even spring for a Let's Encrypt cert? | 1vuio0pswjnm7 wrote: | https://web.archive.org/web/20220701204054if_/http://pilimi.... | mkl wrote: | *wary | | For some reason I'm seeing this mistake more and more lately. | https://en.wiktionary.org/wiki/weary vs | https://en.wiktionary.org/wiki/wary | nerdponx wrote: | I've seen it too, and it's a weird mistake to me because I am | certain that at least one of my friends who makes the mistake | did not make it in the past. | | I wonder if there's a kind of memetic effect happening | online, where people who lack confidence in their English | spelling ability see somebody make this mistake and somehow | think that it's correct, so they switch how they write it. | gjvc wrote: | cf. revert / reply | jamiek88 wrote: | People have mixed up leery and wary and it has memetically | become 'weary'. | | I notice it more and more too. | InCityDreams wrote: | No room for auto-correct? | brewdad wrote: | Even so, there are enough ESL readers on here, along with | native speakers who may not understand the difference, that | it makes sense to point it out once in a while. | | Otherwise we end up in lose/loose situation where I see | more people use it wrongly than correctly. | mkl wrote: | Seems unlikely, as "wary" appears to be the more common | word (71M Google results vs 55M). Maybe some keyboard | mistakenly does it though. | bbarnett wrote: | Not sure why the count is relevant here. Both are valid | words. | wccrawford wrote: | I've heard people _say_ it. People who I think should know | better. It 's really frustrating. | quazar wrote: | It's a read only blog with 2 pages. What do you gain for | putting this over HTTPS? | ac0lyte wrote: | Trust? | na85 wrote: | Malicious sites can still use letsencrypt | SquareWheel wrote: | Do you really want your ISP to know which piracy sites you | frequent? This is all being sent in plain text. Or they could | change the content, insert a redirect, or inject ads without | your knowledge. TLS is needed on all websites - not just | those with interaction. | cookiengineer wrote: | I hate to break it to you, but why do you think ISPs | override the DNS responses with TTL set to 0? | | TLS itself is only useful if you also rely on DNS over | HTTPS/TLS. Well, setting the issues with TLS 1.2 and | earlier aside. | SquareWheel wrote: | Those are problems too, but they aren't exploited nearly | as often as MITMing cleartext has been historically. The | solution you mention is already becoming widely- | supported, as are newer protocols like QUIC that | discourage snooping. | | There's no reason to ignore a good solution just because | it's not 100% perfect. | ars wrote: | They still know which sites you visit even with https. | kenniskrag wrote: | only the destination IP. TLS encryption is inside tcp and | around the http protocol. | hombre_fatal wrote: | You don't need to copy and paste your reply everywhere | it's relevant on HN. Even us flea brains can carry your | remarks in our head and apply them to similar comments. | geoffeg wrote: | And, unless you setup an appropriate DNS server and the | default from your ISP, then they also know that you | looked up the site's hostname(s). | generalizations wrote: | https won't keep your ISP from knowing you visited the | site. And the rest of those? For a text-only blog, they | seem kinda trivial. | simlevesque wrote: | > https won't keep your ISP from knowing you visited the | site | | If you use DoH, yes it does. Unless I'm mistaken. They | only know the IP address of the remote server. | kevin_thibedeau wrote: | And nobody would ever think of keeping a reverse DNS | index. | Anunayj wrote: | and the SNI, until ECH is widely adopted, SNI is leaked | in plaintext when connecting to a server, it needs to | because how will the server know which TLS cert to reply | with? | cookiengineer wrote: | > They only know the IP address of the remote server. | | It's the internet. Everyone can scrape links and | measure/correlate which assets were on them to correlate | likely visited websites. | | Especially if every web page these days is pretty unique | in terms of what kind of assets (network streams) with | what kind of byte size were loaded at which point in the | document loading timeline. | | Now include the TLS fingerprint of your web browser and | well, privacy went to shit. | | HTTP needs an upgrade with scattering and rerouting on | the fly, otherwise these deanonymization techniques can | never be fixed. | 1vuio0pswjnm7 wrote: | Not disagreeing but presenting a hypothetical: | | If the user requests the page from Internet Archive, | Common Crawl or even Google Cache, how does the ISP know | what the user requested. (NB. Neither IA nor Google Cache | require sending SNI,^1 so the ISP may only see IP | addresses). | | With IA, the IP address alone does not reveal which IA | site or page the user is requesting. There is more to IA | than only Wayback Machine. | | With Common Crawl, the user can send the Cloudfront | domain name instead of a commoncrawl.org domain. Are all | ISPs going to know that this is Common Crawl. Even if | they expend the effort to learn, what benefit is | achieved. | | With Google Cache, the IP address alone does not reveal | which Google site the user is accessing. Needless to say, | there are many, many domains using these IP addresses. | | There is nothing that requires any web user to retrieve | web pages from a given host. The page may be mirrored at | a number of hosts. Some of those hosts might offer HTTPS, | support TLS1.3 and not require plaintext SNI/offer | encrypted ClientHello. | | Even assuming an ISP can determine what domain name a | customer is sending in a Host header or ClientHello | packet, it would still be necessary to subpoena the | archive/CDN/cache to figure out precisely what pages were | being requested. | | 1. The same party is controlling all the server | certificates. IA controls the certificates for all IA | domains, Amazon (issues and) controls all the | certificates for Cloudfront customers and Google controls | all the certificates for Google domains. Perhaps there | are web users commenting on HN who believe that | ingress/egress traffic for site saved/hosted/cached at an | archive/CDN/cache is somehow private as against the | company running the archive/CDN/cache in a meaningful | way. I am not one of them. | | As for the question of an ISP modifying the contents of | web pages, this is an issue that could be addressed | contractually in a subscriber agreement. It stands to | reason that if this was a serious issue and not merely a | hypothetical one raised by nerds debating the merits of | TLS then it would be addressed in such agreements. | | As for the "injection of advertising" issue as a argument | in favour of the way TLS^2 is being administered on the | web, IMO this is a bit silly since (a) it is trivial to | filter such advertising (e.g., Javascript in the examples | I saw) out out of the page and/or block it from | running/connecting/loading and (b) the amount of "tech" | company-mediated advertising that web users endure in | spite of using TLS is enormous. More likely than being | seen as a threat to web users, the injection of | advertising by ISPs was seen as a threat to the | advertising revenue of "tech" companies. The later are | responsible for facilitating the injection of advertising | (by their customers, not their competitors, i.e., ISPs), | not preventing it. | | 2. By "TLS administration" I do not mean encryption as a | concept nor certificates as a concept. I mean TLS | administration measures designed to support "tech" | companies first and web users second, if at all. A system | where the questions of "threat model" and "trust" are | both decided by "tech" companies not users. | robonerd wrote: | > _For a text-only blog_ | | If somebody MITMs it, they can serve you anything they | want. | enriquto wrote: | > they can serve you anything they want. | | Great. More books! | | No really, I don't understand this argument. A static | site served by plain http is perfectly appropriate. It's | like a poster hanging on the wall for all to see. Of | course people can paint over it, but it doesn't really | matter. | robonerd wrote: | They could serve you javascript that exploits your | browser. At the very least, they could replace that | bitcoin donation address with their own. That's a | tempting target if nothing else. | SquareWheel wrote: | And "they" isn't just your ISP. It's also that free wifi | hotspot you connected to, or the hotel service, or your | company's network. Even if you trust your ISP (and you | probably shouldn't), there are other bad actors to be | aware of. | criddell wrote: | HTTP connections can be used as a weapon against others. | One example is China's Great Cannon. | | https://citizenlab.ca/2015/04/chinas-great-cannon/ | kenniskrag wrote: | only the destination IP. TLS encryption is inside tcp and | around the http protocol. | Nextgrid wrote: | Until ESNI becomes mainstream (and browsers offer the | ability to enforce it), the domain name is also sent out | in plaintext. | syntheticcorp wrote: | ESNI has been dropped, a new spec alters how it works and | renames it Encrypted client hello (ECH) | | https://blog.mozilla.org/security/2021/01/07/encrypted- | clien... | btdmaster wrote: | ECH looks quite interesting, but isn't it quite easy to | do a reverse DNS lookup for most domains? | notriddle wrote: | The same thing you always get. Assurance that your free Wi-Fi | hotspot isn't tampering with the page. | krick wrote: | This, and also MITM (like ISP) needs to make their own | request to this site, to know what I read. And, | technically, they cannot really be sure it's what I've | read, since nothing says that this site is static. | | I'm not that offended, and torrents are only available via | TOR anyway, but I do actually appreciate the sentiment. | There's no reason to be not using TLS. | uniqueuid wrote: | It's really funny to think about how the advances of technology | keeps changing how we perceive books. | | 7TB is even a commodity disk these days. And it's a lot less than | the torrent of scientific papers that floated around some time | ago (that was ~18TB IIRC). | nonrandomstring wrote: | I foresee storage density reaching the point that for most | ordinary people "online" becomes rather unimportant. What would | be the effects of technology when computers behave as in early | science fiction, as stand-alone oracles? [1] | | [1] | https://www.timeshighereducation.com/opinion/2048-informatio... | themodelplumber wrote: | How would the appeal of streamers and live data/content | settle out in that case? Sometimes context is available in | the moment that makes it easier for all parties to consume | and analyze in that moment as well. | | Since transient, ethereal meme culture is also basically | emergent culture now, it's difficult not to also foresee a | greater cultural divide in such a case. This is saying | nothing of live data tools as well, even weather data... | ars wrote: | It's 7TB compressed. If it's text you'd need about 70TB to | decompress it. It's probably mostly images though, so probably | not quite that bad. | emj wrote: | I've tried to do lossy compression of epubs with some lines | of bash scripts; i.e. removing the images and fonts that were | not needed. Many epubs could be downsized to a third of their | size, but then I found a book that needed the supplied fonts | and gave up. When doing lossy compressions can not have those | kind of bugs. | | What I also found was that many of the images in the epubs | were already unuseable and nothing like their counter parts | in phsyical books. | solarmist wrote: | I don't understand this. Are they epubs of comics or | something? Epubs are already compressed (zip). | [deleted] | robonerd wrote: | It's not terribly uncommon to find an epub with several | megabytes of cover art and a few hundred kilobytes of | text. | generalizations wrote: | I wonder if there are any search engines dedicated to indexing | these kinds of libraries. I know there's a decent one just for | scihub, but it would be awesome if I could do a Google-style | search that returned the contents of books, magazines and journal | articles instead of just websites. | delusional wrote: | Wasn't that what google books was supposed to be? | voisin wrote: | I assume Google abandoned this along with all their earlier | mission statements in favour of building another chat app. | bbarnett wrote: | I can't imagine a life at google. So much promise, all | turned to ash. | lupire wrote: | Could you take a moment to check it Google Books search | still exists? | | I'll give you a hint: https://books.google.com/?hl=en | | > Search the world's most comprehensive index of full-text | books. | voisin wrote: | I know it exists, but it has appeared to languish for | years. They rely on third parties now for inclusion of | books, whereas in the early days they innovated on their | own with specialized scanning technology. They seemed | quite proud of it a decade ago. When was the last time | Google has touted their books project? Have they even | integrated searching books into their main search (which | was supposed to catalogue And make searchable all the | world's information)? | | It hasn't been killed but it is clearly a zombie. | londons_explore wrote: | It was paralyzed by legal disputes with book publishers. | | In the years the lawsuits were going on, nearly everyone | left the project. And then the lawyers have put in so | many red lines that it's nearly impossible to make any | changes to it. | zozbot234 wrote: | Book metadata is widely available via sites like e.g. Open | Library. With good metadata, full text search is not as | relevant. | aaron695 wrote: | sacrosanct wrote: | There is the Imperial Library of Trantor: https://trantor.is/ | | They offer a clearnet and a hidden service .onion incase you | don't want ISPs blocking access to it. | lupire wrote: | books.google.com | 2OEH8eoCRo0 wrote: | Your movie recommendations suck, ValentineC ;) | ars wrote: | 7TB of compressed text? I don't think humanity has generated that | much written words in it's entire existence. Although it would be | an interesting Fermi Problem to estimate (and don't forget just | how well text compresses). | | This has to be a lot of duplicates or bad formats (images). This | would be far more useful to people with some curating. | khazhoux wrote: | These are most likely PDFs, not plaintext. | [deleted] | Larrikin wrote: | Is there an index of what's included. 7 tb is alot to ask for | simply upholding an ideal | robonerd wrote: | It apparently comes with a index in the form of a MySQL | database that contains title, author, description, and | filetype. | robonerd wrote: | > _We will release the data in stages, as we are still processing | the files. Right now the metadata file and a few of the torrents | are available. Note that the torrent files are only available | through our TOR mirror._ | | Presently, only the first four of several dozens of parts are | available. | dkjaudyeqooe wrote: | To be fair, Z-Library doesn't charge unless you want to download | more than 10 books per 24 hour period. That's per account and | although they ask you not to open multiple accounts they don't | seem to do anything to stop you. | PostOnce wrote: | What kind of fairness can there be in charging for stolen | books? | | I believe in free access to education, but charging for these | books they have no rights to is a whole other thing. | mdp2021 wrote: | Running costs. | Sparkle-san wrote: | It would depend on how they use the funds. I wouldn't be | surprised if bandwidth expenses made up a majority of what it | cost to run Z-Library and that money has to come from | somewhere. | II2II wrote: | At this point is worth noting that there are reputable | sources for free books, such as public libraries and Project | Gutenberg. | | I realize that neither source will satisfy many of the people | on HN, simply because there is a need for current technical | books. | fancyfredbot wrote: | You don't have to pay them if you go steal them yourself. If | you find it more convenient to pay another thief to do it for | you then I don't think that's significantly less fair. | UmbertoNoEco wrote: | All those India/Malasya/Guatemala kids stealing a 200 USD | PDE book from Pearson should go to jail... in America... | Assange style | KMnO4 wrote: | They're not charging for the books; they're charging for the | bandwidth. | dkjaudyeqooe wrote: | The "to be fair" is against the claim in the article that | they charge for books. | | It's a figure of speech, not a comment on what they do. | jl6 wrote: | Yeah, grey hat shenanigans easily become black hat as soon as | money is involved. | mgaunard wrote: | Technically, they're not charging for the books, they're | charging for the bandwidth. | krick wrote: | I don't quite agree. I mean, they provide useful service, and | it costs money to run it. It's ok that they earn (even if | it's actually making a profit, not just covering the costs). | | That being said, 10 downloads/day feels a bit restrictive to | me. I'd get if it was 100, or 50, heck, maybe even 20. I | mean, I don't appreciate that it's not mirrorable in the | first place, but maybe they cannot afford it, I don't know... | But 10 feels less than somebody researching a new topic might | need to access in a day, even if he won't read them all | immediately. | | ...That being said as well, it has some really nice UI. I | wish somebody did it for Libgen. | lupire wrote: | You read more than 10 books per day? | krick wrote: | Read -- no, I don't. Download to skim and see the | contents -- yes (even though I don't do it everyday, | obviously). In fact, I rarely download less than 4 books | at once, except for occasions when it's a new book of my | favorite writer (in which case I can as well just buy | it). Instead, there is some topic, some reason why I need | these books, and I somehow can gather a dozen of | recommendations, maybe more, then I need to actually get | a look inside of them, to see what I'll be reading (if | anything). It also happens that I kinda know the book, | but not precisely enough, because some authors really | like to milk the topic by publishing 5 books kinda the | same as the first successful one, and if they are | technical they can have 5 revisions each. I may not read | them at all, or I may be reading them during the whole | next year, but I'll need to get them all at once at | first. | | And if we also count papers, which this site provides too | -- easily. | swayvil wrote: | >What kind of fairness...? | | Well they are providing a great public service and their | system takes money to maintain. It's just a small fee. ___________________________________________________________________ (page generated 2022-07-03 23:00 UTC)