[HN Gopher] How to Download All of Wikipedia onto a USB Flash Drive ___________________________________________________________________ How to Download All of Wikipedia onto a USB Flash Drive Author : bubblehack3r Score : 69 points Date : 2022-10-06 21:06 UTC (1 hours ago) (HTM) web link (planetofthepaul.com) (TXT) w3m dump (planetofthepaul.com) | PaulDavisThe1st wrote: | Can someone explain what the role of kiwix in all this, please? | [deleted] | londons_explore wrote: | Note that it's possible to make wikipedia substantially smaller | if you're happy to use more aggressive compression algorithms. | | Kiwix divides the data into chunks and adds various indexes and | stuff to allow searching data and fast access, even on slow CPU | devices. But if you can live with slow loading, you can probably | halve the storage space required, or maybe more. | bombcar wrote: | Kiwix is great - I have a collection of various things from their | library https://library.kiwix.org/?lang=eng downloaded for when | I'm on a plane or the internet is otherwise unavailable. | | That and the TeXlive PDF manuals can get me through anything. | daneel_w wrote: | I second Kiwix. I found out about it not too long ago on the | topic of portable Wikipedia readers. It really stands out as | the best software part of such a solution. | 23B1 wrote: | I third Kiwix. Immensely useful when I was deployed without | internet. | barbs wrote: | Is there a portable version of Kiwix? Would be cool if you could | plug the USB into any computer and start reading Wikipedia | without having to install anything. | tehnicaorg wrote: | Yes. You download a zip archive. Unpack from 121MB to 263MB, | and start the exe. (assuming you're using Windows) | orliesaurus wrote: | Oh wow, I thought this was gonna be a REALLY large file, but only | 95GB not bad, some worthless videogames are larger haha | bscphil wrote: | I was curious how they achieve this. It looks like the | underlying file format uses LZMA, or optionally Zstd, | compression. Both achieve pretty high compression ratios | against plain text and markup. | | > Its file compression uses LZMA2, as implemented by the xz- | utils library, and, more recently, Zstandard. The openZIM | project is sponsored by Wikimedia CH, and supported by the | Wikimedia Foundation. | | https://en.wikipedia.org/wiki/ZIM_(file_format) | keepquestioning wrote: | I remember the era of stupidly large games. | aendruk wrote: | Circa 2003 I carried around a pared down copy on a Pocket PC. | Dropping a few chosen categories (who needs Sports?) allowed it | to barely fit on a 1-GB SD card. | FeistySkink wrote: | People going back in time need sports. An almonac of some | kind. | yieldcrv wrote: | protip: you need to download wikipedia in other languages as well | | they are not translations, they are completely different articles | under the name brand and platform of Wikipedia | | an entry that may be just a blurb in English may be one of the | most comprehensive and fully fleshed out and researched entries | on the site in German, for example | thakoppno wrote: | Somewhere around the original ipad era, I believe there was a | curated subset of wikipedia articles that may have been called | something like Educator's Edition. | | It worked offline and had images and I traveled to Peru with it | and learned so much. Does anyone remember this sort of thing? | | I've tried wix formatted copies and they do work but the | experience on an offline ipad was simply better. Thanks in | advance. | Rediscover wrote: | Yes, I remember - I had a copy on an SD card on my OLPC. | | I believed it morphed into "Wikipedia for Schools" ^0 - | possibly this ^1 is a comment about it? | | 0: | https://en.m.wikipedia.org/wiki/Wikipedia:Wikipedia_for_Scho... | | 1: https://www.speedofcreativity.org/2008/11/11/wikipedia-to- | go... | thehours wrote: | Tangent - I've noticed a lot more comments like this using | the "^0" syntax for citations vs the traditional "[0]" one | I've become accustomed to seeing on HN. Is there a real shift | happening here and, if so, why? | ashraful wrote: | maybe: https://github.blog/changelog/2021-09-30-footnotes- | now-suppo... | teh_klev wrote: | Checking to see if supported on HN [^1] | | Edit: nope :) | | [^1]: https://github.blog/changelog/2021-09-30-footnotes- | now-suppo... | teh_klev wrote: | It's a bit non-standard, and if it's trying to follow the | wikipedia citation style then it's the wrong way round. | pupppet wrote: | Can anyone recommend a hardy device for viewing the content? As | nutty as it sounds, in some post-apocalyptic world it would sure | be nice to have. I'd keep it under the bed just in case.. | bryanlarsen wrote: | There used to be one, maybe you can find one somewhere. | | https://en.wikipedia.org/wiki/WikiReader | bombcar wrote: | Honestly a generic PC would probably be best, because it may be | a bit harder to find power, etc, but you will have infinite | amounts of replacement parts. | c7b wrote: | Have you looked at e-Ink readers? | IggleSniggle wrote: | Print it out on paper, small but legible font. | teh_klev wrote: | Someone did actually print out and bind Wikipedia in 2015: | | https://en.wikipedia.org/wiki/Print_Wikipedia | SahAssar wrote: | If you follow the logic that anything is at about half its life | that would probably be an older thinkpad laptop, like an x61 or | x200. If you are willing to spend the money on something newer | perhaps a thoughbook. I have a modded kobo ebook reader (I | upgraded mine to 256GB storage and have project gutenberg, | wikipedia and a few other things on it) with a good solar | powerbank. | bscphil wrote: | > If you follow the logic that anything is at about half its | life | | I don't think that makes any sense. By that logic any | currently working device should be assumed to last another | $currentlifetime. My 20 year old car is not gonna last | another 20 years. My 10 year old laptop won't last another | 10. If my car somehow _did_ last another 20 years, it would | not then make sense to assume it would still be running in | another 40. | | Makes more sense to look at all objects of the same class. If | 75% of laptops are dead in 10 years and 95% are dead in 15, | and your laptop is 10 years old, you can infer that 5 out of | 25 surviving laptops will make it another 5 years, or 20%. | (These numbers completely made up, just an example.) | ScottEvtuch wrote: | I think the idea of "everything is about half its life" is | to account for survivorship bias in longevity. The only | units that make it to the 95th percentile lifetimes clearly | got luckier with parts and can reasonably be expected to | last longer. | sgerenser wrote: | Reliability of most complicated devices (cars, | electronics) is usually thought to follow a "bathtub | curve." Some early mortality due to defective parts or | manufacturing defects, a long trough of reliability from | say, 1-10 years, then a rapid rise in failures due to | aging. "Everything at half life" is a pretty bad | approximation of this. | seba_dos1 wrote: | https://en.wikipedia.org/wiki/WikiReader ? ;) | colordrops wrote: | Is there a way to keep a mirror that stays in sync? | doomrobo wrote: | It looks like Kiwix uses the ZIM file format, which appears to | have diffing support [0] (see zimdiff and zimpatch). That said, | it doesn't look like Kiwix actually publishes those diffs. | | [0] https://github.com/openzim/zim-tools/tree/master/src | blue1 wrote: | Does it include the images or it's just the text? | 0x073 wrote: | Yes, with images but only english | | All possible dumps: | https://dumps.wikimedia.org/other/kiwix/zim/wikipedia/ | [deleted] | sprash wrote: | Is there something similar for Stack Overflow? | Jun8 wrote: | https://library.kiwix.org/?lang=eng&category=stack_exchange | [deleted] | ankaAr wrote: | Kiwix can do that also. You needs to specify the ZIM file and | it works: | | https://wiki.kiwix.org/wiki/Content_in_all_languages | | Why I know that? I wanted to travel as system administrator in | some antartica base with a whole copy of stackoverflow with me. | sqrt_1 wrote: | Article mentions to format to exFat as NTFS has a 4GB limit - I | don't think that is true. | Wingman4l7 wrote: | It's not -- FAT32 is the one with the 4GB limit. NTFS has much | less native support on Macs than exFAT, though. | aaron695 wrote: | [deleted] | kloch wrote: | I wonder if there is an offline backup of Wikipedia on ISS? There | should be. And on every manned space mission. | Dig1t wrote: | Why not just every space mission, period? | vorpalhex wrote: | Well the robots don't read too well.. | Rebelgecko wrote: | How much would the science capabilities of a telescope like | JWST be reduced if 1/3 of its SSD was repurposed for storing | the latest wikipedia dump (that 1/3 number is assuming it's | only English, compressed, and without images)? To me that | seems like an easy cost/benefit analysis. | bagels wrote: | Why should there be? | mhh__ wrote: | The next Apollo 13 will probably be a software problem , | doesn't hurt if they can read up about it | tablespoon wrote: | > The next Apollo 13 will probably be a software problem , | doesn't hurt if they can read up about it | | What good would an "offline backup of Wikipedia" do in that | situation? | | Wikipedia is good for one thing, and one thing only: | getting some cursory knowledge on a topic you're unfamiliar | with. It's the tourist map to the "sum of all human | knowledge." If you expect to use it for anything else, | you're asking too much of it. | PaulDavisThe1st wrote: | So, stackoverflow, not wikipedia, then? ___________________________________________________________________ (page generated 2022-10-06 23:00 UTC)