[HN Gopher] How to Download All of Wikipedia onto a USB Flash Drive
       ___________________________________________________________________
        
       How to Download All of Wikipedia onto a USB Flash Drive
        
       Author : bubblehack3r
       Score  : 69 points
       Date   : 2022-10-06 21:06 UTC (1 hours ago)
        
 (HTM) web link (planetofthepaul.com)
 (TXT) w3m dump (planetofthepaul.com)
        
       | PaulDavisThe1st wrote:
       | Can someone explain what the role of kiwix in all this, please?
        
       | [deleted]
        
       | londons_explore wrote:
       | Note that it's possible to make wikipedia substantially smaller
       | if you're happy to use more aggressive compression algorithms.
       | 
       | Kiwix divides the data into chunks and adds various indexes and
       | stuff to allow searching data and fast access, even on slow CPU
       | devices. But if you can live with slow loading, you can probably
       | halve the storage space required, or maybe more.
        
       | bombcar wrote:
       | Kiwix is great - I have a collection of various things from their
       | library https://library.kiwix.org/?lang=eng downloaded for when
       | I'm on a plane or the internet is otherwise unavailable.
       | 
       | That and the TeXlive PDF manuals can get me through anything.
        
         | daneel_w wrote:
         | I second Kiwix. I found out about it not too long ago on the
         | topic of portable Wikipedia readers. It really stands out as
         | the best software part of such a solution.
        
           | 23B1 wrote:
           | I third Kiwix. Immensely useful when I was deployed without
           | internet.
        
       | barbs wrote:
       | Is there a portable version of Kiwix? Would be cool if you could
       | plug the USB into any computer and start reading Wikipedia
       | without having to install anything.
        
         | tehnicaorg wrote:
         | Yes. You download a zip archive. Unpack from 121MB to 263MB,
         | and start the exe. (assuming you're using Windows)
        
       | orliesaurus wrote:
       | Oh wow, I thought this was gonna be a REALLY large file, but only
       | 95GB not bad, some worthless videogames are larger haha
        
         | bscphil wrote:
         | I was curious how they achieve this. It looks like the
         | underlying file format uses LZMA, or optionally Zstd,
         | compression. Both achieve pretty high compression ratios
         | against plain text and markup.
         | 
         | > Its file compression uses LZMA2, as implemented by the xz-
         | utils library, and, more recently, Zstandard. The openZIM
         | project is sponsored by Wikimedia CH, and supported by the
         | Wikimedia Foundation.
         | 
         | https://en.wikipedia.org/wiki/ZIM_(file_format)
        
         | keepquestioning wrote:
         | I remember the era of stupidly large games.
        
         | aendruk wrote:
         | Circa 2003 I carried around a pared down copy on a Pocket PC.
         | Dropping a few chosen categories (who needs Sports?) allowed it
         | to barely fit on a 1-GB SD card.
        
           | FeistySkink wrote:
           | People going back in time need sports. An almonac of some
           | kind.
        
       | yieldcrv wrote:
       | protip: you need to download wikipedia in other languages as well
       | 
       | they are not translations, they are completely different articles
       | under the name brand and platform of Wikipedia
       | 
       | an entry that may be just a blurb in English may be one of the
       | most comprehensive and fully fleshed out and researched entries
       | on the site in German, for example
        
       | thakoppno wrote:
       | Somewhere around the original ipad era, I believe there was a
       | curated subset of wikipedia articles that may have been called
       | something like Educator's Edition.
       | 
       | It worked offline and had images and I traveled to Peru with it
       | and learned so much. Does anyone remember this sort of thing?
       | 
       | I've tried wix formatted copies and they do work but the
       | experience on an offline ipad was simply better. Thanks in
       | advance.
        
         | Rediscover wrote:
         | Yes, I remember - I had a copy on an SD card on my OLPC.
         | 
         | I believed it morphed into "Wikipedia for Schools" ^0 -
         | possibly this ^1 is a comment about it?
         | 
         | 0:
         | https://en.m.wikipedia.org/wiki/Wikipedia:Wikipedia_for_Scho...
         | 
         | 1: https://www.speedofcreativity.org/2008/11/11/wikipedia-to-
         | go...
        
           | thehours wrote:
           | Tangent - I've noticed a lot more comments like this using
           | the "^0" syntax for citations vs the traditional "[0]" one
           | I've become accustomed to seeing on HN. Is there a real shift
           | happening here and, if so, why?
        
             | ashraful wrote:
             | maybe: https://github.blog/changelog/2021-09-30-footnotes-
             | now-suppo...
        
               | teh_klev wrote:
               | Checking to see if supported on HN [^1]
               | 
               | Edit: nope :)
               | 
               | [^1]: https://github.blog/changelog/2021-09-30-footnotes-
               | now-suppo...
        
             | teh_klev wrote:
             | It's a bit non-standard, and if it's trying to follow the
             | wikipedia citation style then it's the wrong way round.
        
       | pupppet wrote:
       | Can anyone recommend a hardy device for viewing the content? As
       | nutty as it sounds, in some post-apocalyptic world it would sure
       | be nice to have. I'd keep it under the bed just in case..
        
         | bryanlarsen wrote:
         | There used to be one, maybe you can find one somewhere.
         | 
         | https://en.wikipedia.org/wiki/WikiReader
        
         | bombcar wrote:
         | Honestly a generic PC would probably be best, because it may be
         | a bit harder to find power, etc, but you will have infinite
         | amounts of replacement parts.
        
         | c7b wrote:
         | Have you looked at e-Ink readers?
        
         | IggleSniggle wrote:
         | Print it out on paper, small but legible font.
        
           | teh_klev wrote:
           | Someone did actually print out and bind Wikipedia in 2015:
           | 
           | https://en.wikipedia.org/wiki/Print_Wikipedia
        
         | SahAssar wrote:
         | If you follow the logic that anything is at about half its life
         | that would probably be an older thinkpad laptop, like an x61 or
         | x200. If you are willing to spend the money on something newer
         | perhaps a thoughbook. I have a modded kobo ebook reader (I
         | upgraded mine to 256GB storage and have project gutenberg,
         | wikipedia and a few other things on it) with a good solar
         | powerbank.
        
           | bscphil wrote:
           | > If you follow the logic that anything is at about half its
           | life
           | 
           | I don't think that makes any sense. By that logic any
           | currently working device should be assumed to last another
           | $currentlifetime. My 20 year old car is not gonna last
           | another 20 years. My 10 year old laptop won't last another
           | 10. If my car somehow _did_ last another 20 years, it would
           | not then make sense to assume it would still be running in
           | another 40.
           | 
           | Makes more sense to look at all objects of the same class. If
           | 75% of laptops are dead in 10 years and 95% are dead in 15,
           | and your laptop is 10 years old, you can infer that 5 out of
           | 25 surviving laptops will make it another 5 years, or 20%.
           | (These numbers completely made up, just an example.)
        
             | ScottEvtuch wrote:
             | I think the idea of "everything is about half its life" is
             | to account for survivorship bias in longevity. The only
             | units that make it to the 95th percentile lifetimes clearly
             | got luckier with parts and can reasonably be expected to
             | last longer.
        
               | sgerenser wrote:
               | Reliability of most complicated devices (cars,
               | electronics) is usually thought to follow a "bathtub
               | curve." Some early mortality due to defective parts or
               | manufacturing defects, a long trough of reliability from
               | say, 1-10 years, then a rapid rise in failures due to
               | aging. "Everything at half life" is a pretty bad
               | approximation of this.
        
         | seba_dos1 wrote:
         | https://en.wikipedia.org/wiki/WikiReader ? ;)
        
       | colordrops wrote:
       | Is there a way to keep a mirror that stays in sync?
        
         | doomrobo wrote:
         | It looks like Kiwix uses the ZIM file format, which appears to
         | have diffing support [0] (see zimdiff and zimpatch). That said,
         | it doesn't look like Kiwix actually publishes those diffs.
         | 
         | [0] https://github.com/openzim/zim-tools/tree/master/src
        
       | blue1 wrote:
       | Does it include the images or it's just the text?
        
         | 0x073 wrote:
         | Yes, with images but only english
         | 
         | All possible dumps:
         | https://dumps.wikimedia.org/other/kiwix/zim/wikipedia/
        
         | [deleted]
        
       | sprash wrote:
       | Is there something similar for Stack Overflow?
        
         | Jun8 wrote:
         | https://library.kiwix.org/?lang=eng&category=stack_exchange
        
         | [deleted]
        
         | ankaAr wrote:
         | Kiwix can do that also. You needs to specify the ZIM file and
         | it works:
         | 
         | https://wiki.kiwix.org/wiki/Content_in_all_languages
         | 
         | Why I know that? I wanted to travel as system administrator in
         | some antartica base with a whole copy of stackoverflow with me.
        
       | sqrt_1 wrote:
       | Article mentions to format to exFat as NTFS has a 4GB limit - I
       | don't think that is true.
        
         | Wingman4l7 wrote:
         | It's not -- FAT32 is the one with the 4GB limit. NTFS has much
         | less native support on Macs than exFAT, though.
        
         | aaron695 wrote:
        
         | [deleted]
        
       | kloch wrote:
       | I wonder if there is an offline backup of Wikipedia on ISS? There
       | should be. And on every manned space mission.
        
         | Dig1t wrote:
         | Why not just every space mission, period?
        
           | vorpalhex wrote:
           | Well the robots don't read too well..
        
           | Rebelgecko wrote:
           | How much would the science capabilities of a telescope like
           | JWST be reduced if 1/3 of its SSD was repurposed for storing
           | the latest wikipedia dump (that 1/3 number is assuming it's
           | only English, compressed, and without images)? To me that
           | seems like an easy cost/benefit analysis.
        
         | bagels wrote:
         | Why should there be?
        
           | mhh__ wrote:
           | The next Apollo 13 will probably be a software problem ,
           | doesn't hurt if they can read up about it
        
             | tablespoon wrote:
             | > The next Apollo 13 will probably be a software problem ,
             | doesn't hurt if they can read up about it
             | 
             | What good would an "offline backup of Wikipedia" do in that
             | situation?
             | 
             | Wikipedia is good for one thing, and one thing only:
             | getting some cursory knowledge on a topic you're unfamiliar
             | with. It's the tourist map to the "sum of all human
             | knowledge." If you expect to use it for anything else,
             | you're asking too much of it.
        
             | PaulDavisThe1st wrote:
             | So, stackoverflow, not wikipedia, then?
        
       ___________________________________________________________________
       (page generated 2022-10-06 23:00 UTC)