[HN Gopher] The Overflow Offline project
       ___________________________________________________________________
        
       The Overflow Offline project
        
       Author : donutshop
       Score  : 209 points
       Date   : 2022-10-20 13:13 UTC (9 hours ago)
        
 (HTM) web link (stackoverflow.blog)
 (TXT) w3m dump (stackoverflow.blog)
        
       | maw wrote:
       | To me this basically seems like boat programming made
       | respectable.
       | 
       | Of course, if you asked me, it always was. You couldn't assume
       | great connectivity then and you often still can't today.
        
         | Spivak wrote:
         | My favorite consultant I ever worked with was a boat
         | programmer. You hired him for super specialized MySQL magics so
         | he, well his company, charged a pretty substantial hourly rate
         | and he apparently had enough revenue/leverage to get his
         | company to foot the bill for two separate satellite internet
         | connections on his boat. I feel like I would get lonely but
         | it's definitely a vibe.
        
         | pjmlp wrote:
         | To me this is programming during the first 10 years where the
         | "Internet" were local BBS, magazines spoke about Compuserve and
         | Prodigy, and the connection rates where impossible, so we had
         | to get by with what came on magazines and local library.
        
       | jhgkjhlkhjkljk wrote:
       | I assume this is so people can train AI on it. It's just hard to
       | say that outright because some people don't like the idea.
        
         | speedgoose wrote:
         | It was already possible to download dumps since a long time.
         | 
         | https://archive.org/details/stackexchange
        
       | mkathuri wrote:
       | Nice to find Kiwix again. Shameless plug, I made my own Kiwix
       | alternative for macOS: https://github.com/technusm1/kiwings
        
         | orblivion wrote:
         | So this is a desktop app, but it uses the server as part of it?
         | The normal Kiwix desktop client doesn't do that right?
         | 
         | I'll throw in my own shameless plug: Self-host your Stack
         | Overflow, Wikipedia etc on Sandstorm:
         | https://apps.sandstorm.io/app/5uh349d0kky2zp5whrh2znahn27gwh...
         | Obviously uses kiwix-serve as well. 3 years old, I need to make
         | a better clip for updating it.
        
       | ComodoHacker wrote:
       | They could actually try to build a Copilot competitor off their
       | data. /s
        
         | VoidWhisperer wrote:
         | It would be interesting to see how many times a copilot
         | competitor trained off it gave correct code vs wrong code for a
         | given case
        
           | mdaniel wrote:
           | I would suspect that would differ whether it was trained on
           | the question's code versus any accepted answer's (or most
           | upvotes?) code
        
         | mdaniel wrote:
         | I see the "/s" but I actually do wonder if integrating the
         | "prompt" behavior into the _question box_ would help cut down
         | on the absolutely staggering number of duplicate questions.
         | Regrettably, I 'm not enough of a GPT expert to know what
         | percentage of the time it would generate gibberish thus making
         | the duplication question problem _worse_
        
       | cee_el123 wrote:
       | This is a amazing dose of humility.
        
       | gragundier wrote:
       | I've always wondered if we could force web apps into some sort of
       | "default" offline mode with like some offline://url.here . Very
       | cool of overflow.
        
         | throwoutway wrote:
         | I like this idea; I wonder if there's a way to get Firefox to
         | support this via the settings. There's already support for
         | file:/// ftp:// etc
        
       | txtai wrote:
       | There was a recent HN Post for codequestion which builds an
       | offline semantic index (using https://github.com/neuml/txtai) on
       | the archive.org Stack Overflow dumps -
       | https://news.ycombinator.com/item?id=33110219
       | 
       | GitHub: https://github.com/neuml/codequestion
       | 
       | Article: https://medium.com/neuml/find-answers-with-
       | codequestion-2-0-...
        
       | xd1936 wrote:
       | Love this. Reminds me of the other Kiwix projects to make
       | MediaWiki services like Wikipedia available offline[1]. The
       | entirety of English Wikipedia is ~50GB of text and ~100GB of
       | images.
       | 
       | 1. https://wiki.kiwix.org/
        
         | 7373737373 wrote:
         | I feel like their homepage could be greatly improved. It
         | doesn't really make obvious what great capability it provides
        
       | jokoon wrote:
       | I already downloaded documentations, like the python api, or the
       | cpp preference website as a pdf or html archive.
       | 
       | I don't know if it's available for html or js or css, or opengl.
        
         | they4kman wrote:
         | https://devdocs.io/ exposes a huge catalog of indexed and
         | searchable collections of documentation for a wide variety of
         | languages, libraries, and subjects, including HTML, JS, and CSS
         | - though, the only GL I see is WebGL - and _all_ of it can be
         | downloaded to an IndexedDB for offline use.
         | 
         | It's been a very handy tool in my toolbelt.
        
         | eternauta3k wrote:
         | You should check out Zeal, it's an offline documentation
         | browser with existing documentation packages for HTML and a
         | whole bunch of things
         | 
         | https://zealdocs.org/
        
       | 7373737373 wrote:
       | This is great! Too many services today become completely unusable
       | when they encounter technical problems, are hacked or are just
       | lost over time. Having an easily accessible offline copy is
       | always reassuring, showing that their survivability does not
       | depend on just a few people and the projects are fundamentally
       | about the information, not an organization.
        
       | sytse wrote:
       | You can also run FreeCodeCamp locally
       | https://github.com/freeCodeCamp/freeCodeCamp/blob/main/docs/...
       | 
       | And I funded to work to run that on an Android phone
       | https://play.google.com/store/apps/details?id=space.atrailin...
        
       | iib wrote:
       | I remember already being able to use certain stackexchanges with
       | kiwix before, as well as the arch wiki, wikipedia without images,
       | and some other great resources. It is nice to see that they
       | actually pay attention to this use-case and I look forward to
       | updated workflows with kiwix or similar in the future. Latency is
       | way better that way, even with good and stable internet.
       | OpenZIM[1] is also useful in turning any page for use with kiwix.
       | 
       | I also have great memories from a University exam where we were
       | allowed to have laptops that were not connected to the internet.
       | 
       | [1] https://wiki.openzim.org/wiki/OpenZIM
        
         | benpopper1 wrote:
         | What was test score ;)
        
       | throwoutway wrote:
       | This is awesome! At first I thought this only supported Stack
       | Overflow and not the other 170+ StackExchange forums, but it
       | looks like it does (or will?). From the blog:
       | 
       | > "We built the Sotoki (Stack Overflow to Kiwix) scraper in such
       | a way that it can capture each and every one of the 180 Stack
       | Exchange websites."
       | 
       | Unclear to me if "can" means "does" or "will soon" or just
       | "could"
        
         | benpopper1 wrote:
         | It already does - everything from the technical stack exchanges
         | to the sites on cooking and gardening :)
        
       | polarix wrote:
       | This has been available for a while but it's great to see some
       | acknowledgement especially since the most recent data set was
       | stuck in 2019 for a while.
       | 
       | Here are the datasets:
       | http://download.kiwix.org/zim/stack_exchange/
       | 
       | It's not clear to me why the data set shrank between 2019/3 and
       | 2022/6; was something excluded? Compression improvements?
       | 
       | > stackoverflow.com_en_all_2019-02.zim 2019-03-12 19:53 134G
       | 
       | > stackoverflow.com_en_all_2022-05.zim 2022-06-17 12:36 75G
        
         | FinnLeSueur wrote:
         | The article states:
         | 
         | > ... to ensure that an up-to-date version of our dataset is
         | easily available for those who need it, and will work to
         | improve its readability and reduce its size so there is less
         | friction for end users...
        
         | gernb wrote:
         | The data isn't stuck. The data is available here
         | 
         | https://archive.org/details/stackexchange
         | 
         | It's the "official" place to get the data
         | 
         | I've download it several times and extracted my own
         | contributions.
        
       ___________________________________________________________________
       (page generated 2022-10-20 23:00 UTC)