[HN Gopher] The Overflow Offline project ___________________________________________________________________ The Overflow Offline project Author : donutshop Score : 209 points Date : 2022-10-20 13:13 UTC (9 hours ago) (HTM) web link (stackoverflow.blog) (TXT) w3m dump (stackoverflow.blog) | maw wrote: | To me this basically seems like boat programming made | respectable. | | Of course, if you asked me, it always was. You couldn't assume | great connectivity then and you often still can't today. | Spivak wrote: | My favorite consultant I ever worked with was a boat | programmer. You hired him for super specialized MySQL magics so | he, well his company, charged a pretty substantial hourly rate | and he apparently had enough revenue/leverage to get his | company to foot the bill for two separate satellite internet | connections on his boat. I feel like I would get lonely but | it's definitely a vibe. | pjmlp wrote: | To me this is programming during the first 10 years where the | "Internet" were local BBS, magazines spoke about Compuserve and | Prodigy, and the connection rates where impossible, so we had | to get by with what came on magazines and local library. | jhgkjhlkhjkljk wrote: | I assume this is so people can train AI on it. It's just hard to | say that outright because some people don't like the idea. | speedgoose wrote: | It was already possible to download dumps since a long time. | | https://archive.org/details/stackexchange | mkathuri wrote: | Nice to find Kiwix again. Shameless plug, I made my own Kiwix | alternative for macOS: https://github.com/technusm1/kiwings | orblivion wrote: | So this is a desktop app, but it uses the server as part of it? | The normal Kiwix desktop client doesn't do that right? | | I'll throw in my own shameless plug: Self-host your Stack | Overflow, Wikipedia etc on Sandstorm: | https://apps.sandstorm.io/app/5uh349d0kky2zp5whrh2znahn27gwh... | Obviously uses kiwix-serve as well. 3 years old, I need to make | a better clip for updating it. | ComodoHacker wrote: | They could actually try to build a Copilot competitor off their | data. /s | VoidWhisperer wrote: | It would be interesting to see how many times a copilot | competitor trained off it gave correct code vs wrong code for a | given case | mdaniel wrote: | I would suspect that would differ whether it was trained on | the question's code versus any accepted answer's (or most | upvotes?) code | mdaniel wrote: | I see the "/s" but I actually do wonder if integrating the | "prompt" behavior into the _question box_ would help cut down | on the absolutely staggering number of duplicate questions. | Regrettably, I 'm not enough of a GPT expert to know what | percentage of the time it would generate gibberish thus making | the duplication question problem _worse_ | cee_el123 wrote: | This is a amazing dose of humility. | gragundier wrote: | I've always wondered if we could force web apps into some sort of | "default" offline mode with like some offline://url.here . Very | cool of overflow. | throwoutway wrote: | I like this idea; I wonder if there's a way to get Firefox to | support this via the settings. There's already support for | file:/// ftp:// etc | txtai wrote: | There was a recent HN Post for codequestion which builds an | offline semantic index (using https://github.com/neuml/txtai) on | the archive.org Stack Overflow dumps - | https://news.ycombinator.com/item?id=33110219 | | GitHub: https://github.com/neuml/codequestion | | Article: https://medium.com/neuml/find-answers-with- | codequestion-2-0-... | xd1936 wrote: | Love this. Reminds me of the other Kiwix projects to make | MediaWiki services like Wikipedia available offline[1]. The | entirety of English Wikipedia is ~50GB of text and ~100GB of | images. | | 1. https://wiki.kiwix.org/ | 7373737373 wrote: | I feel like their homepage could be greatly improved. It | doesn't really make obvious what great capability it provides | jokoon wrote: | I already downloaded documentations, like the python api, or the | cpp preference website as a pdf or html archive. | | I don't know if it's available for html or js or css, or opengl. | they4kman wrote: | https://devdocs.io/ exposes a huge catalog of indexed and | searchable collections of documentation for a wide variety of | languages, libraries, and subjects, including HTML, JS, and CSS | - though, the only GL I see is WebGL - and _all_ of it can be | downloaded to an IndexedDB for offline use. | | It's been a very handy tool in my toolbelt. | eternauta3k wrote: | You should check out Zeal, it's an offline documentation | browser with existing documentation packages for HTML and a | whole bunch of things | | https://zealdocs.org/ | 7373737373 wrote: | This is great! Too many services today become completely unusable | when they encounter technical problems, are hacked or are just | lost over time. Having an easily accessible offline copy is | always reassuring, showing that their survivability does not | depend on just a few people and the projects are fundamentally | about the information, not an organization. | sytse wrote: | You can also run FreeCodeCamp locally | https://github.com/freeCodeCamp/freeCodeCamp/blob/main/docs/... | | And I funded to work to run that on an Android phone | https://play.google.com/store/apps/details?id=space.atrailin... | iib wrote: | I remember already being able to use certain stackexchanges with | kiwix before, as well as the arch wiki, wikipedia without images, | and some other great resources. It is nice to see that they | actually pay attention to this use-case and I look forward to | updated workflows with kiwix or similar in the future. Latency is | way better that way, even with good and stable internet. | OpenZIM[1] is also useful in turning any page for use with kiwix. | | I also have great memories from a University exam where we were | allowed to have laptops that were not connected to the internet. | | [1] https://wiki.openzim.org/wiki/OpenZIM | benpopper1 wrote: | What was test score ;) | throwoutway wrote: | This is awesome! At first I thought this only supported Stack | Overflow and not the other 170+ StackExchange forums, but it | looks like it does (or will?). From the blog: | | > "We built the Sotoki (Stack Overflow to Kiwix) scraper in such | a way that it can capture each and every one of the 180 Stack | Exchange websites." | | Unclear to me if "can" means "does" or "will soon" or just | "could" | benpopper1 wrote: | It already does - everything from the technical stack exchanges | to the sites on cooking and gardening :) | polarix wrote: | This has been available for a while but it's great to see some | acknowledgement especially since the most recent data set was | stuck in 2019 for a while. | | Here are the datasets: | http://download.kiwix.org/zim/stack_exchange/ | | It's not clear to me why the data set shrank between 2019/3 and | 2022/6; was something excluded? Compression improvements? | | > stackoverflow.com_en_all_2019-02.zim 2019-03-12 19:53 134G | | > stackoverflow.com_en_all_2022-05.zim 2022-06-17 12:36 75G | FinnLeSueur wrote: | The article states: | | > ... to ensure that an up-to-date version of our dataset is | easily available for those who need it, and will work to | improve its readability and reduce its size so there is less | friction for end users... | gernb wrote: | The data isn't stuck. The data is available here | | https://archive.org/details/stackexchange | | It's the "official" place to get the data | | I've download it several times and extracted my own | contributions. ___________________________________________________________________ (page generated 2022-10-20 23:00 UTC)