[HN Gopher] WorldBrain's Memex: Bookmarking for the power users ... ___________________________________________________________________ WorldBrain's Memex: Bookmarking for the power users of the web Author : lelf Score : 138 points Date : 2020-05-18 19:22 UTC (3 hours ago) (HTM) web link (getmemex.com) (TXT) w3m dump (getmemex.com) | haaaris wrote: | I'm currently building a similar tool, but for groups and teams. | Would appreciate any feedback if anyone's keen on checking it out | :) https://www.inverse.network/ | cparsons3000 wrote: | I've literally used over 10 bookmark managers in the last 10 | years and Bookmark OS (https://bookmarkos.com) is the only one | that I've stuck with | dr_dshiv wrote: | Wow, I have been looking for just this tool. First, the ability | to highlight and save interesting passages on the web. Second, | something to give me value from my own browsing history. Third, | an honest, open, paid service that aspires to the vision of the | original Memex. I really hope this succeeds. | ohlookabird wrote: | I really like my self-hosted Wallabag for this. There are browser | extensions for Firefox and Chromium (and possibly other) and | works well on my Android phone and online. It's a nice layout and | most websites work well with it. I use it both for bookmarking | and as read-it-later tool. Kudos to the devs! | fastball wrote: | What is WorldBrain? | severine wrote: | More details: | | https://community.worldbrain.io/t/data-sovereignty-and-priva... | | https://medium.com/bettersharing/steward-ownership-is-capita... | | edit: _corrected 1st link_ | fudged71 wrote: | I'm excited for this resurgence of archiving, searching, | highlighting, bookmarking, note-taking, etc | donmcronald wrote: | I want a self-hosted version of something like this. I | currently use historio.us, which is one of the only services I | pay for, but I'd much rather have a good self hosted option. | I've been looking for years. | anotheryou wrote: | This one stores locally (if you don't use sync) | kstrauser wrote: | My first impression was "oh, another Pinboard competitor" (which | historically don't fare well). What's the elevator pitch for why | I should use Memex instead of that? | anotheryou wrote: | Full-text search across everything you have ever seen (not just | bookmarked). | kstrauser wrote: | Ah, thanks! That's a good summary. | karlicoss wrote: | also, highlighting/annotations | [deleted] | valbaca wrote: | I've been using Pocket for free since 2011: getpocket.com It's | not great or perfect but it's good enough for "to read later" and | keeping a running "grimoire" | | I've tried other methods: chrome bookmarks, evernote, plain-text, | etc but nothing provides: | | 1. Ubiquity with just one login | | With Pocket, everywhere I browse I can add to pocket, including | at work. I don't want to ever use my Google login at work b/c I | don't want my work Chrome bookmarks (which are basically work- | internal websites) to conflict with my personal ones. | | Pocket is available on my phone, iPad, browser, and work browser | quickly and easily. | | 2. Has tags. | | I stick with about one tag per item. I don't need it to be fully | tagged out, but just a general one. Typically by programming | language or topic. | | One special tag is "someday" which is how I get very long items | (like online books) out of my short "To Read" queue. | | 3. exports | | I haven't needed it but it's nice to know that I can easily | export my bookmarks, with tags, to html. From there I can convert | to something else if I want. | | I've tried GTD and other "universal" systems and my current | system is a bit of a mess (mostly because of the work-life | dichotomy), but at least my "save to read later" flow is simple: | | 1. Go to hacker news 2. send to pocket 3. when I've got time, | scroll through my to-read and pick one that packs into the amount | of time I have | | It does one thing and does it well enough for me. | loughnane wrote: | Is a good way to think about this as Memex = Evernote + Genius + | Privacy? | greenice wrote: | Does WorldBrain Memex save any data about the sites I bookmark? | | I've been using Onenote for the past 10 years to bookmark or save | websites. | | It had worked OK to share from mobile but my Onenote notebook is | now approaching 10 GB in size. | | And I have a pretty bad experience with syncing as it doesn't | reliably sync in the background if I don't regularly open the app | on mobile (especially on iOS). | lrpublic wrote: | This is an interesting perhaps meta-relevant topic for HN. | | How many of us bookmark or otherwise record interesting posts | from here and elsewhere? | | How many of us ever refer that accumulated digital memory? | | I have about 7,000 links with notes accumulated over the last few | decades. | | I've read a lot of them, but the hard to acknowledge reality is | that even with a refined workflow, recording my links in a near | perfect taxonomy, to a repository with full text search and | spaced repetition reminder cards, the things I remember are those | that I took the time to read. | | I suspect most people here has a comparable metric to share. | | Maybe the best bookmark repository is nul: | avolcano wrote: | The site is weirdly ambiguous about this but: I am assuming by | "offline-first" they mean that the "full-text history capture" | never leaves my device, right? Or does it get synced optionally? | Or only synced to other devices I have? | | It's baffling to me that they put "privacy-centric" front and | center and then do not in any way explain what that actually | means. | karlicoss wrote: | Yep, the sync is optional (and the only thing they take money | for, which makes sense) | severine wrote: | https://community.worldbrain.io/t/what-happens-with-my-priva... | fit2rule wrote: | This is still not as powerful as my one, simple trick to handle | all bookmarks, ever: Print to PDF. | | I've been doing it since last century, and I have 10's of | thousands of PDF's of every single web page I've ever found | interesting, sitting right there in a directory on my computer. | Its indexable, searchable, grok'able, available off-line, allows | me to harvest data without fuss, and gives me access to anything | I can remember about the article, almost instantaneously. | $ ls -l ~/PDFArchive/ | grep -i "bookmark" | grep -i "manage" | | wc -l | | = I've seen 20 other bookmark management 'solution' articles in | 20 years | | .. nothing beats print-to-PDF. Its just awesome. | danielecook wrote: | What sorts of web pages do you do this for? What if the pdf | version is difficult to read? | fit2rule wrote: | Every web page I'll ever want to refer to, ever again. There | are no good reasons for exceptions to this technique, imho. | | If the PDF version is difficult to read - which it rarely is, | by the way - all I need to do is open the PDF and use the | links in the page header to go visit the site again - all the | details about the page are still there in the PDF, links are | still clickable, etc. | | And if its really important, and I've taken the time, before | moving to my PDF Archive, to verify that the site is not | readable due to some layout inconsistency in the conversion | to PDF (I do sometimes suspect this with the fancier laid out | pages), I Print-to-PDF again after enabling Reader mode/view | (Safari/Firefox): problem solved. | | But really, there are very few web pages that don't survive | the PDF conversion. And anyway, I mostly pipe the .PDF output | through something like pdf2text for further grok/grep'ing... | cortesoft wrote: | > There are no good reasons for exceptions to this | technique, imho | | Dynamic web sites? A PDF of my bank website isn't going to | help me much. | fit2rule wrote: | If the intention is to save data from your bank website, | you're probably going to have to jump through hoops | anyway, assuming your bank is doing its job. (Or just | remember to use Reader mode first..) | | However, if the intention is to just save a link to the | bank website for future reference, my technique still | works since every page in the PDF produced contains a | header with the URL - just like a normal bookmark. | nojito wrote: | Most if not all web browsers have a reader mode which trims | away nonsense. | filoleg wrote: | I would be careful with using this method and check the | generated PDF versions with your eyes before writing them off | as "all is good, it is archived now". | | I recently got bitten by that, when I was trying to print out | some page in Chrome, and it was rendering as a bunch of white | space surrounded by some elements from the page, but without | any actual content I cared about. Turns out, my situation isn't | that uncommon for pages that are heavily JS-dependent | | Note: I am not saying JS=bad. This has nothing to do with JS | itself and everything to do with how JS is used to | generate/render the page. A lot of pages just don't bother with | doing it the right way that doesn't screw up generated PDFs. | fit2rule wrote: | I've since learned in this thread that Chrome and Firefox are | not as good as Safari for this technique - it hasn't impacted | me much since I only use Chrome/Firefox for development, | mostly. | | And although I do occasionally check the produced PDF's, the | layout doesn't matter to me at all since I use a cmd-line | grep or combination of 'pdftotext' to find the page, open the | PDF, and click the link to go to the original web page if I | need to .. haven't found a single dud PDF in the collection | in a randomised sampling, but then again in 20,000+ files, | there's bound to be one that didn't make it through the | rendering pipeline, but so far, hasn't been an issue. | throwawaysea wrote: | Are you manually printing each page to PDF? I would love to | have an automated way to do this but haven't figured out how to | deal with logging into subscription based site and all that. | | There is also some degree of messiness even with printing to | PDF. For example let's say I want to save an HN or Reddit | discussion along with the comments - I would need to make sure | I capture all the comments that overflow to "More" on HN or are | behind a "load more comments" link on Reddit. Is there any | elegant way to traverse all that and capture it? | cocktailpeanuts wrote: | you should productize this idea. I would use it. | | It's a bit hassle to print to PDF every time, but if the | barrier is low enough, it would be useful. | fit2rule wrote: | There's nothing to productize. You just use CMD[,Control}-P, | select "Print to PDF", save to a relevant folder, and off you | go. If that's too many steps, use Automator or whatever the | equivalent is on your OS to make a shorter hotkey. (I had a | Hammerspoon script for this once, but reverted to just doing | it manually, since my muscle memory on the keystrokes is | sufficiently well trained that it supplants my desire to find | the .lua files somewhere to pass to Hammerspoon..) | | The entire point is that there is absolutely no need for a | third party to get involved in organising your web browsing | history or remembering your bookmarks. Use the shell. Very | few third-party services will be able to match the power of | this tooling, for the reasons I gave above. My history = my | data, for my own private purposes. | GuiA wrote: | That's the software economy we're in. Everyone's thinking | in terms of "productization" and "features" and "user | journeys". Only old grumpy hackers care about minimal, | composable tools anymore. Sigh. | jannes wrote: | Do you use a Chromium-based browser? Chrome/ium's Print-To-PDF | uses quite a different (better) method for generating the PDF | compared to the OS-level Print-To-PDF. | | The OS-level PDF converter can lose a lot of information. | Especially hyperlinks are not present in the PDF when it's | generated through a print driver. | | Unfortunately, this is one of the few times when it sucks to be | a Firefox user, because it doesn't have a builtin Print-To-PDF. | fit2rule wrote: | I use Safari and Firefox mostly, and haven't yet run into any | of the issues you bring up - pdftotext gives me full text | search with ease. All the links still work, PDF's are easy to | read (assuming the page doesn't do weird layout tricks), and | for the worst case, I at least can search my "bookmark" PDF | archive and go back to the original live web page if needed. | jannes wrote: | Looks like Apple's implementation is a bit better then. I | just tried on Windows and I definitely don't get any | clickable links when printed with the OS print-to-pdf | driver. (I do get them with Chrome's save-to-pdf feature | which circumvents "printing" and generates the PDF itself.) | fit2rule wrote: | Good to know for future discussions about this technique. | I wonder if there is a way to make Chrome on Windows | behave better in this regard .. seems quite shortsighted | to me that they'd remove the links, although maybe a | security boffin has made the case for it. | | Either way, haven't used Windows in decades, so its a | non-issue, but it is interesting to note that this isn't | something I'd be doing if I did switch. | andrepd wrote: | How is this "not as powerful"? What is this tool lacking that | save to pdf provides? I can see at least one way it is vastly | inferior, in that you break formatting by converting to a | horrendous paper page-based format. | gregsadetsky wrote: | It sounds like a really good idea (in addition to images being | part of this single file PDF "archive" and thus won't go | missing), but the PDFs being searchable depend on how the PDF | is made, no? | | I printed to PDF this HN thread in Chrome (I assume that the | PDF printing was done on the system level by OSX -- EDIT yes, | from the file: "/Producer (macOS Version 10.15.2 \\(Build | 19C57\\) Quartz PDFContext)"), and none of the page's strings | appear as ascii or utf-8 in the document. grep is unable to | find any string in that file. | | Do you have a specific print to PDF setup? Or a PDF-aware | grep..? | | EDIT: Seeing the command-line you're using, the search you do | is over the files' names, correct? The PDF/(original web page) | text content is not indexed, right? Just to make sure I | understand correctly. | fit2rule wrote: | I just use the PDF defaults from whatever browser I'm using | at the time. Nothing special involved, just the defaults. | | I do use 'pdftotext' to do more fine-grained searching if I | need to - but for the most part I find that a simple "ls -l | | grep <search>" suffices, since this method preserves page | title text too .. | | I did the same thing for this thread and had no issues with | this command, whatsoever: $ pdftotext | WorldBrain\'s\ Memex:\ Bookmarking\ for\ the\ power\ users\ | of\ the\ web\ \|\ Hacker\ News.pdf - | grep -i "Print-to-pdf" | | Results: | | ".. nothing beats print-to-PDF. Its just awesome." "fancier | laid out pages), I Print-to-PDF again after enabling Reader" | gregsadetsky wrote: | Got it, makes sense. | snazz wrote: | > EDIT: Seeing the command-line you're using, the search | you do is over the files' names, correct? The | PDF/(original web page) text content is not indexed, | right? Just to make sure I understand correctly. | | pdftotext gets the actual text from the PDF. I don't do | this, but I'm sure that you could automate the process of | generating a text file for each PDF in a directory with | pdftotext and then ripgrep the text files when it's time | to search the contents. That would be doable with a | makefile or a couple of shell scripts. | fit2rule wrote: | Yeah, my computer is fast enough that I can just do "find | . -name '*.pdf' -exec pdftotext {} \; | grep -i | someSearchTerm" and come back later. Bonus points that it | stays in my Terminal for reference later in the day as | needed. | wazoox wrote: | I discovered Zotero for this use. I don't have any use of its | bibliographical abilities, but it stores web pages and PDF | articles fine, and is searchable, etc. | ramraj07 wrote: | This is amazingly simple, but what about in mobile? There's pdf | printing here too but it's not as simple as command P! Any | ideas? | fit2rule wrote: | If I need to save a page I've read on mobile, I mail myself | the link to my desktop and print it there, where the PDF | Archive lives. Its muscle memory at this point. | | Still, would be nice if the Browser vendors would cotton on | to how powerful this is, and make the whole thing a bit more | seamless for the mobile/desktop bridge, or just make Print- | to-PDF work more smoothly for this case on mobile. | | Either way, I also have a list of every mail I've ever sent | myself containing a URL from mobile, which is handy in and of | itself at times, hehe .. | WanderPanda wrote: | On iOS you can easily create a shortcut for the share-sheet | to creat pdf prints fyi | fit2rule wrote: | Yeah, I should probably set that up some time, but I've | become reliant on my muscle memory for my forwarding-to- | desktop flow .. | printtopdf wrote: | Sounds like a product idea. I can imagine a service that uses | google puppeteer on a server somewhere to print a pdf of a | URL and then emailing it to the user. | imperialdrive wrote: | Kudos - I'll be practicing this later today :) | ropeladder wrote: | It's too bad browsers don't have an easy way to print to | browser-page-sized PDF. Standard 8.5x11/A4 paper sized PDFs of | webpages tend to look pretty terrible. | | I used to use the Scrapbook plugin for Firefox but I realized | for the most part just plaintext might be best. So I'm in the | process of setting up a workflow that will save article in | markdown in one click and sync between my phone and my | computer. | fit2rule wrote: | Reader View is the solution to the A4- problem, imho. But I | honestly don't mind the rendering issue - this is just a | reference repository, after all. If I really need the cleaner | page, I either Reader-View it beforehand, or just open it up | on the Web again - links are preserved in PDF. | jannes wrote: | Links are only preserved if you use Chrome's PDF export... | This is not true for Firefox. (At least on Windows. I | haven't tried Firefox on macOS.) | fit2rule wrote: | Safari for the win! ;) | | (Seems like a bug in Firefox to me.. maybe this should be | a config option..) | wlesieutre wrote: | Safari on iPhone can do this: | | https://images.macrumors.com/t/zbOsBhKGQj6VvA9oq8KaZkLxXUc=/. | .. | | (note the scrollable preview at right edge of screen, the | main preview is only showing a small fraction of the | document) | jannes wrote: | You could try the awesome SingleFile extension: | https://github.com/gildas-lormeau/SingleFile | | It might be a good compromise between PDF and plain text. | It's pretty nice because it essentially serialises a snapshot | of the current DOM tree, so it works with all kinds of JS- | generated pages. | | The files should be relatively grep-able, because it's normal | HTML. Of course, you might want to strip HTML tags for more | sophisticated searching. | ropeladder wrote: | SingleFile is a really great extension, but I wanted | something a bit more pared down that I could easily use on | both mobile and desktop and sync between them using | Syncthing. So I'm trying to copy some of SingleFile's UI | and graft it on to Markdown-Clipper.[1] And also add the | ability to save the images that get picked up by | Readability (which Markdown-Clipper uses). | | [1] https://github.com/enrico-kaack/markdown-clipper | pvg wrote: | _nothing beats print-to-PDF_ | | What's the advantage over browser's built-in 'save entire page' | option? Print to PDF loses formatting and obscures the URL you | got the thing from. | fit2rule wrote: | $ pdftotext | grep sometext | | ^^ first advantage | | Also: a) Formatting is not lost, it just changes to fit the | default paper size I've got selected (A4) but doesn't really | make much difference, since its a snapshot, and b) URL is | right there in the Header of the PDF, and is clickable, so no | - not really an issue. This archive also functions as a | bookmark collection as well as an offline copy for future | reference .. | | (Disclaimer: may be that your browser is borking the PDF's. | Not the case with Safari, anyway, but ymmv..) | pvg wrote: | You can grep the saved archives and they often save working | copies of local interactive content in a way PDF doesn't. | Internal structure and annotation is also preserved. I'm | not sure I understand the formatting comment, you seem to | be saying formatting is not lost and supporting that with | an example of how formatting is lost. Don't get me wrong, | it should definitely be easier to save, index and otherwise | manipulate web pages. But out of the the trivial methods, | 'print to PDF' is one of the poorer methods. | fit2rule wrote: | It depends on the site - but I haven't found 'lost | formatting' to be an issue at all - since, when I want to | do a granular search I'm using 'pdftotext' to search on | plaintext, and when I find a PDF of interest, I open it | and can go directly back to the web page from which it | was printed by way of the footer/header which contains a | clickable URL. | | Most of the time though, the formatting isn't an issue. | It depends on the site though - some authors produce | stuff that doesn't look good as PDF, even if the content | is still there. That doesn't bug me much. | pvg wrote: | Ok, so we seem to agree print-to-pdf loses formatting. I | share your interest in and fascination with this (weirdly | irksome and edgecasey) problem but just about any modern | browser provides better facilities for saving web pages | with higher fidelity than 'print to pdf'. Print to pdf is | so easy to beat, you'd have to go out of your way to find | a way to not-beat it - say, saving just 'page source'. | ben509 wrote: | Grep is not much of an advantage when Mac, Windows and | Linux all have as-you-type full-text search of common | formats like HTML and PDF. | NikolaeVarius wrote: | I guess this possibly competes with pocket/wallaby/similar? | | I see mozilla as a contributor. | symplee wrote: | Looks great. Would be awesome if it had a hook to easily generate | Anki flashcards from text selection. | carapace wrote: | I tried this for a couple of months but the search results were | disappointing. | | I think something like "Stealth" ( | https://github.com/cookiengineer/stealth ) will prove to be a | better strategy. | ollo wrote: | Why should I use this instead of Zotero? | dig1 wrote: | My tools of choice for advanced bookmarking and offline read: | | * org-mode [1]. | | * org-board [2] for offline archiving. | | * Org Capture [3] for getting links or text chunks from browser. | | * git repo for tracking history. | | With org-mode I can create really complex connections between | articles and citations, add tags, have TODO lists and many more. | To visualize things and connections, org-mind-map [4] can be | useful. Because everything is text, grep, ripgrep, ag, xapian and | other similar tools works without problems. | | I'm aware this setup isn't for everyone (you need to be Emacs | user), but I still need to find proper alternative with this | amount of flexibility, keeping everything in plain text format. | | [1] https://orgmode.org/ | | [2] https://github.com/scallywag/org-board | | [3] https://chrome.google.com/webstore/detail/org- | capture/kkkjlf... | | [4] https://github.com/the-humanities/org-mind-map | wakkaflokka wrote: | I keep wanting to use this because I love the idea, but the | implementation last time I tried didn't seem to jive with me. I | navigate the web with Tridactyl, and I think some of the | keybindings were interfering - which would be my fault. | | With that being said - I love the idea, and will continue to | check every so often on the status of the project :) | anotheryou wrote: | I also disabled all keybindings and overlays and it can still | be useful for search. | jalopy wrote: | Does it capture the web content I view? Or just index it to | retrieve the web at it's original URL later? | [deleted] | anotheryou wrote: | For full text search it has to save the text obviously, but | right now you sadly can't retrieve it. | | It's on top of the roadmap though :) | https://www.notion.so/Release-Notes-Roadmap-262a367f7a2a48ff... | rollinDyno wrote: | Something I noticed when I use "Read Later" style applications to | save pages is that I will, most of the time, forget about how I | arrived at a certain page. This is important to me because it | gives me the context to decide a perspective on the page. | | If I was able to save pages while also knowing where I found them | and maybe make a comment about why I found it interesting, then I | would be able to organize my knowledge in a way that mirrors my | train of thought. | | Are there any tools capable of doing this? | karlicoss wrote: | I'm working on a tool which can do exactly this (and it's only | one of the features!): | https://github.com/karlicoss/promnesia#readme | | Here's a link demonstrating the usecase you want (40 seconds | video): https://karlicoss.github.io/promnesia- | demos/how_did_i_get_he... | | I discovered Worldbrain Memex way into the development | (unfortunately), but in the near future I will try to evaluate | to which extent it's possible to mutually benefit, i.e. base | Promnesia extension & backend on Worldbrain's, or contribute | some of Promnesia's features to them (maybe even merge | completely?) | rollinDyno wrote: | The webm file is broken | karlicoss wrote: | sigh.. thanks, it works in Firefox, but apparently not in | Chrome. I added a link to mp4 version. | | upd: in case it would save someone else some pain in the | future -- direct webm links don't work on | raw.githubusercontent.com, but do work if you publish your | repo as github pages -- then it ends up hosted on a proper | CDN. | kirubakaran wrote: | https://histre.com/ has tree-style web history, taking notes on | those web pages, and more. Disclaimer: I'm the founder. | | It automatically creates a knowledge base for you. The paths | you took to arrive at a piece of information is just one part | of the puzzle that it puts together for you. | ramraj07 wrote: | How are you planning to attack mobile use? More than half my | browsing happens on mobile now! | Groxx wrote: | Not sure if their plugin works for it or not, but: Firefox | has had extensions on Android for years. Should work fine. | unqueued wrote: | Hmm, I thought I was the only one who thought like that. I've | just been exporting entire browser trees from Tree Style Tabs | (with hierarchy) at once and attaching them to a page in my | Zettelkasten or another part of my knowledge base. | | It is great to have the entire context of my browsing session | to go back to. ___________________________________________________________________ (page generated 2020-05-18 23:00 UTC)