[HN Gopher] Rga: Ripgrep, but also search in PDFs, E-Books, Offi...
       ___________________________________________________________________
        
       Rga: Ripgrep, but also search in PDFs, E-Books, Office documents,
       zip, tar.gz
        
       Author : angrygoat
       Score  : 451 points
       Date   : 2020-12-02 15:38 UTC (7 hours ago)
        
 (HTM) web link (phiresky.github.io)
 (TXT) w3m dump (phiresky.github.io)
        
       | awinter-py wrote:
       | thanks but it's way faster to have my stuff in G drive
       | 
       | that way I can open a browser tab, wait 5 seconds for it to load,
       | locate the new screen location of the search bar, click it, wait
       | for javascript to finish loading so I can click the search bar,
       | click it for real this time, mistype because there's some kind of
       | contenteditable event jank, wait 5 seconds for my results to come
       | up, fix the typo, and just have my results waiting for me
       | 
       | I'm not going to learn a new tool when web is fine
        
         | ffpip wrote:
         | If you're using Duckduckgo, just search ''!drive search-term''
         | or ''search term !drive'' or ''search !drive term''
         | 
         | More !operators here - https://duckduckgo.com/bang
        
           | [deleted]
        
           | spappal wrote:
           | Firefox supports custom search engines, the most bang for the
           | buck custom search engine must be
           | https://duckduckgo.com/?q=%s with keyword being the letter d.
           | Then you get all these 13000+ bangs without having to
           | configure the custom search engines. E.g. write "d !drive
           | term" in url bar. And "d !w hacker news" sends you directly
           | to https://en.wikipedia.org/wiki/Hacker_News
        
             | codethief wrote:
             | Or you just set DDG as your default search engine and then
             | you don't even have to type the "d" anymore. :)
        
         | Tokkemon wrote:
         | Can't tell if this is sarcasm.
        
           | awinter-py wrote:
           | not being sarcastic
           | 
           | if god wanted me to access my files in less than 15 seconds,
           | they wouldn't have commanded google to package the search bar
           | as a separate JS bundle that only gets downloaded when you
           | focus the search bar
           | 
           | I'm no frontend dev but I know a thing or two about HTML +
           | there's no built-in way to input text into a box -- this is
           | the best we can do and we'll just have to wait for 5G +
           | moore's law to solve this
        
             | durnygbur wrote:
             | Laugh all you want but try looking for a Fullstack/Frontend
             | role in today's job market. What do they want? AnGuLaRr
             | with oBsErVaBlEs! Why do they want it? Because Google can't
             | be wrong.
        
               | awinter-py wrote:
               | wait actually? my sense is that react is leading
        
             | chrisweekly wrote:
             | > "I'm no frontend dev but I know a thing or two about HTML
             | + there's no built-in way to input text into a box"
             | 
             | hahaha, nice one (continued)
        
           | murermader wrote:
           | I don't know if you are joking, but this is clearly sarcasm.
        
             | read_if_gay_ wrote:
             | You're saying you can't tell it's sarcasm that he can't
             | tell it's sarcasm?
        
               | enriquto wrote:
               | This means it's good sarcasm!
               | 
               | The best sarcasm lies on a ridge, you cannot tell if it's
               | sarcasm or not.
        
         | hs86 wrote:
         | If you use Chrome, this might help:
         | https://www.androidpolice.com/2019/12/04/chrome-omnibox-will...
         | 
         | For GSuite/Workspace this needs to be enabled by an admin:
         | https://support.google.com/a/answer/9121487?hl=en
        
         | stjohnswarts wrote:
         | A lot of us don't want our stuff on G-drive for privacy and
         | security concerns. Tools like this are valuable to us. It's an
         | old problem and there are plenty indexers out there, this more
         | real-time scan is more than welcome to join the bunch of
         | course.
        
       | gopty wrote:
       | Sounds like a poor man's version of recoll
       | 
       | https://www.lesbonscomptes.com/recoll/
       | 
       | A PDF in a Zip file, in an email attachment. recoll can index it
       | and do OCR if you like
        
       | SamuelAdams wrote:
       | Any advantages to this over something like Agent Ransack?
       | 
       | https://www.mythicsoft.com/agentransack/
        
         | tfigment wrote:
         | Command line, linux, and open source immediately come to mind
        
         | fnord123 wrote:
         | Works on non-Windows. ripgrep is notoriously fast. Command line
         | interface. Not comically priced at 59.95 USD.
        
           | cpach wrote:
           | Why is there an expectation that every application should be
           | free or cheap? IMHO $60 is very reasonable for a program that
           | can save a lot of time for the user. And developers also have
           | to eat, and might want to some day retire.
        
             | fnord123 wrote:
             | I'm not the boss of you. If you want to spend 60 USD on a
             | program that is mostly built into Finder and Nautilus, fill
             | your boots.
        
             | michaelcampbell wrote:
             | $60 may not be much for you; it may be worth it for you.
             | For others it may not be.
        
       | maxioatic wrote:
       | This is great. I have 100+ ebooks/pdfs of programming and
       | textbooks of which I've been extracting the index pages of. My
       | intention was to always make some sort of search index out of
       | them. I will definitely be trialing this (initial few searches
       | seem promising!)
        
         | skanga wrote:
         | Great idea. Please update on whether this use case works or
         | not! And other tips, examples, etc.
        
       | diimdeep wrote:
       | Is anyone preferring some other search tool other than Spotlight
       | on macOS ?
        
         | cpach wrote:
         | I like Spotlight, and its CLI companion mdfind.
        
         | qppo wrote:
         | I use ripgrep on macos
        
         | michaelcampbell wrote:
         | ripgrep in emacs for me.
        
       | ghoomketu wrote:
       | One a related note there is one program that I absolutely miss on
       | Linux called everything (on windows).
       | 
       | The closest I can find is mlocate but it does not have a GUI but
       | more importantly it does not index my Windows or NTFS drives.
       | 
       | Would appreciate any suggestions if someone knows something like
       | 'everything' for Ubuntu.
        
         | shscs911 wrote:
         | fd (https://github.com/sharkdp/fd) is the best command line
         | search utility IMO. Its crazy fast and always found what I was
         | looking for. If you want a GUI alternative, check out Drill
         | (https://github.com/yatima1460/Drill). Although the development
         | seems stalled, it works well for normal usecases.
        
         | opan wrote:
         | Haven't used this, but heard of it years ago, and it aims to be
         | similar. https://github.com/dotheevo/angrysearch/
        
           | ghoomketu wrote:
           | Thanks! That's super nice and very close to what I was
           | looking for all this time.
           | 
           | I just learned how to mount all my Windows drive under /mnt
           | using (using the `disks` software), so hopefully this should
           | index those files too.
        
         | pabs3 wrote:
         | BTW, mlocate is obsoleted by plocate, which is much faster and
         | is actually maintained.
         | 
         | https://plocate.sesse.net/
        
         | captn3m0 wrote:
         | Seriously - I miss it as well. But my access patterns have
         | changed as well. I spend more time on the terminal, and with
         | autojump, the alternatives (with similar features) on Linux
         | aren't really that useful to my usage.
        
         | mrlala wrote:
         | 'Everything' is a LIFE SAVER.
         | 
         | Hmm.. I seem to remember creating an excel file for this client
         | a while back.. open Everything -> filter _client_.xlsx .. boom.
         | Or maybe I didn 't name it properly, at all? Well still just a
         | simple '*.xlsx' and sort by date, I can generally find anything
         | this way. As long as you let Everything open on windows
         | startup, it will be instant after use.
        
         | ronjouch wrote:
         | Everything is great on Windows to pick files/folders.
         | 
         | From the linux command-line, I like fzf (
         | https://github.com/junegunn/fzf ), that you can instruct to use
         | the faster fd ( https://github.com/junegunn/fzf#environment-
         | variables ). Fzf even offers keybindings for your shell. For
         | example, it binds Alt+C to fuzzy-finding a directory, and Enter
         | cds to it ( https://github.com/junegunn/fzf#key-bindings-for-
         | command-lin... ).
         | 
         | Fzf is great for other things too; here is a fish function to
         | bing Alt+G to fuzzy-pick a Git branch and jump to it:
         | function fish_user_key_bindings          bind \eg 'test -d
         | .git; or git rev-parse --git-dir > /dev/null 2>&1; and git
         | checkout (string trim -- (git branch | fzf)); and commandline
         | -f repaint'          bind \eG 'test -d .git; or git rev-parse
         | --git-dir > /dev/null 2>&1; and git checkout (string trim --
         | (git branch --all | fzf)); and commandline -f repaint'
         | end
        
         | demosito666 wrote:
         | There is about a dozen dedicated file indexers [1] on linux
         | (some are gui, some are not) and also DE-integrated ones like
         | Baloo for KDE and Tracker for Gnome.
         | 
         | [1]
         | https://lmgtfy.app/?q=Is+there+a+file+search+engine+like+%E2...
        
         | pkaye wrote:
         | For mlocate you can edit /etc/updatedb.conf to specify what to
         | index. One trick I use is "locate -Ai" that lets you search for
         | multiple patterns and makes it case insensitive. So you can use
         | "locate -Ai linux .pdf" to search for all pdf files related to
         | Linux.
         | 
         | Also for gnome there is tracker which does search and indexing
         | built into the system. I think by default its set for minimal
         | use but it can be configured by the settings/search panel to
         | index many locations. I haven't played with is much recently
         | though.
        
         | RMPR wrote:
         | To traverse my files I use the combo ranger + autojump. It is
         | not GUI and you need to traverse a directory at least once
         | before accessing it automatically, but I just wanted to mention
         | this. Another (CLI) software that seem to do what you want is
         | fzf.
        
       | dang wrote:
       | If curious see also
       | 
       | 2019 https://news.ycombinator.com/item?id=20196982
        
       | durnygbur wrote:
       | No ripgrep-all through the package manager:                 $
       | sudo dnf install -y ripgrep-all       [...]       No match for
       | argument: ripgrep-all       Error: Unable to find a match:
       | ripgrep-all
       | 
       | Rust's package manager fails:                 $ cargo install
       | ripgrep_all       [...]
       | failed to select a version for the requirement `cachedir =
       | "^0.1.1"`       candidate versions found which didn't match:
       | 0.2.0       location searched: crates.io index
       | required by package `ripgrep_all v0.9.6`
       | 
       | Quick search on the web shows that more people have problems with
       | cachedir version.
        
         | ChrisSD wrote:
         | It looks like cachedir yanked version 0.1.1. This is usually
         | only done when a very serious issue is discovered, though I
         | don't know what the reason is in this case.
         | 
         | https://crates.io/crates/cachedir
        
         | est31 wrote:
         | You can do _cargo install --locked ripgrep_all_ as a
         | workaround. It uses the lockfile that 's part of the
         | ripgrep_all package, so you miss out on some package updates,
         | but can also get the cachedir version required.
         | 
         | There is a github issue to make this the default behaviour of
         | cargo, but you miss out on updates which might fix security
         | bugs so the cargo team is unwilling to change the default.
         | 
         | https://github.com/rust-lang/cargo/issues/7169
        
       | edm0nd wrote:
       | Big fan of ripgrep. Use it on Windows to search through 100s of
       | GBs of data really quickly.
        
       | akavel wrote:
       | The "Integration with fzf" example looks really cool:
       | 
       | https://github.com/phiresky/ripgrep-all#integration-with-fzf
        
       | kovek wrote:
       | How could I use Rga to search my browsing history?
        
         | simonw wrote:
         | Your browser history (if you use Chrome or Firefox at least) is
         | stored in a SQLite database.
         | 
         | It looks like rga can handle SQLite out of the box, so just
         | making sure your history .db file is visible to rga may be all
         | you need.
         | 
         | You can also use my Datasette tool to get a web UI against your
         | history, see
         | https://docs.datasette.io/en/stable/getting_started.html#usi...
        
       | hiq wrote:
       | It would be nice to have a direct comparison with ugrep. In the
       | case of rg the benchmarks are already enough to switch. Why
       | should I use rga instead of ugrep?
        
         | burntsushi wrote:
         | I've called the ugrep benchmarks into question, and I
         | elaborated on it here (and this includes a frustrating exchange
         | between myself and the ugrep author):
         | https://old.reddit.com/r/rust/comments/i6pfb2/ugrep_new_ultr...
         | 
         | I've also re-run my original set of benchmarks[1] with ugrep
         | included:
         | https://github.com/BurntSushi/ripgrep/blob/master/benchsuite...
         | 
         | [1] - https://blog.burntsushi.net/ripgrep/
        
           | hiq wrote:
           | Just to be clear, I meant that I had switched to ripgrep
           | because its speed was convincing enough on its own (so I did
           | not even extra features to switch).
           | 
           | I'm currently not using any of ugrep or rga, although I have
           | used pdfgrep in the past. It'd be nice for casual users like
           | me to know more about why I should use rga over ugrep (or
           | vice-versa).
        
       | cb321 wrote:
       | NOTE: ripgrep already has --pre. (No pre-built indexing, of
       | course.)
        
         | burntsushi wrote:
         | That's exactly what ripgrep-all uses to implement this. There's
         | a lot of integration work required to make this nice. The --pre
         | flag is just a small hook. More info on it here:
         | https://github.com/BurntSushi/ripgrep/blob/master/GUIDE.md#p...
        
           | cb321 wrote:
           | Yup.
           | 
           | Something perhaps more helpful but so far unmentioned (and
           | somewhat OS-specific) is that statically linked executables
           | usually fork & exec (especially exec) much faster than
           | dynamically linked ones. This difference is usually only like
           | 50..150 us vs 500..3000 us but can multiply up over thousands
           | of files.
           | 
           | This only matters on the first run of `rga`, of course. While
           | the dispatched-to decoder is likely mostly out of one's
           | linking control, this overhead can be saved for the dispatch
           | _er_ , at least. So, I would suggest `rga-preproc` should
           | have a static linking option/suggestion, at least on Linux.
           | 
           | Of course, this overhead may also fall below the noise of
           | PDF/ebook/etc. parsing, but maybe not the decompression of
           | small files in some dark horse format. :-)
        
       | vmchale wrote:
       | Wonderful! pdfgrep is good but slow.
        
         | phiresky wrote:
         | pdfgrep has a --cache option since a while ago :) Not sure why
         | they don't enable it by default. Still, this is much faster.
        
       | globular-toast wrote:
       | I have mixed feelings about these kinds of tools.
       | 
       | I can understand it might be nice to have a personal library of
       | PDF books and searching in them. I can't think of a time I've
       | ever wished I could search my bookshelf in that way, but you
       | never know.
       | 
       | Obviously I use tools like ripgrep for searching codebases and
       | the like.
       | 
       | But the extreme flexibility of this one in particular (and others
       | like MacOs Spotlight) makes it seem more like a data recovery
       | tool for me. If my directory structures and databases ever
       | completely failed for some reason I might need to search through
       | everything to find the data again. It's good to know such tools
       | exist, I suppose.
       | 
       | But my fear is that tools like this teach people to not worry
       | about organisation of data and to just fill up their disks with
       | no structure at all. I think that unless something goes terribly
       | wrong nobody should ever need a tool like this. Once you rely on
       | it, you're out of luck it if it ever fails you. What if you just
       | can't remember a single searchable phrase from some document, but
       | you just _know_ it must exist somewhere?
       | 
       | It's similar to what Google has done to the web. When I was
       | growing up it used to be a skill to use the web. People used
       | tools like bookmarks and followed links from one place to
       | another. Now it's just type it into Google and if Google doesn't
       | know, it doesn't exist.
        
         | nojito wrote:
         | Hierarchal organizing of data is not a productive way of
         | organization. Simply due to how much information people
         | accumulate and often times structures breakdown.
         | 
         | It's more intuitive to simple search for something in the
         | something you are looking for and clicking it.
         | 
         | I haven't used a folder organization structure in many many
         | years. Other than the defaults for my cloud folders and a
         | separation between Personal + Work.
        
         | durnygbur wrote:
         | There is nothing wrong with the original Google's postulate.
         | Your local search results are less likely to be hijacked by
         | entities bidding for your attention. I agree with the argument
         | for organizing the data anyway.
        
         | armoredkitten wrote:
         | I mean, I understand what you mean when it comes to Google --
         | the web essentially becomes locked into a particular
         | proprietary solution to finding information. I definitely still
         | have hundreds (maybe into the thousands?) of bookmarks of sites
         | that store information I care about.
         | 
         | But I don't think this tool deserves the same sort of mixed
         | feelings. I don't think this replaces structure -- there's
         | still value to having a conceptual mapping of where documents
         | are stored, and for grouping sets of documents together. It's
         | just that having a structure doesn't help if you don't know
         | where in the structure something is stored. This sort of tool
         | is a bottom-up approach for the times when the top-down
         | approach doesn't work very well.
         | 
         | Do you have similarly mixed feelings if sometimes, even with my
         | carefully-crafted set of bookmarks with all their nested
         | folders, I use the search tool to find the bookmark I'm looking
         | for? It's the same idea. Sometimes a top-down structure is
         | beneficial. But sometimes things get misclassified, or you
         | forget about some piece of the structure, or you aren't
         | familiar with some new structure, and in those cases, having
         | bottom-up tools are immensely useful. There's no risk of vendor
         | lock-in here. It's just a difference of approach in information
         | retrieval.
        
       | chris_st wrote:
       | Curious why this isn't a pull request to ripgrep? Maybe it was,
       | and rejected? It'd be nice to just have one tool, and this
       | doesn't feel like it's a stretch to add to ripgrep.
        
         | burntsushi wrote:
         | It's a stretch. A big one.
         | 
         | I answered this a while back:
         | https://old.reddit.com/r/rust/comments/c1bjw4/rga_ripgrep_bu...
        
       | antegamisou wrote:
       | I always found useful something along the lines of
       | pdftotext -layout file.pdf | grep -E ...
       | 
       | for PDFs, good to see a Swiss Army knife utility for all sorts of
       | file though!
        
         | phiresky wrote:
         | rga uses pdftotext (from poppler) internally for pdfs, except
         | wraps it in parallelization and a very fast cache layer, since
         | you usually want to do multiple queries per file :)
        
       | hobofan wrote:
       | Big fan of rga! I use it almost every day for the academic part
       | of my life, when I want to know the location of some specific
       | keywords in my lecture slides, books or papers I've been reading.
       | Even for single ebooks, it is often more useful than the search
       | in Acrobat Reader.
        
         | durnygbur wrote:
         | > search in Acrobat Reader
         | 
         | The search in PDF viewers is an anti-feature in terms of UI and
         | performance. Their advantage is that they allow to scroll to
         | and highlight the found phrase back in the document.
        
           | solstice wrote:
           | The search in Tracker Software's PDF X-Change Viewer/Editor
           | is really great. Effective and easy to use
        
           | mssdvd wrote:
           | The search in most application is an anti-feature.
        
       | faitswulff wrote:
       | I noticed that you can use Tesseract as an OCR adapter for rga.
       | Tesseract is written in python, IIRC, and in the OP it comes with
       | a warning that it's slow and not enabled by default. Are there
       | any other fast, reliable OCR libs out there? Or any rust OCR
       | backends?
        
         | mouldysammich wrote:
         | https://github.com/tesseract-ocr/tesseract seems to be written
         | in c++ not python
        
           | faitswulff wrote:
           | Ah, my mistake then.
        
         | hobofan wrote:
         | I don't think the problem necessarily is that Tesseract is
         | slow, but that the whole process of rendering a PDF to a series
         | of PNGs on which you can then run OCR is slow (which is what it
         | does in the background).
        
           | undebuggable wrote:
           | The process of converting all pages to raster images and then
           | OCR-ing each one takes hours for PDFs hundreds of pages long.
           | This workflow is not suitable for instant search. For non
           | OCR-ed PDFs it's worth to pregenerate the text.
        
             | hobofan wrote:
             | That's why rga comes with a cache. I've occasionally used
             | the Tesseract adapter with good success (results-wise), and
             | after the inital rendering and indexing, it's fast enough
             | to use.
        
       | alexruf wrote:
       | Idea behind Rga is cool. Anyway, I tried it on Mac and installed
       | via Homebrew. The formula already says it depends on ripgrep
       | (that's fine since I have ripgrep already installed and use it
       | regularly). I still was surprised when I executed Rga for the
       | first time and got an error message that 'pdftotext' was not
       | found. Since pdftotext has been officially discontinued, I am not
       | sure if I want to install an old version just to make Rga work on
       | my machine. Don't think it's an good idea to rely on a project
       | which is not maintained actively.
        
         | phiresky wrote:
         | Yeah, In my opinion poppler should be a dependency of rga in
         | homebrew (since it's kinda useless without having the default
         | adapters), but I don't maintain that package.
        
         | there_the_and wrote:
         | I don't see any indication that pdftotext has been discontinued
         | [1]. It looks like a Mac-specific installer available via
         | Homebrew Cask has been discontinued [2], but pdftotext is still
         | available through the normal poppler formula [3].
         | 
         | 1. https://poppler.freedesktop.org/releases.html
         | 
         | 2. https://formulae.brew.sh/cask/pdftotext
         | 
         | 3. https://formulae.brew.sh/formula/poppler
        
         | burntsushi wrote:
         | > Since pdftotext has been officially discontinued
         | 
         | Do you have a link for that? That's news to me.
        
           | alexruf wrote:
           | brew info pdftotext
           | 
           | https://formulae.brew.sh/cask/pdftotext#default
        
       | root_axis wrote:
       | This is really great.
        
       | fock wrote:
       | can it produce links to open the file yet (don't know rust, so
       | can't add a PR easily). At least gnome-terminal supports that
       | (and normally it should also support opening a specific pdf
       | page)!
        
         | mcintyre1994 wrote:
         | Not sure if the implementation is in rg or zsh, but that
         | combination produces cmd-clickable file names for me.
        
       | soferio wrote:
       | Can it (or any tool) perform proximity searches on scanned PDFs?
       | E.g word1 within 20 words of word2, on scanned PDFs? (I think
       | this is non trivial but very useful.)
        
         | phiresky wrote:
         | Scanned PDFs only work well if they already have an OCR layer.
         | There's some optional integration of rga with tesseract, but
         | it's pretty slow and less good than external OCR tools.
         | 
         | ripgrep-all can do the same regexes as rg on any filetypes it
         | supports. So you can could do something like --multiline and
         | foo(\w+[\s\n]+){,20}bar
         | 
         | It won't work exactly like this, but something similar should
         | do it:
         | 
         | --multiline enables multiline matching
         | 
         | * foo searches for foo
         | 
         | * \w+ searches for at least one word character
         | 
         | * [\W]+ searches for at least one space/nonword character like
         | sentence marks
         | 
         | * {,20} searches for at most 20 iterations of the word-space
         | combination bar searches for bar
        
         | ballmerspeak wrote:
         | If its a scanned PDF (essentially a collection of 1 image per
         | page), there would need to be an OCR step to get some text out
         | first. Tesseract would work for this.
         | 
         | Once that's done, you have all the options available to perform
         | that search. But I don't know of a search tool that does the
         | OCR for you. I did read a blog post of someone uploading PDFs
         | to google drive (they OCR them on upload) as an easy way to do
         | this.
        
       | ssivark wrote:
       | I love that we're seeing fast & flexible solutions for personal
       | search.
       | 
       | I've recently been playing with Recoll for full-text-search on
       | content. Since it indexes content up front, the search is pretty
       | fast. It can also easily accommodate tag metadata on files.
       | 
       | It would be interesting to consider how ripgrep based tools can
       | fit into generically broad "search your database of content"
       | workflows (as opposed to remember or go through your file system
       | paths).
        
         | simias wrote:
         | FZF + ripgrep is really killer for me. I don't even bother
         | organizing my notes anymore, I just throw everything markdown
         | files in a flat directory and then I have a script that uses
         | FZF + ripgrep to search through it when I need it. I search by
         | "last modified first" so unless I'm digging for something very
         | old the results are instant. Code snippets, finances, TODO
         | lists, cake recipes... It's all in there.
         | 
         | I use the same system in Vim to browse source code. It's very
         | powerful, very fast, works with any language and requires zero
         | configuration.
        
           | rshm wrote:
           | Can you share your script.
        
             | maxioatic wrote:
             | I'd love some more info as well!
        
             | simias wrote:
             | This is the main one (that actually only uses FZF, not
             | ripgrep): https://gist.github.com/simias/b1d8356469d2a9386d
             | eeb7c45984b...
             | 
             | You'll need to set NOTES_DIR in your environment to
             | wherever you want your notes to be stored. Then you can
             | write `note something` to create or open
             | $NOTES_DIR/something.md with your $EDITOR.
             | 
             | If you type "note" without parameter you'll start a search
             | on all the note names, ordered by last use. If you type
             | "note -f" it starts a full text search.
             | 
             | For best results you should have the fzf.vim's preview.sh
             | somewhere in your fs, otherwise it'll use "cat" but it
             | won't be as good looking (see FZF_PREVIEW in the script).
             | 
             | Hopefully despite being shell it should be readable enough
             | to tweak to your liking.
             | 
             | Note that it was written and used exclusively on Linux, but
             | I did try to avoid GNU-isms so hopefully it should work on
             | BSDs and maybe even on MacOS with a bit of luck.
        
           | tasuki wrote:
           | About a year ago, I discovered it was very helpful for me to
           | have git branches ordered by "recently modified first":
           | 
           | From my `~/.gitconfig`:                   [alias]
           | brt = "!git for-each-ref refs/heads --color=always --sort
           | -committerdate --format='%(HEAD)%(color:reset);%(color:yellow
           | )%(refname:short)%(color:reset);%(contents:subject);%(color:g
           | reen)(%(committerdate:relative))%(color:blue);<%(authorname)>
           | ' | column -t -s ';'"
           | 
           | I always spent a lot of time being confused about branches,
           | and never realised how easy the solution was.
        
             | simias wrote:
             | Oh that's a great idea, I'd definitely stealing that,
             | thanks!
        
           | polyrand wrote:
           | I couldn't agree more with that. I wrapped a bash function to
           | search through my notes folder with fzf + rg and it works
           | perfectly.
           | 
           | Also, I have a specific pattern to write some tags inside
           | files that I can parse with ripgrep.
        
         | rjzzleep wrote:
         | rga also indexes them when you search. To be honest I like that
         | approach a lot more since it saves space and I generally know
         | where I'm looking for things                   ls -sh
         | ~/.cache/rga/         total 336M         336M data.mdb  4.0K
         | lock.mdb
        
           | ssivark wrote:
           | That kind of caching is an interesting solution to
           | incrementally building a database instead of spending hours
           | up-front indexing. So the tool is ready for immediate use.
           | Quite nifty :-)
        
           | curious_tenet wrote:
           | Wow that is so cool!
        
       | nikisweeting wrote:
       | Aww hell yeah we should definitely use this in place of ripgrep
       | for the new ArchiveBox.io full-text search backend.
       | 
       | https://github.com/ArchiveBox/ArchiveBox/pull/543
        
       | phiresky wrote:
       | Developer of the tool here :) Glad to see it posted here, I still
       | actively use it myself. Also check out the fzf integration in the
       | README: https://github.com/phiresky/ripgrep-
       | all/blob/master/doc/rga-...
       | 
       | Currently the main branch is undergoing a refactor to add support
       | for having custom extractors (calling out to other tools), and
       | more flexible chains of extractors.
       | 
       | Ripgrep itself has functionality integrated to call custom
       | extractors with the `--pre` flag, but by adding it here we can
       | retain the benefits of the rga wrapper (more accurate file type
       | matchers, caching, recursion into archives, adapter chaining, no
       | slow shell scripts in between, etc).
       | 
       | Sadly, during rewriting it to allow this, I kind of got hung up
       | and couldn't manage to figure out how to cleanly design that in
       | Rust. I'd be really glad if a Rust expert could help me out here:
       | 
       | In the currently stable version, the main interface of each
       | "adapter" is `fn(Read, Write) -> ()`. To allow custom adapter
       | chaining I have to change it to be `fn(Read) -> Read` where each
       | chained adapter wraps the read stream and converts it while
       | reading. But then I get issues with how to handle threading etc,
       | as well as a random deadlock that I haven't figured out how to
       | solve so far :/
        
         | maximz wrote:
         | Love this. I appreciate your building on ripgrep versus my own
         | bulky lucene-based approach a while back
         | (https://github.com/maximz/sift), and that you don't require
         | pre-indexing but build up a cache as you go.
        
         | burntsushi wrote:
         | > In the currently stable version, the main interface of each
         | "adapter" is `fn(Read, Write) -> ()`. To allow custom adapter
         | chaining I have to change it to be `fn(Read) -> Read` where
         | each chained adapter wraps the read stream and converts it
         | while reading. But then I get issues with how to handle
         | threading etc, as well as a random deadlock that I haven't
         | figured out how to solve so far :/
         | 
         | I don't quite grok the problem here. If you file an issue
         | against ripgrep proper with code links and some more details, I
         | can try to assist.
         | 
         | Taken literally, ripgrep uses that exact same approach. There
         | are potentially multiple adapters being used. Each adapter is
         | just defined to wrap a `std::io::Read` implementation, and the
         | adapter in turn implements `std::io::Read` so that it can be
         | composed with others. The part that I'm missing is why this has
         | anything to do with threading or deadlocks. I/O adapters
         | shouldn't be having anything to do with synchronization. So I'm
         | probably misunderstanding your problem.
        
           | phiresky wrote:
           | > If you file an issue against ripgrep proper with code links
           | and some more details
           | 
           | Sorry, I don't think I explained my issue very well. In
           | general it has nothing to do with the interaction with
           | ripgrep, that works fine.
           | 
           | It's that each adapter (e.g. zip -> list of file streams)
           | needs to have an interface of fn(Read) -> Iter<ReadWithMeta>
           | 
           | But then if there's a PDF within the zip, I have to give the
           | returned ReadWithMeta to the PDF adapter - but it can't take
           | ownership, because the Archive file iterators only give
           | borrowed reads. I maybe worked around this by creating a
           | wrapper type [3] and adding an unsafe here [2], but something
           | deadlocks when adapting zip files currently.
           | 
           | Also, for external programs, I have to copy the data from the
           | Read into a Write (stdin of the program) - which needs to
           | happen in a separate thread, otherwise the stdout is never
           | read [1], but some Reads I have aren't Send since they come
           | from e.g. zip-rs, so they can't be passed to a thread.
           | 
           | [1] https://github.com/phiresky/ripgrep-
           | all/blob/baca166fdab3d24...
           | 
           | [2] https://github.com/phiresky/ripgrep-
           | all/blob/baca166fdab3d24...
           | 
           | [3] https://github.com/phiresky/ripgrep-
           | all/blob/baca166fdab3d24...
        
         | one-punch wrote:
         | The integration with fzf seems nice.
         | 
         | Any plans to integrate with skim, a Rust implementation of fzf?
         | 
         | https://github.com/lotabout/skim
        
         | cb321 wrote:
         | One possibility is the almost dirt-simple solution wherein you
         | just have a "make"/"Makefile" (or your favorite other build
         | system) maintain a shadow tree of parallel pre-translated
         | files. You get parallelism via `make -j$(nproc)` or its
         | equivalent.
         | 
         | Every name in the shadow is built from the name in the origin
         | but maybe with ".txt" added (or .txt.gz if you want to keep the
         | compressed with whatever is the fastest decompressor builtin to
         | ripgrep as a library not called as a program). Untranslated
         | names can be just symbolic/hard links back to the origin. Build
         | rules become as flexible as your build system.
         | 
         | This also scales to deployments that have more disk space than
         | memory. Admittedly, in that case, the whole procedure probably
         | becomes disk-IO bound, but maybe not. Maybe some translations
         | cannot even keep up with disk IO - NVMe storage is pretty fast,
         | for example. Or available memory may vary dynamically a lot,
         | sometimes allowing the shadow to be fully in the buffer cache,
         | other times not. It strikes me as less presumptuous to assume
         | you can find disk space vs. having that much memory available.
         | (EDIT2: though I may be confused about how `rga` operates -
         | your doc says "memory cache", though.)
         | 
         | On the pro-side, but for updating the shadows based on origins,
         | the user could even just `rg` from within the shadow and
         | translate filenames "in their head", although stripping an
         | always present string is obviously trivial. Indeed, you won't
         | need `rg --pre` at all and the grep itself could become
         | pluggable. I doubt any of your other `fzf`/etc. integrations
         | would be made more complicated by this design, either.
         | 
         | This all strikes me as simple/nice enough that someone has
         | probably already done it...EDIT1: Oh, I see from thumbs ups and
         | other comments over at [1] and [2] that @phiresky is probably
         | already aware of this design idea, but maybe some HN person
         | knows of an existing solution along these lines.
         | 
         | [1] https://github.com/BurntSushi/ripgrep/issues/978 [2]
         | https://github.com/BurntSushi/ripgrep/pull/981
        
         | aembleton wrote:
         | AUR has both a ripgrep-all [1] and ripgrep-all-bin [2] package.
         | Both were addded by you. The bin package has a newer version.
         | What is the difference between the two?
         | 
         | 1. https://aur.archlinux.org/packages/ripgrep-all/
         | 
         | 2. https://aur.archlinux.org/packages/ripgrep-all-bin
        
       ___________________________________________________________________
       (page generated 2020-12-02 23:00 UTC)