[HN Gopher] Pipe Viewer ___________________________________________________________________ Pipe Viewer Author : 0x45696e6172 Score : 141 points Date : 2022-10-18 09:15 UTC (1 days ago) (HTM) web link (www.ivarch.com) (TXT) w3m dump (www.ivarch.com) | londons_explore wrote: | It would be nice to indicate if the upstream or the downstream is | the 'limiting' factor in speed. | | Ie. within pv, is it the reading the input stream or the writing | the output stream that is blocking most of the time? | ketralnis wrote: | It's open source, be the change you want to see in the world | kotlin2 wrote: | Having maintained an open source library, it's actually | really helpful to see features people want. Not everyone | needs to contribute directly to the code base. User feedback | is valuable, too. | bingaling wrote: | it's instantaneous, but the -T (transfer buffer % full display) | is sometimes useful for that. (0% full -> source limited, 100% | full -> sink limited) | Twirrim wrote: | Oh wow, I'd completely missed that -T flag. That's some | useful data. Thanks for mentioning it! | senjin wrote: | This would be a genius addition | heinrich5991 wrote: | `progress` is also a nice tool to see progress of programs | operating linearly on a single file. A lot of tools do that! | sigmonsays wrote: | i've consistently lost and found this tool over and over again | for over 20 years | pbhjpbhj wrote: | Same, `apropos $keyword` helps, but strangely in this case | doesn't find `progress` from `apropos progress`. | sneak wrote: | part of my default install. | torgard wrote: | There are countless times where I would have found this | incredibly helpful. Just 10 minutes ago, I wanted this exact | tool. | | Thanks! | derefr wrote: | As a person who runs a lot of ETL-like commands at work, I never | find myself using pv(1). I love the idea of it, but for the | commands I most want to measure progress of, they always seem to | be either: | | 1. things where I'd be paranoid about pv(1) itself becoming the | bottleneck in the pipeline -- e.g. dd(1) of large disks where | I've explicitly set a large blocksize and set | conv=idirect/odirect, to optimize throughput. | | 2. things where the program has some useful cleverness I rely on | that requires being fed by a named file argument, but behaves a | lot less intelligently when being fed from stdin -- e.g. feeding | SQL files into psql(1). | | 3. things where the program, even while writing to stdout, also | produces useful "sampled progress" informational messages on | stderr, which I'd like to see; where pv(1) and this output | logging would fight each-other if both were running. | | 4. things where there's no clean place to insert pv(1) anyway -- | mostly, this comes up for any command that manages jobs itself in | order to do things in parallel, e.g. any object-storage-client | mass-copy, or any parallel-rsync script. (You'd think these | programs would also report global progress, but they usually | don't!) | | I could see pv(1) being fixed to address case 3 (by e.g. drawing | progress while streaming stderr-logged output below it, using a | TUI); but the other cases seem to be fundamental limitations. | | Personally, when I want to observe progress on some sort of | operation that's creating files (rsync, tar/untar, etc), here's | what I do instead: I run the command-line, and then, in a | separate terminal connected to the machine the files are being | written/unpacked onto, I run this: # for files | watch -n 2 -- ls -lh $filepath # for directories | watch -n 4 -- du -h -d 0 $dirpath | | If I'm in a tmux(1) session, I usually run the file-copying | command in one pane, and then create a little three-vertical-line | pane below it to run the observation command. | | Doing things this way doesn't give you a percentage progress, but | I find that with most operations I already know what the target's | goal size is going to be, so all I really need to know is the | size-so-far. (And pv(1) can't tell you the target size in many | cases anyway.) | MayeulC wrote: | I usually fix 3. by redirecting the intermediate program to | stderr before piping to pv. | | My main use-case is netcat (nc). | | As an aside, I prefer the BSD version, which I find is superior | (IPv6 support, SOCKS, etc). "GNU Netcat" isn't even part of the | GNU project, AFAIK. I also discovered Ncat while writing this, | from the Nmap project; I'll give it a try. | derefr wrote: | I don't quite understand what you mean -- by default, most | Unix-pipeline-y tools that produce on stdout, if they log at | all, already write their logs to stderr (that being why | stderr exists); and pv(1) already _also_ writes to stderr (as | if it wrote its progress to stdout, you wouldn 't be able to | use it in a pipe!) | | But pv(1) is just blindly attempting to emit "\r[progress bar | ASCII-art]\n" (plus a few regular lines) to stderr every | second; and interleaving that into your PTY buffer along with | actual lines of stderr output from your producer command, | will just result in mush -- a barrage of new progress bars on | new lines, overwriting any lines emitted directly before | them. | | Having two things both writing to stderr, where one's trying | to do something TUI-ish, and the other is attempting to write | regular text lines, is the _problem statement_ of 3, not the | solution to it. | | A _solution_ , AFAICT, would look more like: enabling pv(1) | to (somehow) capture the stderr of the entire command-line, | and manage it, along with drawing the progress bar. Probably | by splitting pv(1) into two programs -- one that goes inside | the command-line, watches progress, and emits progress logs | as specially-tagged little messages (think: the UUID-like | heredoc tags used in MIME-email binary-embeds) without any | ANSI escape codes; and another, which _wraps_ your whole | command line, parsing out the messages emitted by the inner | pv(1) to render a progress bar on the top /bottom of the PTY | buffer, while streaming the regular lines across the rest of | the PTY buffer. (Probably all on the PTY secondary buffer, | like less(1) or a text editor.) | | Another, probably simpler, solution would be to have a flag | that tells pv(1) to log progress "events" (as JSON or | whatever) to a named-FIFO filepath it would create (and then | delete when the pipeline is over) -- or to a loopback- | interface TCP port it would listen on -- and otherwise be | silent; and then to have another command you can run | asynchronously to your command-line, to open that named | FIFO/connect to that port, and consume the events from it, | rendering them as a progress bar; which would also quit when | the FIFO gets deleted / when the socket is closed by the | remote. Then you could run _that_ command, instead of | watch(2), in another tmux(2) pane, or wherever you like. | gpderetta wrote: | You could redirect each pipeline stage stderr to a fifo and | tail it from another terminal. A bit annoying to do it by | hand though. | gpderetta wrote: | IIRC pv uses splice internally and simply tells the kernel to | mive pipe buffers from one pipe to the other, so it is very | unlikely to be a bottleneck. | derefr wrote: | In the dd(1) case, we're talking about "having any pipe | involved at all" vs "no pipe, just copying internal to the | command." The Linux kernel pipe buffer size is only 64KB, | while my hand-optimized `bs` usually lands at ~2MB. There's a | _big_ performance gap introduced by serially copying tiny | (non-IO-queue-saturating) chunks at a time -- it can | literally be a difference of minutes vs. hours to complete a | copy. Especially when there 's high IO _latency_ on one end, | e.g. on IaaS network disks. | invalidator wrote: | Try using "pv -d <pid>". It will monitor open files on the | process and report progress on them. | | 1) this gets it out of the pipeline. 2) the program gets to | have the named arguments. 3) pv's out put is on a separate | terminal. 4) your job never needs to know. | | Downside: it only sees the currently open files, so it doesn't | work well for batch jobs. Still, it's handy to see which file | it's on, and how fast the progress is. | | Also, for rsync: "--info=progress2 --no-i-r" will show you the | progress for a whole job. | prmoustache wrote: | Sometimes you prefer predictability and information over sheer | speed. If do a very large transfer that could take hours, I'd | rather trade a bit of speed to know the progress and make sure | nothing is stuck than launching in the blind and then repeat | slow and expensive du commands to know where I am in the | transfer or have to strace the process. | derefr wrote: | > slow and expensive du commands | | You'd be surprised how cheap these du(1) can be when you're | running the _same_ du(1) command over and over. Think of it | like running the same SQL query over and over -- the first | time you do it, the DBMS takes its time doing IO to pull the | relevant disk pages into the disk cache; but the Nth>=2 time, | the query is entirely over "hot" data. Hot filesystem | metadata pages, in this case. (Plus, for the file(s) that | were just written by your command, the query is hot because | those pages are still in memory from being recently dirty.) | | I regularly unpack tarballs containing 10 million+ files; and | periodic du(1) over these takes only a few milliseconds of | wall-clock time to complete. | | (The other bottleneck with du(1), for deep file hierarchies, | is printing all the subdirectory sizes. Which is why the `-d | 0` -- to only print the total.) | | You might be worried about something else thrashing the disk | cache, but in my experience I've never needed to run an ETL- | like job on a system that's _also_ running some other | completely orthogonal IO-heavy prod workload. Usually such | jobs are for restoring data onto new systems, migrating data | between systems, etc.; where if there _is_ any prod workload | running on the box, it 's one that's touching all the _same_ | data you 're touching, and so keeping disk-cache coherency. | leni536 wrote: | For rsync to get reliable global progress there is --no-i-r | --info=progress2 . --no-i-r adds a bit of upfront work, but | it's well worth it IMO. | derefr wrote: | Thanks for that! (I felt like I _had_ to be missing | something, with how useless rsync progress usually was.) | TT-392 wrote: | But... pipe-viewer was already a commandline youtube browser | dima55 wrote: | pv predates youtube itself | trabant00 wrote: | I've used it mostly to measure events per second with something | like: tail -f /some/log | grep something | pv -lr | > /dev/null or tcpdump expression | pv -lr > | /dev/null | JayGuerette wrote: | pv is a great tool. One of it's lesser known features is | throttling; transfer a file without dominating your bandwidth: | | pv -L 200K < bigfile.iso | ssh somehost 'cat > bigfile.iso' | | Complete with a progress bar, speed, and ETA. | dspillett wrote: | Similarly, though useful less often these days, using | -B/--buffer-size to increase the amount that it can buffer. If | reading data from traditional hard drives, piping that data | through some process, and writing the result back to the same | drives, this option can increase throughput significantly by | reducing head movements. It can help on other storage systems | too, but usually not so much so. | smcl wrote: | Oh damn that's neat I never thought to use `ssh` directly when | transferring a file, I always used `scp bigfile.iso | name@server.org:path/in/destination` | tyingq wrote: | A similar trick that's nice is piping tar through ssh. Handy | if you don't have rsync or something better around. Even | handy for one file, since it preserves permissions, etc. | | tar -cf - some/dir | ssh remote 'cd /place/to/go && tar -xvf | -' | Twirrim wrote: | I love this trick. I was dealing with some old solaris | boxes something like 15 years ago when I learned you could | do this. I couldn't rsync, and had started off SCP'ing | hundreds of thousands of files across but it was going to | take an insane length of time. Asked one of the other | sysadmins if they knew a better way and they pointed out | you can pipe stuff in to ssh for the other side too. Every | now and then this technique proves useful in unexpected | ways :) | fbergen wrote: | Also see `scp -l 200 bigfile.iso | name@server.org:path/in/destination` | | from man page: | | -l limit | | Limits the used bandwidth, specified in Kbit/s. | MayeulC wrote: | Also you probably shouldn't use scp. rsync and sftp have | mostly the same semantics. rsync | --bwlimit=2OOK bigfile.iso | name@server.org:path/in/destination sftp -l 200 | bigfile.iso name@server.org:path/in/destination | | Although it seems that scp is becoming a wrapper around | sftp these days: | | https://www.redhat.com/en/blog/openssh-scp-deprecation- | rhel-... | | https://news.ycombinator.com/item?id=25005567 | michaelmior wrote: | Probably my favorite non-POSIX tool that I insert into my | pipelines whenever anything takes more than a few second. I find | it super helpful to avoid premature optimization. If I can | quickly see that my hacked together pipeline will run in a few | minutes and I only ever need to do that once, I'll probably just | let it finish. If it's going to take a few hours, I might decide | it's worth optimizing. | | It also helps me optimize my time. If something is going to | finish in a few minutes, I probably won't context switch to | another major task. However, if something is going to take a few | hours then I'll probably switch to work on something different | knowing approximately when I can go back and check on results. | systems_glitch wrote: | Same, one of the first utilities I install on a new system. | dang wrote: | Related: | | _PV (Pipe Viewer) - add a progress bar to most command-line | programs_ - https://news.ycombinator.com/item?id=23826845 - July | 2020 (2 comments) | | _A Unix Utility You Should Know About: Pipe Viewer_ - | https://news.ycombinator.com/item?id=8761094 - Dec 2014 (1 | comment) | | _Pipe Viewer_ - https://news.ycombinator.com/item?id=5942115 - | June 2013 (1 comment) | | _Pipe Viewer_ - https://news.ycombinator.com/item?id=4020026 - | May 2012 (26 comments) | | _A Unix Utility You Should Know About: Pipe Viewer_ - | https://news.ycombinator.com/item?id=462244 - Feb 2009 (63 | comments) | est wrote: | pv was the tool when I discovered sometimes the VPS have only | 10Gbps memory copy speed. | cwillu wrote: | pv -d $(pidof xz):1 is great for when you realize too late that | something is slow enough that you want a progress indication, and | definitely do not want to restart from scratch. | dspillett wrote: | Another good option for that, which works in a number of other | useful circumstances too, is progress: | https://github.com/Xfennec/progress | xuhu wrote: | How `pv -d` work ? Does it use perf probes or attach to the | target PID ? | remram wrote: | It finds the file using /proc/<pid>/fd/<num> and watches its | size grow. It doesn't work with pipes, devices, a file being | overwritten (not appended to), or anything whose size doesn't | grow. | cwillu wrote: | It appears to monitor the contents of /proc/<pid>/fdinfo/<fd> ___________________________________________________________________ (page generated 2022-10-19 23:00 UTC)