[HN Gopher] Pipe Viewer
       ___________________________________________________________________
        
       Pipe Viewer
        
       Author : 0x45696e6172
       Score  : 141 points
       Date   : 2022-10-18 09:15 UTC (1 days ago)
        
 (HTM) web link (www.ivarch.com)
 (TXT) w3m dump (www.ivarch.com)
        
       | londons_explore wrote:
       | It would be nice to indicate if the upstream or the downstream is
       | the 'limiting' factor in speed.
       | 
       | Ie. within pv, is it the reading the input stream or the writing
       | the output stream that is blocking most of the time?
        
         | ketralnis wrote:
         | It's open source, be the change you want to see in the world
        
           | kotlin2 wrote:
           | Having maintained an open source library, it's actually
           | really helpful to see features people want. Not everyone
           | needs to contribute directly to the code base. User feedback
           | is valuable, too.
        
         | bingaling wrote:
         | it's instantaneous, but the -T (transfer buffer % full display)
         | is sometimes useful for that. (0% full -> source limited, 100%
         | full -> sink limited)
        
           | Twirrim wrote:
           | Oh wow, I'd completely missed that -T flag. That's some
           | useful data. Thanks for mentioning it!
        
         | senjin wrote:
         | This would be a genius addition
        
       | heinrich5991 wrote:
       | `progress` is also a nice tool to see progress of programs
       | operating linearly on a single file. A lot of tools do that!
        
       | sigmonsays wrote:
       | i've consistently lost and found this tool over and over again
       | for over 20 years
        
         | pbhjpbhj wrote:
         | Same, `apropos $keyword` helps, but strangely in this case
         | doesn't find `progress` from `apropos progress`.
        
       | sneak wrote:
       | part of my default install.
        
       | torgard wrote:
       | There are countless times where I would have found this
       | incredibly helpful. Just 10 minutes ago, I wanted this exact
       | tool.
       | 
       | Thanks!
        
       | derefr wrote:
       | As a person who runs a lot of ETL-like commands at work, I never
       | find myself using pv(1). I love the idea of it, but for the
       | commands I most want to measure progress of, they always seem to
       | be either:
       | 
       | 1. things where I'd be paranoid about pv(1) itself becoming the
       | bottleneck in the pipeline -- e.g. dd(1) of large disks where
       | I've explicitly set a large blocksize and set
       | conv=idirect/odirect, to optimize throughput.
       | 
       | 2. things where the program has some useful cleverness I rely on
       | that requires being fed by a named file argument, but behaves a
       | lot less intelligently when being fed from stdin -- e.g. feeding
       | SQL files into psql(1).
       | 
       | 3. things where the program, even while writing to stdout, also
       | produces useful "sampled progress" informational messages on
       | stderr, which I'd like to see; where pv(1) and this output
       | logging would fight each-other if both were running.
       | 
       | 4. things where there's no clean place to insert pv(1) anyway --
       | mostly, this comes up for any command that manages jobs itself in
       | order to do things in parallel, e.g. any object-storage-client
       | mass-copy, or any parallel-rsync script. (You'd think these
       | programs would also report global progress, but they usually
       | don't!)
       | 
       | I could see pv(1) being fixed to address case 3 (by e.g. drawing
       | progress while streaming stderr-logged output below it, using a
       | TUI); but the other cases seem to be fundamental limitations.
       | 
       | Personally, when I want to observe progress on some sort of
       | operation that's creating files (rsync, tar/untar, etc), here's
       | what I do instead: I run the command-line, and then, in a
       | separate terminal connected to the machine the files are being
       | written/unpacked onto, I run this:                   # for files
       | watch -n 2 -- ls -lh $filepath              # for directories
       | watch -n 4 -- du -h -d 0 $dirpath
       | 
       | If I'm in a tmux(1) session, I usually run the file-copying
       | command in one pane, and then create a little three-vertical-line
       | pane below it to run the observation command.
       | 
       | Doing things this way doesn't give you a percentage progress, but
       | I find that with most operations I already know what the target's
       | goal size is going to be, so all I really need to know is the
       | size-so-far. (And pv(1) can't tell you the target size in many
       | cases anyway.)
        
         | MayeulC wrote:
         | I usually fix 3. by redirecting the intermediate program to
         | stderr before piping to pv.
         | 
         | My main use-case is netcat (nc).
         | 
         | As an aside, I prefer the BSD version, which I find is superior
         | (IPv6 support, SOCKS, etc). "GNU Netcat" isn't even part of the
         | GNU project, AFAIK. I also discovered Ncat while writing this,
         | from the Nmap project; I'll give it a try.
        
           | derefr wrote:
           | I don't quite understand what you mean -- by default, most
           | Unix-pipeline-y tools that produce on stdout, if they log at
           | all, already write their logs to stderr (that being why
           | stderr exists); and pv(1) already _also_ writes to stderr (as
           | if it wrote its progress to stdout, you wouldn 't be able to
           | use it in a pipe!)
           | 
           | But pv(1) is just blindly attempting to emit "\r[progress bar
           | ASCII-art]\n" (plus a few regular lines) to stderr every
           | second; and interleaving that into your PTY buffer along with
           | actual lines of stderr output from your producer command,
           | will just result in mush -- a barrage of new progress bars on
           | new lines, overwriting any lines emitted directly before
           | them.
           | 
           | Having two things both writing to stderr, where one's trying
           | to do something TUI-ish, and the other is attempting to write
           | regular text lines, is the _problem statement_ of 3, not the
           | solution to it.
           | 
           | A _solution_ , AFAICT, would look more like: enabling pv(1)
           | to (somehow) capture the stderr of the entire command-line,
           | and manage it, along with drawing the progress bar. Probably
           | by splitting pv(1) into two programs -- one that goes inside
           | the command-line, watches progress, and emits progress logs
           | as specially-tagged little messages (think: the UUID-like
           | heredoc tags used in MIME-email binary-embeds) without any
           | ANSI escape codes; and another, which _wraps_ your whole
           | command line, parsing out the messages emitted by the inner
           | pv(1) to render a progress bar on the top /bottom of the PTY
           | buffer, while streaming the regular lines across the rest of
           | the PTY buffer. (Probably all on the PTY secondary buffer,
           | like less(1) or a text editor.)
           | 
           | Another, probably simpler, solution would be to have a flag
           | that tells pv(1) to log progress "events" (as JSON or
           | whatever) to a named-FIFO filepath it would create (and then
           | delete when the pipeline is over) -- or to a loopback-
           | interface TCP port it would listen on -- and otherwise be
           | silent; and then to have another command you can run
           | asynchronously to your command-line, to open that named
           | FIFO/connect to that port, and consume the events from it,
           | rendering them as a progress bar; which would also quit when
           | the FIFO gets deleted / when the socket is closed by the
           | remote. Then you could run _that_ command, instead of
           | watch(2), in another tmux(2) pane, or wherever you like.
        
             | gpderetta wrote:
             | You could redirect each pipeline stage stderr to a fifo and
             | tail it from another terminal. A bit annoying to do it by
             | hand though.
        
         | gpderetta wrote:
         | IIRC pv uses splice internally and simply tells the kernel to
         | mive pipe buffers from one pipe to the other, so it is very
         | unlikely to be a bottleneck.
        
           | derefr wrote:
           | In the dd(1) case, we're talking about "having any pipe
           | involved at all" vs "no pipe, just copying internal to the
           | command." The Linux kernel pipe buffer size is only 64KB,
           | while my hand-optimized `bs` usually lands at ~2MB. There's a
           | _big_ performance gap introduced by serially copying tiny
           | (non-IO-queue-saturating) chunks at a time -- it can
           | literally be a difference of minutes vs. hours to complete a
           | copy. Especially when there 's high IO _latency_ on one end,
           | e.g. on IaaS network disks.
        
         | invalidator wrote:
         | Try using "pv -d <pid>". It will monitor open files on the
         | process and report progress on them.
         | 
         | 1) this gets it out of the pipeline. 2) the program gets to
         | have the named arguments. 3) pv's out put is on a separate
         | terminal. 4) your job never needs to know.
         | 
         | Downside: it only sees the currently open files, so it doesn't
         | work well for batch jobs. Still, it's handy to see which file
         | it's on, and how fast the progress is.
         | 
         | Also, for rsync: "--info=progress2 --no-i-r" will show you the
         | progress for a whole job.
        
         | prmoustache wrote:
         | Sometimes you prefer predictability and information over sheer
         | speed. If do a very large transfer that could take hours, I'd
         | rather trade a bit of speed to know the progress and make sure
         | nothing is stuck than launching in the blind and then repeat
         | slow and expensive du commands to know where I am in the
         | transfer or have to strace the process.
        
           | derefr wrote:
           | > slow and expensive du commands
           | 
           | You'd be surprised how cheap these du(1) can be when you're
           | running the _same_ du(1) command over and over. Think of it
           | like running the same SQL query over and over -- the first
           | time you do it, the DBMS takes its time doing IO to pull the
           | relevant disk pages into the disk cache; but the Nth>=2 time,
           | the query is entirely over  "hot" data. Hot filesystem
           | metadata pages, in this case. (Plus, for the file(s) that
           | were just written by your command, the query is hot because
           | those pages are still in memory from being recently dirty.)
           | 
           | I regularly unpack tarballs containing 10 million+ files; and
           | periodic du(1) over these takes only a few milliseconds of
           | wall-clock time to complete.
           | 
           | (The other bottleneck with du(1), for deep file hierarchies,
           | is printing all the subdirectory sizes. Which is why the `-d
           | 0` -- to only print the total.)
           | 
           | You might be worried about something else thrashing the disk
           | cache, but in my experience I've never needed to run an ETL-
           | like job on a system that's _also_ running some other
           | completely orthogonal IO-heavy prod workload. Usually such
           | jobs are for restoring data onto new systems, migrating data
           | between systems, etc.; where if there _is_ any prod workload
           | running on the box, it 's one that's touching all the _same_
           | data you 're touching, and so keeping disk-cache coherency.
        
         | leni536 wrote:
         | For rsync to get reliable global progress there is --no-i-r
         | --info=progress2 . --no-i-r adds a bit of upfront work, but
         | it's well worth it IMO.
        
           | derefr wrote:
           | Thanks for that! (I felt like I _had_ to be missing
           | something, with how useless rsync progress usually was.)
        
       | TT-392 wrote:
       | But... pipe-viewer was already a commandline youtube browser
        
         | dima55 wrote:
         | pv predates youtube itself
        
       | trabant00 wrote:
       | I've used it mostly to measure events per second with something
       | like:                 tail -f /some/log | grep something | pv -lr
       | > /dev/null       or       tcpdump expression | pv -lr >
       | /dev/null
        
       | JayGuerette wrote:
       | pv is a great tool. One of it's lesser known features is
       | throttling; transfer a file without dominating your bandwidth:
       | 
       | pv -L 200K < bigfile.iso | ssh somehost 'cat > bigfile.iso'
       | 
       | Complete with a progress bar, speed, and ETA.
        
         | dspillett wrote:
         | Similarly, though useful less often these days, using
         | -B/--buffer-size to increase the amount that it can buffer. If
         | reading data from traditional hard drives, piping that data
         | through some process, and writing the result back to the same
         | drives, this option can increase throughput significantly by
         | reducing head movements. It can help on other storage systems
         | too, but usually not so much so.
        
         | smcl wrote:
         | Oh damn that's neat I never thought to use `ssh` directly when
         | transferring a file, I always used `scp bigfile.iso
         | name@server.org:path/in/destination`
        
           | tyingq wrote:
           | A similar trick that's nice is piping tar through ssh. Handy
           | if you don't have rsync or something better around. Even
           | handy for one file, since it preserves permissions, etc.
           | 
           | tar -cf - some/dir | ssh remote 'cd /place/to/go && tar -xvf
           | -'
        
             | Twirrim wrote:
             | I love this trick. I was dealing with some old solaris
             | boxes something like 15 years ago when I learned you could
             | do this. I couldn't rsync, and had started off SCP'ing
             | hundreds of thousands of files across but it was going to
             | take an insane length of time. Asked one of the other
             | sysadmins if they knew a better way and they pointed out
             | you can pipe stuff in to ssh for the other side too. Every
             | now and then this technique proves useful in unexpected
             | ways :)
        
           | fbergen wrote:
           | Also see `scp -l 200 bigfile.iso
           | name@server.org:path/in/destination`
           | 
           | from man page:
           | 
           | -l limit
           | 
           | Limits the used bandwidth, specified in Kbit/s.
        
             | MayeulC wrote:
             | Also you probably shouldn't use scp. rsync and sftp have
             | mostly the same semantics.                   rsync
             | --bwlimit=2OOK bigfile.iso
             | name@server.org:path/in/destination         sftp -l 200
             | bigfile.iso name@server.org:path/in/destination
             | 
             | Although it seems that scp is becoming a wrapper around
             | sftp these days:
             | 
             | https://www.redhat.com/en/blog/openssh-scp-deprecation-
             | rhel-...
             | 
             | https://news.ycombinator.com/item?id=25005567
        
       | michaelmior wrote:
       | Probably my favorite non-POSIX tool that I insert into my
       | pipelines whenever anything takes more than a few second. I find
       | it super helpful to avoid premature optimization. If I can
       | quickly see that my hacked together pipeline will run in a few
       | minutes and I only ever need to do that once, I'll probably just
       | let it finish. If it's going to take a few hours, I might decide
       | it's worth optimizing.
       | 
       | It also helps me optimize my time. If something is going to
       | finish in a few minutes, I probably won't context switch to
       | another major task. However, if something is going to take a few
       | hours then I'll probably switch to work on something different
       | knowing approximately when I can go back and check on results.
        
         | systems_glitch wrote:
         | Same, one of the first utilities I install on a new system.
        
       | dang wrote:
       | Related:
       | 
       |  _PV (Pipe Viewer) - add a progress bar to most command-line
       | programs_ - https://news.ycombinator.com/item?id=23826845 - July
       | 2020 (2 comments)
       | 
       |  _A Unix Utility You Should Know About: Pipe Viewer_ -
       | https://news.ycombinator.com/item?id=8761094 - Dec 2014 (1
       | comment)
       | 
       |  _Pipe Viewer_ - https://news.ycombinator.com/item?id=5942115 -
       | June 2013 (1 comment)
       | 
       |  _Pipe Viewer_ - https://news.ycombinator.com/item?id=4020026 -
       | May 2012 (26 comments)
       | 
       |  _A Unix Utility You Should Know About: Pipe Viewer_ -
       | https://news.ycombinator.com/item?id=462244 - Feb 2009 (63
       | comments)
        
       | est wrote:
       | pv was the tool when I discovered sometimes the VPS have only
       | 10Gbps memory copy speed.
        
       | cwillu wrote:
       | pv -d $(pidof xz):1 is great for when you realize too late that
       | something is slow enough that you want a progress indication, and
       | definitely do not want to restart from scratch.
        
         | dspillett wrote:
         | Another good option for that, which works in a number of other
         | useful circumstances too, is progress:
         | https://github.com/Xfennec/progress
        
         | xuhu wrote:
         | How `pv -d` work ? Does it use perf probes or attach to the
         | target PID ?
        
           | remram wrote:
           | It finds the file using /proc/<pid>/fd/<num> and watches its
           | size grow. It doesn't work with pipes, devices, a file being
           | overwritten (not appended to), or anything whose size doesn't
           | grow.
        
           | cwillu wrote:
           | It appears to monitor the contents of /proc/<pid>/fdinfo/<fd>
        
       ___________________________________________________________________
       (page generated 2022-10-19 23:00 UTC)