hngopher.com

       [HN Gopher] Computer Latency: 1977-2017
       ___________________________________________________________________
        
       Computer Latency: 1977-2017
        
       Author : 2pEXgD0fZ5cF
       Score  : 122 points
       Date   : 2022-11-20 17:51 UTC (5 hours ago)
        
 (HTM) web link (danluu.com)
 (TXT) w3m dump (danluu.com)
        
       | haberman wrote:
       | When I saw this page a few years back I had an idea for a
       | project. I want to create the lowest-latency typing terminal I
       | possibly can, using an FPGA and an LED array. My initial results
       | suggest that I can drive a 64x32 pixel LED array at 4.88kHz, for
       | a roughly 0.2ms latency.
       | 
       | For the next step I want to make it capable of injecting
       | artificial latency, and then do A/B testing to determine (1) the
       | smallest amount of latency I can reliably perceive, and (2) the
       | smallest amount of latency that actually bothers me.
       | 
       | This idea was also inspired by this work from Microsoft Research,
       | where they do a similar experiment with touch screens:
       | https://www.youtube.com/watch?v=vOvQCPLkPt4
        
       | walrus01 wrote:
       | Something I recently observed is that cutting edge, current
       | generation gaming-marketed x86-64 motherboards for single socket
       | CPUs, both Intel and AMD, still come with a single PS/2 mouse
       | port on the rear I/O plate.
       | 
       | I read something about this being intended for use with high end
       | wired gaming mice, where the end to end latency between mouse and
       | cursor movement is theoretically lower if the signal doesn't go
       | through the USB bus on the motherboard, but rather through
       | whatever legacy PS/2 interface is talking to the equivalent-of-
       | northbridge chipset.
        
       | pjkundert wrote:
       | Using emacs on an SGI Iris in 1988 was ... sublime.
       | 
       | Every computer systems since then has been a head shaking
       | disappointment, latency-wise.
        
       | still_grokking wrote:
       | Cynic comment ahead, beware!
       | 
       | ---
       | 
       | Does this actually even matter today when every click or key-
       | press triggers dozens of fat network request going around the
       | globe on top of a maximally inefficient protocol?
       | 
       | Or to summarize what we see here: We've build layers of madness.
       | Now we have just to deal with the fallout...
       | 
       | The result is in no way surprising given we haven't refactored
       | our systems for over 50 years and just put new things on top.
        
         | retrac wrote:
         | If you aren't familiar, check out Winning Run [1]. A 3D arcade
         | racing game from 1988, about the best possible with custom
         | hardware at the time. Graphics quality is primitive by modern
         | standards. But make sure to watch the video in 60 fps. If
         | there's any hiccups, it's your device playing the video. Smooth
         | and continuous 60 frames per second rendering, with some tens
         | of millisecond delay to respond to game inputs. It's still very
         | hard to pull that off today, yet it's fundamental to that type
         | of game's overall quality.
         | 
         | [1] https://youtu.be/NBiD-v-YGIA?t=85
        
           | alpaca128 wrote:
           | WipEout HD on the PS3 managed to get super stable 60FPS at
           | 1080p. It dynamically scales the horizontal rendering
           | resolution for every frame and then scales it to 1920 pixels
           | using hardware. So the resolution might vary a bit but at
           | that framerate and such speeds in races it's not noticeable.
           | The controls were super smooth at any speed, only achievement
           | popups caused the whole game to freeze for half a second.
        
           | _trampeltier wrote:
           | I guess keyboard latency is also the biggest problem if you
           | play old games in emulators. I feel is often very difficult
           | to play old action games, because you can't hit the buttons
           | exactly enough.
        
             | mrob wrote:
             | I'm using a Razer Huntsman V2 keyboard, which has 8kHz
             | polling and optical switches. I do not notice any obvious
             | latency from it, and the specification claims sub-
             | millisecond latency from switch activation point. This is
             | better performance than is possible from a PS/2 keyboard,
             | because the PS/2 interface is bottlenecked by the slow
             | serial link.
        
           | drewtato wrote:
           | This video is not a steady 60FPS. Lots of frames are
           | duplicated or torn. Maybe this was originally 60FPS and got
           | mangled by the recording process.
        
         | yakubin wrote:
         | That inefficient network has better latency than your computer
         | when trying to show you a pixel:
         | <http://newstmobilephone.blogspot.com/2012/05/john-carmack-
         | ex...>
        
           | still_grokking wrote:
           | Only that such a network call can't replace the pixel output.
           | 
           | It just adds up to the overall latency.
           | 
           | Also real latency of web-pages is measured in _seconds_ these
           | days. People are happy when they 're able to serve a request
           | in under 0.2 sec.
        
             | jiggawatts wrote:
             | Fifteen years ago I used to target 15ms as seen in the
             | browser F12 network trace (not as recorded on the server!)
             | and if I mention such a thing these days people are
             | flabbergasted.
             | 
             | For example, I had a support call with Azure asking them
             | why the latency between Azure App Service and Azure SQL was
             | as high as 13ms, and they asked me if my target user base
             | was "high frequency traders" or somesuch.
             | 
             | They just could not believe that I was expecting sub-1ms
             | latencies as a normal thing for a database response.
        
               | still_grokking wrote:
               | > if I mention such a thing these days people are
               | flabbergasted
               | 
               | I think I'm just learning this the hard way, given the
               | down-votes of the initial comment. :-)
               | 
               | Maybe people really don't see the issue with adding
               | layers after layers of stuff, and that we've reached, no,
               | surpassed even, some tragicomic point? Computers are
               | thousands of times faster, yet the end-user experience
               | becomes more sluggish with every year passing. We have an
               | issue, I would say. And it's actually not even funny any
               | more.
        
       | giantdude wrote:
       | I always thought that Apple ][ + was as good as it gets. It's
       | been downhill from there, for Apple and for the rest of us.
        
         | 13of40 wrote:
         | I was shocked to see the TI-99/4a so high up. Just listing a
         | BASIC program on a TI-99 is about as slow as a 300 baud modem.
         | 
         | Example: https://youtu.be/ls-PxqRQ35Q?t=178
        
         | jamiek88 wrote:
         | Once I got good at typing on it my Acorn Electron (we couldn't
         | afford the whizzy bbc master!) was an extension of my brain.
         | 
         | Instant response. A full reboot was a control break away.
         | Instant access to the interpreter. Easy assembly access.
         | 
         | I thought, it executed.
         | 
         | I remember our school moving from the networked bbc's to the
         | PC's and it was a huge downgrade for us as kids. Computer class
         | became operating a word processor or learning win 3.11 rather
         | than the exciting and sometimes adversarial (remote messaging
         | other terminals, spoofing etc) system that made us want to
         | learn, to just more drudgery.
        
           | snoot wrote:
           | I agree with all of this except for one point:
           | 
           | Having an ordinary key on the keyboard that would effectively
           | kill -9 the current program _and_ clear the screen was a
           | crazy design decision, especially for a machine where saving
           | data meant using a cassette tape!
        
             | jamiek88 wrote:
             | The break key unless you held down control was only a soft
             | break though.
             | 
             | Your program would still be in memory with an \>OLD
             | command.
             | 
             | As long as it was a basic prog, the machine code loaded
             | *RUN was lost and had to be reloaded from tape, yes.
             | 
             | A pain for games but I don't really recall accidentally
             | pressing the break key much it was out of the way up right.
             | 
             | I could talk about this all day!
        
               | snoot wrote:
               | It's true that you could get a basic program back with
               | old.
               | 
               | But any data was lost and I saw break get pressed
               | accidentally fairly often at school and amongst friends.
        
               | jamiek88 wrote:
               | Fair. Not everyone spent 12 hours per day on their
               | computer like me! They probably had friends and stuff. :)
        
       | preinheimer wrote:
       | Global Ping Data - https://wondernetwork.com/pings
       | 
       | We've got servers in 200+ cities around the world, and ask them
       | to ping each other every hour. Currently it takes our servers in
       | Tokyo and London about 226ms to ping each other.
       | 
       | We've got some downloadable datasets here if you want to play
       | with them: https://wonderproxy.com/blog/a-day-in-the-life-of-the-
       | intern...
        
         | jiggawatts wrote:
         | The fundamental physical limit to latency caused by the speed
         | of light is gleefully ignored by many web "application"
         | architects. Apps that feel super snappy when hosted in the same
         | region run like molasses from places like Australia. Unless the
         | back-end is deployed in every major region, a significant
         | fraction of your userbase will always think of your app as
         | sluggish, irrespective of how much optimisation work goes into
         | it.
         | 
         | Some random example:
         | 
         | Azure Application Insights can be deployed to any Azure region,
         | making it feel noticeably snappier than most cloud hosted
         | competitors such as New Relic or logz.io.
         | 
         | ESRI ArcGIS has a cloud version that is "quick and easy" to use
         | compare to the hosted version... and is terribly slow for
         | anyone outside of the US.
         | 
         | Our timesheet app is hosted in the US and is barely useable.
         | Our managers complain that engineers "don't like timesheets".
         | Look... we don't _mind_ timesheets, but having to... wait...
         | seconds... for.... each... click... is just torture, especially
         | at 4:55pm on a Friday afternoon.
        
         | emj wrote:
         | I've used your ping data before, it was useful to know where to
         | place my servers, and how nice of you to publish a dump as
         | well! If I can wish for mor data: a min-median-max client
         | latencies for all those servers would be swell, but I can see
         | that you might not want to publish the results of that maybe on
         | per month basis? Just a couple of thousand packets every hour
         | should be enough: tcpdump -w stats.pcap -c5000 "tcp[tcpflags] &
         | (tcp-syn|tcp-ack) != 0"
        
       | drewtato wrote:
       | Powershell isn't a terminal (it's a shell, obviously), so the
       | windows results are most likely tested in conhost. If it's on
       | windows 11 it might be windows terminal, which may be more likely
       | since I think cmd is still default on windows 10.
        
         | 13of40 wrote:
         | It might still be a valid test, because PowerShell needs to
         | have a bunch of code in the stack between the keypress event
         | and the call into the console API that actually displays the
         | character. Among other things, the entire command line is
         | getting lexically parsed every time you press a key.
        
           | jiggawatts wrote:
           | If you think "parsing the command line" should or does take
           | appreciable time on a human timescale when executed by a
           | modern superscalar processor, then your mental model of
           | computer performance is "off" by at least 4 or 5 orders of
           | magnitude. Not four or five times incorrect, but many
           | thousands of times incorrect.
        
             | 13of40 wrote:
             | Just for context, I worked on the feature team for a lot of
             | early versions of PowerShell, so I kind of know where the
             | bodies are buried. Here are some empirical numbers, just
             | for the parsing part:
             | 
             | 10K iterations of $null=$null took 38 milliseconds on my
             | laptop.
             | 
             | 10K parses of the letter "a" took 110 milliseconds.
             | 
             | 10K parses of a 100K character comment took 7405
             | milliseconds.
             | 
             | 10K parses of a complex nested expression took just over
             | six minutes.
             | 
             | You're probably imagining a lexer written in C that
             | tokenizes a context free language and does nothing else. In
             | PowerShell, you can't run the tokenizer directly, you have
             | to use the parser, which also builds an AST. The language
             | itself is a blend of two different paradigms, so a token
             | can have a totally different meaning depending on whether
             | it's part of an expression or a command, meaning more state
             | to track during the tokenizer pass.
             | 
             | On top of that, while it was being developed, language
             | performance wasn't a priority until around version 3 or 4,
             | and the main perf advancement then was to compile from AST
             | to dynamic code for code blocks that get run a minimum
             | number of times. The parser itself was never subject to any
             | deep perf testing, IIRC.
             | 
             | Plus it does a bunch of other stuff when you press a key,
             | not just the parsing. All of the host code that listens for
             | the keyboard event and ultimately puts the character on the
             | screen, for example, is probably half a dozen layers of
             | managed and unmanaged abstractions around the Win32 console
             | API.
        
           | drewtato wrote:
           | The test is valid for any combo of shell and terminal, it's
           | just a matter of figuring out which methodology was used so
           | it can be better understood.
           | 
           | But yeah, I agree with the other comment that powershell is
           | likely adding less than 1ms.
        
       | fnordpiglet wrote:
       | Yet more proof we should have just stopped with the SGI Indy
        
       | killjoywashere wrote:
       | I really found this valuable, particularly the slide at the top
       | that enables you to visualize low level latency times (Jeff Dean
       | numbers) over the years. tl;dr: not much has changed in the
       | processor hardware numbers since 2012. So everything right of the
       | processor is where the action is. And sounds like people are
       | starting to actually make progress.
       | 
       | https://colin-scott.github.io/personal_website/research/inte...
        
       | nikanj wrote:
       | On my state-of-the-art desktop PC, Visual Studio has very
       | noticeable cursor&scrolling lag. My C64 had the latter as well,
       | but I used to assume the cursor moved as fast as I could type /
       | tap the arrow keys
        
       | egberts1 wrote:
       | So much added sluggishness and they still cannot bring themselves
       | to show us a current dynamic keyboard mapping to this day.
        
         | mhh__ wrote:
         | What's a current dynamic keyboard mapping?
        
           | egberts1 wrote:
           | Things (a dialog/popup box?) that let you see what each key
           | are mapped to based on its current window and/or its mouse
           | position.
        
       | Smoosh wrote:
       | Has anyone else used an IBM mainframe with a hardware 327x
       | terminal?
       | 
       | They process all normal keystrokes locally, and only send back to
       | the host when Enter and function keys are pressed. This means
       | very low latency for typing and most keystrokes. But much longer
       | latency when you press enter, or page up/down as the mainframe
       | then processes all the on-screen changes and sends back the
       | refreshed screen (yes, you are looking at a page at a time, there
       | is no scrolling).
       | 
       | Of course, these days people use emulators instead of hardware
       | terminals so you get the standard GUI delays and the worst of
       | both worlds.
        
       | userbinator wrote:
       | I'd like to see older MS-DOS and Windows on there for comparison;
       | I remember dualbooting 98se and XP for a while in the early 2000s
       | and the former was noticeably more responsive.
       | 
       | Another comparative anecdote I have is between Windows XP and OS
       | X on the same hardware, wherein the latter was less responsive.
       | After seeing what GUI apps on a Mac actually involve, I'm not too
       | surprised: https://news.ycombinator.com/item?id=11638367
        
       | gtrevorjay wrote:
       | An anecdote that will probably sway no one: was in a family
       | friendly barcade and noticed-- inexplicably--a gaggle of kids,
       | all 8-14, gathered around the Pong. Sauntering up so I could
       | overhear their conversation, it was all excited variants of "It's
       | just a square! But it's real!","You're touching it!", or "The
       | knobs _really_ move it. "
       | 
       | If you wonder why we no long we have "twitch" games, this is why.
       | Old school games had a tactile aesthetic lost in the blur of
       | modern lag.
        
       | veloxo wrote:
       | FWIW, a quick ballpark test shows <30 ms minimum keyboard latency
       | on my M1 Max MacBook, which has a 120-hz display.
       | Sublime Text: 17-29 ms       iTerm (zsh4humans): 25-54 ms
       | Safari address bar: 17-38 ms       TextEdit: 25-46 ms
       | 
       | Method: Record 240-fps slo-mo video. Press keyboard key. Count
       | frames from key depress to first update on screen, inclusive.
       | Repeat 3x for each app.
        
       | still_grokking wrote:
       | Previous discussions:
       | 
       | https://news.ycombinator.com/item?id=25290118 (December 3, 2020
       | -- 454 points, 259 comments)
       | 
       | https://news.ycombinator.com/item?id=16001407 (December 24, 2017
       | -- 588 points, 161 comments)
        
       | LastTrain wrote:
       | That is because latency on it's own is an often useless metric.
        
       | FartyMcFarter wrote:
       | Going through the list of what happens on iOS:
       | 
       | > UIKit introduced 1-2 ms event processing overhead, CPU-bound
       | 
       | I wonder if this is correct, and what's happening there if so - a
       | modern CPU (even a mobile one) can do a _lot_ in 1-2 ms. That 's
       | 6 to 12% of the per-frame budget of a game running at 60 fps,
       | which is pretty mind-boggling for just processing an event.
        
         | still_grokking wrote:
         | I guess you can waste any amount of time with "a few" layers of
         | strictly unnecessary indirection.
         | 
         | Speaking of games: I had just the other day the realization
         | that we should look into software design around games if we
         | want proper architectures for GUI applications.
         | 
         | What we do today instead are "layers of madness". At least I
         | would call it like this.
        
           | ilyt wrote:
           | Games have privilege of controlling everything from input
           | device to GPU pipeline. Nothing desktop is going to be that
           | vertically integrated easily
        
             | still_grokking wrote:
             | > Nothing desktop is going to be that vertically integrated
             | easily
             | 
             | Why? Are there any technical reasons?
             | 
             | I think this is a pure framework / system-API question.
        
               | drewtato wrote:
               | Only things I can think of is that for windowed apps, you
               | have to wait for the OS to hand you mouse events, since
               | the mouse may be on another window, and you have to
               | render to a window instead of directly to the
               | framebuffer.
        
               | still_grokking wrote:
               | Which brings us back to "system APIs".
        
       | amluto wrote:
       | I wonder if a compositor, and possibly an entire compositing
       | system designed around adaptive sync could perform substantially
       | better than current compositors.
       | 
       | Currently, there is a whole pile of steps to update a UI. The
       | input system processes an event, some decision is made as to when
       | to rerender the application, then another decision is made as to
       | when to composite the screen, and hopefully this all finishes
       | before a frame is scanned out, but not too far before, because
       | that would add latency. It's heuristics all the way down.
       | 
       | With adaptive sync, there is still a heuristic decision as to
       | whether to process an input event immediately or to wait to
       | aggregate more events into the same frame. But once that is done,
       | an application can update its state, redraw itself, and trigger
       | an _immediate_ compositor update. The compositor will render as
       | quickly as possible, but it doesn't need to worry about missing
       | scanout -- scanout can begin as soon as the compositor finishes.
       | 
       | (There are surely some constraints on the intervals between
       | frames sent to the display, but this seems quite manageable while
       | still scanning out a frame immediately after compositing it
       | nearly 100% of the time.)
        
         | mrob wrote:
         | Adaptive sync can only delay drawing, never make it happen
         | sooner. This means it can only harm average latency of response
         | to unpredictable events, such as human interaction. (Individual
         | events may have lower latency purely by luck, because latency
         | depends on the position of the raster scan relative to the part
         | of the screen that needs to be updated, and adaptive sync will
         | perturb this, but this effect is just as likely to make things
         | worse.) The lowest average latency is always achieved by
         | running the monitor at maximum speed all the time and
         | responding to events immediately.
         | 
         | Adaptive sync is beneficial for graphically intensive games
         | where you can't always render fast enough, but IMO this should
         | never be true for a GUI on modern hardware.
        
           | drewtato wrote:
           | With text content, most frames are exactly the same. So what
           | adaptive sync can do is delay a refresh until just after the
           | content has been updated. At a minimum, it can delay a
           | refresh when an update is currently being drawn, which would
           | lower the max latency.
        
             | mrob wrote:
             | The time taken to update text should be negligible on any
             | modern hardware.
        
               | wmf wrote:
               | The point of the article is that it is not negligible.
        
           | amluto wrote:
           | > Adaptive sync can only delay drawing, never make it happen
           | sooner.
           | 
           | That's a matter of perspective. If your goal is to crank out
           | frames at exactly 60 Hz (or 120 Hz or whatever), then, sure,
           | you can't send frames early and you want to avoid being late.
           | But this seems like a somewhat dubiously necessary goal in a
           | continuously rendered game and a completely useless goal in a
           | desktop UI. So instead the goal can be to be slightly late
           | for every single frame, and then if you're less late than
           | intended, fine.
           | 
           | Alternatively, one could start compositing _at_ the target
           | time. If it takes 0.5ms, then the frame is 0.5ms late. If it
           | goes over and takes 1ms, then the frame is 1ms late.
        
         | dtx1 wrote:
         | Uhm... aren't you basically describing wayland?
         | 
         | This Xorg dude did exactly the tuning you want on wayland
         | https://artemis.sh/2022/09/18/wayland-from-an-x-apologist.ht...
        
       ___________________________________________________________________
       (page generated 2022-11-20 23:00 UTC)