[HN Gopher] Computer Latency: 1977-2017 ___________________________________________________________________ Computer Latency: 1977-2017 Author : 2pEXgD0fZ5cF Score : 122 points Date : 2022-11-20 17:51 UTC (5 hours ago) (HTM) web link (danluu.com) (TXT) w3m dump (danluu.com) | haberman wrote: | When I saw this page a few years back I had an idea for a | project. I want to create the lowest-latency typing terminal I | possibly can, using an FPGA and an LED array. My initial results | suggest that I can drive a 64x32 pixel LED array at 4.88kHz, for | a roughly 0.2ms latency. | | For the next step I want to make it capable of injecting | artificial latency, and then do A/B testing to determine (1) the | smallest amount of latency I can reliably perceive, and (2) the | smallest amount of latency that actually bothers me. | | This idea was also inspired by this work from Microsoft Research, | where they do a similar experiment with touch screens: | https://www.youtube.com/watch?v=vOvQCPLkPt4 | walrus01 wrote: | Something I recently observed is that cutting edge, current | generation gaming-marketed x86-64 motherboards for single socket | CPUs, both Intel and AMD, still come with a single PS/2 mouse | port on the rear I/O plate. | | I read something about this being intended for use with high end | wired gaming mice, where the end to end latency between mouse and | cursor movement is theoretically lower if the signal doesn't go | through the USB bus on the motherboard, but rather through | whatever legacy PS/2 interface is talking to the equivalent-of- | northbridge chipset. | pjkundert wrote: | Using emacs on an SGI Iris in 1988 was ... sublime. | | Every computer systems since then has been a head shaking | disappointment, latency-wise. | still_grokking wrote: | Cynic comment ahead, beware! | | --- | | Does this actually even matter today when every click or key- | press triggers dozens of fat network request going around the | globe on top of a maximally inefficient protocol? | | Or to summarize what we see here: We've build layers of madness. | Now we have just to deal with the fallout... | | The result is in no way surprising given we haven't refactored | our systems for over 50 years and just put new things on top. | retrac wrote: | If you aren't familiar, check out Winning Run [1]. A 3D arcade | racing game from 1988, about the best possible with custom | hardware at the time. Graphics quality is primitive by modern | standards. But make sure to watch the video in 60 fps. If | there's any hiccups, it's your device playing the video. Smooth | and continuous 60 frames per second rendering, with some tens | of millisecond delay to respond to game inputs. It's still very | hard to pull that off today, yet it's fundamental to that type | of game's overall quality. | | [1] https://youtu.be/NBiD-v-YGIA?t=85 | alpaca128 wrote: | WipEout HD on the PS3 managed to get super stable 60FPS at | 1080p. It dynamically scales the horizontal rendering | resolution for every frame and then scales it to 1920 pixels | using hardware. So the resolution might vary a bit but at | that framerate and such speeds in races it's not noticeable. | The controls were super smooth at any speed, only achievement | popups caused the whole game to freeze for half a second. | _trampeltier wrote: | I guess keyboard latency is also the biggest problem if you | play old games in emulators. I feel is often very difficult | to play old action games, because you can't hit the buttons | exactly enough. | mrob wrote: | I'm using a Razer Huntsman V2 keyboard, which has 8kHz | polling and optical switches. I do not notice any obvious | latency from it, and the specification claims sub- | millisecond latency from switch activation point. This is | better performance than is possible from a PS/2 keyboard, | because the PS/2 interface is bottlenecked by the slow | serial link. | drewtato wrote: | This video is not a steady 60FPS. Lots of frames are | duplicated or torn. Maybe this was originally 60FPS and got | mangled by the recording process. | yakubin wrote: | That inefficient network has better latency than your computer | when trying to show you a pixel: | <http://newstmobilephone.blogspot.com/2012/05/john-carmack- | ex...> | still_grokking wrote: | Only that such a network call can't replace the pixel output. | | It just adds up to the overall latency. | | Also real latency of web-pages is measured in _seconds_ these | days. People are happy when they 're able to serve a request | in under 0.2 sec. | jiggawatts wrote: | Fifteen years ago I used to target 15ms as seen in the | browser F12 network trace (not as recorded on the server!) | and if I mention such a thing these days people are | flabbergasted. | | For example, I had a support call with Azure asking them | why the latency between Azure App Service and Azure SQL was | as high as 13ms, and they asked me if my target user base | was "high frequency traders" or somesuch. | | They just could not believe that I was expecting sub-1ms | latencies as a normal thing for a database response. | still_grokking wrote: | > if I mention such a thing these days people are | flabbergasted | | I think I'm just learning this the hard way, given the | down-votes of the initial comment. :-) | | Maybe people really don't see the issue with adding | layers after layers of stuff, and that we've reached, no, | surpassed even, some tragicomic point? Computers are | thousands of times faster, yet the end-user experience | becomes more sluggish with every year passing. We have an | issue, I would say. And it's actually not even funny any | more. | giantdude wrote: | I always thought that Apple ][ + was as good as it gets. It's | been downhill from there, for Apple and for the rest of us. | 13of40 wrote: | I was shocked to see the TI-99/4a so high up. Just listing a | BASIC program on a TI-99 is about as slow as a 300 baud modem. | | Example: https://youtu.be/ls-PxqRQ35Q?t=178 | jamiek88 wrote: | Once I got good at typing on it my Acorn Electron (we couldn't | afford the whizzy bbc master!) was an extension of my brain. | | Instant response. A full reboot was a control break away. | Instant access to the interpreter. Easy assembly access. | | I thought, it executed. | | I remember our school moving from the networked bbc's to the | PC's and it was a huge downgrade for us as kids. Computer class | became operating a word processor or learning win 3.11 rather | than the exciting and sometimes adversarial (remote messaging | other terminals, spoofing etc) system that made us want to | learn, to just more drudgery. | snoot wrote: | I agree with all of this except for one point: | | Having an ordinary key on the keyboard that would effectively | kill -9 the current program _and_ clear the screen was a | crazy design decision, especially for a machine where saving | data meant using a cassette tape! | jamiek88 wrote: | The break key unless you held down control was only a soft | break though. | | Your program would still be in memory with an \>OLD | command. | | As long as it was a basic prog, the machine code loaded | *RUN was lost and had to be reloaded from tape, yes. | | A pain for games but I don't really recall accidentally | pressing the break key much it was out of the way up right. | | I could talk about this all day! | snoot wrote: | It's true that you could get a basic program back with | old. | | But any data was lost and I saw break get pressed | accidentally fairly often at school and amongst friends. | jamiek88 wrote: | Fair. Not everyone spent 12 hours per day on their | computer like me! They probably had friends and stuff. :) | preinheimer wrote: | Global Ping Data - https://wondernetwork.com/pings | | We've got servers in 200+ cities around the world, and ask them | to ping each other every hour. Currently it takes our servers in | Tokyo and London about 226ms to ping each other. | | We've got some downloadable datasets here if you want to play | with them: https://wonderproxy.com/blog/a-day-in-the-life-of-the- | intern... | jiggawatts wrote: | The fundamental physical limit to latency caused by the speed | of light is gleefully ignored by many web "application" | architects. Apps that feel super snappy when hosted in the same | region run like molasses from places like Australia. Unless the | back-end is deployed in every major region, a significant | fraction of your userbase will always think of your app as | sluggish, irrespective of how much optimisation work goes into | it. | | Some random example: | | Azure Application Insights can be deployed to any Azure region, | making it feel noticeably snappier than most cloud hosted | competitors such as New Relic or logz.io. | | ESRI ArcGIS has a cloud version that is "quick and easy" to use | compare to the hosted version... and is terribly slow for | anyone outside of the US. | | Our timesheet app is hosted in the US and is barely useable. | Our managers complain that engineers "don't like timesheets". | Look... we don't _mind_ timesheets, but having to... wait... | seconds... for.... each... click... is just torture, especially | at 4:55pm on a Friday afternoon. | emj wrote: | I've used your ping data before, it was useful to know where to | place my servers, and how nice of you to publish a dump as | well! If I can wish for mor data: a min-median-max client | latencies for all those servers would be swell, but I can see | that you might not want to publish the results of that maybe on | per month basis? Just a couple of thousand packets every hour | should be enough: tcpdump -w stats.pcap -c5000 "tcp[tcpflags] & | (tcp-syn|tcp-ack) != 0" | drewtato wrote: | Powershell isn't a terminal (it's a shell, obviously), so the | windows results are most likely tested in conhost. If it's on | windows 11 it might be windows terminal, which may be more likely | since I think cmd is still default on windows 10. | 13of40 wrote: | It might still be a valid test, because PowerShell needs to | have a bunch of code in the stack between the keypress event | and the call into the console API that actually displays the | character. Among other things, the entire command line is | getting lexically parsed every time you press a key. | jiggawatts wrote: | If you think "parsing the command line" should or does take | appreciable time on a human timescale when executed by a | modern superscalar processor, then your mental model of | computer performance is "off" by at least 4 or 5 orders of | magnitude. Not four or five times incorrect, but many | thousands of times incorrect. | 13of40 wrote: | Just for context, I worked on the feature team for a lot of | early versions of PowerShell, so I kind of know where the | bodies are buried. Here are some empirical numbers, just | for the parsing part: | | 10K iterations of $null=$null took 38 milliseconds on my | laptop. | | 10K parses of the letter "a" took 110 milliseconds. | | 10K parses of a 100K character comment took 7405 | milliseconds. | | 10K parses of a complex nested expression took just over | six minutes. | | You're probably imagining a lexer written in C that | tokenizes a context free language and does nothing else. In | PowerShell, you can't run the tokenizer directly, you have | to use the parser, which also builds an AST. The language | itself is a blend of two different paradigms, so a token | can have a totally different meaning depending on whether | it's part of an expression or a command, meaning more state | to track during the tokenizer pass. | | On top of that, while it was being developed, language | performance wasn't a priority until around version 3 or 4, | and the main perf advancement then was to compile from AST | to dynamic code for code blocks that get run a minimum | number of times. The parser itself was never subject to any | deep perf testing, IIRC. | | Plus it does a bunch of other stuff when you press a key, | not just the parsing. All of the host code that listens for | the keyboard event and ultimately puts the character on the | screen, for example, is probably half a dozen layers of | managed and unmanaged abstractions around the Win32 console | API. | drewtato wrote: | The test is valid for any combo of shell and terminal, it's | just a matter of figuring out which methodology was used so | it can be better understood. | | But yeah, I agree with the other comment that powershell is | likely adding less than 1ms. | fnordpiglet wrote: | Yet more proof we should have just stopped with the SGI Indy | killjoywashere wrote: | I really found this valuable, particularly the slide at the top | that enables you to visualize low level latency times (Jeff Dean | numbers) over the years. tl;dr: not much has changed in the | processor hardware numbers since 2012. So everything right of the | processor is where the action is. And sounds like people are | starting to actually make progress. | | https://colin-scott.github.io/personal_website/research/inte... | nikanj wrote: | On my state-of-the-art desktop PC, Visual Studio has very | noticeable cursor&scrolling lag. My C64 had the latter as well, | but I used to assume the cursor moved as fast as I could type / | tap the arrow keys | egberts1 wrote: | So much added sluggishness and they still cannot bring themselves | to show us a current dynamic keyboard mapping to this day. | mhh__ wrote: | What's a current dynamic keyboard mapping? | egberts1 wrote: | Things (a dialog/popup box?) that let you see what each key | are mapped to based on its current window and/or its mouse | position. | Smoosh wrote: | Has anyone else used an IBM mainframe with a hardware 327x | terminal? | | They process all normal keystrokes locally, and only send back to | the host when Enter and function keys are pressed. This means | very low latency for typing and most keystrokes. But much longer | latency when you press enter, or page up/down as the mainframe | then processes all the on-screen changes and sends back the | refreshed screen (yes, you are looking at a page at a time, there | is no scrolling). | | Of course, these days people use emulators instead of hardware | terminals so you get the standard GUI delays and the worst of | both worlds. | userbinator wrote: | I'd like to see older MS-DOS and Windows on there for comparison; | I remember dualbooting 98se and XP for a while in the early 2000s | and the former was noticeably more responsive. | | Another comparative anecdote I have is between Windows XP and OS | X on the same hardware, wherein the latter was less responsive. | After seeing what GUI apps on a Mac actually involve, I'm not too | surprised: https://news.ycombinator.com/item?id=11638367 | gtrevorjay wrote: | An anecdote that will probably sway no one: was in a family | friendly barcade and noticed-- inexplicably--a gaggle of kids, | all 8-14, gathered around the Pong. Sauntering up so I could | overhear their conversation, it was all excited variants of "It's | just a square! But it's real!","You're touching it!", or "The | knobs _really_ move it. " | | If you wonder why we no long we have "twitch" games, this is why. | Old school games had a tactile aesthetic lost in the blur of | modern lag. | veloxo wrote: | FWIW, a quick ballpark test shows <30 ms minimum keyboard latency | on my M1 Max MacBook, which has a 120-hz display. | Sublime Text: 17-29 ms iTerm (zsh4humans): 25-54 ms | Safari address bar: 17-38 ms TextEdit: 25-46 ms | | Method: Record 240-fps slo-mo video. Press keyboard key. Count | frames from key depress to first update on screen, inclusive. | Repeat 3x for each app. | still_grokking wrote: | Previous discussions: | | https://news.ycombinator.com/item?id=25290118 (December 3, 2020 | -- 454 points, 259 comments) | | https://news.ycombinator.com/item?id=16001407 (December 24, 2017 | -- 588 points, 161 comments) | LastTrain wrote: | That is because latency on it's own is an often useless metric. | FartyMcFarter wrote: | Going through the list of what happens on iOS: | | > UIKit introduced 1-2 ms event processing overhead, CPU-bound | | I wonder if this is correct, and what's happening there if so - a | modern CPU (even a mobile one) can do a _lot_ in 1-2 ms. That 's | 6 to 12% of the per-frame budget of a game running at 60 fps, | which is pretty mind-boggling for just processing an event. | still_grokking wrote: | I guess you can waste any amount of time with "a few" layers of | strictly unnecessary indirection. | | Speaking of games: I had just the other day the realization | that we should look into software design around games if we | want proper architectures for GUI applications. | | What we do today instead are "layers of madness". At least I | would call it like this. | ilyt wrote: | Games have privilege of controlling everything from input | device to GPU pipeline. Nothing desktop is going to be that | vertically integrated easily | still_grokking wrote: | > Nothing desktop is going to be that vertically integrated | easily | | Why? Are there any technical reasons? | | I think this is a pure framework / system-API question. | drewtato wrote: | Only things I can think of is that for windowed apps, you | have to wait for the OS to hand you mouse events, since | the mouse may be on another window, and you have to | render to a window instead of directly to the | framebuffer. | still_grokking wrote: | Which brings us back to "system APIs". | amluto wrote: | I wonder if a compositor, and possibly an entire compositing | system designed around adaptive sync could perform substantially | better than current compositors. | | Currently, there is a whole pile of steps to update a UI. The | input system processes an event, some decision is made as to when | to rerender the application, then another decision is made as to | when to composite the screen, and hopefully this all finishes | before a frame is scanned out, but not too far before, because | that would add latency. It's heuristics all the way down. | | With adaptive sync, there is still a heuristic decision as to | whether to process an input event immediately or to wait to | aggregate more events into the same frame. But once that is done, | an application can update its state, redraw itself, and trigger | an _immediate_ compositor update. The compositor will render as | quickly as possible, but it doesn't need to worry about missing | scanout -- scanout can begin as soon as the compositor finishes. | | (There are surely some constraints on the intervals between | frames sent to the display, but this seems quite manageable while | still scanning out a frame immediately after compositing it | nearly 100% of the time.) | mrob wrote: | Adaptive sync can only delay drawing, never make it happen | sooner. This means it can only harm average latency of response | to unpredictable events, such as human interaction. (Individual | events may have lower latency purely by luck, because latency | depends on the position of the raster scan relative to the part | of the screen that needs to be updated, and adaptive sync will | perturb this, but this effect is just as likely to make things | worse.) The lowest average latency is always achieved by | running the monitor at maximum speed all the time and | responding to events immediately. | | Adaptive sync is beneficial for graphically intensive games | where you can't always render fast enough, but IMO this should | never be true for a GUI on modern hardware. | drewtato wrote: | With text content, most frames are exactly the same. So what | adaptive sync can do is delay a refresh until just after the | content has been updated. At a minimum, it can delay a | refresh when an update is currently being drawn, which would | lower the max latency. | mrob wrote: | The time taken to update text should be negligible on any | modern hardware. | wmf wrote: | The point of the article is that it is not negligible. | amluto wrote: | > Adaptive sync can only delay drawing, never make it happen | sooner. | | That's a matter of perspective. If your goal is to crank out | frames at exactly 60 Hz (or 120 Hz or whatever), then, sure, | you can't send frames early and you want to avoid being late. | But this seems like a somewhat dubiously necessary goal in a | continuously rendered game and a completely useless goal in a | desktop UI. So instead the goal can be to be slightly late | for every single frame, and then if you're less late than | intended, fine. | | Alternatively, one could start compositing _at_ the target | time. If it takes 0.5ms, then the frame is 0.5ms late. If it | goes over and takes 1ms, then the frame is 1ms late. | dtx1 wrote: | Uhm... aren't you basically describing wayland? | | This Xorg dude did exactly the tuning you want on wayland | https://artemis.sh/2022/09/18/wayland-from-an-x-apologist.ht... ___________________________________________________________________ (page generated 2022-11-20 23:00 UTC)