[HN Gopher] FFmpeg lands CLI multi-threading as its "most comple... ___________________________________________________________________ FFmpeg lands CLI multi-threading as its "most complex refactoring" in decades Author : worble Score : 429 points Date : 2023-12-12 15:15 UTC (7 hours ago) (HTM) web link (www.phoronix.com) (TXT) w3m dump (www.phoronix.com) | qrush wrote: | Is there a recording of this talk from VDD@Dublin? Can't easily | find it on the author's site or here | https://www.videolan.org/videolan/events/vdd23/ | | Update: Found here! | https://www.youtube.com/watch?v=Z4DS3jiZhfo&t=1221s | motoboi wrote: | It's nuts to think that in the near future LLM will be able to do | that refactoring in seconds. All we need is enough context | window. | bsdpufferfish wrote: | Why would you think this is possible? | motoboi wrote: | based on my current experience with gpt-4. Have you tried | some sort of refactoring in it? Because I have been routinely | turning serial scripts into parallel ones with success. | | Couldn't do the same with larger codebases because the | context is not enough for the input code and output | refactoring. | thfuran wrote: | Yeah, it's nuts to think that. | motoboi wrote: | I'm genuinely confused about your point of view. Have you | tried refactoring with GPT-4? | | I have been refactoring code using gpt-4 for some months now | and the limiting factor have been the context size. | | GPT-4 turbo now have 128k context and I can provide it with | larger portions of the code base for the refactors. | | When we have millions of tokens of context, based on what I'm | experiencing now, I can see that a refactoring like the one | made in ffmpeg would be possible. Or not? What am I missing | here? | rocqua wrote: | Refactoring is really rather well defined. It's " just | transformations that are invariant w.r.t. the outcome". The | reason they are hard to automate is that 'invariant w.r.t. | the outcome' is a lot more lenient than most semantic models | van handle. But this kind of well-defined task with a slight | amount of nuance (and decently checkable) seems pretty well- | suited to an LLM. | nolist_policy wrote: | At least for the linux kernel, qemu and other large c | projects, this is a solved problem with coccinelle[1]. | Compared to AI, it has the added benefit of not doing | incorrect changes and/or hallucinating stuff or promt | injections or ... | | I guess you could use AI to help create a coccinelle | semantic patch. | | [1] https://en.wikipedia.org/wiki/Coccinelle_(software) | dataangel wrote: | The part coccinelle does is the part GPT is good at, the | problem is neither of them actually reason about the code | bigbillheck wrote: | Why on earth would you possibly think that? | ctoth wrote: | Just as your human intelligence lead to you writing the same | darn comment as another human above you, AI can often write | the same code as a human would, without having to even bring | creativity into it! For those of us who write code, this can | be useful! | motoboi wrote: | I'm quite confused by the answers I got from this thread. | Haven't you tried refactoring with gpt-4 yet? | bigbillheck wrote: | > Haven't you tried refactoring with gpt-4 yet? | | I most certainly have not. At work, I do greenfield | development in a specialized problem domain, and I would | not trust a model (or, for that matter, a junior developer) | to do any kind of refactor in an acceptable manner. (That | aside, there's no way I'm goingto expose company code to | any sort of outside party without the approval of upper | management). | | At home, I program for fun and self-improvement, and a big | part of both is thinking hard about problems. Why would I | want to wreck that with asking a model to do it for me? | motoboi wrote: | Oh, I understand. | | What if you have an idea and you tell the computer to | implement it and then thoroughly check the code? | | About the self-improvement part, I suppose you don't | operate your graphics card yourself, but delegate it to | your driver. | | The LLM is just another tool. | smabie wrote: | Oh yea your work is far too sophisticated for a llm, got | it | bigbillheck wrote: | Some of us do actual creative work, yes. | airstrike wrote: | Some people get significance from their ability to write | code. To them, admitting an LLM can (or will soon be able | to) do their work inflicts cognitive dissonance, so they | refuse to believe it. Some refuse to even try it--not | realizing that refusing to engage does nothing to hinder | the advancement of the tool they fear. | mrintegrity wrote: | Parent post is getting down voted to oblivion but it seems a | reasonable belief for someone who is not highly engaged with | AI. I have only the vaguest understanding of how it works (and | it's probably wrong) and to my layman mind it also seems like a | totally fair assumption, based on experience as a user and the | constant flood of news. Please explain why the suggestion that | a future AI / sufficiently advanced LLM could refactor a | complex codebase is so preposterous. | not_the_fda wrote: | Because refactoring requires understanding, which LLMs | completely lack. | ctoth wrote: | > Because refactoring requires understanding, which LLMs | completely lack. | | Interesting. | | Here's a sample question I just asked the AI: | | We are working on a React mapping application. I am going | to give you two components from this application. Our goal | is to extract functionality to deduplicate code | | We can make a single base component, extract code to hooks, | or any other strategy which you think is appropriate | | Here is travel.tsx: | | <snip> | | Here is heatmap.tsx: | | <snip> | | And here is the response: | | To deduplicate code and improve maintainability between the | `TravelMap` and `Heatmap` components in your React mapping | application, we can identify common functionalities and | extract them into reusable hooks or components. Here's a | strategy considering your code: | | 1. *Extract Common Hooks*: Both components use similar | hooks for managing markers, popups, and map references. We | can create custom hooks that encapsulate these | functionalities. | | 2. *Create Shared Utility Functions*: Functions like | `initialViewState`, which are identical in both components, | can be moved to a separate utility file and imported where | needed. | | 3. *Refactor Map Layers and Sources*: The way map layers | and sources are handled is quite similar in both | components. A custom hook or component to manage map layers | and sources would be beneficial. | | 4. *Generalize Popup Handling*: The popup handling in both | components can be abstracted into a custom hook or a | higher-order component. | | 5. *Combine Common Styling*: Shared styles, such as | `MAP_CONTAINER_STYLE`, can be moved to a common CSS module | or a styled-component that can be reused. | | 6. *Refactor Event Handlers*: Functions like `onMapClick`, | `onMouseMove`, and `onMouseLeave` have similar structures. | These can be part of a custom hook or a utility function | with configurable parameters. | | 7. *Legend and Live Region Components*: These components | are used in a similar way in both `TravelMap` and | `Heatmap`. Ensure they are designed to be reusable with | different props. | | 8. *Refactor Map Initialization*: The way the map is | initialized and configured is quite similar. This can be | abstracted into a custom component or hook. | | ### Example Implementations: | | #### 1. Custom Hook for Map Markers: | | ```typescript // useMapMarkers.ts import { useState, | useCallback } from 'react'; | | export const useMapMarkers = (initialLat, initialLng, | world) => { // Similar marker logic from your components // | ... return { marker, | onMarkerDrag, onMarkerDragEnd, | updateMarkerPosition, // other marker related | states and functions }; | | }; ``` | | #### 2. Shared Utility Function: | | ```typescript // utils/mapUtils.ts export const | initialViewState = (map, mapType) => { // Implementation }; | ``` | | #### 3. Custom Hook for Map Layers and Sources: | | ```typescript // useMapLayers.ts import { useState, | useEffect } from 'react'; | | export const useMapLayers = (props) => { // Logic to handle | layers and sources // ... return { | dataLayer, patternLayer, // other layer | related states and functions }; | | }; ``` | | #### 4. Generalized Popup Handling: | | ```typescript // useMapPopup.ts import { useState } from | 'react'; | | export const useMapPopup = () => { // Popup logic from your | components // ... return { | popupState, openPopup, closePopup, | setPopupData, // other popup related states and | functions }; | | }; ``` | | ### Integration: | | Once these hooks and utilities are created, integrate them | into your `TravelMap` and `Heatmap` components. Replace the | duplicated logic with calls to these shared resources. This | will significantly reduce code duplication and make future | maintenance easier. | | ### Testing and Refinement: | | After refactoring, thoroughly test both components to | ensure that all functionalities work as expected. Watch out | for any edge cases or specific behaviors that might be | affected by the refactoring. | | For those suggestions, I might use five out of eight of | them, and probably do one or two things differently. But | you cannot, with a straight face, say the model did not | understand. It clearly did. It suggested reasonable | refactors. If being able to refactor means understanding, I | guess we have understanding! | | I could continue with this conversation, ask it to produce | the full code for the hooks (I have in my custom prompt to | provide outlines) and once the hooks are complete, ask it | to rewrite the components using the shared code. | | Have you ever used one of these models? | never_inline wrote: | Eliminating duplication and cleaning code is a different | type of refactoring than supporting concurrency, which is | much much harder. | | Cleaning up code also follows some well established | patterns, performance work is much less pattern-y. | | Codebases like FFMPEG are one of the kind. I bet you need | 10 or 100 times more understanding than the react thing | you mentioned above. | | One day maybe AI can do it, but it probably won't be LLM. | It would be something which can understand symbols and | math. | ctoth wrote: | Ah, we're having some classic goalpost moving! | | > Because refactoring requires understanding, which LLMs | completely lack. | | <demonstration that an LLM can refactor code> | | > Cleaning up code also follows some well established | patterns, performance work is much less pattern-y. | | Just as writing shitty react apps follow patterns, low- | level performance and concurrency work also follow | patterns. See [0] for a sample. | | > I bet you need 10 or 100 times more understanding | | Okay, so a 10 or 100 times larger model? Sounds like | something we'll have next year, and certainly within a | decade. | | > One day maybe AI can do it, but it probably won't be | LLM. It would be something which can understand symbols | and math. | | You do understand that the reason some of the earlier | GPTs had trouble with symbols and math was the | tokenization scheme, completely separate from how they | work in general, right? | | [0]: C++ Concurrency in Action: Practical Multithreading | 1st Edition https://www.amazon.com/C-Concurrency-Action- | Practical-Multit... | kcbanner wrote: | > Because refactoring requires understanding, which LLMs | completely lack. | | It's obvious from context here that the refactoring that | was mentioned was specifically around concurrency, not | simply cleaning up code. | ctoth wrote: | So if I show you an LLM implementing concurrency, will | you concede the point? Is this your true objection? | | https://chat.openai.com/share/7c41f59a-c21c-4abd-876c-c95 | 647... | malcolmgreaves wrote: | Hope you're looking for good-faith discussion here. I'll | assume that you're looking for a response where someone | has taken the time to read through your previous messages | and also the linked ChatGPT interaction logs. | | What you've shown is actually a great example of the what | folks mean that LLMs lack any sort of understanding. | They're fundamentally predict-the-next-token machines; | they regurgitate and mix parts of their training data in | order to satisfy the token prediction loss function they | were trained with. | | In the linked example you provided, *you* are the one | that needs to provide the understanding. It's a rather | lengthly back-and-forth to get that code into a somewhat | useable state. Importantly, if you didn't tell it to fix | things (sqlite connections over threads, etc.), it would | have failed. | | And while it's concurrent, it's using threads, so it's | not going to be doing any work in parallel. The example | you have mixes some IO and compute-bound looking | operations. | | So, if your need was to refactor your original code to | _actually be fast_, ChatGPT demonstrated it doesn't | understand nearly enough to actually make this happen. | This thread conversation got started around correcting | the misnomer that an LLM would actually ever be able to | possess enough knowledge to do actually valuable, complex | refactoring and programming. | | While I believe that LLMs can be good tools for a variety | of usecases, they have to be used in short bursts. Since | their output is fundamentally unreliable, someone always | has to read -- then comprehend -- its output. Giving it | too much context and then prompting it in such a way to | align its next token prediction with a complex outcome is | a highly variable and unstable process. If it outputs | millions of tokens, how is someone going to actually | review all of this? | | In my experience using ChatGPT, GPT4, and a few other | LLMs, I've found that it's pretty good at coming up with | little bits to jog one's own thinking and problem | solving. But doing an actual complex task with lots of | nuance and semantics-to-be-understood outright? The | technology is not quite there yet. | atrus wrote: | Chess requires understanding, which computers lack. Go | requires understanding, which computers lack. X requires Y | which _AI technology today_ lacks. AI is a constantly | moving goalpost it seems. | satvikpendem wrote: | > AI is a constantly moving goalpost it seems. | | alwayshasbeen.png | | > The AI effect occurs when onlookers discount the | behavior of an artificial intelligence program by arguing | that it is not "real" intelligence.[1] > Author Pamela | McCorduck writes: "It's part of the history of the field | of artificial intelligence that every time somebody | figured out how to make a computer do something--play | good checkers, solve simple but relatively informal | problems--there was a chorus of critics to say, 'that's | not thinking'."[2] Researcher Rodney Brooks complains: | "Every time we figure out a piece of it, it stops being | magical; we say, 'Oh, that's just a computation.'"[3] | | > "AI is whatever hasn't been done yet." | | > --Larry Tesler | | https://en.wikipedia.org/wiki/AI_effect | The_Colonel wrote: | It was always clear that games like chess or go can be | played by computers well, even with simple algorithms, | because they were completely formalized. The only issue | was with performance / finding more efficient algorithms. | | That's very different from code which (perhaps | surprisingly) isn't well formalized. The goals are often | vague and it's difficult to figure out what is | intentional and what incidental behavior (esp. with | imperative code). | astrange wrote: | The ffmpeg tests take a lot more than a few seconds to run, | and an AI god is still going to have trouble debugging | multithreaded code. | dataangel wrote: | AI is not very good at _single threaded_ code which is widely | regarded as much easier. The breathless demos don 't | generalize well when you truly test on data not in the | training set, it's just that most people don't come up with | good tests because they take something from the internet, | which is the training set. But the code most people need to | write is to do tasks that are bespoke to individual | businesses/science-experiments/etc not popular CS problems | that there are 1000 tutorials online for. When you get into | those areas it becomes apparent really quickly that the AI | only gets the "vibes" of what code should look like, it | doesn't have any mechanistic understanding. | ctoth wrote: | I see that you got some responses from people who may have not | even used gpt-4 as a coding assistant, but I absolutely agree | with you. A larger context window, a framework like Aider, and | slightly-better tooling so the AI can do renames and other | high-level actions without having to provide the entire | changeset as patches, and tests. Lots of tests. Then you can | just run the migration 15 times, pick from the one which passes | all the tests... run another integration pass to merge ideas | from the other runs, rinse and repeat. Of course the outer | loops will themselves be automated. | | The trick to this is continuous iteration and feedback. It's | remarkable how far I've gotten with GPT using these simple | primitives and I know I'm not the only one. | beeboobaa wrote: | If you think a large refactor is just renaming some stuff | then it makes sense you think this. | dataangel wrote: | If you ask GPT to refactor a single threaded program much | smaller than the context window that is truly out of sample | into a multithreaded program, its often going to fail. GPT | has trouble understanding bit masks in single threaded code, | let alone multiple threads. | beeboobaa wrote: | I have some snake oil to sell you | PaywallBuster wrote: | the pdf with a presentation requires a password | bsdpufferfish wrote: | If I'm operating a cloud service like Netflix, then I'm already | running thousands of ffmpeg processes on each machine. In other | words, it's already a multi-core job. | platzhirsch wrote: | As multi-core as Python and Ruby then. | bsdpufferfish wrote: | Yes. The kernel multiplies your efforts for you. It works | great for web services. | sanitycheck wrote: | Curious, what would that many ffmpeg processes be doing at | Netflix? I assume new VOD content gets encoded once per format, | and the amount of new content added per day is not gigantic. | | Agree with the general premise, of course, if I've got 10 | different videos encoding at once then I don't need additional | efficiency because the CPU's already maxed out. | asveikau wrote: | Probably a lot more than once when you consider that | different devices have different capabilities, and that they | might stream you different bitrates depending on conditions | like your network capability, screen resolution, how much | you've paid them.. | | You could also imagine they might apply some kind of | heuristic to decide to re-encode something based on some | condition... Like fine tune encoder settings when a title | becomes popular. No idea if they do that, just using some | imagination. | The_Colonel wrote: | I assume they re-compress for each resolution / format, quite | possibly they also have different bitrate levels per | resolution. Potentially even variants tweaked for certain | classes of device (in cases this is not already covered by | combination of format/resolution/bitrate). I would also | assume they re-compress with new advances in video processing | (things like HDR, improved compression). | | Also, their devs likely want fast feedback on changes - I | imagine they might have CI running changes on some standard | movies, checking various stats (like SNR) for regressions. | Everybody loves if their CI finishes fast, so you might want | to compress even a single movie in multiple threads. | sanitycheck wrote: | They'll be doing VBR encodes to DASH, HLS & (I guess still) | MSS which covers the resolutions & formats... DRM will be | what prevents high res content from working on some "less- | trusted" platforms so the same encodes should work. | | (Plus a couple more "legacy" encodes with PIFF instead of | CENC for ancient devices, probably.) | | New tech advances, sure, they probably do re-encode | everything sometimes - even knocking a few MB off the size | of a movie saves a measurable amount of $$ at that scale. | But are there frequent enough tech advances to do that more | than a couple of times a year..? The amount of difficult | testing (every TV model group from the past 10 years, or | something) required for an encode change is horrible. I'm | sure they have better automation than anyone else, but I'm | guessing it's still somewhat of a nightmare. | | Youtube, OTOH, I really can imagine having thousands of | concurrent ffmpeg processes. | canucker2016 wrote: | Why bring up assumptions/suppositions about Netflix's | encoding process? | | Their tech blog and tech presentations discuss many of | the requirements and steps involved for encoding source | media to stream to all the devices that Netflix supports. | | The Netflix tech blog: https://netflixtechblog.com/ or | https://netflixtechblog.medium.com/ | | Netflix seems to use AWS CPU+GPU for encoding, whereas | YouTube has gone to the expense of producing an ASIC to | do much of their encoding. | | 2015 blog entry about their video encoding pipeline: | https://netflixtechblog.com/high-quality-video-encoding- | at-s... | | 2021 presentation of their media encoding pipeline: | https://www.infoq.com/presentations/video-encoding- | netflix/ | | An example of their FFmpeg usage - a neural-net video | frame downscaler: https://netflixtechblog.com/for-your- | eyes-only-improving-net... | | Their dynamic optimization encoding framework - | allocating more bits for complex scenes and fewer bits | for simpler, quieter scenes: | https://netflixtechblog.com/dynamic-optimizer-a- | perceptual-v... and | https://netflixtechblog.com/optimized-shot-based-encodes- | now... | | Netflix developed an algorithm for determining video | quality - VMAF, which helps determine their encoding | decisions: https://netflixtechblog.com/toward-a- | practical-perceptual-vi..., | https://netflixtechblog.com/vmaf-the-journey- | continues-44b51..., https://netflixtechblog.com/toward-a- | better-quality-metric-f... | astrange wrote: | > Their dynamic optimization encoding framework - | allocating more bits for complex scenes and fewer bits | for simpler, quieter scenes: | https://netflixtechblog.com/dynamic-optimizer-a- | perceptual-v... and | https://netflixtechblog.com/optimized-shot-based-encodes- | now... | | This is overrated - of course that's how you do it, what | else would you do? | | > Mean-squared-error (MSE), typically used for encoder | decisions, is a number that doesn't always correlate very | nicely with human perception. | | Academics, the reference MPEG encoder, and old | proprietary encoder vendors like On2 VP9 did make | decisions this way because their customers didn't know | what they wanted. But people who care about quality, i.e. | anime and movie pirate college students with a lot of | free time, didn't. | | It looks like they've run x264 in an unnatural mode to | get an improvement here, because the default "constant | ratefactor" and "psy-rd" always behaved like this. | slhck wrote: | > This is overrated - of course that's how you do it, | what else would you do? | | That's not what has been done previously for adaptive | streaming. I guess you are referring to what encoding | modes like CRF do for an individual, entire file? Or | where else has this kind of approach been shown before? | | In the early days of streaming you would've done constant | bitrate for MPEG-TS, even adding zero bytes to pad "easy" | scenes. Later you'd have selected 2-pass ABR with some | VBV bitrate constraints to not mess up the decoding | buffer. At the time, YouTube did something where they | tried to predict the CRF they'd need to achieve a certain | (average) bitrate target (can't find the reference | anymore). With per-title encoding (which was also | popularized by Netflix) you could change the target | bitrates for an entire title based on a previous | complexity analysis. It took quite some time for other | players in the field to also hop on the per-title | encoding train. | | Going to a per-scene/per-shot level is the novely here, | and exhaustively finding the best possible combination of | QP/resolution pairs for an entire encoding ladder that | also optimizes subjective quality - and not just MSE. | astrange wrote: | > exhaustively finding the best possible combination of | QP/resolution pairs for an entire encoding ladder that | also optimizes subjective quality - and not just MSE. | | This is unnecessary if the encoder is well-written. It's | like how some people used to run multipass encoders 3 or | 4 times just in case the result got better. You only need | one analysis pass to find the optimal quality at a | bitrate. | slhck wrote: | Sure, the whole point of CRF is to set a quality target | and forget about it, or, with ABR, to be as good as you | can with an average bitrate target (under constraints). | But you can't do that across resolutions, e.g. do you | pick the higher bitrate 360p version, or the lower | bitrate 480p one, considering both coding artifacts and | upscaling degradation? | astrange wrote: | At those two resolutions you'd pick the higher resolution | one. I agree that generation of codec doesn't scale all | the way up to 4K and at that point you might need to make | some smart decisions. | | I think it should be possible to decide in one shot in | the codec though. My memory is that codecs (image and | video) have tried implementing scalable resolutions | before, but it didn't catch on simply because dropping | resolution is almost never better than dropping bitrate. | canucker2016 wrote: | You're letting the video codec make all the decisions for | bitrate allocation. | | Netflix tries to optimize the encoding parameters per | shot/scene. | | from the dynamic optimization article: | | - A long video sequence is split in shots ("Shots are | portions of video with a relatively short duration, | coming from the same camera under fairly constant | lighting and environment conditions.") | | - Each shot is encoded multiple times with different | encoding parameters, such as resolutions and qualities | (QPs) | | - Each encode is evaluated using VMAF, which together | with its bitrate produces an (R,D) point. One can convert | VMAF quality to distortion using different mappings; we | tested against the following two, linearly and inversely | proportional mappings, which give rise to different | temporal aggregation strategies, discussed in the | subsequent section | | - The convex hull of (R,D) points for each shot is | calculated. In the following example figures, distortion | is inverse of (VMAF+1) | | - Points from the convex hull, one from each shot, are | combined to create an encode for the entire video | sequence by following the constant-slope principle and | building end-to-end paths in a Trellis | | - One produces as many aggregate encodes (final operating | points) by varying the slope parameter of the R-D curve | as necessary in order to cover a desired bitrate/quality | range | | - Final result is a complete R-D or rate-quality (R-Q) | curve for the entire video sequence | banana_giraffe wrote: | It's been reported in the past that Netflix encodes 120 | different variants of each video they have [1] for different | bitrates and different device's needs. | | And that was years ago, I wouldn't be surprised to learn it's | a bigger number now. | | [1] https://news.ycombinator.com/item?id=4946275 | The_Colonel wrote: | I guess it's irrelevant for Netflix then*. But it sounds great | for the remaining 99.99%. | | * I would be very surprised if Netflix even uses vanilla ffmpeg | bsdpufferfish wrote: | > But it sounds great for the remaining 99.99%. | | I believe the vast majority of ffmpeg usages are web | services, or one off encodings. | The_Colonel wrote: | Well, this feature is awesome for one-off encoding by a | home user. | | Subjectively, me compressing my holiday video is much more | important than Netflix re-compressing a million of them. | buu700 wrote: | I use ffmpeg all the time, so this change is much | appreciated. Well not really _that_ often, but when I do | encode video /audio it's generally with ffmpeg. | kevincox wrote: | Latency is still valuable. For example YouTube (which IIRC uses | ffmpeg) often takes hours to do transcodes. This is likely | somewhat due to scheduling but assuming that they can get the | same result doing 4x threads for 1/4 of the time they would | prefer that as each job finishes faster. The only real question | is at what efficiency cost the latency benefit stops being | worth it. | lelanthran wrote: | I think that if you're operating at the scale of Google using | a single-threaded ffmpeg will finish your jobs in less time. | | If you have a queue of 100k videos to process and a cluster | of 100 cores, assigning a video to each core as it becomes | available is the most efficient way to process them, because | your skipping the thread joining time. | | Anytime there is a queue of jobs, assigning the next job in | the queue to the next free core is always going to be faster | than assigning the next job to multiple cores. | Thaxll wrote: | YouTube does not use ffmpeg, at the scale at which they | operate it would be too slow / expensive. | | They use custom hardware just for encoding. | | fyi they have to transcode over 500h of videos per minute. So | multiple that by all the formats they support. | | They operate at an insane scale, Netflix looks like a garage | project for comparison. | astrange wrote: | There's still decoding. If a service claims to support all | kinds of weird formats (like a MOV or AVI from the 90s) | that means ffmpeg is running. | canucker2016 wrote: | Google's use of ffmpeg: | https://multimedia.cx/eggs/googles-youtube-uses-ffmpeg/ | | For encoding, recently, they've built their own ASIC to | deal with H264 and VP9 encoding (for 7-33x faster | encoding compared to CPU-only): | https://arstechnica.com/gadgets/2021/04/youtube-is-now- | build... | stevehiehn wrote: | Okaaay, and if I'm not operating a cloud service like Netflix, | and I'm not running thousands of ffmpeg processes? In other | words, it's not already a multi-core job? | brucethemoose2 wrote: | Meanwhile, I've been enjoying threaded filter processing in | VapourSynth for nearly a decade. | | Not that this isn't great. Its fantastic. But TBH its not really | going to change my workflow of VapourSynth preprocessing + av1an | encoding for "quality" video encodes. | tetris11 wrote: | I believe I have too with gstreamer's pipe framework for | threading, but ffmpeg's syntax has stuck in my mind far longer | than any of the elaborate setups I built with gstreamer. I'm | excited for this development | dylan604 wrote: | FFMPEG does so much more than just video encoding. I use ffmpeg | all day every day, and only a fraction of the time do I | actually make a video. | m3kw9 wrote: | Like what do you do? | andoma wrote: | One can use it instead of cat to display text files. Easy | syntax to remember. ffmpeg -v quiet -f data | -i file.txt -map 0:0 -c text -f data - | DonHopkins wrote: | Good thing it's now multi-threaded so it can process all | those command line arguments in parallel! | whalesalad wrote: | I'm dying. | nerpderp82 wrote: | https://www.youtube.com/watch?v=9kaIXkImCAM | whalesalad wrote: | I'm glad we've reached a point where there is quality | parody content online for our industry. | ElijahLynn wrote: | THIS!!! It was so refreshing! | Rebelgecko wrote: | Check out Krazam. I quote their Microservices video on a | regular basis (https://youtu.be/y8OnoxKotPQ) | danudey wrote: | "Do you know ffmpeg supports OCR? I haven't found the | command yet, but it does support it." | | This is probably 80% of my experience with ffmpeg, to be | honest, but the other 20% is invaluable enough anyway. | ElijahLynn wrote: | That was one of the funniest things I've seen in a | while!!!! I had to stop drinking my decaf for fear of | spitting it all over my computer I was laughing out loud | so much! | | (ps: and no, it's not Rick Astley/Never Gonna Give You | Up) | nerpderp82 wrote: | The artfully inserted, corrupted predicted frames was | :chefskiss: | dkjaudyeqooe wrote: | I bet ffmpeg special cases that combination of flags and | calls cat. | yoz wrote: | can I use ffmpeg to embed a gif in a Hacker News comment, | because I want that so much right now | jasomill wrote: | No, but you can use ffmpeg to create a GIF from ASCII art | embedded in a Hacker News comment: $ | ffmpeg -v quiet -codecs | egrep -i 'gif|ascii' | D.V.L. ansi ASCII/ANSI art DEV..S | gif CompuServe GIF (Graphics Interchange | Format) | | ("D" and "E" in the first field indicate support for | decoding and encoding) | faitswulff wrote: | Ah now I can replace all my useless uses of cat with | ffmpeg | fransje26 wrote: | I use dd for that. dd if=./file.txt | | Can you also format your drive with ffmpeg? I'm looking | for a more versatile dd replacement.. | jasomill wrote: | It can't create partition tables or filesystems, so no, | but ffmpeg -f data -i /dev/zero -map 0:0 | -c copy -f data - > /dev/sda | | is roughly equivalent to to dd | status=progress if=/dev/zero of=/dev/sda | dylan604 wrote: | you might need a -disposition default type option, | otherwise, it introduce some abnormal behavior | jasomill wrote: | That doesn't work[1], but ffmpeg -v quiet | -f data -i file.txt -map 0:0 -c copy -f data - | | does. | | [1] "Encoder 'text' specified, but only '-codec copy' | supported for data streams" | polonbike wrote: | Beside video conversion/compression ? Sound extraction or | processing, image processing, video casting or streaming, | anything related to image/multimedia format, basically | whalesalad wrote: | you can make gif's with it | starkparker wrote: | I've used it for video and audio concatenation of laserdisc | game segments, transcoding audio clips for gamedev, | programmatically generating GIFs of automatically generated | video clips from tests in a CI pipeline, ripping songs and | audio clips from YouTube videos to ogg/mp3, creating GIFs | from burst-shot and time-lapse photography (and decimating | them), excerpting clips from a video without re-encoding, | and compressing or transforming audio on remote servers | where VLC wasn't and couldn't be installed. | bbkane wrote: | Sounds like you already have a process for most of this, | but I found https://github.com/mifi/editly to be | incredibly helpful to run ffmpeg and make my little time | lapse video. Could be useful for others | dylan604 wrote: | ffmpeg can produce an amazing amount of analysis | ThrowawayTestr wrote: | I use ffmpeg everytime I download a YouTube video. | brucethemoose2 wrote: | Vapoursynth can be used for image processing too (albeit more | clumsily with variable size input), and its also a great way | to hook into PyTorch. | Thaxll wrote: | Does it can fix broken files? | PreachSoup wrote: | Can you run doom on it? | j1elo wrote: | Interesting! I'm among today's lucky 10,000 in learning for the | first time about VapourSynth. | | How come it only has 4 measly entries in HN, and none got any | traction. I've posted a new entry, just for the curiosity of | others. | aidenn0 wrote: | I'm guessing from context that VapourSynth is a frame-server in | the vein of avisynth? If so, does it run on Linux? Avisynth was | the single biggest thing I missed when moving to Linux about 20 | years ago. | | [edit] | | found the docs; it's available on Linux[1]. I'm definitely | looking into it tonight because it can't be _worse_ than | writing ffmpeg CLI filtergraphs! | | 1: http://www.vapoursynth.com/doc/installation.html#linux- | insta... | brucethemoose2 wrote: | Yep, and its so much better than ffmpeg CLI that its not even | funny. | | This is a pretty good (but not comprehensive) db of the | filters: https://vsdb.top/ | naikrovek wrote: | I don't understand why you would want to piggyback on this | story to say this. | | are people just itching for reasons to dive into show & tell or | to wax poetic about how _they_ have solved the problem for | _years_? I really don 't understand people at all, because I | don't understand why people do this. and I'm sure I've done it, | too. | tetris11 wrote: | There is hype for FEAT. People who have achieved similar FEAT | perk up their heads but say nothing. | | Hype for FEAT is beyond sensibility. People with similar FEAT | are bristled by this and wish that their projects received | even a fraction of FEAT's hype. | | I think it's normal. | naikrovek wrote: | not gonna define FEAT, then? ok. | tetris11 wrote: | ...in this case, multi-threading. In other cases; AI | workflows that others commercialize, a new type system in | a language that already exists in another, a new sampling | algorithm that has already existed by another name for | decades, a permaculture innovation that farmers have been | using for aeons, the list goes on... | naikrovek wrote: | just say "feature". | | language is for communicating. don't impede that | communication by using unnecessary terms. | brucethemoose2 wrote: | Not gonna lie, I think VapourSynth has been flying under the | radar for far too long, and is an awesome largely unused | alternative to ffmpeg filter chains in certain cases. I don't | see any harm in piggybacking on an ffmpeg story to bring it | up, especially if readers find it useful. | | It's been threaded since its inception, so it seems somewhat | topical. | badrabbit wrote: | When I stream 4k from my laptop ffmpeg gets very intense about | cpu usage to the point fans are constantly at high speed and it's | distracting. I hope this helps in some way. I have a fairly | decent specces mid-tier laptop. | hereme888 wrote: | I believe ffmpeg can be compiled to support the GPU, if your | laptop has one. It works at least for CUDA-enabled GPUs | (https://docs.nvidia.com/video-technologies/video-codec- | sdk/1...) | | Talk with ChatGPT about it and see if you can do it. | isatty wrote: | ??? | | Just Google it. | badrabbit wrote: | Thanks, I don't think it has a gpu separate from that is | cuda-enabled but I have other systems that do, will look into | it. | kevincox wrote: | I've always wondered if better multi-core performance can come | from processing different keyframe segments separately. | | IIUC all current encoders that support parallelism work by | multiple threads working on the same frame at the same time. | Often times the frame is split into regions and each thread | focuses on a specific region of the frame. This approach can have | a (usually small) quality/efficiency cost and requires per- | encoder logic to assemble those regions into a single frame. | | What if instead/additionally different keyframe segments are | processed independently? So if keyframes are every 60 frames | ffmpeg will read 60 frames pass that to the first thread, the | next 60 to the next thread, ... then assemble the results | basically by concatenating them. It seems like this could be used | to parallelize any codec in a fairly generic way and it should be | more efficient as there is no thread-communication overhead or | splitting of the frame into regions which harms cross-region | compression. | | Off the top of my head I can only think of two issues: | | 1. Requires loading N*keyframe period frames into memory as well | as the overhead memory for encoding N frames. | | 2. Variable keyframe support would require special support as the | keyframe splits will need to be identified before passing the | video to the encoding threads. This may require extra work to be | performed upfront. | | But both of these seem like they won't be an issue in many cases. | Lots of the time I'd be happy to use tons of RAM and output with | a fixed keyframe interval. | | Probably I would combine this with intra-frame parallelization | such as process every frame with 4 threads and then run 8 | keyframe segments in parallel. This way I can get really good | parallelism but only minor quality loss from 4 regions rather | than splitting the video into 32 regions which would harm quality | more. | Hello71 wrote: | your idea also doesn't work with live streaming, and may also | not work with inter-frame filters (depending on | implementation). nonetheless, this exists already with those | limitations: av1an and I believe vapoursynth work more or less | the way you describe, except you don't actually need to load | every chunk into memory, only the current frames. as I | understand, this isn't a major priority for mainstream encoding | pipelines because gop/chunk threading isn't massively better | than intra-frame threading. | kevincox wrote: | It can work with live streaming, you just need to add N | keyframes of latency. With low-latency livestreaming | keyframes are often close together anyways so adding say 4s | of latency to get 4x encoding speed may be a good tradeoff. | bagels wrote: | 4s of latency is not acceptable for applications like live | chat | kevincox wrote: | As I said, "may be". "Live" varies hugely with different | use cases. Sporting events are often broadcast live with | 10s of seconds of latency. But yes, if you are talking to | a chat in real-time a few seconds can make a huge | difference. | mort96 wrote: | Well, you don't add 4s of latency for 4x encoding speed | though. You add 4s of latency for very marginal | quality/efficiency improvement and significant encoder | simplification, because the baseline is current frame- | parallel encoders, not sequential encoders. | | Plus, computers aren't quad cores any more, people with | powerful streaming rigs probably have 8 or 16 cores; and | key frames aren't every second. Suddenly you're in this | hellish world where you have to balance latency, CPU | utilization and encoding efficiency. 16 cores at a not-so- | great 8 seconds of extra latency means terrible efficiency | with a key frame every 0.5 second. 16 cores at good | efficiency (say, 4 seconds between key frames) means | terrible 64 second of extra latency. | kevincox wrote: | > except you don't actually need to load every chunk into | memory, only the current frames. | | That's a good point. In the general case of reading from a | pipe you need to buffer it somewhere. But for file-based | inputs the buffering concerns aren't relevant, just the | working memory. | dbrueck wrote: | Actually, not only does it work with live streaming, it's not | an uncommon approach in a number of live streaming | implementations*. To be clear, I'm not talking about low | latency stuff like interactive chat, but e.g. live sports. | | It's one of several reasons why live streams of this type are | often 10-30 seconds behind live. | | * Of course it also depends on where in the pipeline they | hook in - some take the feed directly, in which case every | frame is essentially a key frame. | cudder wrote: | I know next to nothing about video encoders, and in my naive | mind I absolutely thought that parallelism would work just like | you suggested it should. It sounds absolutely wild to me that | they're splitting single frames into multiple segments. Merging | work from different threads for every single frame sounds | wasteful somehow. But I guess it works, if that's how everybody | does it. TIL! | astrange wrote: | Most people concerned about encoding performance are doing | livestreaming and so they can't accept any additional | latency. Splitting a frame into independent segments (called | "slices") doesn't add latency / can even reduce it, and it | recovers from data corruption a bit better, so that's usually | done at the cost of some compression efficiency. | seeknotfind wrote: | Video codecs often encode the delta from the previous frame, | and because this delta is often small, it's efficient to do it | this way. If each thread needed to process the frame | separately, you would need to make significant changes to the | codec, and I hypothesize it would cause the video stream to be | bigger in size. | keehun wrote: | The parent comment referred to "keyframes" instead of just | "frames". Keyframes--unlike normal frames--encode the full | image. That is done in case the "delta" you mentioned could | be dropped in a stream ending up with strange artifacts in | the resulting video output. Keyframes are where the codec | gets to press "reset". | seeknotfind wrote: | Oh right. For non realtime, if you're not IO bound, this is | better. Though I'd wonder how portable the codec code | itself would be. | actionfromafar wrote: | The encoder has a lot of freedom in _how_ it arrives at | the encoded data. | danielrhodes wrote: | Isn't that delta partially based on the last keyframe? I | guess it would be codec dependent, but my understanding is | that keyframes are like a synchronization mechanism where the | decoder catches up to where it should be in time. | astrange wrote: | In most codecs the entropy coder doesn't reset across | frames, so there is enough freedom that you can do | multithreaded decoding. ffmpeg has frame-based and slice- | based threading for this. | | It also has a lossless codec ffv1 where the entropy coder | doesn't reset, so it truly can't be multithreaded. | 0x457 wrote: | Yes, key frames are fully encoded, and some delta frames | are based on the previous frame (which could be keyframe or | another delta frame). Some delta frames (b-frames) can be | based on next frame instead of previous. That's why | sometimes you could have a visual glitch and mess up the | image until the next key frame. | | I'd assume if each thread is working on its own key frame, | it would be difficult to make b-frames work? Live content | also probably makes it hard. | rokweom wrote: | There's already software that does this: | https://github.com/master-of-zen/Av1an Encoding this way should | indeed improve quality slightly. Whether that is actually | noticeable/measurable... I'm not sure. | rnnr wrote: | ffmpeg and x265 allow you to do this too. frame-threads=1 | will use 1 thread per frame addressing the issue OP | mentioned, without big perf penalty, in contrary to 'pools' | switch which sets the threads to be used for encoding. | jamal-kumar wrote: | I've messed around with av1an. Keep in mind the software used | for scene chunking, L-SMASH, is only documented in Japanese | [1], but it does the trick pretty well as long as you're not | messing with huge dimensions like HD VR where you have video | dimensions that do stuff like crash quicktime on a mac | | [1] http://l-smash.github.io/l-smash/ | PatronBernard wrote: | IIUC - International Islamic University Chittagong? | nolist_policy wrote: | IIUC - If I understand correctly. | KineticLensman wrote: | If I Understand Correctly | bmurphy1976 wrote: | This definitely happens. This is how videos uploaded to | Facebook or YouTube become available so quickly. The video is | split into chunks based on key frame, the chunks are farmed out | to a cluster of servers and encoded in parallel, and the | outputs are then re-assembled into the final file. | pier25 wrote: | So does this mean that FFMPEG will be able to use multiple cores | with all the included codecs? | | I'm using FFMPEG to encode MP3 with LAME for an audio hosting | service and it would be great to improve encode times for long | files. | pseudosavant wrote: | Doubtful. Many codecs like MP3 aren't well suited to efficient | multi-threaded encoding. | mrbuttons454 wrote: | Will this allow multiple machines to process the same file? If | not is there anything out there that can? | muragekibicho wrote: | Shameless plug. I teach the FFmpeg C api here : | https://www.udemy.com/course/part-1-video-coding-with-ffmpeg... | 3abiton wrote: | That's such a very niche topic to teach. What usecases? | muragekibicho wrote: | It's for engineers tired of memorizing long weird CLI | commands. I teach you the underlying C data structures so you | can get out of command line hell and make the most out of | your time! | ElijahLynn wrote: | This must have been quite the challenge to continually rebase the | ongoing changes coming in on the daily. Wow. Now that it is | actually in, it should be much easier to go forward. | | Big win too! This is going to really speed things up! | shp0ngle wrote: | I don't know anything about ffmpeg codebase, but I just wonder... | how would I go about doing this _slowly_ without completely doing | a giant commit that changes everything? | | The presentation says it's 700 commits. Was that a separate | branch? Or was it slowly merged back to the project? | | Well I can look at github I guess | shp0ngle wrote: | It seems ffmpeg uses the mailing list patch way of doing "PRs", | which is... well it is what it is. It doesn't help me | understand the process unless I just go through all the mailing | list archives, I guess. | asylteltine wrote: | Ugh why? That is so old school | _joel wrote: | Like the linux kernel? | shp0ngle wrote: | I mean they might be used to doing that as ffmpeg is | predating github. (and git.) | | as long as it works for them... | ajhai wrote: | This will hopefully improve the startup times for FFmpeg when | streaming from virtual display buffers. We use FFmpeg in LLMStack | (low-code framework to build and run LLM agents) to stream | browser video. We use playwright to automate browser interactions | and provide that as tool to the LLM. When this tool is invoked, | we stream the video of these browser interactions with FFmpeg by | streaming the virtual display buffer the browser is using. | | There is a noticeable delay booting up this pipeline for each | tool invoke right now. We are working on putting in some | optimizations but improvements in FFmpeg will definitely help. | https://github.com/trypromptly/LLMStack is the project repo for | the curious. | bane wrote: | A theory about this that may also affect other older solid | software: the assumptions made on where to optimally "split" a | problem for multi-threading/processing has likely changed over | time. | | It wasn't that long ago that reading, processing, and rendering | the contents of a single image took a noticeable amount of time. | But both hardware and software techniques have gotten | significantly faster. What may have made sense many years ago | (lots of workers on a frame) may not matter today when a single | worker can process a frame or a group of frames more efficiently | than the overhead of spinning up a bunch of workers to do the | same task. | | But where to move that split now? Ultra-low-end CPUs now ship | with multiple cores and you can get over 100 easily on high-end | systems, system RAM is faster than ever, interconnect moves | almost a TB/sec on consumer hardware, GPUs are in everything, and | SSDs are now faster than the RAM I grew up with (at least on | continuous transfer). Basically the systems of today are entirely | different beasts to the ones commonly on the market when FFmpeg | was created. | | This is tremendous work that requires lots of rethinking about | how the workload needs to be defined, scheduled, distributed, | tracked, and merged back into a final output. Kudos to the team | for being willing to take it on. FFmpeg is one of those "pinnacle | of open source" infrastructure components that civilizations are | built from. | MightyBuzzard wrote: | It's not the codecs that were multithreaded in this release. | Pretty much all modern codecs are already multithreaded. What | they decided to parallelize is ffmpeg itself. You know, the | filter graphs and such. They didn't do anything to the codecs | themselves. | vfclists wrote: | All without Rust? | 71a54xd wrote: | Random reach here but has anyone here managed to get FFMPeg to | render JS text over a video? I've been thinking about this | workflow and just haven't quite figured it out yet, only a | prototype in MoviePy but I'd like to move away from that. | sylware wrote: | I think this was not "basic" multi-threading: they were careful | about keeping latency as low as possible and some internal | modifications of ffmpeg libs had to be done. | | That said, I don't think we still get input buffering (for HLS). | Const-me wrote: | Intel Core Duo CPU was released in 2006. By then it was obvious | computationally intensive programs need multithreading, these | Unix-style processes are no longer adequate. | | I wonder why did it took so long for FFmpeg? | | BTW, MS Media foundation is a functional equivalent of FFmpeg. It | was released as a part of Windows Vista in 2006, and is heavily | multithreaded by design. | atif089 wrote: | Does it mean that my FFMPEG H264 encoding until now was single | threaded? ___________________________________________________________________ (page generated 2023-12-12 23:00 UTC)