[HN Gopher] FFmpeg lands CLI multi-threading as its "most comple...
       FFmpeg lands CLI multi-threading as its "most complex refactoring"
       in decades
       Author : worble
       Score  : 429 points
       Date   : 2023-12-12 15:15 UTC (7 hours ago)
 (HTM) web link (www.phoronix.com)
 (TXT) w3m dump (www.phoronix.com)
       | qrush wrote:
       | Is there a recording of this talk from VDD@Dublin? Can't easily
       | find it on the author's site or here
       | https://www.videolan.org/videolan/events/vdd23/
       | Update: Found here!
       | https://www.youtube.com/watch?v=Z4DS3jiZhfo&t=1221s
       | motoboi wrote:
       | It's nuts to think that in the near future LLM will be able to do
       | that refactoring in seconds. All we need is enough context
       | window.
         | bsdpufferfish wrote:
         | Why would you think this is possible?
           | motoboi wrote:
           | based on my current experience with gpt-4. Have you tried
           | some sort of refactoring in it? Because I have been routinely
           | turning serial scripts into parallel ones with success.
           | Couldn't do the same with larger codebases because the
           | context is not enough for the input code and output
           | refactoring.
         | thfuran wrote:
         | Yeah, it's nuts to think that.
           | motoboi wrote:
           | I'm genuinely confused about your point of view. Have you
           | tried refactoring with GPT-4?
           | I have been refactoring code using gpt-4 for some months now
           | and the limiting factor have been the context size.
           | GPT-4 turbo now have 128k context and I can provide it with
           | larger portions of the code base for the refactors.
           | When we have millions of tokens of context, based on what I'm
           | experiencing now, I can see that a refactoring like the one
           | made in ffmpeg would be possible. Or not? What am I missing
           | here?
           | rocqua wrote:
           | Refactoring is really rather well defined. It's " just
           | transformations that are invariant w.r.t. the outcome". The
           | reason they are hard to automate is that 'invariant w.r.t.
           | the outcome' is a lot more lenient than most semantic models
           | van handle. But this kind of well-defined task with a slight
           | amount of nuance (and decently checkable) seems pretty well-
           | suited to an LLM.
             | nolist_policy wrote:
             | At least for the linux kernel, qemu and other large c
             | projects, this is a solved problem with coccinelle[1].
             | Compared to AI, it has the added benefit of not doing
             | incorrect changes and/or hallucinating stuff or promt
             | injections or ...
             | I guess you could use AI to help create a coccinelle
             | semantic patch.
             | [1] https://en.wikipedia.org/wiki/Coccinelle_(software)
               | dataangel wrote:
               | The part coccinelle does is the part GPT is good at, the
               | problem is neither of them actually reason about the code
         | bigbillheck wrote:
         | Why on earth would you possibly think that?
           | ctoth wrote:
           | Just as your human intelligence lead to you writing the same
           | darn comment as another human above you, AI can often write
           | the same code as a human would, without having to even bring
           | creativity into it! For those of us who write code, this can
           | be useful!
           | motoboi wrote:
           | I'm quite confused by the answers I got from this thread.
           | Haven't you tried refactoring with gpt-4 yet?
             | bigbillheck wrote:
             | > Haven't you tried refactoring with gpt-4 yet?
             | I most certainly have not. At work, I do greenfield
             | development in a specialized problem domain, and I would
             | not trust a model (or, for that matter, a junior developer)
             | to do any kind of refactor in an acceptable manner. (That
             | aside, there's no way I'm goingto expose company code to
             | any sort of outside party without the approval of upper
             | management).
             | At home, I program for fun and self-improvement, and a big
             | part of both is thinking hard about problems. Why would I
             | want to wreck that with asking a model to do it for me?
               | motoboi wrote:
               | Oh, I understand.
               | What if you have an idea and you tell the computer to
               | implement it and then thoroughly check the code?
               | About the self-improvement part, I suppose you don't
               | operate your graphics card yourself, but delegate it to
               | your driver.
               | The LLM is just another tool.
               | smabie wrote:
               | Oh yea your work is far too sophisticated for a llm, got
               | it
               | bigbillheck wrote:
               | Some of us do actual creative work, yes.
             | airstrike wrote:
             | Some people get significance from their ability to write
             | code. To them, admitting an LLM can (or will soon be able
             | to) do their work inflicts cognitive dissonance, so they
             | refuse to believe it. Some refuse to even try it--not
             | realizing that refusing to engage does nothing to hinder
             | the advancement of the tool they fear.
         | mrintegrity wrote:
         | Parent post is getting down voted to oblivion but it seems a
         | reasonable belief for someone who is not highly engaged with
         | AI. I have only the vaguest understanding of how it works (and
         | it's probably wrong) and to my layman mind it also seems like a
         | totally fair assumption, based on experience as a user and the
         | constant flood of news. Please explain why the suggestion that
         | a future AI / sufficiently advanced LLM could refactor a
         | complex codebase is so preposterous.
           | not_the_fda wrote:
           | Because refactoring requires understanding, which LLMs
           | completely lack.
             | ctoth wrote:
             | > Because refactoring requires understanding, which LLMs
             | completely lack.
             | Interesting.
             | Here's a sample question I just asked the AI:
             | We are working on a React mapping application. I am going
             | to give you two components from this application. Our goal
             | is to extract functionality to deduplicate code
             | We can make a single base component, extract code to hooks,
             | or any other strategy which you think is appropriate
             | Here is travel.tsx:
             | <snip>
             | Here is heatmap.tsx:
             | <snip>
             | And here is the response:
             | To deduplicate code and improve maintainability between the
             | `TravelMap` and `Heatmap` components in your React mapping
             | application, we can identify common functionalities and
             | extract them into reusable hooks or components. Here's a
             | strategy considering your code:
             | 1. *Extract Common Hooks*: Both components use similar
             | hooks for managing markers, popups, and map references. We
             | can create custom hooks that encapsulate these
             | functionalities.
             | 2. *Create Shared Utility Functions*: Functions like
             | `initialViewState`, which are identical in both components,
             | can be moved to a separate utility file and imported where
             | needed.
             | 3. *Refactor Map Layers and Sources*: The way map layers
             | and sources are handled is quite similar in both
             | components. A custom hook or component to manage map layers
             | and sources would be beneficial.
             | 4. *Generalize Popup Handling*: The popup handling in both
             | components can be abstracted into a custom hook or a
             | higher-order component.
             | 5. *Combine Common Styling*: Shared styles, such as
             | `MAP_CONTAINER_STYLE`, can be moved to a common CSS module
             | or a styled-component that can be reused.
             | 6. *Refactor Event Handlers*: Functions like `onMapClick`,
             | `onMouseMove`, and `onMouseLeave` have similar structures.
             | These can be part of a custom hook or a utility function
             | with configurable parameters.
             | 7. *Legend and Live Region Components*: These components
             | are used in a similar way in both `TravelMap` and
             | `Heatmap`. Ensure they are designed to be reusable with
             | different props.
             | 8. *Refactor Map Initialization*: The way the map is
             | initialized and configured is quite similar. This can be
             | abstracted into a custom component or hook.
             | ### Example Implementations:
             | #### 1. Custom Hook for Map Markers:
             | ```typescript // useMapMarkers.ts import { useState,
             | useCallback } from 'react';
             | export const useMapMarkers = (initialLat, initialLng,
             | world) => { // Similar marker logic from your components //
             | ...                 return {         marker,
             | onMarkerDrag,         onMarkerDragEnd,
             | updateMarkerPosition,         // other marker related
             | states and functions       };
             | }; ```
             | #### 2. Shared Utility Function:
             | ```typescript // utils/mapUtils.ts export const
             | initialViewState = (map, mapType) => { // Implementation };
             | ```
             | #### 3. Custom Hook for Map Layers and Sources:
             | ```typescript // useMapLayers.ts import { useState,
             | useEffect } from 'react';
             | export const useMapLayers = (props) => { // Logic to handle
             | layers and sources // ...                 return {
             | dataLayer,         patternLayer,         // other layer
             | related states and functions       };
             | }; ```
             | #### 4. Generalized Popup Handling:
             | ```typescript // useMapPopup.ts import { useState } from
             | 'react';
             | export const useMapPopup = () => { // Popup logic from your
             | components // ...                 return {
             | popupState,         openPopup,         closePopup,
             | setPopupData,         // other popup related states and
             | functions       };
             | }; ```
             | ### Integration:
             | Once these hooks and utilities are created, integrate them
             | into your `TravelMap` and `Heatmap` components. Replace the
             | duplicated logic with calls to these shared resources. This
             | will significantly reduce code duplication and make future
             | maintenance easier.
             | ### Testing and Refinement:
             | After refactoring, thoroughly test both components to
             | ensure that all functionalities work as expected. Watch out
             | for any edge cases or specific behaviors that might be
             | affected by the refactoring.
             | For those suggestions, I might use five out of eight of
             | them, and probably do one or two things differently. But
             | you cannot, with a straight face, say the model did not
             | understand. It clearly did. It suggested reasonable
             | refactors. If being able to refactor means understanding, I
             | guess we have understanding!
             | I could continue with this conversation, ask it to produce
             | the full code for the hooks (I have in my custom prompt to
             | provide outlines) and once the hooks are complete, ask it
             | to rewrite the components using the shared code.
             | Have you ever used one of these models?
               | never_inline wrote:
               | Eliminating duplication and cleaning code is a different
               | type of refactoring than supporting concurrency, which is
               | much much harder.
               | Cleaning up code also follows some well established
               | patterns, performance work is much less pattern-y.
               | Codebases like FFMPEG are one of the kind. I bet you need
               | 10 or 100 times more understanding than the react thing
               | you mentioned above.
               | One day maybe AI can do it, but it probably won't be LLM.
               | It would be something which can understand symbols and
               | math.
               | ctoth wrote:
               | Ah, we're having some classic goalpost moving!
               | > Because refactoring requires understanding, which LLMs
               | completely lack.
               | <demonstration that an LLM can refactor code>
               | > Cleaning up code also follows some well established
               | patterns, performance work is much less pattern-y.
               | Just as writing shitty react apps follow patterns, low-
               | level performance and concurrency work also follow
               | patterns. See [0] for a sample.
               | > I bet you need 10 or 100 times more understanding
               | Okay, so a 10 or 100 times larger model? Sounds like
               | something we'll have next year, and certainly within a
               | decade.
               | > One day maybe AI can do it, but it probably won't be
               | LLM. It would be something which can understand symbols
               | and math.
               | You do understand that the reason some of the earlier
               | GPTs had trouble with symbols and math was the
               | tokenization scheme, completely separate from how they
               | work in general, right?
               | [0]: C++ Concurrency in Action: Practical Multithreading
               | 1st Edition https://www.amazon.com/C-Concurrency-Action-
               | Practical-Multit...
               | kcbanner wrote:
               | > Because refactoring requires understanding, which LLMs
               | completely lack.
               | It's obvious from context here that the refactoring that
               | was mentioned was specifically around concurrency, not
               | simply cleaning up code.
               | ctoth wrote:
               | So if I show you an LLM implementing concurrency, will
               | you concede the point? Is this your true objection?
               | https://chat.openai.com/share/7c41f59a-c21c-4abd-876c-c95
               | 647...
               | malcolmgreaves wrote:
               | Hope you're looking for good-faith discussion here. I'll
               | assume that you're looking for a response where someone
               | has taken the time to read through your previous messages
               | and also the linked ChatGPT interaction logs.
               | What you've shown is actually a great example of the what
               | folks mean that LLMs lack any sort of understanding.
               | They're fundamentally predict-the-next-token machines;
               | they regurgitate and mix parts of their training data in
               | order to satisfy the token prediction loss function they
               | were trained with.
               | In the linked example you provided, *you* are the one
               | that needs to provide the understanding. It's a rather
               | lengthly back-and-forth to get that code into a somewhat
               | useable state. Importantly, if you didn't tell it to fix
               | things (sqlite connections over threads, etc.), it would
               | have failed.
               | And while it's concurrent, it's using threads, so it's
               | not going to be doing any work in parallel. The example
               | you have mixes some IO and compute-bound looking
               | operations.
               | So, if your need was to refactor your original code to
               | _actually be fast_, ChatGPT demonstrated it doesn't
               | understand nearly enough to actually make this happen.
               | This thread conversation got started around correcting
               | the misnomer that an LLM would actually ever be able to
               | possess enough knowledge to do actually valuable, complex
               | refactoring and programming.
               | While I believe that LLMs can be good tools for a variety
               | of usecases, they have to be used in short bursts. Since
               | their output is fundamentally unreliable, someone always
               | has to read -- then comprehend -- its output. Giving it
               | too much context and then prompting it in such a way to
               | align its next token prediction with a complex outcome is
               | a highly variable and unstable process. If it outputs
               | millions of tokens, how is someone going to actually
               | review all of this?
               | In my experience using ChatGPT, GPT4, and a few other
               | LLMs, I've found that it's pretty good at coming up with
               | little bits to jog one's own thinking and problem
               | solving. But doing an actual complex task with lots of
               | nuance and semantics-to-be-understood outright? The
               | technology is not quite there yet.
             | atrus wrote:
             | Chess requires understanding, which computers lack. Go
             | requires understanding, which computers lack. X requires Y
             | which _AI technology today_ lacks. AI is a constantly
             | moving goalpost it seems.
               | satvikpendem wrote:
               | > AI is a constantly moving goalpost it seems.
               | alwayshasbeen.png
               | > The AI effect occurs when onlookers discount the
               | behavior of an artificial intelligence program by arguing
               | that it is not "real" intelligence.[1] > Author Pamela
               | McCorduck writes: "It's part of the history of the field
               | of artificial intelligence that every time somebody
               | figured out how to make a computer do something--play
               | good checkers, solve simple but relatively informal
               | problems--there was a chorus of critics to say, 'that's
               | not thinking'."[2] Researcher Rodney Brooks complains:
               | "Every time we figure out a piece of it, it stops being
               | magical; we say, 'Oh, that's just a computation.'"[3]
               | > "AI is whatever hasn't been done yet."
               | > --Larry Tesler
               | https://en.wikipedia.org/wiki/AI_effect
               | The_Colonel wrote:
               | It was always clear that games like chess or go can be
               | played by computers well, even with simple algorithms,
               | because they were completely formalized. The only issue
               | was with performance / finding more efficient algorithms.
               | That's very different from code which (perhaps
               | surprisingly) isn't well formalized. The goals are often
               | vague and it's difficult to figure out what is
               | intentional and what incidental behavior (esp. with
               | imperative code).
           | astrange wrote:
           | The ffmpeg tests take a lot more than a few seconds to run,
           | and an AI god is still going to have trouble debugging
           | multithreaded code.
           | dataangel wrote:
           | AI is not very good at _single threaded_ code which is widely
           | regarded as much easier. The breathless demos don 't
           | generalize well when you truly test on data not in the
           | training set, it's just that most people don't come up with
           | good tests because they take something from the internet,
           | which is the training set. But the code most people need to
           | write is to do tasks that are bespoke to individual
           | businesses/science-experiments/etc not popular CS problems
           | that there are 1000 tutorials online for. When you get into
           | those areas it becomes apparent really quickly that the AI
           | only gets the "vibes" of what code should look like, it
           | doesn't have any mechanistic understanding.
         | ctoth wrote:
         | I see that you got some responses from people who may have not
         | even used gpt-4 as a coding assistant, but I absolutely agree
         | with you. A larger context window, a framework like Aider, and
         | slightly-better tooling so the AI can do renames and other
         | high-level actions without having to provide the entire
         | changeset as patches, and tests. Lots of tests. Then you can
         | just run the migration 15 times, pick from the one which passes
         | all the tests... run another integration pass to merge ideas
         | from the other runs, rinse and repeat. Of course the outer
         | loops will themselves be automated.
         | The trick to this is continuous iteration and feedback. It's
         | remarkable how far I've gotten with GPT using these simple
         | primitives and I know I'm not the only one.
           | beeboobaa wrote:
           | If you think a large refactor is just renaming some stuff
           | then it makes sense you think this.
           | dataangel wrote:
           | If you ask GPT to refactor a single threaded program much
           | smaller than the context window that is truly out of sample
           | into a multithreaded program, its often going to fail. GPT
           | has trouble understanding bit masks in single threaded code,
           | let alone multiple threads.
         | beeboobaa wrote:
         | I have some snake oil to sell you
       | PaywallBuster wrote:
       | the pdf with a presentation requires a password
       | bsdpufferfish wrote:
       | If I'm operating a cloud service like Netflix, then I'm already
       | running thousands of ffmpeg processes on each machine. In other
       | words, it's already a multi-core job.
         | platzhirsch wrote:
         | As multi-core as Python and Ruby then.
           | bsdpufferfish wrote:
           | Yes. The kernel multiplies your efforts for you. It works
           | great for web services.
         | sanitycheck wrote:
         | Curious, what would that many ffmpeg processes be doing at
         | Netflix? I assume new VOD content gets encoded once per format,
         | and the amount of new content added per day is not gigantic.
         | Agree with the general premise, of course, if I've got 10
         | different videos encoding at once then I don't need additional
         | efficiency because the CPU's already maxed out.
           | asveikau wrote:
           | Probably a lot more than once when you consider that
           | different devices have different capabilities, and that they
           | might stream you different bitrates depending on conditions
           | like your network capability, screen resolution, how much
           | you've paid them..
           | You could also imagine they might apply some kind of
           | heuristic to decide to re-encode something based on some
           | condition... Like fine tune encoder settings when a title
           | becomes popular. No idea if they do that, just using some
           | imagination.
           | The_Colonel wrote:
           | I assume they re-compress for each resolution / format, quite
           | possibly they also have different bitrate levels per
           | resolution. Potentially even variants tweaked for certain
           | classes of device (in cases this is not already covered by
           | combination of format/resolution/bitrate). I would also
           | assume they re-compress with new advances in video processing
           | (things like HDR, improved compression).
           | Also, their devs likely want fast feedback on changes - I
           | imagine they might have CI running changes on some standard
           | movies, checking various stats (like SNR) for regressions.
           | Everybody loves if their CI finishes fast, so you might want
           | to compress even a single movie in multiple threads.
             | sanitycheck wrote:
             | They'll be doing VBR encodes to DASH, HLS & (I guess still)
             | MSS which covers the resolutions & formats... DRM will be
             | what prevents high res content from working on some "less-
             | trusted" platforms so the same encodes should work.
             | (Plus a couple more "legacy" encodes with PIFF instead of
             | CENC for ancient devices, probably.)
             | New tech advances, sure, they probably do re-encode
             | everything sometimes - even knocking a few MB off the size
             | of a movie saves a measurable amount of $$ at that scale.
             | But are there frequent enough tech advances to do that more
             | than a couple of times a year..? The amount of difficult
             | testing (every TV model group from the past 10 years, or
             | something) required for an encode change is horrible. I'm
             | sure they have better automation than anyone else, but I'm
             | guessing it's still somewhat of a nightmare.
             | Youtube, OTOH, I really can imagine having thousands of
             | concurrent ffmpeg processes.
               | canucker2016 wrote:
               | Why bring up assumptions/suppositions about Netflix's
               | encoding process?
               | Their tech blog and tech presentations discuss many of
               | the requirements and steps involved for encoding source
               | media to stream to all the devices that Netflix supports.
               | The Netflix tech blog: https://netflixtechblog.com/ or
               | https://netflixtechblog.medium.com/
               | Netflix seems to use AWS CPU+GPU for encoding, whereas
               | YouTube has gone to the expense of producing an ASIC to
               | do much of their encoding.
               | 2015 blog entry about their video encoding pipeline:
               | https://netflixtechblog.com/high-quality-video-encoding-
               | at-s...
               | 2021 presentation of their media encoding pipeline:
               | https://www.infoq.com/presentations/video-encoding-
               | netflix/
               | An example of their FFmpeg usage - a neural-net video
               | frame downscaler: https://netflixtechblog.com/for-your-
               | eyes-only-improving-net...
               | Their dynamic optimization encoding framework -
               | allocating more bits for complex scenes and fewer bits
               | for simpler, quieter scenes:
               | https://netflixtechblog.com/dynamic-optimizer-a-
               | perceptual-v... and
               | https://netflixtechblog.com/optimized-shot-based-encodes-
               | now...
               | Netflix developed an algorithm for determining video
               | quality - VMAF, which helps determine their encoding
               | decisions: https://netflixtechblog.com/toward-a-
               | practical-perceptual-vi...,
               | https://netflixtechblog.com/vmaf-the-journey-
               | continues-44b51..., https://netflixtechblog.com/toward-a-
               | better-quality-metric-f...
               | astrange wrote:
               | > Their dynamic optimization encoding framework -
               | allocating more bits for complex scenes and fewer bits
               | for simpler, quieter scenes:
               | https://netflixtechblog.com/dynamic-optimizer-a-
               | perceptual-v... and
               | https://netflixtechblog.com/optimized-shot-based-encodes-
               | now...
               | This is overrated - of course that's how you do it, what
               | else would you do?
               | > Mean-squared-error (MSE), typically used for encoder
               | decisions, is a number that doesn't always correlate very
               | nicely with human perception.
               | Academics, the reference MPEG encoder, and old
               | proprietary encoder vendors like On2 VP9 did make
               | decisions this way because their customers didn't know
               | what they wanted. But people who care about quality, i.e.
               | anime and movie pirate college students with a lot of
               | free time, didn't.
               | It looks like they've run x264 in an unnatural mode to
               | get an improvement here, because the default "constant
               | ratefactor" and "psy-rd" always behaved like this.
               | slhck wrote:
               | > This is overrated - of course that's how you do it,
               | what else would you do?
               | That's not what has been done previously for adaptive
               | streaming. I guess you are referring to what encoding
               | modes like CRF do for an individual, entire file? Or
               | where else has this kind of approach been shown before?
               | In the early days of streaming you would've done constant
               | bitrate for MPEG-TS, even adding zero bytes to pad "easy"
               | scenes. Later you'd have selected 2-pass ABR with some
               | VBV bitrate constraints to not mess up the decoding
               | buffer. At the time, YouTube did something where they
               | tried to predict the CRF they'd need to achieve a certain
               | (average) bitrate target (can't find the reference
               | anymore). With per-title encoding (which was also
               | popularized by Netflix) you could change the target
               | bitrates for an entire title based on a previous
               | complexity analysis. It took quite some time for other
               | players in the field to also hop on the per-title
               | encoding train.
               | Going to a per-scene/per-shot level is the novely here,
               | and exhaustively finding the best possible combination of
               | QP/resolution pairs for an entire encoding ladder that
               | also optimizes subjective quality - and not just MSE.
               | astrange wrote:
               | > exhaustively finding the best possible combination of
               | QP/resolution pairs for an entire encoding ladder that
               | also optimizes subjective quality - and not just MSE.
               | This is unnecessary if the encoder is well-written. It's
               | like how some people used to run multipass encoders 3 or
               | 4 times just in case the result got better. You only need
               | one analysis pass to find the optimal quality at a
               | bitrate.
               | slhck wrote:
               | Sure, the whole point of CRF is to set a quality target
               | and forget about it, or, with ABR, to be as good as you
               | can with an average bitrate target (under constraints).
               | But you can't do that across resolutions, e.g. do you
               | pick the higher bitrate 360p version, or the lower
               | bitrate 480p one, considering both coding artifacts and
               | upscaling degradation?
               | astrange wrote:
               | At those two resolutions you'd pick the higher resolution
               | one. I agree that generation of codec doesn't scale all
               | the way up to 4K and at that point you might need to make
               | some smart decisions.
               | I think it should be possible to decide in one shot in
               | the codec though. My memory is that codecs (image and
               | video) have tried implementing scalable resolutions
               | before, but it didn't catch on simply because dropping
               | resolution is almost never better than dropping bitrate.
               | canucker2016 wrote:
               | You're letting the video codec make all the decisions for
               | bitrate allocation.
               | Netflix tries to optimize the encoding parameters per
               | shot/scene.
               | from the dynamic optimization article:
               | - A long video sequence is split in shots ("Shots are
               | portions of video with a relatively short duration,
               | coming from the same camera under fairly constant
               | lighting and environment conditions.")
               | - Each shot is encoded multiple times with different
               | encoding parameters, such as resolutions and qualities
               | (QPs)
               | - Each encode is evaluated using VMAF, which together
               | with its bitrate produces an (R,D) point. One can convert
               | VMAF quality to distortion using different mappings; we
               | tested against the following two, linearly and inversely
               | proportional mappings, which give rise to different
               | temporal aggregation strategies, discussed in the
               | subsequent section
               | - The convex hull of (R,D) points for each shot is
               | calculated. In the following example figures, distortion
               | is inverse of (VMAF+1)
               | - Points from the convex hull, one from each shot, are
               | combined to create an encode for the entire video
               | sequence by following the constant-slope principle and
               | building end-to-end paths in a Trellis
               | - One produces as many aggregate encodes (final operating
               | points) by varying the slope parameter of the R-D curve
               | as necessary in order to cover a desired bitrate/quality
               | range
               | - Final result is a complete R-D or rate-quality (R-Q)
               | curve for the entire video sequence
           | banana_giraffe wrote:
           | It's been reported in the past that Netflix encodes 120
           | different variants of each video they have [1] for different
           | bitrates and different device's needs.
           | And that was years ago, I wouldn't be surprised to learn it's
           | a bigger number now.
           | [1] https://news.ycombinator.com/item?id=4946275
         | The_Colonel wrote:
         | I guess it's irrelevant for Netflix then*. But it sounds great
         | for the remaining 99.99%.
         | * I would be very surprised if Netflix even uses vanilla ffmpeg
           | bsdpufferfish wrote:
           | > But it sounds great for the remaining 99.99%.
           | I believe the vast majority of ffmpeg usages are web
           | services, or one off encodings.
             | The_Colonel wrote:
             | Well, this feature is awesome for one-off encoding by a
             | home user.
             | Subjectively, me compressing my holiday video is much more
             | important than Netflix re-compressing a million of them.
               | buu700 wrote:
               | I use ffmpeg all the time, so this change is much
               | appreciated. Well not really _that_ often, but when I do
               | encode video /audio it's generally with ffmpeg.
         | kevincox wrote:
         | Latency is still valuable. For example YouTube (which IIRC uses
         | ffmpeg) often takes hours to do transcodes. This is likely
         | somewhat due to scheduling but assuming that they can get the
         | same result doing 4x threads for 1/4 of the time they would
         | prefer that as each job finishes faster. The only real question
         | is at what efficiency cost the latency benefit stops being
         | worth it.
           | lelanthran wrote:
           | I think that if you're operating at the scale of Google using
           | a single-threaded ffmpeg will finish your jobs in less time.
           | If you have a queue of 100k videos to process and a cluster
           | of 100 cores, assigning a video to each core as it becomes
           | available is the most efficient way to process them, because
           | your skipping the thread joining time.
           | Anytime there is a queue of jobs, assigning the next job in
           | the queue to the next free core is always going to be faster
           | than assigning the next job to multiple cores.
           | Thaxll wrote:
           | YouTube does not use ffmpeg, at the scale at which they
           | operate it would be too slow / expensive.
           | They use custom hardware just for encoding.
           | fyi they have to transcode over 500h of videos per minute. So
           | multiple that by all the formats they support.
           | They operate at an insane scale, Netflix looks like a garage
           | project for comparison.
             | astrange wrote:
             | There's still decoding. If a service claims to support all
             | kinds of weird formats (like a MOV or AVI from the 90s)
             | that means ffmpeg is running.
               | canucker2016 wrote:
               | Google's use of ffmpeg:
               | https://multimedia.cx/eggs/googles-youtube-uses-ffmpeg/
               | For encoding, recently, they've built their own ASIC to
               | deal with H264 and VP9 encoding (for 7-33x faster
               | encoding compared to CPU-only):
               | https://arstechnica.com/gadgets/2021/04/youtube-is-now-
               | build...
         | stevehiehn wrote:
         | Okaaay, and if I'm not operating a cloud service like Netflix,
         | and I'm not running thousands of ffmpeg processes? In other
         | words, it's not already a multi-core job?
       | brucethemoose2 wrote:
       | Meanwhile, I've been enjoying threaded filter processing in
       | VapourSynth for nearly a decade.
       | Not that this isn't great. Its fantastic. But TBH its not really
       | going to change my workflow of VapourSynth preprocessing + av1an
       | encoding for "quality" video encodes.
         | tetris11 wrote:
         | I believe I have too with gstreamer's pipe framework for
         | threading, but ffmpeg's syntax has stuck in my mind far longer
         | than any of the elaborate setups I built with gstreamer. I'm
         | excited for this development
         | dylan604 wrote:
         | FFMPEG does so much more than just video encoding. I use ffmpeg
         | all day every day, and only a fraction of the time do I
         | actually make a video.
           | m3kw9 wrote:
           | Like what do you do?
             | andoma wrote:
             | One can use it instead of cat to display text files. Easy
             | syntax to remember.                 ffmpeg -v quiet -f data
             | -i file.txt -map 0:0 -c text -f data -
               | DonHopkins wrote:
               | Good thing it's now multi-threaded so it can process all
               | those command line arguments in parallel!
               | whalesalad wrote:
               | I'm dying.
               | nerpderp82 wrote:
               | https://www.youtube.com/watch?v=9kaIXkImCAM
               | whalesalad wrote:
               | I'm glad we've reached a point where there is quality
               | parody content online for our industry.
               | ElijahLynn wrote:
               | THIS!!! It was so refreshing!
               | Rebelgecko wrote:
               | Check out Krazam. I quote their Microservices video on a
               | regular basis (https://youtu.be/y8OnoxKotPQ)
               | danudey wrote:
               | "Do you know ffmpeg supports OCR? I haven't found the
               | command yet, but it does support it."
               | This is probably 80% of my experience with ffmpeg, to be
               | honest, but the other 20% is invaluable enough anyway.
               | ElijahLynn wrote:
               | That was one of the funniest things I've seen in a
               | while!!!! I had to stop drinking my decaf for fear of
               | spitting it all over my computer I was laughing out loud
               | so much!
               | (ps: and no, it's not Rick Astley/Never Gonna Give You
               | Up)
               | nerpderp82 wrote:
               | The artfully inserted, corrupted predicted frames was
               | :chefskiss:
               | dkjaudyeqooe wrote:
               | I bet ffmpeg special cases that combination of flags and
               | calls cat.
               | yoz wrote:
               | can I use ffmpeg to embed a gif in a Hacker News comment,
               | because I want that so much right now
               | jasomill wrote:
               | No, but you can use ffmpeg to create a GIF from ASCII art
               | embedded in a Hacker News comment:                 $
               | ffmpeg -v quiet -codecs | egrep -i 'gif|ascii'
               | D.V.L. ansi                 ASCII/ANSI art        DEV..S
               | gif                  CompuServe GIF (Graphics Interchange
               | Format)
               | ("D" and "E" in the first field indicate support for
               | decoding and encoding)
               | faitswulff wrote:
               | Ah now I can replace all my useless uses of cat with
               | ffmpeg
               | fransje26 wrote:
               | I use dd for that.                   dd if=./file.txt
               | Can you also format your drive with ffmpeg? I'm looking
               | for a more versatile dd replacement..
               | jasomill wrote:
               | It can't create partition tables or filesystems, so no,
               | but                 ffmpeg -f data -i /dev/zero -map 0:0
               | -c copy -f data - > /dev/sda
               | is roughly equivalent to to                 dd
               | status=progress if=/dev/zero of=/dev/sda
               | dylan604 wrote:
               | you might need a -disposition default type option,
               | otherwise, it introduce some abnormal behavior
               | jasomill wrote:
               | That doesn't work[1], but                 ffmpeg -v quiet
               | -f data -i file.txt -map 0:0 -c copy -f data -
               | does.
               | [1] "Encoder 'text' specified, but only '-codec copy'
               | supported for data streams"
             | polonbike wrote:
             | Beside video conversion/compression ? Sound extraction or
             | processing, image processing, video casting or streaming,
             | anything related to image/multimedia format, basically
             | whalesalad wrote:
             | you can make gif's with it
             | starkparker wrote:
             | I've used it for video and audio concatenation of laserdisc
             | game segments, transcoding audio clips for gamedev,
             | programmatically generating GIFs of automatically generated
             | video clips from tests in a CI pipeline, ripping songs and
             | audio clips from YouTube videos to ogg/mp3, creating GIFs
             | from burst-shot and time-lapse photography (and decimating
             | them), excerpting clips from a video without re-encoding,
             | and compressing or transforming audio on remote servers
             | where VLC wasn't and couldn't be installed.
               | bbkane wrote:
               | Sounds like you already have a process for most of this,
               | but I found https://github.com/mifi/editly to be
               | incredibly helpful to run ffmpeg and make my little time
               | lapse video. Could be useful for others
             | dylan604 wrote:
             | ffmpeg can produce an amazing amount of analysis
             | ThrowawayTestr wrote:
             | I use ffmpeg everytime I download a YouTube video.
           | brucethemoose2 wrote:
           | Vapoursynth can be used for image processing too (albeit more
           | clumsily with variable size input), and its also a great way
           | to hook into PyTorch.
             | Thaxll wrote:
             | Does it can fix broken files?
           | PreachSoup wrote:
           | Can you run doom on it?
         | j1elo wrote:
         | Interesting! I'm among today's lucky 10,000 in learning for the
         | first time about VapourSynth.
         | How come it only has 4 measly entries in HN, and none got any
         | traction. I've posted a new entry, just for the curiosity of
         | others.
         | aidenn0 wrote:
         | I'm guessing from context that VapourSynth is a frame-server in
         | the vein of avisynth? If so, does it run on Linux? Avisynth was
         | the single biggest thing I missed when moving to Linux about 20
         | years ago.
         | [edit]
         | found the docs; it's available on Linux[1]. I'm definitely
         | looking into it tonight because it can't be _worse_ than
         | writing ffmpeg CLI filtergraphs!
         | 1: http://www.vapoursynth.com/doc/installation.html#linux-
         | insta...
           | brucethemoose2 wrote:
           | Yep, and its so much better than ffmpeg CLI that its not even
           | funny.
           | This is a pretty good (but not comprehensive) db of the
           | filters: https://vsdb.top/
         | naikrovek wrote:
         | I don't understand why you would want to piggyback on this
         | story to say this.
         | are people just itching for reasons to dive into show & tell or
         | to wax poetic about how _they_ have solved the problem for
         | _years_? I really don 't understand people at all, because I
         | don't understand why people do this. and I'm sure I've done it,
         | too.
           | tetris11 wrote:
           | There is hype for FEAT. People who have achieved similar FEAT
           | perk up their heads but say nothing.
           | Hype for FEAT is beyond sensibility. People with similar FEAT
           | are bristled by this and wish that their projects received
           | even a fraction of FEAT's hype.
           | I think it's normal.
             | naikrovek wrote:
             | not gonna define FEAT, then? ok.
               | tetris11 wrote:
               | ...in this case, multi-threading. In other cases; AI
               | workflows that others commercialize, a new type system in
               | a language that already exists in another, a new sampling
               | algorithm that has already existed by another name for
               | decades, a permaculture innovation that farmers have been
               | using for aeons, the list goes on...
               | naikrovek wrote:
               | just say "feature".
               | language is for communicating. don't impede that
               | communication by using unnecessary terms.
           | brucethemoose2 wrote:
           | Not gonna lie, I think VapourSynth has been flying under the
           | radar for far too long, and is an awesome largely unused
           | alternative to ffmpeg filter chains in certain cases. I don't
           | see any harm in piggybacking on an ffmpeg story to bring it
           | up, especially if readers find it useful.
           | It's been threaded since its inception, so it seems somewhat
           | topical.
       | badrabbit wrote:
       | When I stream 4k from my laptop ffmpeg gets very intense about
       | cpu usage to the point fans are constantly at high speed and it's
       | distracting. I hope this helps in some way. I have a fairly
       | decent specces mid-tier laptop.
         | hereme888 wrote:
         | I believe ffmpeg can be compiled to support the GPU, if your
         | laptop has one. It works at least for CUDA-enabled GPUs
         | (https://docs.nvidia.com/video-technologies/video-codec-
         | sdk/1...)
         | Talk with ChatGPT about it and see if you can do it.
           | isatty wrote:
           | ???
           | Just Google it.
           | badrabbit wrote:
           | Thanks, I don't think it has a gpu separate from that is
           | cuda-enabled but I have other systems that do, will look into
           | it.
       | kevincox wrote:
       | I've always wondered if better multi-core performance can come
       | from processing different keyframe segments separately.
       | IIUC all current encoders that support parallelism work by
       | multiple threads working on the same frame at the same time.
       | Often times the frame is split into regions and each thread
       | focuses on a specific region of the frame. This approach can have
       | a (usually small) quality/efficiency cost and requires per-
       | encoder logic to assemble those regions into a single frame.
       | What if instead/additionally different keyframe segments are
       | processed independently? So if keyframes are every 60 frames
       | ffmpeg will read 60 frames pass that to the first thread, the
       | next 60 to the next thread, ... then assemble the results
       | basically by concatenating them. It seems like this could be used
       | to parallelize any codec in a fairly generic way and it should be
       | more efficient as there is no thread-communication overhead or
       | splitting of the frame into regions which harms cross-region
       | compression.
       | Off the top of my head I can only think of two issues:
       | 1. Requires loading N*keyframe period frames into memory as well
       | as the overhead memory for encoding N frames.
       | 2. Variable keyframe support would require special support as the
       | keyframe splits will need to be identified before passing the
       | video to the encoding threads. This may require extra work to be
       | performed upfront.
       | But both of these seem like they won't be an issue in many cases.
       | Lots of the time I'd be happy to use tons of RAM and output with
       | a fixed keyframe interval.
       | Probably I would combine this with intra-frame parallelization
       | such as process every frame with 4 threads and then run 8
       | keyframe segments in parallel. This way I can get really good
       | parallelism but only minor quality loss from 4 regions rather
       | than splitting the video into 32 regions which would harm quality
       | more.
         | Hello71 wrote:
         | your idea also doesn't work with live streaming, and may also
         | not work with inter-frame filters (depending on
         | implementation). nonetheless, this exists already with those
         | limitations: av1an and I believe vapoursynth work more or less
         | the way you describe, except you don't actually need to load
         | every chunk into memory, only the current frames. as I
         | understand, this isn't a major priority for mainstream encoding
         | pipelines because gop/chunk threading isn't massively better
         | than intra-frame threading.
           | kevincox wrote:
           | It can work with live streaming, you just need to add N
           | keyframes of latency. With low-latency livestreaming
           | keyframes are often close together anyways so adding say 4s
           | of latency to get 4x encoding speed may be a good tradeoff.
             | bagels wrote:
             | 4s of latency is not acceptable for applications like live
             | chat
               | kevincox wrote:
               | As I said, "may be". "Live" varies hugely with different
               | use cases. Sporting events are often broadcast live with
               | 10s of seconds of latency. But yes, if you are talking to
               | a chat in real-time a few seconds can make a huge
               | difference.
             | mort96 wrote:
             | Well, you don't add 4s of latency for 4x encoding speed
             | though. You add 4s of latency for very marginal
             | quality/efficiency improvement and significant encoder
             | simplification, because the baseline is current frame-
             | parallel encoders, not sequential encoders.
             | Plus, computers aren't quad cores any more, people with
             | powerful streaming rigs probably have 8 or 16 cores; and
             | key frames aren't every second. Suddenly you're in this
             | hellish world where you have to balance latency, CPU
             | utilization and encoding efficiency. 16 cores at a not-so-
             | great 8 seconds of extra latency means terrible efficiency
             | with a key frame every 0.5 second. 16 cores at good
             | efficiency (say, 4 seconds between key frames) means
             | terrible 64 second of extra latency.
           | kevincox wrote:
           | > except you don't actually need to load every chunk into
           | memory, only the current frames.
           | That's a good point. In the general case of reading from a
           | pipe you need to buffer it somewhere. But for file-based
           | inputs the buffering concerns aren't relevant, just the
           | working memory.
           | dbrueck wrote:
           | Actually, not only does it work with live streaming, it's not
           | an uncommon approach in a number of live streaming
           | implementations*. To be clear, I'm not talking about low
           | latency stuff like interactive chat, but e.g. live sports.
           | It's one of several reasons why live streams of this type are
           | often 10-30 seconds behind live.
           | * Of course it also depends on where in the pipeline they
           | hook in - some take the feed directly, in which case every
           | frame is essentially a key frame.
         | cudder wrote:
         | I know next to nothing about video encoders, and in my naive
         | mind I absolutely thought that parallelism would work just like
         | you suggested it should. It sounds absolutely wild to me that
         | they're splitting single frames into multiple segments. Merging
         | work from different threads for every single frame sounds
         | wasteful somehow. But I guess it works, if that's how everybody
         | does it. TIL!
           | astrange wrote:
           | Most people concerned about encoding performance are doing
           | livestreaming and so they can't accept any additional
           | latency. Splitting a frame into independent segments (called
           | "slices") doesn't add latency / can even reduce it, and it
           | recovers from data corruption a bit better, so that's usually
           | done at the cost of some compression efficiency.
         | seeknotfind wrote:
         | Video codecs often encode the delta from the previous frame,
         | and because this delta is often small, it's efficient to do it
         | this way. If each thread needed to process the frame
         | separately, you would need to make significant changes to the
         | codec, and I hypothesize it would cause the video stream to be
         | bigger in size.
           | keehun wrote:
           | The parent comment referred to "keyframes" instead of just
           | "frames". Keyframes--unlike normal frames--encode the full
           | image. That is done in case the "delta" you mentioned could
           | be dropped in a stream ending up with strange artifacts in
           | the resulting video output. Keyframes are where the codec
           | gets to press "reset".
             | seeknotfind wrote:
             | Oh right. For non realtime, if you're not IO bound, this is
             | better. Though I'd wonder how portable the codec code
             | itself would be.
               | actionfromafar wrote:
               | The encoder has a lot of freedom in _how_ it arrives at
               | the encoded data.
           | danielrhodes wrote:
           | Isn't that delta partially based on the last keyframe? I
           | guess it would be codec dependent, but my understanding is
           | that keyframes are like a synchronization mechanism where the
           | decoder catches up to where it should be in time.
             | astrange wrote:
             | In most codecs the entropy coder doesn't reset across
             | frames, so there is enough freedom that you can do
             | multithreaded decoding. ffmpeg has frame-based and slice-
             | based threading for this.
             | It also has a lossless codec ffv1 where the entropy coder
             | doesn't reset, so it truly can't be multithreaded.
             | 0x457 wrote:
             | Yes, key frames are fully encoded, and some delta frames
             | are based on the previous frame (which could be keyframe or
             | another delta frame). Some delta frames (b-frames) can be
             | based on next frame instead of previous. That's why
             | sometimes you could have a visual glitch and mess up the
             | image until the next key frame.
             | I'd assume if each thread is working on its own key frame,
             | it would be difficult to make b-frames work? Live content
             | also probably makes it hard.
         | rokweom wrote:
         | There's already software that does this:
         | https://github.com/master-of-zen/Av1an Encoding this way should
         | indeed improve quality slightly. Whether that is actually
         | noticeable/measurable... I'm not sure.
           | rnnr wrote:
           | ffmpeg and x265 allow you to do this too. frame-threads=1
           | will use 1 thread per frame addressing the issue OP
           | mentioned, without big perf penalty, in contrary to 'pools'
           | switch which sets the threads to be used for encoding.
           | jamal-kumar wrote:
           | I've messed around with av1an. Keep in mind the software used
           | for scene chunking, L-SMASH, is only documented in Japanese
           | [1], but it does the trick pretty well as long as you're not
           | messing with huge dimensions like HD VR where you have video
           | dimensions that do stuff like crash quicktime on a mac
           | [1] http://l-smash.github.io/l-smash/
         | PatronBernard wrote:
         | IIUC - International Islamic University Chittagong?
           | nolist_policy wrote:
           | IIUC - If I understand correctly.
           | KineticLensman wrote:
           | If I Understand Correctly
         | bmurphy1976 wrote:
         | This definitely happens. This is how videos uploaded to
         | Facebook or YouTube become available so quickly. The video is
         | split into chunks based on key frame, the chunks are farmed out
         | to a cluster of servers and encoded in parallel, and the
         | outputs are then re-assembled into the final file.
       | pier25 wrote:
       | So does this mean that FFMPEG will be able to use multiple cores
       | with all the included codecs?
       | I'm using FFMPEG to encode MP3 with LAME for an audio hosting
       | service and it would be great to improve encode times for long
       | files.
         | pseudosavant wrote:
         | Doubtful. Many codecs like MP3 aren't well suited to efficient
         | multi-threaded encoding.
       | mrbuttons454 wrote:
       | Will this allow multiple machines to process the same file? If
       | not is there anything out there that can?
       | muragekibicho wrote:
       | Shameless plug. I teach the FFmpeg C api here :
       | https://www.udemy.com/course/part-1-video-coding-with-ffmpeg...
         | 3abiton wrote:
         | That's such a very niche topic to teach. What usecases?
           | muragekibicho wrote:
           | It's for engineers tired of memorizing long weird CLI
           | commands. I teach you the underlying C data structures so you
           | can get out of command line hell and make the most out of
           | your time!
       | ElijahLynn wrote:
       | This must have been quite the challenge to continually rebase the
       | ongoing changes coming in on the daily. Wow. Now that it is
       | actually in, it should be much easier to go forward.
       | Big win too! This is going to really speed things up!
       | shp0ngle wrote:
       | I don't know anything about ffmpeg codebase, but I just wonder...
       | how would I go about doing this _slowly_ without completely doing
       | a giant commit that changes everything?
       | The presentation says it's 700 commits. Was that a separate
       | branch? Or was it slowly merged back to the project?
       | Well I can look at github I guess
         | shp0ngle wrote:
         | It seems ffmpeg uses the mailing list patch way of doing "PRs",
         | which is... well it is what it is. It doesn't help me
         | understand the process unless I just go through all the mailing
         | list archives, I guess.
           | asylteltine wrote:
           | Ugh why? That is so old school
             | _joel wrote:
             | Like the linux kernel?
             | shp0ngle wrote:
             | I mean they might be used to doing that as ffmpeg is
             | predating github. (and git.)
             | as long as it works for them...
       | ajhai wrote:
       | This will hopefully improve the startup times for FFmpeg when
       | streaming from virtual display buffers. We use FFmpeg in LLMStack
       | (low-code framework to build and run LLM agents) to stream
       | browser video. We use playwright to automate browser interactions
       | and provide that as tool to the LLM. When this tool is invoked,
       | we stream the video of these browser interactions with FFmpeg by
       | streaming the virtual display buffer the browser is using.
       | There is a noticeable delay booting up this pipeline for each
       | tool invoke right now. We are working on putting in some
       | optimizations but improvements in FFmpeg will definitely help.
       | https://github.com/trypromptly/LLMStack is the project repo for
       | the curious.
       | bane wrote:
       | A theory about this that may also affect other older solid
       | software: the assumptions made on where to optimally "split" a
       | problem for multi-threading/processing has likely changed over
       | time.
       | It wasn't that long ago that reading, processing, and rendering
       | the contents of a single image took a noticeable amount of time.
       | But both hardware and software techniques have gotten
       | significantly faster. What may have made sense many years ago
       | (lots of workers on a frame) may not matter today when a single
       | worker can process a frame or a group of frames more efficiently
       | than the overhead of spinning up a bunch of workers to do the
       | same task.
       | But where to move that split now? Ultra-low-end CPUs now ship
       | with multiple cores and you can get over 100 easily on high-end
       | systems, system RAM is faster than ever, interconnect moves
       | almost a TB/sec on consumer hardware, GPUs are in everything, and
       | SSDs are now faster than the RAM I grew up with (at least on
       | continuous transfer). Basically the systems of today are entirely
       | different beasts to the ones commonly on the market when FFmpeg
       | was created.
       | This is tremendous work that requires lots of rethinking about
       | how the workload needs to be defined, scheduled, distributed,
       | tracked, and merged back into a final output. Kudos to the team
       | for being willing to take it on. FFmpeg is one of those "pinnacle
       | of open source" infrastructure components that civilizations are
       | built from.
         | MightyBuzzard wrote:
         | It's not the codecs that were multithreaded in this release.
         | Pretty much all modern codecs are already multithreaded. What
         | they decided to parallelize is ffmpeg itself. You know, the
         | filter graphs and such. They didn't do anything to the codecs
         | themselves.
       | vfclists wrote:
       | All without Rust?
       | 71a54xd wrote:
       | Random reach here but has anyone here managed to get FFMPeg to
       | render JS text over a video? I've been thinking about this
       | workflow and just haven't quite figured it out yet, only a
       | prototype in MoviePy but I'd like to move away from that.
       | sylware wrote:
       | I think this was not "basic" multi-threading: they were careful
       | about keeping latency as low as possible and some internal
       | modifications of ffmpeg libs had to be done.
       | That said, I don't think we still get input buffering (for HLS).
       | Const-me wrote:
       | Intel Core Duo CPU was released in 2006. By then it was obvious
       | computationally intensive programs need multithreading, these
       | Unix-style processes are no longer adequate.
       | I wonder why did it took so long for FFmpeg?
       | BTW, MS Media foundation is a functional equivalent of FFmpeg. It
       | was released as a part of Windows Vista in 2006, and is heavily
       | multithreaded by design.
       | atif089 wrote:
       | Does it mean that my FFMPEG H264 encoding until now was single
       | threaded?
       (page generated 2023-12-12 23:00 UTC)