hngopher.com

       [HN Gopher] YouTube-dl has an interpreter for a subset of JavaSc...
       ___________________________________________________________________
        
       YouTube-dl has an interpreter for a subset of JavaScript in 870
       lines of Python
        
       Author : yuuta
       Score  : 304 points
       Date   : 2022-09-10 18:12 UTC (4 hours ago)
        
 (HTM) web link (twitter.com)
 (TXT) w3m dump (twitter.com)
        
       | lolinder wrote:
       | To be clear, this is an _extremely_ tiny subset of JS. It looks
       | like they only implemented the features needed to run a very
       | specific function. For example, the only symbol allowed after
       | "new" is "Date", everything else throws an exception.
       | 
       | It's still fun that it's there, but it's not as big a deal as it
       | sounds from the tweet.
        
         | krab wrote:
         | It will only grow - as new scripts will need to be interpreted,
         | new features will be added.
        
           | lolinder wrote:
           | I would be horrified if this grew much further. It's
           | perfectly fine for its current scope, but the architecture
           | would not scale at all to a full interpreter without
           | essentially starting from scratch.
        
         | em-bee wrote:
         | if it's going to need much more than that then it probably
         | would make more sense to port the whole application to
         | javascript instead.
         | 
         | but then this could be turned into a commandline browser that
         | is able to interpret a whole web-page and save the resulting
         | html structure instead of the source as curl/wget would do.
        
       | mdaniel wrote:
       | I was expecting this to be about Duktape
       | <https://github.com/svaarala/duktape>, but heh, for sure no. I'd
       | bet $1 there's no way youtube-dl would switch, but I wonder if
       | yt-dlp would?
        
       | kristopolous wrote:
       | To understand why, I have a far simpler tool that focuses on a
       | subset of sites (adult content video aggregators)
       | 
       | https://github.com/kristopolous/tube-get
       | 
       | It too deals with this problem but does so in a way that'd be
       | easy to maliciously sabotage
       | 
       | Look right about here https://github.com/kristopolous/tube-
       | get/blob/master/tube-ge...
       | 
       | As to why this program exists, this was originally written
       | between about 2010-2015 or so technically predates the yt-*
       | ecosystem.
       | 
       | The tool still works fine and it's not a strict subset of yt-dlp
       | or YouTube-dl because being a different approach, although it's
       | overall site coverage is smaller, I've had it be a "second try"
       | system when yt-* fails and it comes up with success maybe about
       | half the time
        
       | [deleted]
        
       | homarp wrote:
       | the tests for it: https://github.com/ytdl-org/youtube-
       | dl/blob/master/test/test...
        
       | lewisl9029 wrote:
       | Another really cool JS dialect I recently learned about is njs
       | from the nginx team: https://github.com/nginx/njs
       | 
       | This video goes into some of the design and tradeoffs:
       | https://www.youtube.com/watch?v=Jc_L6UffFOs
       | 
       | TL;DW: they optimized for fast creation/destruction of low-
       | footprint VMs with no JIT or garbage collection.
        
       | M30 wrote:
       | How should a programming noob interpret this? Be impressed at
       | what was achieved here? Be concerned about security implications
       | using the tool? Something else entirely?
        
         | Test0129 wrote:
         | > How should a programming noob interpret this?
         | 
         | Usually in a virtual machine.
        
         | smcl wrote:
         | All of the above, really.
        
         | tenebrisalietum wrote:
         | > How should a programming noob interpret this?
         | 
         | The browser is client-facing and everything there is possible
         | to reverse engineer and figure out. So if you design a web-
         | based application, and are depending on client-side Javascript
         | for any security or distribution enforcement, it can be
         | helpful, but can ultimately be unwound and cracked even if
         | obfuscated, etc.
         | 
         | > Be impressed at what was achieved here?
         | 
         | Yes. Try to download a YouTube video with out it or an online
         | service which is probably using it internally.
        
         | rkangel wrote:
         | This is the compiler writer equivalent of parsing HTML with
         | regex:
         | 
         | It is technically wrong - it isn't a sufficiently rich and
         | powerful approach to handle all JS (HTML) that you might throw
         | at it. It'll work for a while until it eventually barfs when
         | you least expect it.
         | 
         | EXCEPT that if the inputs you are giving it come from some
         | understood source(s) that aren't likely to change, then a
         | simpler approach to the "all singing all dancing" correct may
         | be appropriate and justified. E.g. because it might be easier
         | to write, easier to maintain and/or less attack surface etc.
        
         | lolinder wrote:
         | It's an extremely tiny subset of JS--as an example, the only
         | object that can be instantiated is Date. Anything other than
         | "Date" after "new" throws an exception.
         | 
         | It's definitely neat, but not especially useful outside of the
         | confines of its current application, and the security concerns
         | of such a tiny subset will be minimal.
        
           | petters wrote:
           | > Anything other than "Date" after "new" throws an exception
           | 
           | It's even very sensitive to white space.
        
         | bjt2n3904 wrote:
         | The goal of youtube-dl is to download a video off of YouTube
         | for offline storage.
         | 
         | This isn't something YouTube particularly enjoys. They would
         | rather you keep coming back -- every visit is more ad revenue
         | for them. If you have an offline copy, you don't need to visit
         | YouTube anymore.
         | 
         | YouTube has an incentive, therefore, to make it more difficult
         | to download (or "scrape") their content.
         | 
         | I'm not particularly sure of the specific details, but
         | apparently YouTube has added JavaScript (a programming language
         | that executes in the browser) as a hurdle to jump over. A
         | simple python script doesn't have enough brains to execute
         | JavaScript, only enough to realize that it exists. (Clearly,
         | youtube-dl is sophistication enough to have jumped over it.)
         | 
         | These are the conclusions I come to, having written software
         | for about a decade.
         | 
         | 1) Once you give information to someone, be it text, pictures,
         | sound, or video -- they will do whatever they want with it, and
         | you have no control. Oh, yes -- it may be illegal. Maybe
         | unethical. But the fact of the matter is you do not have
         | control over information once it leaves your hands.
         | 
         | 2) Adding hurdles to make it harder to access the information
         | does little to stop someone who is dedicated to accessing it.
         | 
         | 3) Implementing a subset of JavaScript in such an elegant and
         | tiny manner is quite impressive.
         | 
         | How you interpret these facts depends on your worldviews. If
         | you are a media and content creator, you will view these facts
         | differently than a politician, and a teenager.
         | 
         | As an engineer and amateur philosopher, I certainly support the
         | rights of content creators to be paid for their work. And yet,
         | I fear that more and more, content creators want to lease me a
         | right to listen their music, instead of own a copy of it.
         | 
         | I used to own CDs, DVDs, movies, and books. What happens if
         | Amazon or YouTube decides to not serve me anymore? Anything
         | I've "purchased" from them, I lose access to.
         | 
         | Further more, if I create a song, I used to be able to burn
         | copies of CDs and distribute it on the street corners. Now, you
         | have to sign up to stream on Spotify. This is a double edged
         | sword -- I get a wide audience, but Spotify will do whatever
         | they want with me.
         | 
         | This troubles me.
        
       | jraph wrote:
       | I do wonder why YouTube does not try harder to make it difficult
       | to do this computation meant to prove you are a legit YouTube web
       | client. Providing an easy-to-find, simple JS function
       | interpretable with 900 lines of Python is like they don't try at
       | all. They might as well do nothing.
       | 
       | Or is their goal just to make youtube-dl not 100% reliable? Or to
       | be able to say "look, you are running our code in a way we did
       | not intend, you can't do this because you are breaking the EULA"?
        
         | Cthulhu_ wrote:
         | I'm guessing the amount of people using it is low enough to not
         | bother with mitigation. Then again, there's a LOT of YT videos
         | that take clips from other videos (which in most cases falls
         | under fair use), which I can imagine would use this tool.
        
         | Arnavion wrote:
         | They do make it harder from time to time. In fact yt-dlp's
         | interpreter has been broken for a month or so now and the devs
         | finally gave up and told users to just install PhantomJS (which
         | itself hasn't been updated since 2016 and probably has bugs /
         | vulns of its own, but whatever).
         | 
         | https://github.com/yt-dlp/yt-dlp/issues/4635#issuecomment-12...
        
         | zuminator wrote:
         | I'd guess that their efforts to make it harder are limited by
         | the fact that they want YouTube to be able to play on thousands
         | of different low powered set top boxes and cheap phones. So
         | whatever obfuscated code they use has to be simple enough to be
         | run and periodically updated by all these different devices,
         | and that same simplicity makes it emulable.
        
       | rcarmo wrote:
       | Awesome. Even if it's likely incomplete, it might come in really
       | handy for some scraping I need to do...
        
       | Uptrenda wrote:
       | Anyone who has ever pulled a website from a script knows the pain
       | that is Javascript. Normally you want to just get some text and
       | work out the API actions but a lot of sites use horribly
       | obfuscated Javascript -- either because that's what modern web
       | development is (lolz) -- or because its part of their 'security.'
       | That means if you want to write browser-based bots properly --
       | you ought to use a browser. There are special browsers that run
       | 'headlessly' or are designed mostly for bot use. Like
       | https://www.selenium.dev/ which plugs into a few different
       | 'browser engines.'
       | 
       | But now you have another problem. Your simple script goes from
       | being small, simple, self-contained, and elegant gem, to
       | requiring a full browser, specialized drivers, and/or daemons
       | running just to work. If you're using something like Python you
       | just frankly don't have very good packaging. So it's hard to
       | string together all that into a solution and have it magically
       | work for everyone. What YouTube-dl have done is good engineering.
       | Even though it's not a full JS interpreter: they've kept their
       | software lean, self-contained, and easier to use.
        
         | eurasiantiger wrote:
         | Just npm install puppeteer.
        
           | lolinder wrote:
           | Puppeteer is cool, but it's exactly what OP is warning
           | against: it's a full browser that is downloaded and run
           | through npm. It's remarkably well packaged, but still far
           | more error prone than a simple HTTP request, and far more
           | likely to break on its own just with the passage of time.
        
       | haunter wrote:
       | The same in yt-dlp https://github.com/yt-dlp/yt-
       | dlp/blob/master/yt_dlp/jsinterp...
       | 
       | Interesting to see the diffcheck between the two
       | https://www.diffchecker.com/8EJGN27K
        
         | cheschire wrote:
         | Is yt-dlp's implementation being better the reason why I have
         | fewer throttling issues than with youtube-dl?
        
           | [deleted]
        
       | [deleted]
        
       | anony23 wrote:
       | What purpose does it serve?
        
         | [deleted]
        
         | throwaway0984 wrote:
         | IIRC it's used to extract/generate the signatures needed for
         | YouTube media URLs
        
         | oynqr wrote:
         | You need to run some obscured JS to get decent download speeds
         | from Youtube. Something along the lines of PoW.
        
           | db48x wrote:
           | It's not like proof of work at all. It's just a challenge and
           | response; youtube includes a random number in the webpage for
           | each video, and expects to see a request parameter with a
           | particular value calculated from that random number when you
           | request the video. If you don't do the arithmetic it
           | throttles you to 50kb/s.
           | 
           | Since the calculation of the response is done in JS, and they
           | occasionally change the formula, some download programs are
           | moving towards running the JS rather than trying to keep up
           | with the changes.
           | 
           | It's really just bullshit to make people's lives harder.
        
             | xg15 wrote:
             | Next step will probably be moving the calculation to
             | webassembly or requiring the script to fetch the result via
             | websocket or webrtc...
        
             | mistrial9 wrote:
             | .. pirate determination is a thing to behold, as is crazed-
             | repetitive digital grabs.. Its not a fair or accurate
             | characterization to dismiss it as "making people's lives
             | harder" .. it is remarkable that the Debian distros now
             | include ytdl; lets do what is reasonable to make it
             | continue
        
               | db48x wrote:
               | You can't exactly pirate a youtube video, since they're
               | all publicly available.
        
               | MiguelX413 wrote:
               | That's not really how piracy works. I say this as an
               | advocate of it.
        
         | rany_ wrote:
         | They need to run a JavaScript function to download YouTube
         | videos at normal speeds.
         | 
         | Edit: it's also required to download music, otherwise it will
         | just fail
         | 
         | Source:
         | 
         | - https://github.com/ytdl-org/youtube-
         | dl/issues/29326#issuecom...
         | 
         | - https://github.com/ytdl-org/youtube-
         | dl/blob/d619dd712f63aab1...
         | 
         | - https://github.com/ytdl-org/youtube-
         | dl/commit/cf001636600430...
        
           | ajkjk wrote:
           | Wow:                  Overview of the control flow (already
           | known):        The Youtube API provides you with n - your
           | video access token        If their new changes apply to your
           | client (they do for "web") then it is expected your client
           | will modify n based on internal logic. This logic is inside
           | player...base.js        n is modified by a cryptic function
           | Modified n is sent back to server as proof that we're an
           | official client. If you send n unmodified, the server will
           | eventually throttle you.
           | 
           | So they can always change the function to keep you on your
           | toes, hence you need to be able to run semi-arbitrary JS in
           | order to keep using the API.
           | 
           | Waste of human brainpower but I guess that energy is better
           | spent imagining a world where Google isn't in charge instead
           | of kvetching about what they're doing with their influence.
        
         | elaus wrote:
         | I'd have to read up on the specifics as well, but I think
         | basically Youtube uses a lot of obfuscated, rapidly and
         | automatically changing Javascript code to fetch the video data.
         | A project like youtube-dl has to run this code to be able to
         | download videos, because that's what's happening in the browser
         | as well.
        
           | temp_account_32 wrote:
           | For those interested further, in some of the past few weeks
           | youtube-dl had stopped working intermittently for multiple
           | hours at a time, and it was precisely related to this code.
           | 
           | We have a custom-made Discord music bot on our server which
           | uses ytdl to stream songs so we can listen together, and at
           | one point we were listening and suddenly got some obscure
           | JavaScript error.
           | 
           | We began joking that there's some bug in the code which
           | breaks it after 6PM, but later found out that Google had
           | changed some of the obfuscated JS and this basically broke
           | this part of code, which prevented us from fetching the song
           | information.
        
           | bitexploder wrote:
           | What is interesting is it seems to be constant cat and mouse.
           | I download a YT vid. It crawls. Update yt-dlp, it flies
           | again. I love yt-dlp and use it a lot.
        
           | londons_explore wrote:
           | If you start a youtube video and then pause it and resume a
           | few days later, you'll notice that the youtube page plays for
           | ~30 seconds (ie. whats buffered) and then the page refreshes.
           | I'd guess this refresh is to pick up the new javascript and
           | any updates to the HTML code.
           | 
           | It's kinda annoying if you have a lot of youtube tabs open
           | for a long time and come back to them.
        
           | lupire wrote:
           | But why not just use a normal JS engine called from Python?
        
         | hadrien01 wrote:
         | It's used in the YouTube extractor: https://github.com/ytdl-
         | org/youtube-dl/blob/d619dd712f63aab1...
         | 
         | I believe YouTube limits your bitrate if you don't pass a
         | specific calculated value; it's possible youtube-dl has to
         | parse and eval JS to get it.
        
           | RicoElectrico wrote:
           | > I believe YouTube limits your bitrate if you don't pass a
           | specific calculated value
           | 
           | It's starting to become Widevine bullshit all over again.
        
             | kevin_thibedeau wrote:
             | It's their platform. They can do with it what they want.
        
               | vukgr wrote:
               | Just because they have the right to do it doesn't make it
               | right.
        
               | jraph wrote:
               | They've also chosen to be a monopoly.
        
               | uwuemu wrote:
        
               | MiguelX413 wrote:
               | There's a difference between combatting entitlement to a
               | platform and complaining about something not serving a
               | greater good. Leftists are also censored either way.
               | Those companies censor or don't censor according to what
               | would maximize profit. Monetization and reach are
               | probably mutually exclusive with freedom.
        
               | tsukikage wrote:
               | > you also think billionares should be taxed more
               | 
               | That's... quite a response in defense of a tool intended
               | for breaching TOS and performing copyright infringement.
               | Can you clarify exactly who it is and isn't OK to steal
               | from, again? I'm struggling here.
        
               | btdmaster wrote:
               | > As a matter of policy (as well as legality), youtube-dl
               | does not include support for services that specialize in
               | infringing copyright. As a rule of thumb, if you cannot
               | easily find a video that the service is quite obviously
               | allowed to distribute (i.e. that has been uploaded by the
               | creator, the creator's distributor, or is published under
               | a free license), the service is probably unfit for
               | inclusion to youtube-dl.
               | 
               | Does using a different User Agent instead of a typical
               | browser amount to copyright infringement in any
               | jurisdiction?
        
               | tsukikage wrote:
               | Copyright law only permits making copies of artistic
               | works when you have license to do so. Youtube only
               | permits use of content it serves in the specific
               | situations described in its terms of service. All other
               | use is prohibited.
               | 
               | You can see the terms of service here:
               | 
               | https://www.youtube.com/static?gl=GB&template=terms
               | 
               | In particular, the first three points in the "permissions
               | and restrictions" section explicitly prohibit tools like
               | youtube-dl. I've pasted these below:
               | The following restrictions apply to your use of the
               | Service. You are not allowed to:              1. access,
               | reproduce, download, distribute, transmit, broadcast,
               | display, sell, license, alter, modify or otherwise use
               | any part of the Service or any Content except: (a) as
               | specifically permitted by the Service;  (b) with prior
               | written permission from YouTube and, if applicable, the
               | respective rights holders; or (c) as permitted by
               | applicable law;         2. circumvent, disable,
               | fraudulently engage, or otherwise interfere with the
               | Service (or attempt to do any of these things), including
               | security-related features or features that: (a) prevent
               | or restrict the copying or other use of Content; or (b)
               | limit the use of the Service or Content;         3.
               | access the Service using any automated means (such as
               | robots, botnets or scrapers) except: (a) in the case of
               | public search engines, in accordance with YouTube's
               | robots.txt file; (b) with YouTube's prior written
               | permission; or (c) as permitted by applicable law;
               | 
               | As a convenient figleaf, it is also possible to use
               | youtube-dl for some purposes that are not dubious. Of the
               | people I know who use the tool, none of them do that.
        
               | Dylan16807 wrote:
               | It's their platform but it's also a web site and that
               | comes with certain expectations of interoperability.
        
               | [deleted]
        
               | forchune3 wrote:
               | it's sort of an extension of the state / surveillance
        
               | RicoElectrico wrote:
               | Many channels would be more than happy to enable download
               | options, if possible.
               | 
               | Hell, how is Creative Commons licence they totally give
               | you option to select, work in case of videos that can't
               | be downloaded in any way?
        
               | londons_explore wrote:
               | But would the channel owner be happy to enable download
               | options if $0.09 per GB downloaded was subtracted from
               | their ad revenue?
        
       | sylware wrote:
       | Nowadays "javascript" refers to the scriptable, grotesquely and
       | absurdely complex and massive web engines, aka google financed
       | blink and geeko, then apple financed webkit, that with their SDK.
       | 
       | The currently obfuscated javascript media players will try to
       | break yt-dlp by leveraging the complexity and size of those
       | scripted web engines. They will make them out of reach to small
       | teamns or individuals and it is even "better", it will force ppl
       | to use apple or google web engine, killing any attempt to provide
       | a real alternative.
       | 
       | A standalone javascript interpreter is actually some work, but
       | seems to stay in the "reasonable" realm: look at quickjs from M.
       | Bellard and friends (the guy who created qemu, ffmpeg, tinycc,
       | etc): plain and simple C (no need of a c++ compiler), doing the
       | job more that well enough.
       | 
       | That's why noscript/basic (x)html is so much important.
        
         | dtx1 wrote:
         | > but seems to stay in the "reasonable" realm
         | 
         | > M. Bellard and friends
         | 
         | Chose one, that dude is a wizard wielding c like a brain
         | surgeon wields a scalpel.
        
         | randyrand wrote:
         | Chrome and Safari both have open source JS engines...
        
           | userbinator wrote:
           | That's beside the point. Open-source is not useful to the
           | smaller players if it is too complex to comprehend and
           | constantly churned.
        
         | olliej wrote:
         | Yeah I agree with almost all of this - the massive size and
         | complexity of commercial engines makes it seem like JS the
         | language must also be complex.
         | 
         | I also agree with the idea that these sites will probably be
         | able to/want to create JS that breaks these small/lightweight
         | engines requiring constant work :-/
         | 
         | This final point I disagree with entirely. You can't point to
         | Bellard doing something as evidence that it's reasonable. This
         | is a guy that wrote a program that generated a TV signal via a
         | VGA card. :D
        
         | oblak wrote:
         | ah, but quickjs is an actual js engine. I have tried a couple
         | of versions with real progress between them. This thing here is
         | not
        
         | languageserver wrote:
         | > That's why noscript/basic (x)html is so much important.
         | 
         | xhtml has been dead for a decade
        
       | esprehn wrote:
       | This isn't really JS, it's a purpose built evaluator that's only
       | for evaluating a particular script on YouTube, assuming a huge
       | list of things are true about how YouTube JS is written.
       | 
       | Ex. Its got a hard coded list of methods for String, and it
       | doesn't respect prototypes. It only supports creating Date
       | instances, and won't work if you override the global Date. It
       | parses with regexes and implements all operators with python's
       | operator module (which is the wrong type semantics) etc. Nearly
       | none of the semantics of JS are implemented.
       | 
       | It's sort of the sandwich categorization problem:
       | 
       | If I write a C# "interpreter" in perl thats only 200 lines and
       | just handles string.Join, string.Concat and Console.WriteLine,
       | and it doesn't actually try to implement C# syntax or semantics
       | at all and just uses perl semantics for those operations is it
       | actually C#? :P
       | 
       | I say "not a sandwich".
        
         | Test0129 wrote:
         | This really isn't fair. Just because it doesn't faithfully
         | implement whatever standard Javascript is on doesn't mean it
         | isn't an interpreter. All an interpreter is is something that
         | executes a script directly rather than requiring compilation.
         | It is a defacto interpreter for a subset of javascript. Nothing
         | more, nothing less. The title could be more clear, however.
        
           | baobabKoodaa wrote:
           | There's a huge difference between an interpreter for
           | "JavaScript" and an interpreter for a "subset of JavaScript".
        
             | Test0129 wrote:
             | Making a pedantic argument on what constitutes an
             | interpreter is silly. The title is bad. It is an
             | interpreter. I'll continue to eat downvotes on this because
             | of the pedantry of HN.
        
               | khazhoux wrote:
               | Technically, it's only the pedantry of a _subset_ of HN.
        
               | lupire wrote:
               | It's an interpretation of a subset of the pedantry on HN.
        
               | jraph wrote:
               | I didn't downvote, but I don't think esprehn is being
               | unfair. Their comment is very informative. They didn't
               | argue that what was implemented is not an interpreter,
               | they did explain why it's not a JavaScript interpreter
               | and not even an interpreter for a subset of JavaScript.
               | It's just a special purpose interpreter suitable for
               | YouTube's code that cannot be re-used for any code that
               | uses the subset that it seems to implement.
               | 
               | It's not pedantry (or I'm pedantic). It's a reaction to
               | the title that can lead people to believe that a complete
               | JavaScript interpreter has been written in less than a
               | thousand lines of Python. This reaction is perfectly
               | understandable.
        
               | chess_buster wrote:
               | I evaluated it with my Pedantic Interpreter which only
               | results in the `pedantic` token.
        
               | blondin wrote:
               | my vote is meaningless and i am sorry about that. but
               | just wanted to let you know that what you said made
               | sense. do not let people get to you.
               | 
               | most of us know that a thousand or so lines of code is
               | not a full JavaScript interpreter and cannot be the real
               | thing.
               | 
               | there is no argument or conversation to have about it.
        
               | baobabKoodaa wrote:
               | > Making a pedantic argument on what constitutes an
               | interpreter is silly. The title is bad. It is an
               | interpreter.
               | 
               | It's not a pedantic argument. Based on the title I
               | thought that somebody wrote something akin to V8 in 800
               | lines of Python. After reading the comments I realized
               | those 800 lines just interpret a particular JavaScript
               | function written by Youtube. Those things are different.
               | Pointing out the fact that they are different is not
               | pedantry. The title is misleading and the comments
               | pointing that out are helpful.
        
           | [deleted]
        
           | blast wrote:
           | esprehn didn't say it isn't an interpreter. They're saying it
           | _is_ an interpreter and what it 's interpreting isn't (all
           | of) JS. That's also what you're saying, so you're agreeing
           | with esprehn.
           | 
           | Edit: You misunderstood baobabKoodaa in the same way. Nobody
           | is arguing about what constitutes an interpreter, except you.
           | The question is only what language is being interpreted.
           | 
           | Before accusing someone of pedantry, it would first be good
           | not to completely misread them.
        
         | blast wrote:
         | I suppose this means it would be easy for YouTube to fuck with
         | youtube-dl simply by throwing in more features of JS?
        
           | joshenders wrote:
           | Cat, meet mouse.
        
         | dang wrote:
         | Ok, we've changed this title to shrink the scope of the
         | interpreter.
         | 
         | Submitted title was "YouTube-dl has a JavaScript interpreter
         | written in 870 lines of Python".
        
         | jraph wrote:
         | And as a user of youtube-dl, I'm quite happy about this. This
         | probably allows a very safe, restricted "subset" of JS. Way
         | better than using a full JS engine. 900 lines is still small
         | and manageable.
        
           | jiggawatts wrote:
           | That's the exact same logic I hear from developers who say
           | things like:
           | 
           | Why do I need a full XML parser when I can just extract what
           | I need with regex?
           | 
           | And:
           | 
           | All that RPC IDL stuff is overcomplicated, REST is so much
           | easier because I can just write the client by hand.
        
           | sebzim4500 wrote:
           | I'm trying to get the thread model here. Is the concern that
           | Youtube will inject JS into the payload which tries to break
           | out of the youtuble-dl js sandbox using some zero day in
           | whatever js engine they would use instead?
        
             | rwmj wrote:
             | Google attempting zero days on client computers would be
             | something. It's not totally without precedent (Sony CD
             | rootkits - https://en.wikipedia.org/wiki/Sony_BMG_copy_prot
             | ection_rootk...) but would still be major news.
        
             | [deleted]
        
             | loeg wrote:
             | youtube-dl targets a lot of websites other than Google
             | properties, many of which are a lot sketchier (think, uh,
             | NSFW streaming sites).
        
             | kevingadd wrote:
             | Embedding a whole js engine and then interopping with it
             | from python would be non trivial. Good luck fixing any bugs
             | or corner cases you hit that way. The V8 and spidermonkey
             | embedding apis are both c++ (iirc) and non trivial to use
             | correctly.
             | 
             | Having full control like this +simple code is probably
             | lower risk and more maintainable, even if there's the
             | challenge of expanding feature set if scripts change.
             | 
             | The alternative would be a console js shell, but those are
             | very different from browsers so that poses it's own
             | challenges.
        
               | esprehn wrote:
               | Fwiw there are python bindings for QuickJS and Duktape:
               | 
               | https://github.com/PetterS/quickjs
               | 
               | https://github.com/stefano/pyduktape
               | 
               | https://github.com/amol-/dukpy
               | 
               | I can't speak to the quality of those bindings, but they
               | do seem maintained.
        
               | em-bee wrote:
               | apparently yt-dlp is somehow calling out to a js engine
               | if available
        
             | jraph wrote:
             | Let's say they end up using Node. Node has a quite complete
             | standard library that lets you access files and everything.
             | 
             | Now if they do it right and only embed some bare JS
             | interpreter, it's still way harder to audit than these <
             | 900 lines, for which it is quite easy to convince oneself
             | that the interpreted script cannot do much.
        
               | geysersam wrote:
               | Nowadays they could probably use Deno. Without
               | permissions it doesn't allow network or file access etc.
        
           | mjevans wrote:
           | yt-dlp sometimes doesn't know how to evaluate the javascript
           | / emcascript and will call out to an optional dependency, a
           | real javascript interpreter, if installed.
        
         | tra3 wrote:
         | It's quacks like a duck at midnight, but it's actually a frog?
        
       | olliej wrote:
       | This is super cool.
       | 
       | Some of the stuff is _kind of_ questionable to me in the sense
       | that I could believe you could probably make some kind of
       | sufficiently wonky JS that this would do the  "wrong" thing.
       | 
       | But it's super cool that they are able to do this as I think it
       | shows that claims of JS complexity based on the size of JS
       | engines is overlooking just how much of that size/complexity
       | comes from the "make it fast" drive vs. what the language
       | requires. Here you have a <1000LoC implementation of the core of
       | the JS language, removed from things like regex engines, GCs,
       | etc.
       | 
       | Mad props to them for even attempting it as well - it simply
       | would not have ever occurred to me to say "let's just write a
       | small JS engine" and I would have spent stupid amounts of time
       | attempting to use JSC* from python instead.
       | 
       | [* JSC appears to be the only JS engine with a pure C API, and
       | the API and ABI are stable so on iOS/macOS at least you can just
       | use the system one which reduces binary size+build annoyance. The
       | downside is that C is terrible, and C++ (differently terrible?
       | :D) APIs make for much more pleasant interfaces to the VM -
       | constructors+destructors mean that you get automatic lifetime
       | management so handles to objects aren't miserable, you can have
       | templates that allow your API to provide handles that have real
       | type information. JSC only has JSValueRef and JSObjectRef, and as
       | a JSObjectRef is a JSValueRef it's actually just a typedef to
       | const JSValueRef :D OTOH other hand I do thing JSC's partially
       | conservative GC is better for stack/temporary variables is
       | superior to Handles for the most part, but it's also absolutely
       | necessary to have an API that isn't absolutely wretched. The real
       | problem with JSC's API is that it has not got any love for many
       | many many .... many years so it doesn't have any way to handle or
       | interact with many modern features without some kludgy wrappers
       | where you push your API objects into JS and have the JS code wrap
       | them up. The API objects are also super slow, as they basically
       | get treated as "oh ffs" objects that obey no rules. I really do
       | wish it would get updated to something more pleasant and really
       | usable.]
        
         | esprehn wrote:
         | This doesn't actually implement any of the JS language though,
         | it just reuses all of python's semantics and hard coded a tiny
         | list of ex. String methods
         | 
         | I also assume you mean mainstream JS engine, but Duktape,
         | JerryScript and QuickJS are all C APIs.
         | 
         | They probably could have used ex.
         | https://github.com/PetterS/quickjs instead of the hacks in the
         | OP linked file.
        
           | olliej wrote:
           | Ah, I only briefly scanned the implementation, and it looked
           | like it was doing actual work - is it mostly string replacing
           | to get approximate python equivalent syntax? Regardless
           | that's disappointing.
           | 
           | You are correct though that I was only thinking of the big
           | engines - bias on my part alas.
           | 
           | For your suggested alternate engines, JerryScript and QuickJS
           | seem more complete than Duktape but I can't quite work out
           | the GC strategy of JerryScript. Bellard says QuickJS has a
           | cycle detector but I'm generally dubious of them based on
           | prior experience.
           | 
           | If I was shipping software that had to actually include a JS
           | engine, if perf was not an issue I would probably use
           | JerryScript or QuickJS as binary size I think would be a more
           | critical component.
        
       ___________________________________________________________________
       (page generated 2022-09-10 23:00 UTC)