[HN Gopher] YouTubeTranscript.com
       ___________________________________________________________________
        
       YouTubeTranscript.com
        
       Author : fragmede
       Score  : 174 points
       Date   : 2022-12-18 16:38 UTC (6 hours ago)
        
 (HTM) web link (youtubetranscript.com)
 (TXT) w3m dump (youtubetranscript.com)
        
       | EGreg wrote:
       | Who built this?
       | 
       | We want to partner with you on a topl that autogenerates clips of
       | any video based on the topic start and end
        
       | [deleted]
        
       | dukeofdoom wrote:
       | Something like this would be nice to be able to search local
       | videos for specific keywords spoken too.
        
       | breck wrote:
       | https://youtubetranscript.com/?v=DvxxdZpMFHg
       | 
       | "Error: transcripts disabled for that video"
       | 
       | Why?
        
         | arboles wrote:
         | Youtube didn't generate captions for that video
        
       | banana_giraffe wrote:
       | If you want an CLI version of a similar idea, you can use yt-dlp
       | and some simple jq to pull down the captions for a file:
       | curl `\           yt-dlp -j
       | "https://www.youtube.com/watch?v=aeWyp2vXxqA" | \           jq -r
       | '.automatic_captions.en[] | select(.ext=="json3") | .url'`
        
         | ptspts wrote:
         | Not all YouTube videos with spoken text have automatic
         | captions.
        
           | arboles wrote:
           | https://news.ycombinator.com/item?id=34041455
        
       | modeless wrote:
       | A supremely useful site that searches YouTube transcripts is
       | https://youglish.com. It shows you pronunciations in context for
       | any word or name.
        
         | arboles wrote:
         | Thanks for the link! This site actually has a database of
         | youtube transcripts unlike OP. Shame you can't search fixed
         | strings, like two words in exact order. Though it seems
         | genuinely useful for learning pronunciation as advertised.
        
       | c7DJTLrn wrote:
       | Pretty nice. The sliver of content still worth watching on
       | YouTube doesn't have repetitive stuff or padding to make it to
       | the 10 min mark though.
       | 
       | If you go to the homepage with clear cookies it's just endless
       | amounts of utterly dogshit cookie cutter content. Same clickbait
       | thumbnails with a person pulling an idiotic expression. Even the
       | videos masquerading as educational are entertainment at best. If
       | I had kids I'd do everything in my power to keep them away from
       | YouTube.
        
       | SkeuomorphicBee wrote:
       | Why it is hard-coded to English? When I try to transcribe a video
       | in any other language it throws the error:
       | 
       | > No transcripts were found for any of the requested language
       | codes: ('en',) For this video ([...]) transcripts are available
       | in the following languages: [...]
       | 
       | It even knows what language is available, so why no dump that
       | instead?
        
         | aardvarkr wrote:
         | Probably because it's a hackathon style project that was
         | slapped together and isn't intended to support every use case.
         | I'd recommend reaching out to the author with your feedback
        
       | darepublic wrote:
       | What I've wanted it search by transcript of past videos I've
       | watched. With something like this it seems reasonable to imagine
       | having a set up where every video you navigate to gets
       | transcribed and test is indexed for later search
        
       | amelius wrote:
       | How can this be so fast? I tried it with two random urls, and the
       | transcripts were instant, like less than 100ms.
        
         | charcircuit wrote:
         | YouTube already creates transcripts for accessibility and for
         | feeding into other ML models.
        
         | samanator wrote:
         | Likely cached. Try with a long video with few views.
         | 
         | Edit: after reading other comments it seems this may be using
         | an undocumented api to retrieve the data.
        
         | FinalDestiny wrote:
         | It appears to be using the YouTube auto-generated captions. The
         | output, spacing, and punctuation are identical.
        
       | seydor wrote:
       | This is great and works well. What is the copyright status of
       | transcripts?
        
         | kube-system wrote:
         | They are owned by the copyright owner of the underlying audio.
        
           | seydor wrote:
           | but for example, is it fair use to reproduce? what about
           | indexing?
        
             | [deleted]
        
             | kube-system wrote:
             | Depends on why it is being done.
        
         | wantlotsofcurry wrote:
         | Not sure on the transcript front, but the owner may want to
         | consider removing 'youtube' from their name.
        
       | arboles wrote:
       | This UI and Youtube's UI for transcripts are really nice. When
       | I'm looking for a particular piece of information I can just
       | Ctrl+F and click on the match to play from there. Youtube used to
       | auto-generate subtitles, now it also formats subtitles as
       | transcripts. I wish offline media players had this functionality,
       | if I get distracted for a few seconds I don't have to watch those
       | seconds again, I can speedread over the past couple lines.
        
         | arboles wrote:
         | Call it "panoramic subtitles"
        
       | politelemon wrote:
        
       | alpb wrote:
       | Fwiw YouTube already has a feature for this. Click the "..." next
       | to the share and click Show Transcript. There are also extensions
       | like https://chrome.google.com/webstore/detail/youtube-
       | captions-s... that makes it easy to search them in a popup.
        
         | dobladov wrote:
         | They seem to have moved the functionality to the end of the
         | description, and there you can find the "Show captions" button.
         | 
         | The extension I made to export the transcript was based on this
         | YouTube functionality, I should update the instructions now.
         | 
         | https://chrome.google.com/webstore/detail/youtube2anki/boebb...
        
         | modeless wrote:
         | Regular "find in page" works to search the transcript on
         | YouTube. I use it often.
        
       | cavisne wrote:
       | This script for whisper.cpp works really well
       | 
       | https://github.com/ggerganov/whisper.cpp/blob/master/example...
       | 
       | for my purposes I changed the output from subtitles to txt (so I
       | could pipe the result into chatgpt)
        
         | codetrotter wrote:
         | > so I could pipe the result into chatgpt
         | 
         | Tell us more :)
        
           | cavisne wrote:
           | Nothing too exciting, just "summarize this" followed by the
           | transcript in quotes, it works very well
        
       | gbertb wrote:
       | Is this utilizing whisper to transcribe?
        
         | arboles wrote:
         | Youtube already auto-generates transcripts that you can see in
         | the ... menu in most videos. This website just seems like an
         | alternative frontend?
        
           | EGreg wrote:
           | Or maybe it processes the video with its own backend ? How do
           | you tell
        
             | arboles wrote:
             | Just minutes ago, I compared two transcripts for the same
             | video and they were the exact same. Also on
             | YouTubeTranscript.com swearing was redacted with [_], which
             | is something I've only ever seen on youtube captions.
        
             | kristianheljas wrote:
             | First indication is the processing speed - there's known
             | machine in the world that could transcribe videos in such
             | speed.
        
               | EGreg wrote:
               | How about a cluster in parallel?
        
               | codetrotter wrote:
               | The simplest explanation is often the most probable one.
               | 
               | Why would you reach for a cluster of machines working in
               | parallel, when you could retrieve the already auto-
               | created transcript from YouTube servers?
               | 
               | Also, other comments have pointed out that the
               | transcripts are identical with the ones created by
               | YouTube, which would be unlikely to happen if this
               | service was creating transcripts of their own.
        
       | 88stacks wrote:
       | this will be dead soon due to having youtube in the name
        
         | lukeasch21 wrote:
         | Don't worry, the website solved this issue: > "Probably Won't
         | Fail: Featuring the latest build of an undocumented API."
         | 
         | This will work as long as YouTube doesn't change anything. And
         | since when has YouTube changed anything?
        
         | seydor wrote:
         | People can switch domains
        
           | kristianheljas wrote:
           | Hehe, they might need to switch cloud provider as well. The
           | domain and the underlying content is currently served by no
           | other than google cloud.
        
       | arcturus17 wrote:
       | The copy on your website is pure fire my dude.
        
       | maybelsyrup wrote:
       | I've been dreaming about something like this for years. Huge deal
       | for me. Thank you for your work!
        
       | faikuygur wrote:
       | Here is how to extract Youtube video transcript to an Excel file
       | with Robomotion:
       | 
       | https://demo.robomotion.io/designer/shared/6j984jBCQqYVBCaQk...
        
       | joosters wrote:
       | My only complaint is with the layout of the site - could you
       | please make the transcripts span across the whole width of the
       | page, not just to the right of the video?
       | 
       | My one gripe with Youtube's own transcript box is that it is too
       | narrow, so it is a shame that a website designed to specifically
       | make the transcripts more readable _also_ displays the
       | transcripts in a narrow box.
        
       | is0tope wrote:
       | Maybe this is a bit off topic, but does anyone know the legal
       | footing of having a business with another businesses name in it?
       | For instance, this tool uses the word "YouTube" in its name,
       | though it is used as only a part of it, and it is not a
       | competitor. I've always wondered how this works.
        
         | kube-system wrote:
         | Broadly speaking, it would be trademark infringement if it is
         | used in a way that may confuse others about the source of the
         | product. It doesn't necessarily have to be a specific product
         | that Alphabet has a direct competitor for.
        
         | thaumasiotes wrote:
         | https://en.wikipedia.org/wiki/Nominative_use
         | 
         | > is a legal doctrine that provides an affirmative defense to
         | trademark infringement as enunciated by the United States Ninth
         | Circuit, by which a person may use the trademark of another as
         | a reference to describe the other product, or to compare it to
         | their own.
        
         | tmpburning wrote:
        
         | chiefalchemist wrote:
         | Not sure about YouTube but WordPress does not allow the use of
         | the name. WP in your (e.g.) domain name is ok. WordPress is
         | not.
         | 
         | I'd imagine it's very similar for others. Often a company will
         | pursue a violation if only to be consistent in showing the
         | courts they actively defend their copy right.
        
           | thaumasiotes wrote:
           | > Not sure about YouTube but WordPress does not allow the use
           | of the name.
           | 
           | They may not like it, but they don't have the power to
           | disallow you from using their name to refer to them. That's
           | allowed.
        
             | chiefalchemist wrote:
             | Actually, they do. It's copyright. Plenty of legal
             | precedent. They defend WordPress, but are willing to allow
             | WP.
             | 
             | The law is on their side.
        
         | bdcravens wrote:
         | Most corporations regularly search for such domains, and submit
         | cease-and-desist. I received one related to an eBay-related
         | domain, but in my case, I hadn't built a business around it so
         | it was easy enough to just take the site offline.
        
       | johnlk wrote:
       | Take video > transcribe > ask gpt to summarize > be genius in 2
       | mins
        
       | janandonly wrote:
       | The burning hate I feel for all information to be locked away in
       | a YouTube video. This will solve that real world problem. I love
       | reading (or, skimming) through a long read.
        
         | xuhu wrote:
         | Just checked that google also includes youtube captions in
         | search returns.
        
         | motoboi wrote:
         | Not sure if you know that, but YouTube has a transcript feature
         | available for years now. It's somewhat hidden in the interface,
         | but let's you search with ctrl-F (or command-F) in the
         | transcript
        
           | cratermoon wrote:
           | Yeah this website just extract the transcript that exists and
           | displays it alongside the video. It's nice, but it's not
           | doing the transcribing itself.
        
           | zbrozek wrote:
           | I use this for city council meetings to figure out who said
           | what. It's not easy, but it's better than nothing. YouTube
           | doesn't appear to do so well with multiple speakers.
        
         | alpb wrote:
         | > I feel for all information to be locked away in a YouTube
         | video.
         | 
         | Google Search actually indexes transcripts of a video and shows
         | you some YouTube results based on that even though the
         | title/description of the video doesn't match the search query.
        
         | RBerenguel wrote:
         | I had a huge backlog of tech videos, so I wrote me this (also
         | to play a bit with Haskell, the base idea can be replicated
         | easily in any language though):
         | https://github.com/rberenguel/glancer
        
           | arboles wrote:
           | Heh, this basically makes a storyboard
        
         | Random_Person wrote:
         | I've published almost 1,800 video diaries and this is a game
         | changer for me. I've been wanting to do more with the back
         | catalog, but don't have transcripts.
        
         | thomassmith65 wrote:
         | The ratio of information to misinformation on Youtube seems
         | pretty bad.
         | 
         | To make transcripts easier to access might create more problems
         | than it solves.
         | 
         | Granted I can't make a bullet-proof argument; there's no clear
         | way to quantify that ratio.
        
       | neilv wrote:
       | Hook this up to a language model, and maybe a user could
       | instantly get the _one sentence worth of information_ that the
       | YouTube video creator buries in 10 minutes of monetized noise.
       | 
       | And also save yourself time when the creator teases that they
       | provide the info, but it turns out they don't, they're just
       | trying to get views.
        
         | greggsy wrote:
         | I put something like this together to collect transcripts for
         | uni videos. It's dumps all transcripts into a directory, with
         | URL links, so I can just search the whole directory to find the
         | keyword I need.
         | 
         | Helped a lot with take home exams.
        
         | nostromo wrote:
         | YouTube created that problem by incentivizing longer videos.
         | And now we have videos with tons of fluff.
         | 
         | Similarly Google incentivizes longer webpages, so now we have
         | recipes that start with a novella about grandma's cooking
         | before showing the actual recipe.
         | 
         | It used to be nice to see a video's thumbs up to thumbs down
         | ratio to know if you've been click baited or not before
         | watching the whole video. But that signal has been removed now
         | too.
        
           | anticristi wrote:
           | As a
           | 
           | recipe reader
           | 
           | I want to
           | 
           | dismiss cookies, have a video ad follow me down the page, and
           | read why this cake conjures up memories of the author's
           | childhood, before reaching the actual recipe
           | 
           | so that
           | 
           | I feel connected to the author, before fully committing to
           | mixing ingredients
        
             | slipmagic wrote:
             | Tom Redman had this idea but he took feedback from Twitter.
             | https://digg.com/2021/one-main-character-tom-redman-
             | recipeas...
        
             | neilv wrote:
             | That "user story" is like a tragically misinterpreted
             | comment by someone at a prospective customer, speaking of a
             | special time with their grandmother, but garbled through N
             | layers of field sales, marketing, product managers,
             | engineering hierarchy, and Agile task management.
             | 
             | Including the part about declining more cookies offered (to
             | save room for grandma's lasagna).
        
           | 12907835202 wrote:
           | Are you sure Google prefers longer pages? I find (annoyingly)
           | that Google likes the search version of my page for lots of
           | things. E.g. a page called "best x of the y" the page for
           | searching comments on that page called "best x of y search"
           | where the only text is the title and a search input, will
           | rank really well
        
             | kristianheljas wrote:
             | Try to search for recepies :) I also see long novels which
             | seem to disguise the ridiculous amount ads which google
             | seems to like as well (these are mostly provided by no
             | other than themselves!).
        
         | Topgamer7 wrote:
         | YouTube-dl had the ability to rip just subtitles. I once used
         | this to grep for some information I wanted after downloading
         | all of the transcripts.
        
         | InCityDreams wrote:
         | ...or, just follow decent creators.
         | 
         | No snark intended, but i just gave up with the dross. And even
         | some of them, of late, are getting a bit crafty. But, creators
         | get one chance from me now - give me decent content, or even
         | with the fancy chapters, you're not getting my eyeballs past
         | two minutes. What I have found is that leaving the decent stuff
         | on, what auto-plays after is 'generally' of similar quality. A
         | quick set of back-buttoning and bookmarking has fairly often
         | got me some interesting results.
        
           | neilv wrote:
           | Good idea, but I don't follow anyone on YouTube. I was
           | thinking about searching the Web for a bit of info, the
           | search hits include YouTube videos (but no finer resolution
           | than "this entire video").
           | 
           | A search engine could, narrow in on the few sentences AV in
           | the video that it thinks correspond to what I was searching
           | for, and summarize that, and also link me to the AV start
           | timepoint in case I also want to watch the video.
           | 
           | This might change the economics of some YouTube video content
           | creation.
        
             | LelouBil wrote:
             | Google does exactly that, if a video shows up in the search
             | results, it shows you only the relevant small part.
        
               | neilv wrote:
               | I've never seen this before now, but I just got a Google
               | search result video page with a kind of table-of-contents
               | index on _one_ of the video hits just now. (These TOC
               | entries _don 't_ correspond to the marked segments on the
               | timeline. I don't know whether this is something YouTube
               | is doing, or something the content creator did.)
               | 
               | Is this what you mean? (Pardon if I'm not familiar with
               | the latest Google Search features; I've mostly been using
               | DDG lately, so don't have occasion to see all the
               | features that exhibit only occasionally.)
        
       | svat wrote:
       | This is a great idea; I really enjoy all these "two channels
       | simultaneously" (side-by-side translations, video with subtitles,
       | and in this case video with a readable transcript, where you can
       | scroll in the video or scroll in the transcript, and be
       | synchronized).
       | 
       | I had done something like this a couple of years ago for some
       | specific set of videos (e.g.
       | https://shreevatsa.net/tex/program/videos/s10/ -- compare with
       | https://youtubetranscript.com/?v=_0Cv1G_s4gQ for the same video),
       | but never got around to making it general; glad someone has done
       | it. It takes just a few lines of Javascript, using the Youtube
       | API, to do this i.e. keeping the video and text in sync (just
       | view source on either page to see the JS at the bottom).
       | 
       | Something like this can also help with audio recordings
       | (generating the alignment automatically is called "forced
       | alignment" and there are tools like "aeneas" for this). In case
       | anyone's interested or wants to help (for Sanskrit texts): see
       | https://github.com/shreevatsa/web-align-audio-text deployed at
       | https://shreevatsa.net/ramayana/sarga/ and better version at
       | https://github.com/avinashvarna/audio_alignment deployed at
       | https://avinashvarna.github.io/audio_alignment/
        
       | unangst wrote:
       | Expect an email from Google lawyers early this week about the
       | domain name.
        
         | antman wrote:
         | I think "transcriptsforyoutube" would be passable? I remember
         | something about a case using "for" and being ok but not any
         | details.
        
           | bdcravens wrote:
           | They generally don't get into nuance. If someone's trademark
           | is in your domain name, expect a C&D.
        
       | TheCaptain4815 wrote:
       | Funny, was just looking for a tool like this.
       | 
       | Any chance timestamps could be added?
        
         | cm2187 wrote:
         | With youtube dl you can download the subtitle tracks which
         | should have timestamps. Though last time I checked they were
         | broken (showing the whole test on the first timestamp) but
         | perhaps they fixed it
        
         | chiefalchemist wrote:
         | For Power Point and screenshare based videos, a screenshot
         | every 15 seconds or so would be great.
         | 
         | Often enough I'd rather read than watch. Reading in faster.
         | Having corresponding visuals would be a big plus.
        
       | breck wrote:
       | This is amazing! The speed and simplicity makes me happy. Thank
       | you!
        
       ___________________________________________________________________
       (page generated 2022-12-18 23:00 UTC)