[HN Gopher] YouTubeTranscript.com ___________________________________________________________________ YouTubeTranscript.com Author : fragmede Score : 174 points Date : 2022-12-18 16:38 UTC (6 hours ago) (HTM) web link (youtubetranscript.com) (TXT) w3m dump (youtubetranscript.com) | EGreg wrote: | Who built this? | | We want to partner with you on a topl that autogenerates clips of | any video based on the topic start and end | [deleted] | dukeofdoom wrote: | Something like this would be nice to be able to search local | videos for specific keywords spoken too. | breck wrote: | https://youtubetranscript.com/?v=DvxxdZpMFHg | | "Error: transcripts disabled for that video" | | Why? | arboles wrote: | Youtube didn't generate captions for that video | banana_giraffe wrote: | If you want an CLI version of a similar idea, you can use yt-dlp | and some simple jq to pull down the captions for a file: | curl `\ yt-dlp -j | "https://www.youtube.com/watch?v=aeWyp2vXxqA" | \ jq -r | '.automatic_captions.en[] | select(.ext=="json3") | .url'` | ptspts wrote: | Not all YouTube videos with spoken text have automatic | captions. | arboles wrote: | https://news.ycombinator.com/item?id=34041455 | modeless wrote: | A supremely useful site that searches YouTube transcripts is | https://youglish.com. It shows you pronunciations in context for | any word or name. | arboles wrote: | Thanks for the link! This site actually has a database of | youtube transcripts unlike OP. Shame you can't search fixed | strings, like two words in exact order. Though it seems | genuinely useful for learning pronunciation as advertised. | c7DJTLrn wrote: | Pretty nice. The sliver of content still worth watching on | YouTube doesn't have repetitive stuff or padding to make it to | the 10 min mark though. | | If you go to the homepage with clear cookies it's just endless | amounts of utterly dogshit cookie cutter content. Same clickbait | thumbnails with a person pulling an idiotic expression. Even the | videos masquerading as educational are entertainment at best. If | I had kids I'd do everything in my power to keep them away from | YouTube. | SkeuomorphicBee wrote: | Why it is hard-coded to English? When I try to transcribe a video | in any other language it throws the error: | | > No transcripts were found for any of the requested language | codes: ('en',) For this video ([...]) transcripts are available | in the following languages: [...] | | It even knows what language is available, so why no dump that | instead? | aardvarkr wrote: | Probably because it's a hackathon style project that was | slapped together and isn't intended to support every use case. | I'd recommend reaching out to the author with your feedback | darepublic wrote: | What I've wanted it search by transcript of past videos I've | watched. With something like this it seems reasonable to imagine | having a set up where every video you navigate to gets | transcribed and test is indexed for later search | amelius wrote: | How can this be so fast? I tried it with two random urls, and the | transcripts were instant, like less than 100ms. | charcircuit wrote: | YouTube already creates transcripts for accessibility and for | feeding into other ML models. | samanator wrote: | Likely cached. Try with a long video with few views. | | Edit: after reading other comments it seems this may be using | an undocumented api to retrieve the data. | FinalDestiny wrote: | It appears to be using the YouTube auto-generated captions. The | output, spacing, and punctuation are identical. | seydor wrote: | This is great and works well. What is the copyright status of | transcripts? | kube-system wrote: | They are owned by the copyright owner of the underlying audio. | seydor wrote: | but for example, is it fair use to reproduce? what about | indexing? | [deleted] | kube-system wrote: | Depends on why it is being done. | wantlotsofcurry wrote: | Not sure on the transcript front, but the owner may want to | consider removing 'youtube' from their name. | arboles wrote: | This UI and Youtube's UI for transcripts are really nice. When | I'm looking for a particular piece of information I can just | Ctrl+F and click on the match to play from there. Youtube used to | auto-generate subtitles, now it also formats subtitles as | transcripts. I wish offline media players had this functionality, | if I get distracted for a few seconds I don't have to watch those | seconds again, I can speedread over the past couple lines. | arboles wrote: | Call it "panoramic subtitles" | politelemon wrote: | alpb wrote: | Fwiw YouTube already has a feature for this. Click the "..." next | to the share and click Show Transcript. There are also extensions | like https://chrome.google.com/webstore/detail/youtube- | captions-s... that makes it easy to search them in a popup. | dobladov wrote: | They seem to have moved the functionality to the end of the | description, and there you can find the "Show captions" button. | | The extension I made to export the transcript was based on this | YouTube functionality, I should update the instructions now. | | https://chrome.google.com/webstore/detail/youtube2anki/boebb... | modeless wrote: | Regular "find in page" works to search the transcript on | YouTube. I use it often. | cavisne wrote: | This script for whisper.cpp works really well | | https://github.com/ggerganov/whisper.cpp/blob/master/example... | | for my purposes I changed the output from subtitles to txt (so I | could pipe the result into chatgpt) | codetrotter wrote: | > so I could pipe the result into chatgpt | | Tell us more :) | cavisne wrote: | Nothing too exciting, just "summarize this" followed by the | transcript in quotes, it works very well | gbertb wrote: | Is this utilizing whisper to transcribe? | arboles wrote: | Youtube already auto-generates transcripts that you can see in | the ... menu in most videos. This website just seems like an | alternative frontend? | EGreg wrote: | Or maybe it processes the video with its own backend ? How do | you tell | arboles wrote: | Just minutes ago, I compared two transcripts for the same | video and they were the exact same. Also on | YouTubeTranscript.com swearing was redacted with [_], which | is something I've only ever seen on youtube captions. | kristianheljas wrote: | First indication is the processing speed - there's known | machine in the world that could transcribe videos in such | speed. | EGreg wrote: | How about a cluster in parallel? | codetrotter wrote: | The simplest explanation is often the most probable one. | | Why would you reach for a cluster of machines working in | parallel, when you could retrieve the already auto- | created transcript from YouTube servers? | | Also, other comments have pointed out that the | transcripts are identical with the ones created by | YouTube, which would be unlikely to happen if this | service was creating transcripts of their own. | 88stacks wrote: | this will be dead soon due to having youtube in the name | lukeasch21 wrote: | Don't worry, the website solved this issue: > "Probably Won't | Fail: Featuring the latest build of an undocumented API." | | This will work as long as YouTube doesn't change anything. And | since when has YouTube changed anything? | seydor wrote: | People can switch domains | kristianheljas wrote: | Hehe, they might need to switch cloud provider as well. The | domain and the underlying content is currently served by no | other than google cloud. | arcturus17 wrote: | The copy on your website is pure fire my dude. | maybelsyrup wrote: | I've been dreaming about something like this for years. Huge deal | for me. Thank you for your work! | faikuygur wrote: | Here is how to extract Youtube video transcript to an Excel file | with Robomotion: | | https://demo.robomotion.io/designer/shared/6j984jBCQqYVBCaQk... | joosters wrote: | My only complaint is with the layout of the site - could you | please make the transcripts span across the whole width of the | page, not just to the right of the video? | | My one gripe with Youtube's own transcript box is that it is too | narrow, so it is a shame that a website designed to specifically | make the transcripts more readable _also_ displays the | transcripts in a narrow box. | is0tope wrote: | Maybe this is a bit off topic, but does anyone know the legal | footing of having a business with another businesses name in it? | For instance, this tool uses the word "YouTube" in its name, | though it is used as only a part of it, and it is not a | competitor. I've always wondered how this works. | kube-system wrote: | Broadly speaking, it would be trademark infringement if it is | used in a way that may confuse others about the source of the | product. It doesn't necessarily have to be a specific product | that Alphabet has a direct competitor for. | thaumasiotes wrote: | https://en.wikipedia.org/wiki/Nominative_use | | > is a legal doctrine that provides an affirmative defense to | trademark infringement as enunciated by the United States Ninth | Circuit, by which a person may use the trademark of another as | a reference to describe the other product, or to compare it to | their own. | tmpburning wrote: | chiefalchemist wrote: | Not sure about YouTube but WordPress does not allow the use of | the name. WP in your (e.g.) domain name is ok. WordPress is | not. | | I'd imagine it's very similar for others. Often a company will | pursue a violation if only to be consistent in showing the | courts they actively defend their copy right. | thaumasiotes wrote: | > Not sure about YouTube but WordPress does not allow the use | of the name. | | They may not like it, but they don't have the power to | disallow you from using their name to refer to them. That's | allowed. | chiefalchemist wrote: | Actually, they do. It's copyright. Plenty of legal | precedent. They defend WordPress, but are willing to allow | WP. | | The law is on their side. | bdcravens wrote: | Most corporations regularly search for such domains, and submit | cease-and-desist. I received one related to an eBay-related | domain, but in my case, I hadn't built a business around it so | it was easy enough to just take the site offline. | johnlk wrote: | Take video > transcribe > ask gpt to summarize > be genius in 2 | mins | janandonly wrote: | The burning hate I feel for all information to be locked away in | a YouTube video. This will solve that real world problem. I love | reading (or, skimming) through a long read. | xuhu wrote: | Just checked that google also includes youtube captions in | search returns. | motoboi wrote: | Not sure if you know that, but YouTube has a transcript feature | available for years now. It's somewhat hidden in the interface, | but let's you search with ctrl-F (or command-F) in the | transcript | cratermoon wrote: | Yeah this website just extract the transcript that exists and | displays it alongside the video. It's nice, but it's not | doing the transcribing itself. | zbrozek wrote: | I use this for city council meetings to figure out who said | what. It's not easy, but it's better than nothing. YouTube | doesn't appear to do so well with multiple speakers. | alpb wrote: | > I feel for all information to be locked away in a YouTube | video. | | Google Search actually indexes transcripts of a video and shows | you some YouTube results based on that even though the | title/description of the video doesn't match the search query. | RBerenguel wrote: | I had a huge backlog of tech videos, so I wrote me this (also | to play a bit with Haskell, the base idea can be replicated | easily in any language though): | https://github.com/rberenguel/glancer | arboles wrote: | Heh, this basically makes a storyboard | Random_Person wrote: | I've published almost 1,800 video diaries and this is a game | changer for me. I've been wanting to do more with the back | catalog, but don't have transcripts. | thomassmith65 wrote: | The ratio of information to misinformation on Youtube seems | pretty bad. | | To make transcripts easier to access might create more problems | than it solves. | | Granted I can't make a bullet-proof argument; there's no clear | way to quantify that ratio. | neilv wrote: | Hook this up to a language model, and maybe a user could | instantly get the _one sentence worth of information_ that the | YouTube video creator buries in 10 minutes of monetized noise. | | And also save yourself time when the creator teases that they | provide the info, but it turns out they don't, they're just | trying to get views. | greggsy wrote: | I put something like this together to collect transcripts for | uni videos. It's dumps all transcripts into a directory, with | URL links, so I can just search the whole directory to find the | keyword I need. | | Helped a lot with take home exams. | nostromo wrote: | YouTube created that problem by incentivizing longer videos. | And now we have videos with tons of fluff. | | Similarly Google incentivizes longer webpages, so now we have | recipes that start with a novella about grandma's cooking | before showing the actual recipe. | | It used to be nice to see a video's thumbs up to thumbs down | ratio to know if you've been click baited or not before | watching the whole video. But that signal has been removed now | too. | anticristi wrote: | As a | | recipe reader | | I want to | | dismiss cookies, have a video ad follow me down the page, and | read why this cake conjures up memories of the author's | childhood, before reaching the actual recipe | | so that | | I feel connected to the author, before fully committing to | mixing ingredients | slipmagic wrote: | Tom Redman had this idea but he took feedback from Twitter. | https://digg.com/2021/one-main-character-tom-redman- | recipeas... | neilv wrote: | That "user story" is like a tragically misinterpreted | comment by someone at a prospective customer, speaking of a | special time with their grandmother, but garbled through N | layers of field sales, marketing, product managers, | engineering hierarchy, and Agile task management. | | Including the part about declining more cookies offered (to | save room for grandma's lasagna). | 12907835202 wrote: | Are you sure Google prefers longer pages? I find (annoyingly) | that Google likes the search version of my page for lots of | things. E.g. a page called "best x of the y" the page for | searching comments on that page called "best x of y search" | where the only text is the title and a search input, will | rank really well | kristianheljas wrote: | Try to search for recepies :) I also see long novels which | seem to disguise the ridiculous amount ads which google | seems to like as well (these are mostly provided by no | other than themselves!). | Topgamer7 wrote: | YouTube-dl had the ability to rip just subtitles. I once used | this to grep for some information I wanted after downloading | all of the transcripts. | InCityDreams wrote: | ...or, just follow decent creators. | | No snark intended, but i just gave up with the dross. And even | some of them, of late, are getting a bit crafty. But, creators | get one chance from me now - give me decent content, or even | with the fancy chapters, you're not getting my eyeballs past | two minutes. What I have found is that leaving the decent stuff | on, what auto-plays after is 'generally' of similar quality. A | quick set of back-buttoning and bookmarking has fairly often | got me some interesting results. | neilv wrote: | Good idea, but I don't follow anyone on YouTube. I was | thinking about searching the Web for a bit of info, the | search hits include YouTube videos (but no finer resolution | than "this entire video"). | | A search engine could, narrow in on the few sentences AV in | the video that it thinks correspond to what I was searching | for, and summarize that, and also link me to the AV start | timepoint in case I also want to watch the video. | | This might change the economics of some YouTube video content | creation. | LelouBil wrote: | Google does exactly that, if a video shows up in the search | results, it shows you only the relevant small part. | neilv wrote: | I've never seen this before now, but I just got a Google | search result video page with a kind of table-of-contents | index on _one_ of the video hits just now. (These TOC | entries _don 't_ correspond to the marked segments on the | timeline. I don't know whether this is something YouTube | is doing, or something the content creator did.) | | Is this what you mean? (Pardon if I'm not familiar with | the latest Google Search features; I've mostly been using | DDG lately, so don't have occasion to see all the | features that exhibit only occasionally.) | svat wrote: | This is a great idea; I really enjoy all these "two channels | simultaneously" (side-by-side translations, video with subtitles, | and in this case video with a readable transcript, where you can | scroll in the video or scroll in the transcript, and be | synchronized). | | I had done something like this a couple of years ago for some | specific set of videos (e.g. | https://shreevatsa.net/tex/program/videos/s10/ -- compare with | https://youtubetranscript.com/?v=_0Cv1G_s4gQ for the same video), | but never got around to making it general; glad someone has done | it. It takes just a few lines of Javascript, using the Youtube | API, to do this i.e. keeping the video and text in sync (just | view source on either page to see the JS at the bottom). | | Something like this can also help with audio recordings | (generating the alignment automatically is called "forced | alignment" and there are tools like "aeneas" for this). In case | anyone's interested or wants to help (for Sanskrit texts): see | https://github.com/shreevatsa/web-align-audio-text deployed at | https://shreevatsa.net/ramayana/sarga/ and better version at | https://github.com/avinashvarna/audio_alignment deployed at | https://avinashvarna.github.io/audio_alignment/ | unangst wrote: | Expect an email from Google lawyers early this week about the | domain name. | antman wrote: | I think "transcriptsforyoutube" would be passable? I remember | something about a case using "for" and being ok but not any | details. | bdcravens wrote: | They generally don't get into nuance. If someone's trademark | is in your domain name, expect a C&D. | TheCaptain4815 wrote: | Funny, was just looking for a tool like this. | | Any chance timestamps could be added? | cm2187 wrote: | With youtube dl you can download the subtitle tracks which | should have timestamps. Though last time I checked they were | broken (showing the whole test on the first timestamp) but | perhaps they fixed it | chiefalchemist wrote: | For Power Point and screenshare based videos, a screenshot | every 15 seconds or so would be great. | | Often enough I'd rather read than watch. Reading in faster. | Having corresponding visuals would be a big plus. | breck wrote: | This is amazing! The speed and simplicity makes me happy. Thank | you! ___________________________________________________________________ (page generated 2022-12-18 23:00 UTC)