[HN Gopher] Show HN: Cleanvoice - Automated Podcast Editing ___________________________________________________________________ Show HN: Cleanvoice - Automated Podcast Editing Author : autoencoders Score : 160 points Date : 2021-11-20 14:58 UTC (8 hours ago) (HTM) web link (cleanvoice.ai) (TXT) w3m dump (cleanvoice.ai) | tokamak-teapot wrote: | "The algorithm can also work with accents from other countries, | such as Australian ones or Irish." | | Other than which country, though? Presumably an English speaking | one - UK? New Zealand? Canada? US? | hs86 wrote: | Reminds me of https://auphonic.com/ | | Their pricing is also similar, but Auphonic allows both | subscription and prepaid "credits". | autoencoders wrote: | Yes, the idea is to bring also prepaid credits soon. | | Auphonic and Cleanvoice go well together. | | I guess the idea is to have your podcast edited by Cleanvoice | and then the audio post-processing with Auphonic. | ghaff wrote: | Auphonic's volume equalization is almost a must-have for | podcasts. I used to spend a lot of time getting volumes | right. With Auphonic it's quick and easy. | | I definitely prefer pre-paid credits to a subscription given | my podcast production varies a lot. | xipho wrote: | Can anyone recommend similar for removing ums etc. in videos? | IIRC there is a workflow in some professional software, but being | able to train and throw the algorithim right at the video itself | (especially locally) would be useful. | autoencoders wrote: | For now, Descript would be the best option. You can still make | it work with the integrations, but it is a lot of effort. | | That will change in Q2, when I add support for video. | mijustin wrote: | Yes; Descript.com does this. | nickjj wrote: | > Can anyone recommend similar for removing ums etc. in videos? | | For single camera floating head style videos where you're | continuously talking about 1 topic it's going to be very | jarring if you start cutting out filler words. You'll end up | with a bunch of jump cuts where it looks like video frames are | dropped. | nickjj wrote: | As someone who has personally edited over a hundred 1-2 hour | podcasts with a new guest every time removing umms, ahhs, dead | air and filler words is soul crushing. It has gotten to the point | where after 2 years of running my podcast[0] I'm seriously | considering stopping the show because I'm getting burnt out from | editing and without sponsors it's not feasible to hire an editor, | but even with the show making no money I would happily pay triple | your asking price if I could click a button and have the problem | solved in a way that matched a human's ability to edit out filler | words. | | It really is the difference between being able to edit a 1 hour | episode in 1 real life hour (editing at 2x speed) vs literally | spending 5 hours to edit 1 hour when there's a lot of filler | words or ums. That's due to having to stop every few seconds, | think about when to cut it and perform the cut. This is using a | heavily optimized keyboard shortcut focused workflow too. | | I hope you don't mind constructive criticism but in my opinion | your "after" version doesn't sound natural. This isn't an attack | on your service specifically, because the outcome is the same | with all of the automated tools I've tried. I haven't tried them | all but I did play with a few of them. | | For example in your case the pause between "Removing" and | "filler" doesn't match the pace of the rest of the sentence and | the transition from "very" to "time" has a very hard cut. This is | also a 10 word clip that's about 6 seconds. If you listened to a | 1 hour podcast episode that was edited like this it would be much | more noticeable. | | There's so many intricate and subtle details around when and what | to cut to remove these things in a way where it's not noticeable. | Are there any paths moving forward in AI / ML that can lead to | this being indistinguishable from being humanly edited? | | I debated deleting this comment before posting it because it's a | combination of feedback but also saying the service isn't | something I would buy in its current state but I'd like to think | it's more beneficial to post this to show there is a real demand | for this service if it can be executed flawlessly. | | [0]: https://runninginproduction.com/ | moritonal wrote: | Meta, but your comment was (IMHO) a great example of | constructive criticism. Show HN is about that, not just staying | silent and letting the users work die. | dannyeei wrote: | Funnily enough I was about to start building this then found | descript[1]. It transcribes the text and allows you to edit the | transcription then export it as audio. | | [1] https://www.descript.com/ | [deleted] | autoencoders wrote: | The edit on the page is not the best. I agree!. Mainly, if your | recording is unnatural (like that one) the edit is also | unnatural. However, the tool works better in an interview | podcast. I would strongly recommend to just upload a sample, | and you would see a big difference. | | Regarding if ML would be indistinguishable from humanly edit. | Hard to tell. I think it will be like self-driving cars in the | future. 98% edits good 2% bad edits. | qmmmur wrote: | What post-processing do you do already to catch the low hanging | fruit? Izotope? I reckon putting in 100 hours of editing and | not being able to get an hour down to sub an hour means there | is something which could be optimised out quite quickly. | nickjj wrote: | > What post-processing do you do already to catch the low | hanging fruit? | | None, everything is manual. | | I use DaVinci Resolve to do the editing where both the guest | and myself have separate tracks. Then I line up the tracks | (only takes a few seconds) and start playing things from the | beginning at 2x speed. I stop to make cuts mostly to remove | filler content. | | Through out this process of editing I'm also creating show | notes as I go. An example of the end result is here | https://runninginproduction.com/podcast/103-great- | question-m.... Basically every few minutes I recap what was | said into a 1 sentence bullet point with a timestamp. Along | the way I list out techs used as tags and list out reference | links / libraries into a Markdown document. Then once I'm | done editing the show I write a few paragraphs which is a | TL;DR of the episode. | | All in all if the guest uses minimal filler words or noises | it takes about 1 real life hour per 1 hour of recorded | content to do all of the above. For context, the episode I | linked has someone who I would bucket into a category of | speaking very fluently with minimal filler content. I was | able to blaze through that one. | | I also have a 2560x1440 display and use the "always on top" | feature of most window managers to layer the Markdown | document and a preview of the page just above the waveform in | DaVinci Resolve so I can quickly make cuts and update the | notes with minimal mouse movement. Almost everything is | keyboard driven. | | What tools can be used to speed up that process? | simonbarker87 wrote: | I've not edited anywhere near as much as you have but I agree, | it's so tedious and by the end of an editing session you can | really start to resent the guest and all their verbal ticks. I | find I get a good idea for what the waveforms look like for | some noises and can see them coming and preemptively split the | track the start with a decent success rate. | | Using RiversideFM to get two locally recordings is also a big | help. | | I was sat next to an audio editor and producer at a wedding | recently and we got on to this topic and he said "your number | one job when editing an interview is to make the host sound | good and then just do the minimum on the guest, otherwise | you'll waste too much time". | | Doing the kind of editing 8 hours a day I can see why he says | that. | nickjj wrote: | Yeah it's weird. I have these in depth technical | conversations with every guest where it's great, I love this | part. The frequency of verbal ticks and filler content really | takes an edit from "this isn't too bad" to "what the fuck am | I doing with my life?" all based on how many times you need | to remove filler content within the first 5 minutes of | editing a 90 minute show. | | I'm kind of surprised that wedding producer openly said that. | My philosophy has always been the opposite. One of my main | goals of the show is to make the guest walk away thinking | this was the best podcast experience they ever had from start | to finish as well as do everything I can to make them come | off as good as possible. | | I rarely cut content but most episodes have hundreds of | manual edits to remove filler content and create a more | concise flow by removing long pauses because my 2nd main goal | is to optimize for the listener. I keep the edits organic at | the same time by leaving in some filler content and subtle | things like a deep inhale or a sigh because there's a lot of | meaning around that when it comes to sentiment and tone, the | same can be said for sometimes leaving in an extra 500ms | pause to amplify the meaning behind something. At the same | time, sometimes filler content gets left in because it flowed | too quickly into the next word so cutting it sounds too | unnatural as if it clipped. | | This is why I think it's a crazy hard problem to get a | machine to be able to make decisions like this. | | I do use separate recordings (we each record our track | locally), it definitely helps eliminate the few cases where | we talk over each other or being able to lower the volume of | a laugh so it doesn't overpower what the other person said | while still keeping it in because it's a good part of a | conversation and a snort or laugh can easily be the | difference between a listener wondering if the guest was | offended or happily agreeing with something. | [deleted] | mijustin wrote: | Hey! Justin (from Transistor.fm) here. This looks really | interesting. Two questions: | | 1. Any plans for an API and bulk pricing? | | 2. Any plans to add loudness normalization, balancing, etc to the | processing? | autoencoders wrote: | Hey Justin! Love your podcast. | | 1) API Access will come end of Q1. | | 2) In the next 6 months, No. However, Auphonic would be a good | fit for you. | abdik wrote: | The logo is similar to ours https://www.lovo.ai/ | bryans wrote: | While turning it into a heart may be clever branding, you've | only slightly modified a ubiquitous icon representing audio, | and countless startups used that before you. | eganist wrote: | This is awesome. | | Can I suggest the ability to export as project files for popular | editors for your roadmap? It'd cut professional workflows down | substantially, which would be worth an (even higher) upcharge. | | (It wasn't immediately obvious to me if you already did this) | | Edit: https://cleanvoice.ai/integrations seems pretty close. I'd | honestly charge more for integrations and provide a base tier for | just exporting sound. I imagine most indie users would benefit | from finished exports enough to pay, while project files would | command a higher fee from editors looking to speed up their | workflow to take more clients. That's where I'm coming from on | pricing tiers and upcharging for professional features. | autoencoders wrote: | ADL Support will come around Q2, so you can import it in lot of | audio and video editors. For now, we have these export files | which you mentioned. | | Regarding Pricing, that's a good point. I will definitely | consider it, thank you! | daenney wrote: | The Terms of service seem worrisome. | | > By posting your Contributions to any part of the Site or making | Contributions accessible to the Site by linking your account from | the Site to any of your social networking accounts, you | automatically grant, and you represent and warrant that you have | the right to grant, to us an unrestricted, unlimited, | irrevocable, perpetual, non-exclusive, transferable, royalty- | free, fully-paid, worldwide right, and license to host, use, | copy, reproduce, disclose, sell, resell, publish, broadcast, | retitle, archive, store, cache, publicly perform, publicly | display, reformat, translate, transmit, excerpt (in whole or in | part), and distribute such Contributions (including, without | limitation, your image and voice) for any purpose, commercial, | advertising, or otherwise, and to prepare derivative works of, or | incorporate into other works, such Contributions, and grant and | authorize sublicenses of the foregoing. | | It sounds an awful lot like "we are allowed to do anything and | everything we want with the content you upload to us". Maybe I'm | misunderstanding something, but I'd be extremely hesitant to | upload any content I create to a service with those kinds of | terms. | [deleted] | [deleted] | stevenicr wrote: | also.. " Your Contributions are not obscene, lewd, lascivious, | filthy, violent, harassing, libelous, slanderous, or otherwise | objectionable (as determined by us). 7. Your Contributions do | not ridicule, mock, disparage, intimidate, or abuse anyone." | | Cancel culture coming.. main reason I would not invest time | into using Anchor,fm .. | | So.. is "Us" progressive or conservative? | | Bill Maher breaks these every night on both sides pretty much, | so I try to think, if they won't protect Bill Maher or Larry | Flynt's words, they are not going to protect mine. | | "Contributions are not false, inaccurate, or misleading." - So | mainstream news can't use it either - that's a bonus. | | I'd add more, but I see you mentioned you will be changing and | this is just a boilerplate to save time. | 1-6 wrote: | Thanks for the heads up. I'm a little hesitant to upload | something now. On the flip-side, I think devs just want total | protection while they navigate the landscape of machine | learning. I agree that they could have worded things better but | someone who worked on writing this probably didn't understand | the nuances of machine learning or the countries that people | would be signing up from. Plus they'll need to constantly use | datasets for their internal purposes to train. | autoencoders wrote: | Yes, that's exactly the case. As I previously commented, I | used an terms generator, until I get a lawyer, which can | write specifically what I do with the data. | autoencoders wrote: | I agree. The terms will be changed. I used an auto-generated | Terms generator for now (termly.io) | | I would like to rewrite it. | | What I do is just keep your files on the server for a week. In | case you have an issue, I will look into your file to fix your | issue. And if you want, you can give consent for me to further | improve the service. (Say you have an accent which the AI is | bad and I can use your audio file to understand why it failed.) | throwthere wrote: | With this statement you've now shown that your site doesn't | take contracts seriously and opened the door to people | arguing future contacts are also invalid. I'd delete this | response asap. | stavros wrote: | What? This person made something, we pointed out an | improvement and they said they'd change it. You're | literally complaining that it wasn't perfect already, and | thus they somehow don't "respect stuff". | giansegato wrote: | Why? They can change the policy and ask for a confirmation, | as every service out there is already doing. | simtel20 wrote: | How and when have you seen it happen that a contract was | invalidated by one party indicating that they would prefer | a more appropriate contract? | pfortuny wrote: | Thank you. | | I would pay for a piece of software that does that job on my | computer with no Internet. | | This way? I may even end up in court for saying something | "improper"... | | Edi. OK: I've just read the developer's reply below. | | Honestly: you need to fix this because right now it is more | scary than not. | | Congratulations for the project but please do fix this. | autoencoders wrote: | I agree. More and more AI applications are exploiting our | data in negative ways. | | I will get proper terms soon as possible. Especially, since | now people have mentioned it. | axhl wrote: | Congratulations on launching. How are you finding using termly.io | for the legal side of things? | autoencoders wrote: | It's not ideal. See the comment talking about the terms. I have | a meeting with a lawyer soon. But I guess is better than no | terms. | throwaway1777 wrote: | Overcast has features to do some of this on the listener side. I | prefer having the AI on the listener side so I can go back to the | raw version if the AI messes up for some reason. | fareesh wrote: | What's the high level approach required to build something like | this yourself? | | Does it involve relying on speech to text with timestamps and | then a series of cuts based on that? | monroewalker wrote: | Sounds similar to Descript https://www.descript.com/ | spicybright wrote: | I'm going to sound like a negative nancey, but I wish | podcasters/youtubers would just practice their speaking skills | instead of rely on series of really quick jump cuts. Worst | offenders are those that can't get through a sentence without | splicing it 2+ times... | | Perhaps you could have a mode to detect how much one stutters, | and parts worth redoing without spending as much time combing the | whole thing. | pfortuny wrote: | Classically professionals learnt their discourses by heart. | That stands out when you see it. | | I remember fondly a student of mine who seemed unable to | express himself properly. I told him to memorize his final | project dissertation because otherwise it would be a wreck (OK, | I did not say this last part, it was more of a suggestion). | | BOY: did he memorize it. He got an honors and I did think "this | guy has really done it, and it sounds like music!" | | When you do it well, it tells. | intrasight wrote: | Some podcasts I listen to are over-edited. I'd always assumed | that a) it was done manually and b) it was done to keep the | length below some threshold. Now I'm curious if they are using | software to automate the editing. | | I find the cadence very unnatural when all the spaces between | phonemes are removed. | ghaff wrote: | >I find the cadence very unnatural when all the spaces | between phonemes are removed. | | Any editing can be overdone and, while I do a modicum of | editing out umms, you knows, and other verbal ticks when I'm | putting together a podcast interview, I'm not fanatical about | it. | | You do occasionally get someone who just speaks quite slowly | and it is sort of annoying to listen to as audio. So I've | done some automated gap reduction is a couple cases. | intrasight wrote: | What software do you use to automate? | ghaff wrote: | Audacity. | [deleted] | cube00 wrote: | Especially ones who won't set their background LED lights to a | stable color. The smooth flowing gradient becomes very | distracting when you jump cut the heck out of it. | intrasight wrote: | synesthesia? | curiousgal wrote: | Once I started noticing jumpcuts it ruined every single YouTube | video with a person talking into the camera. The worst offender | being Phillip DeFranco. | ghaff wrote: | I find talking into a camera really tough. If you're doing it | by yourself you almost need to imagine you're talking to a | person. I even know of people who put cutouts or pictures of | someone by the camera so they can talk to a person. | | I haven't had a lot of luck using teleprompters but maybe I | just haven't hit of the right setup. | | Something else someone told me recently was to try to work in | short segments that you redo until you get right and then do | a cut to the next segment somewhere that it's natural. | unholiness wrote: | Interesting take. I saw Phillip DeFranco as more of a pioneer | of that style. He really leaned into the cuts. At the time it | was something no one else was doing so it was very | noticeable, and he had a very crisp cadence with them where | the jarring cuts were part of the presentation. It was clear | his process was: Write a script, mark cuts everywhere it | could make sense, go through the script repeating every | phrase until you're happy with the sound, and when editing, | always make the cuts where they're marked, even if it could | be skipped. | | The result feels something like pixel art: Clearly not the | closest possible imitation of conversational speaking, but | something else. A style in its own right with different | considerations. | | Now that it's par for the course to have jump cuts, I see | them used more sloppily everywhere, where it's clear the | narrator decided where to do the cuts after the fact. Cutting | off the beginning or end of a phoneme, missing or repeating | bits of a thought because they they liked one phrasing in | recording but opted for another one in post, misordered cuts | where something which moved in the background moves back to | its old place, etc. Phillip's style looked lazy but it can't | really be imitated with actual laziness. | | These days I look back and really cringe at the substance of | his show. But I still see the style as professional. | mikepechadotcom wrote: | Really cool project, I wish you great success! Could be useful | for my (german) podcast agency! | | Out of curiosity: Which ai-technology did you use? OpenAI? Google | API? Or did you train the models yourself with Python (sth. like | Tensorflow)? | | Cheers, Mike | autoencoders wrote: | Hallo Mike, freut mich dich kennenzulernen! | | I trained my own models. No OpenAI/Google API. | | Liebe Grusse, Adrian | notafraudster wrote: | "Free 30 Minutes Trial" is not native English. "Free 30 Minute | Trial" would be better; but I think the sentence is a little | confusing. I presume you mean you can convert 30 minutes of audio | for free, not that the trial account is only valid for 30 minutes | from creation. I would do "Clean 30 minutes of audio for free. No | Credit Card needed." or similar. The sale page which says "Get 30 | minutes credit to try the service out." is better, and "30 | minutes" does sound correct on that page. | | In your FAQ, you say: "Currently we remove lip smacks, saliva | crackle, mouth clicks and harsh parts of breathing (not the whole | breath). If you want to remove a particular mouth sound (ex. | Chewing), write us in the chat as a feature request." I don't | think most English speakers would understand what "harsh parts of | breathing" are. Typically a parenthetical example in English | would be written "(e.g. chewing)" not "(ex. Chewing")". | | Your question "What filetype and sizes do you support?" doesn't | answer what filetypes you support, and I suspect the singular | "filetype" was a grammar error. You also write "We have an audio | file size limit of 1.5G per file or in case you are uploading | multi-track and a total file size of 2 GB. ". The part that says | "or in case you are uploading multi-track and" doesn't make any | sense in English. I think you mean "We support file sizes up to | 1.5GB per file for single-track files, or 2GB if you are | uploading a multi-track file as separate files." but I'm not | sure. | | In general I don't understand why each selling point has a | separate FAQ page but the FAQs are often not related to the | selling point. I don't think people think the "Mouth Sound | Remover" page is the one that lists file size support, while the | "Stutter Remover" page is the one that lists the maximum number | of tracks per project. | | Your integrations page lowercases "cleanvoice" whereas other | pages write it as "Cleanvoice". | | Under integrations, you have a section called "Markers Export". | This should probably be "Export Markers" or "Marker Export". | | Under "How to Export Edits", you probably don't want to | capitalize "Results" or "Editor" unless these are supposed to be | title cased, in which case you probably want to title case all of | them. | | Under your pricing FAQ you have "Does my credit expire at end of | the month? Your credit will reset every billing month. Unused | credit will be lost." This is needlessly confusing. You use the | verbs "expire", "reset", and "be lost" to describe the same | thing, and you don't actually answer the question. Also you don't | want "at end of the month", you want "at month's end" or "at the | end of the month". I would rewrite as "Does my credit expire at | the end of each month? Yes. Credit resets every month and cannot | be carried over to future months. Unused credit will be lost." | This is a terrible business model, though, and so I suggest you | not do this. Either sell as a subscription or sell as a credit | model, not both, this is gross. | | In general I think you want to pay someone who is a professional | English copywriter to fix your website. Cheers. | | Edit: I just noticed your changelog is powered by a service | called Headway. I am not sure if you also made Headway, but | Headway's website is also in need of English copyediting. | [deleted] | [deleted] | autoencoders wrote: | Wow! Thank you so much! You are right, I need to get ASAP a | copywriter. | | I'm curious why the Subscription + Onetime Credit is bad. But I | agree it is confusing. | | My understanding is that not every customer wants or needs a | subscription, since they upload podcasts irregularly. | | This business model is seen in other AI products: | | https://www.remove.bg/pricing https://auphonic.com/pricing | | I am very grateful, you took the time to help out. Really | appreciate it! | sdoering wrote: | Maybe you can get away for a quick fix with something like | deepl.com. | | They are great. As a German native speaker I came a long way | with using them when I needed valid translations. | arendtio wrote: | That logo is very similar to the Cisco logo: | | https://www.cisco.com | stavros wrote: | This is excellent, well done! I'd be curious to know how it's | done, as I don't know much about deep learning and this looks | like magic to me. | autoencoders wrote: | Hey HN! | | I like podcasting, but I hate editing them. I tend to stutter and | have a lot of filler words in my podcast. That's why I created | Cleanvoice, in order to spend less time editing them. Cleanvoice | is an ML tool which removes filler words, mouth sounds, | stuttering and dead air from your podcast. To use it, just upload | your podcast - wait some minutes - download the cleaned audio. | | It's still not perfect, but it's at a stage where I can blindly | use it on every single one of my podcast. | | I would love to hear your feedback! | wpietri wrote: | Neat! I love products that come out of a personal need. | | Is it possible for you to do a live, personal demo? No logins | or anything. I'm thinking something where you tell people to | start up their audio and then give them a quick prompt like | "Describe your breakfast yesterday." Record for 30 seconds, and | then let them play back the original and cleaned versions. You | could limit them to, say, 5 goes, with a different prompt each | time. | | I suggest it because a) a little personal investment makes it | more likely they'll give you their email address for signing | up, and b) many potential customers underestimate how much they | need something like this. | autoencoders wrote: | I like your idea, makes sense. | | My biggest fear is that without login, people will start | abusing it in ways that I don't expect. Definitely | considering it. Thanks you! | wpietri wrote: | That's a good fear to have. That's the kind of thing I | would set up some monitoring for and then wait to see. You | might get a few jerks. But those same jerks might also be | the sort of people who would sign up with a bunch of fake | emails, so gating on an email address may not be much | better than gating on a fresh-issued cookie. | | Thanks for listening, and good luck with your project! | telesilla wrote: | Have you compared this to other commercial options such as | Descript? Looks really great at a glance, thanks for sharing! | autoencoders wrote: | I tried to use Descript for my podcast, but it has some | issues. | | 1) It doesn't work well if you have a strong accent. As an | non-native speaker, the transcription were quite bad, making | the editing quite bad. | | 2) Cleanvoice works with multiple languages, descript | doesn't. | | 3) Cleanvoice can remove stutters (not always, but it tries) | and mouth sounds like lip smacking, teeth clicking. Descript | can't. This is not a big deal for most, but since I stutter | alot this was essential. | | My approach is different from Descript. They use a | transcription service, and then they edit the audio based on | the text. I work directly on the phonetics level. Allowing me | to have more control over audio. | | Depending on the needs, either one is better. I guess you | should try it for yourself and compare. | ckdarby wrote: | I use Descript and it is absolutely lovely. There are a bunch | in this space that I would not be surprised being merged or | acquired. Would love to see Descript & GetWelder merging | together. | | While Cleanvoice has some niche features that Descript | doesn't offer I would not be surprised to find them rolling | these features out in the next major release they're doing. | IMO the founder of Cleanvoice should sell/join Descript. | qmmmur wrote: | Without giving away your secret sauce, what are your approaches | to the cleaning process? Is it a combination of different | passes of algos or is it something more generic and "sausage | machine-like" like a neural network? | jwuphysics wrote: | Based on the OP's username, surely one of the deep learning | algorithms is a denoising autoencoder, right? | autoencoders wrote: | The audio is edited in several phases. It uses different | algorithms, but most of them are deep learning based. It is | surely overengineered, but as a Data Scientist, ML is the | most fun part for me. | nmstoker wrote: | How is the latency and, if it's sufficiently low, could | this realistically be applied to "nearly live" content? | | That scenario seems really appealing for conferences, even | if it just quietens down the verbal ticks, but I'm guessing | if the lag is too great it would get like a bad lip sync | issue | pokot0 wrote: | How does real time makes sense in the first place for an | algorithm that gets 1 minute of audio and gives you back | 50s? You are gonna have to fill the gaps anyway with | something not meaningful. | staticautomatic wrote: | Silence is meaningful, but pretty awkward when not | deliberate! | laumars wrote: | Tools like this are designed to remove awkward silences. | | What it sounds like the GP is after is something more | like hiss and pop removal (to use an only vinyl analogy) | and that's a different and also simpler problem to solve. | I'd wager there are already tools on the market for that. | pokot0 wrote: | Very insightful :). Now I need an AI to tell me when | silence is deliberate or not. :) | autoencoders wrote: | It would be a huge engineering endeavour, which I | wouldn't be capable of doing. That said, things like | background noise and some sounds can be removed. See | Krisp.ai | qmmmur wrote: | Izotope plugins already do some of these things but not | all. In particular their de-clicking algorithm is pretty | good but definitely not automatic or low latency. | Fogest wrote: | Nvidia RTX voice does similar. It's pretty similar to | other technology though where it focuses more on removing | background noise. It actually works very well. It would | definitely be interesting to see it also filter speech | itself. But I feel like this would be hard to do without | introducing extra latency. If someone is saying "umm" or | some other filler before a word you kinda need to know | what that word will be to determine if it's filler or | not. So it almost can't be done without introducing | latency as it would need some future speech to determine | if filler or not. | qmmmur wrote: | Do you do any audio segmentation to remove the filler words | and such? | undoware wrote: | I literally just bought your product, thank you very much, I | needed this and wondered why no one had made it yet. | autoencoders wrote: | I appreciate it! If you have any issues or need help, feel | free to reach out. (You can use the chat in the app.) | gus_massa wrote: | Is the example in the page really made by the computer? In my | opinion the pauses in where the filler words were are slightly | too long. Is it possible to configure this? | | Is it possible to keep some filler words? I make something | similar (but not professionally), and sometimes I like too keep a | few of them. | autoencoders wrote: | > Is the example in the page really made by the computer? Yes. | >In my opinion the pauses in where the filler words were are | slightly too long. Is it possible to configure this? I agree, | however, if you use it in an interview. The edits sound better. | In an unnatural setting, you get unnatural results. | | Currently, there is no way to set it for now. But customization | is planned for Q2 next year. | | >Is it possible to keep some filler words? For now no, but | keeping some filler sounds to keep it authentic is something | which I plan. | gus_massa wrote: | I agree that the correct length of the pause after the word | is removed is very tricky. Perhaps your configuration is the | better than my imaginary magical edition. | | In other comment, eganist posted a link to | https://cleanvoice.ai/integrations It looks interesting | because I can choose which to keep and even use it to sink | with video [with some additional work]. I didn't see it the | first time in the page. | autoencoders wrote: | ADL Support is also around Q2, so you could just import it | in your audio/video editor without issue. Thank you point | out. I'll put Integrations on the homepage as well. | pwned1 wrote: | I suspected something like this was happening with podcasts. I've | noticed lately that some podcasters have unnaturally short pauses | between speakers (question and answer) or between sentences. It | really annoys me. It makes it almost unlistenable. | carols10cents wrote: | Yes, the worst is when so much silence is removed that it | sounds like someone is laughing over themselves. | autoencoders wrote: | I agree, as if they don't breathe! | | This is not the case with my app. I keep the edits longer than | shorter, since I also find that unlistenable. | nateweiss wrote: | Looks cool! Would this also work for "explainer" type videos, | showing how to use a software product or similar? | | If yes, you might consider a page or callout about that use-case, | as it might attract some additional users. Just a thought. | tyingq wrote: | That seems like it would be tricky, as the video and audio | would get out of sync. You would have to remove, then "fill" to | keep the timing. Though this product does mention it works with | multiple speakers on different tracks...so they are already | somewhat in that space. | autoencoders wrote: | For video is quite tricky. One thing with Video is that you | don't want to over edit the audio, since its then very hard | to keep the video synced. That said for explainer video it | should work ok, but for a Video Podcast it would be horrible. | I have an idea how to deal with this, but this is not now | available. | sdoering wrote: | Not sure were you are located, but if you are giving access to | people protected by the GDPR your cookie notice does not fullfill | the requirements set by European Regulations. | | Additionally, if you are located in a country that (like Germany | for example) has regulations on the necessity of an imprint, this | might also be missing. | autoencoders wrote: | It should be ok, since I use strictly essential cookies, which | don't require consent. (But users need to be informed) | | Or do I misunderstand the law? | | [1] Strictly necessary cookies -- These cookies are essential | for you to browse the website and use its features, such as | accessing secure areas of the site. Cookies that allow web | shops to hold your items in your cart while you are shopping | online are an example of strictly necessary cookies. These | cookies will generally be first-party session cookies. While it | is not required to obtain consent for these cookies, what they | do and why they are necessary should be explained to the user. | | [1] - https://gdpr.eu/cookies/ | geuis wrote: | Your demos don't play on iOS safari. | autoencoders wrote: | Ups! Thank you for point it that out. I'll check it. ___________________________________________________________________ (page generated 2021-11-20 23:00 UTC)