hngopher.com

       [HN Gopher] Show HN: Cleanvoice - Automated Podcast Editing
       ___________________________________________________________________
        
       Show HN: Cleanvoice - Automated Podcast Editing
        
       Author : autoencoders
       Score  : 160 points
       Date   : 2021-11-20 14:58 UTC (8 hours ago)
        
 (HTM) web link (cleanvoice.ai)
 (TXT) w3m dump (cleanvoice.ai)
        
       | tokamak-teapot wrote:
       | "The algorithm can also work with accents from other countries,
       | such as Australian ones or Irish."
       | 
       | Other than which country, though? Presumably an English speaking
       | one - UK? New Zealand? Canada? US?
        
       | hs86 wrote:
       | Reminds me of https://auphonic.com/
       | 
       | Their pricing is also similar, but Auphonic allows both
       | subscription and prepaid "credits".
        
         | autoencoders wrote:
         | Yes, the idea is to bring also prepaid credits soon.
         | 
         | Auphonic and Cleanvoice go well together.
         | 
         | I guess the idea is to have your podcast edited by Cleanvoice
         | and then the audio post-processing with Auphonic.
        
           | ghaff wrote:
           | Auphonic's volume equalization is almost a must-have for
           | podcasts. I used to spend a lot of time getting volumes
           | right. With Auphonic it's quick and easy.
           | 
           | I definitely prefer pre-paid credits to a subscription given
           | my podcast production varies a lot.
        
       | xipho wrote:
       | Can anyone recommend similar for removing ums etc. in videos?
       | IIRC there is a workflow in some professional software, but being
       | able to train and throw the algorithim right at the video itself
       | (especially locally) would be useful.
        
         | autoencoders wrote:
         | For now, Descript would be the best option. You can still make
         | it work with the integrations, but it is a lot of effort.
         | 
         | That will change in Q2, when I add support for video.
        
         | mijustin wrote:
         | Yes; Descript.com does this.
        
         | nickjj wrote:
         | > Can anyone recommend similar for removing ums etc. in videos?
         | 
         | For single camera floating head style videos where you're
         | continuously talking about 1 topic it's going to be very
         | jarring if you start cutting out filler words. You'll end up
         | with a bunch of jump cuts where it looks like video frames are
         | dropped.
        
       | nickjj wrote:
       | As someone who has personally edited over a hundred 1-2 hour
       | podcasts with a new guest every time removing umms, ahhs, dead
       | air and filler words is soul crushing. It has gotten to the point
       | where after 2 years of running my podcast[0] I'm seriously
       | considering stopping the show because I'm getting burnt out from
       | editing and without sponsors it's not feasible to hire an editor,
       | but even with the show making no money I would happily pay triple
       | your asking price if I could click a button and have the problem
       | solved in a way that matched a human's ability to edit out filler
       | words.
       | 
       | It really is the difference between being able to edit a 1 hour
       | episode in 1 real life hour (editing at 2x speed) vs literally
       | spending 5 hours to edit 1 hour when there's a lot of filler
       | words or ums. That's due to having to stop every few seconds,
       | think about when to cut it and perform the cut. This is using a
       | heavily optimized keyboard shortcut focused workflow too.
       | 
       | I hope you don't mind constructive criticism but in my opinion
       | your "after" version doesn't sound natural. This isn't an attack
       | on your service specifically, because the outcome is the same
       | with all of the automated tools I've tried. I haven't tried them
       | all but I did play with a few of them.
       | 
       | For example in your case the pause between "Removing" and
       | "filler" doesn't match the pace of the rest of the sentence and
       | the transition from "very" to "time" has a very hard cut. This is
       | also a 10 word clip that's about 6 seconds. If you listened to a
       | 1 hour podcast episode that was edited like this it would be much
       | more noticeable.
       | 
       | There's so many intricate and subtle details around when and what
       | to cut to remove these things in a way where it's not noticeable.
       | Are there any paths moving forward in AI / ML that can lead to
       | this being indistinguishable from being humanly edited?
       | 
       | I debated deleting this comment before posting it because it's a
       | combination of feedback but also saying the service isn't
       | something I would buy in its current state but I'd like to think
       | it's more beneficial to post this to show there is a real demand
       | for this service if it can be executed flawlessly.
       | 
       | [0]: https://runninginproduction.com/
        
         | moritonal wrote:
         | Meta, but your comment was (IMHO) a great example of
         | constructive criticism. Show HN is about that, not just staying
         | silent and letting the users work die.
        
         | dannyeei wrote:
         | Funnily enough I was about to start building this then found
         | descript[1]. It transcribes the text and allows you to edit the
         | transcription then export it as audio.
         | 
         | [1] https://www.descript.com/
        
         | [deleted]
        
         | autoencoders wrote:
         | The edit on the page is not the best. I agree!. Mainly, if your
         | recording is unnatural (like that one) the edit is also
         | unnatural. However, the tool works better in an interview
         | podcast. I would strongly recommend to just upload a sample,
         | and you would see a big difference.
         | 
         | Regarding if ML would be indistinguishable from humanly edit.
         | Hard to tell. I think it will be like self-driving cars in the
         | future. 98% edits good 2% bad edits.
        
         | qmmmur wrote:
         | What post-processing do you do already to catch the low hanging
         | fruit? Izotope? I reckon putting in 100 hours of editing and
         | not being able to get an hour down to sub an hour means there
         | is something which could be optimised out quite quickly.
        
           | nickjj wrote:
           | > What post-processing do you do already to catch the low
           | hanging fruit?
           | 
           | None, everything is manual.
           | 
           | I use DaVinci Resolve to do the editing where both the guest
           | and myself have separate tracks. Then I line up the tracks
           | (only takes a few seconds) and start playing things from the
           | beginning at 2x speed. I stop to make cuts mostly to remove
           | filler content.
           | 
           | Through out this process of editing I'm also creating show
           | notes as I go. An example of the end result is here
           | https://runninginproduction.com/podcast/103-great-
           | question-m.... Basically every few minutes I recap what was
           | said into a 1 sentence bullet point with a timestamp. Along
           | the way I list out techs used as tags and list out reference
           | links / libraries into a Markdown document. Then once I'm
           | done editing the show I write a few paragraphs which is a
           | TL;DR of the episode.
           | 
           | All in all if the guest uses minimal filler words or noises
           | it takes about 1 real life hour per 1 hour of recorded
           | content to do all of the above. For context, the episode I
           | linked has someone who I would bucket into a category of
           | speaking very fluently with minimal filler content. I was
           | able to blaze through that one.
           | 
           | I also have a 2560x1440 display and use the "always on top"
           | feature of most window managers to layer the Markdown
           | document and a preview of the page just above the waveform in
           | DaVinci Resolve so I can quickly make cuts and update the
           | notes with minimal mouse movement. Almost everything is
           | keyboard driven.
           | 
           | What tools can be used to speed up that process?
        
         | simonbarker87 wrote:
         | I've not edited anywhere near as much as you have but I agree,
         | it's so tedious and by the end of an editing session you can
         | really start to resent the guest and all their verbal ticks. I
         | find I get a good idea for what the waveforms look like for
         | some noises and can see them coming and preemptively split the
         | track the start with a decent success rate.
         | 
         | Using RiversideFM to get two locally recordings is also a big
         | help.
         | 
         | I was sat next to an audio editor and producer at a wedding
         | recently and we got on to this topic and he said "your number
         | one job when editing an interview is to make the host sound
         | good and then just do the minimum on the guest, otherwise
         | you'll waste too much time".
         | 
         | Doing the kind of editing 8 hours a day I can see why he says
         | that.
        
           | nickjj wrote:
           | Yeah it's weird. I have these in depth technical
           | conversations with every guest where it's great, I love this
           | part. The frequency of verbal ticks and filler content really
           | takes an edit from "this isn't too bad" to "what the fuck am
           | I doing with my life?" all based on how many times you need
           | to remove filler content within the first 5 minutes of
           | editing a 90 minute show.
           | 
           | I'm kind of surprised that wedding producer openly said that.
           | My philosophy has always been the opposite. One of my main
           | goals of the show is to make the guest walk away thinking
           | this was the best podcast experience they ever had from start
           | to finish as well as do everything I can to make them come
           | off as good as possible.
           | 
           | I rarely cut content but most episodes have hundreds of
           | manual edits to remove filler content and create a more
           | concise flow by removing long pauses because my 2nd main goal
           | is to optimize for the listener. I keep the edits organic at
           | the same time by leaving in some filler content and subtle
           | things like a deep inhale or a sigh because there's a lot of
           | meaning around that when it comes to sentiment and tone, the
           | same can be said for sometimes leaving in an extra 500ms
           | pause to amplify the meaning behind something. At the same
           | time, sometimes filler content gets left in because it flowed
           | too quickly into the next word so cutting it sounds too
           | unnatural as if it clipped.
           | 
           | This is why I think it's a crazy hard problem to get a
           | machine to be able to make decisions like this.
           | 
           | I do use separate recordings (we each record our track
           | locally), it definitely helps eliminate the few cases where
           | we talk over each other or being able to lower the volume of
           | a laugh so it doesn't overpower what the other person said
           | while still keeping it in because it's a good part of a
           | conversation and a snort or laugh can easily be the
           | difference between a listener wondering if the guest was
           | offended or happily agreeing with something.
        
       | [deleted]
        
       | mijustin wrote:
       | Hey! Justin (from Transistor.fm) here. This looks really
       | interesting. Two questions:
       | 
       | 1. Any plans for an API and bulk pricing?
       | 
       | 2. Any plans to add loudness normalization, balancing, etc to the
       | processing?
        
         | autoencoders wrote:
         | Hey Justin! Love your podcast.
         | 
         | 1) API Access will come end of Q1.
         | 
         | 2) In the next 6 months, No. However, Auphonic would be a good
         | fit for you.
        
       | abdik wrote:
       | The logo is similar to ours https://www.lovo.ai/
        
         | bryans wrote:
         | While turning it into a heart may be clever branding, you've
         | only slightly modified a ubiquitous icon representing audio,
         | and countless startups used that before you.
        
       | eganist wrote:
       | This is awesome.
       | 
       | Can I suggest the ability to export as project files for popular
       | editors for your roadmap? It'd cut professional workflows down
       | substantially, which would be worth an (even higher) upcharge.
       | 
       | (It wasn't immediately obvious to me if you already did this)
       | 
       | Edit: https://cleanvoice.ai/integrations seems pretty close. I'd
       | honestly charge more for integrations and provide a base tier for
       | just exporting sound. I imagine most indie users would benefit
       | from finished exports enough to pay, while project files would
       | command a higher fee from editors looking to speed up their
       | workflow to take more clients. That's where I'm coming from on
       | pricing tiers and upcharging for professional features.
        
         | autoencoders wrote:
         | ADL Support will come around Q2, so you can import it in lot of
         | audio and video editors. For now, we have these export files
         | which you mentioned.
         | 
         | Regarding Pricing, that's a good point. I will definitely
         | consider it, thank you!
        
       | daenney wrote:
       | The Terms of service seem worrisome.
       | 
       | > By posting your Contributions to any part of the Site or making
       | Contributions accessible to the Site by linking your account from
       | the Site to any of your social networking accounts, you
       | automatically grant, and you represent and warrant that you have
       | the right to grant, to us an unrestricted, unlimited,
       | irrevocable, perpetual, non-exclusive, transferable, royalty-
       | free, fully-paid, worldwide right, and license to host, use,
       | copy, reproduce, disclose, sell, resell, publish, broadcast,
       | retitle, archive, store, cache, publicly perform, publicly
       | display, reformat, translate, transmit, excerpt (in whole or in
       | part), and distribute such Contributions (including, without
       | limitation, your image and voice) for any purpose, commercial,
       | advertising, or otherwise, and to prepare derivative works of, or
       | incorporate into other works, such Contributions, and grant and
       | authorize sublicenses of the foregoing.
       | 
       | It sounds an awful lot like "we are allowed to do anything and
       | everything we want with the content you upload to us". Maybe I'm
       | misunderstanding something, but I'd be extremely hesitant to
       | upload any content I create to a service with those kinds of
       | terms.
        
         | [deleted]
        
           | [deleted]
        
         | stevenicr wrote:
         | also.. " Your Contributions are not obscene, lewd, lascivious,
         | filthy, violent, harassing, libelous, slanderous, or otherwise
         | objectionable (as determined by us). 7. Your Contributions do
         | not ridicule, mock, disparage, intimidate, or abuse anyone."
         | 
         | Cancel culture coming.. main reason I would not invest time
         | into using Anchor,fm ..
         | 
         | So.. is "Us" progressive or conservative?
         | 
         | Bill Maher breaks these every night on both sides pretty much,
         | so I try to think, if they won't protect Bill Maher or Larry
         | Flynt's words, they are not going to protect mine.
         | 
         | "Contributions are not false, inaccurate, or misleading." - So
         | mainstream news can't use it either - that's a bonus.
         | 
         | I'd add more, but I see you mentioned you will be changing and
         | this is just a boilerplate to save time.
        
         | 1-6 wrote:
         | Thanks for the heads up. I'm a little hesitant to upload
         | something now. On the flip-side, I think devs just want total
         | protection while they navigate the landscape of machine
         | learning. I agree that they could have worded things better but
         | someone who worked on writing this probably didn't understand
         | the nuances of machine learning or the countries that people
         | would be signing up from. Plus they'll need to constantly use
         | datasets for their internal purposes to train.
        
           | autoencoders wrote:
           | Yes, that's exactly the case. As I previously commented, I
           | used an terms generator, until I get a lawyer, which can
           | write specifically what I do with the data.
        
         | autoencoders wrote:
         | I agree. The terms will be changed. I used an auto-generated
         | Terms generator for now (termly.io)
         | 
         | I would like to rewrite it.
         | 
         | What I do is just keep your files on the server for a week. In
         | case you have an issue, I will look into your file to fix your
         | issue. And if you want, you can give consent for me to further
         | improve the service. (Say you have an accent which the AI is
         | bad and I can use your audio file to understand why it failed.)
        
           | throwthere wrote:
           | With this statement you've now shown that your site doesn't
           | take contracts seriously and opened the door to people
           | arguing future contacts are also invalid. I'd delete this
           | response asap.
        
             | stavros wrote:
             | What? This person made something, we pointed out an
             | improvement and they said they'd change it. You're
             | literally complaining that it wasn't perfect already, and
             | thus they somehow don't "respect stuff".
        
             | giansegato wrote:
             | Why? They can change the policy and ask for a confirmation,
             | as every service out there is already doing.
        
             | simtel20 wrote:
             | How and when have you seen it happen that a contract was
             | invalidated by one party indicating that they would prefer
             | a more appropriate contract?
        
         | pfortuny wrote:
         | Thank you.
         | 
         | I would pay for a piece of software that does that job on my
         | computer with no Internet.
         | 
         | This way? I may even end up in court for saying something
         | "improper"...
         | 
         | Edi. OK: I've just read the developer's reply below.
         | 
         | Honestly: you need to fix this because right now it is more
         | scary than not.
         | 
         | Congratulations for the project but please do fix this.
        
           | autoencoders wrote:
           | I agree. More and more AI applications are exploiting our
           | data in negative ways.
           | 
           | I will get proper terms soon as possible. Especially, since
           | now people have mentioned it.
        
       | axhl wrote:
       | Congratulations on launching. How are you finding using termly.io
       | for the legal side of things?
        
         | autoencoders wrote:
         | It's not ideal. See the comment talking about the terms. I have
         | a meeting with a lawyer soon. But I guess is better than no
         | terms.
        
       | throwaway1777 wrote:
       | Overcast has features to do some of this on the listener side. I
       | prefer having the AI on the listener side so I can go back to the
       | raw version if the AI messes up for some reason.
        
       | fareesh wrote:
       | What's the high level approach required to build something like
       | this yourself?
       | 
       | Does it involve relying on speech to text with timestamps and
       | then a series of cuts based on that?
        
       | monroewalker wrote:
       | Sounds similar to Descript https://www.descript.com/
        
       | spicybright wrote:
       | I'm going to sound like a negative nancey, but I wish
       | podcasters/youtubers would just practice their speaking skills
       | instead of rely on series of really quick jump cuts. Worst
       | offenders are those that can't get through a sentence without
       | splicing it 2+ times...
       | 
       | Perhaps you could have a mode to detect how much one stutters,
       | and parts worth redoing without spending as much time combing the
       | whole thing.
        
         | pfortuny wrote:
         | Classically professionals learnt their discourses by heart.
         | That stands out when you see it.
         | 
         | I remember fondly a student of mine who seemed unable to
         | express himself properly. I told him to memorize his final
         | project dissertation because otherwise it would be a wreck (OK,
         | I did not say this last part, it was more of a suggestion).
         | 
         | BOY: did he memorize it. He got an honors and I did think "this
         | guy has really done it, and it sounds like music!"
         | 
         | When you do it well, it tells.
        
         | intrasight wrote:
         | Some podcasts I listen to are over-edited. I'd always assumed
         | that a) it was done manually and b) it was done to keep the
         | length below some threshold. Now I'm curious if they are using
         | software to automate the editing.
         | 
         | I find the cadence very unnatural when all the spaces between
         | phonemes are removed.
        
           | ghaff wrote:
           | >I find the cadence very unnatural when all the spaces
           | between phonemes are removed.
           | 
           | Any editing can be overdone and, while I do a modicum of
           | editing out umms, you knows, and other verbal ticks when I'm
           | putting together a podcast interview, I'm not fanatical about
           | it.
           | 
           | You do occasionally get someone who just speaks quite slowly
           | and it is sort of annoying to listen to as audio. So I've
           | done some automated gap reduction is a couple cases.
        
             | intrasight wrote:
             | What software do you use to automate?
        
               | ghaff wrote:
               | Audacity.
        
         | [deleted]
        
         | cube00 wrote:
         | Especially ones who won't set their background LED lights to a
         | stable color. The smooth flowing gradient becomes very
         | distracting when you jump cut the heck out of it.
        
           | intrasight wrote:
           | synesthesia?
        
         | curiousgal wrote:
         | Once I started noticing jumpcuts it ruined every single YouTube
         | video with a person talking into the camera. The worst offender
         | being Phillip DeFranco.
        
           | ghaff wrote:
           | I find talking into a camera really tough. If you're doing it
           | by yourself you almost need to imagine you're talking to a
           | person. I even know of people who put cutouts or pictures of
           | someone by the camera so they can talk to a person.
           | 
           | I haven't had a lot of luck using teleprompters but maybe I
           | just haven't hit of the right setup.
           | 
           | Something else someone told me recently was to try to work in
           | short segments that you redo until you get right and then do
           | a cut to the next segment somewhere that it's natural.
        
           | unholiness wrote:
           | Interesting take. I saw Phillip DeFranco as more of a pioneer
           | of that style. He really leaned into the cuts. At the time it
           | was something no one else was doing so it was very
           | noticeable, and he had a very crisp cadence with them where
           | the jarring cuts were part of the presentation. It was clear
           | his process was: Write a script, mark cuts everywhere it
           | could make sense, go through the script repeating every
           | phrase until you're happy with the sound, and when editing,
           | always make the cuts where they're marked, even if it could
           | be skipped.
           | 
           | The result feels something like pixel art: Clearly not the
           | closest possible imitation of conversational speaking, but
           | something else. A style in its own right with different
           | considerations.
           | 
           | Now that it's par for the course to have jump cuts, I see
           | them used more sloppily everywhere, where it's clear the
           | narrator decided where to do the cuts after the fact. Cutting
           | off the beginning or end of a phoneme, missing or repeating
           | bits of a thought because they they liked one phrasing in
           | recording but opted for another one in post, misordered cuts
           | where something which moved in the background moves back to
           | its old place, etc. Phillip's style looked lazy but it can't
           | really be imitated with actual laziness.
           | 
           | These days I look back and really cringe at the substance of
           | his show. But I still see the style as professional.
        
       | mikepechadotcom wrote:
       | Really cool project, I wish you great success! Could be useful
       | for my (german) podcast agency!
       | 
       | Out of curiosity: Which ai-technology did you use? OpenAI? Google
       | API? Or did you train the models yourself with Python (sth. like
       | Tensorflow)?
       | 
       | Cheers, Mike
        
         | autoencoders wrote:
         | Hallo Mike, freut mich dich kennenzulernen!
         | 
         | I trained my own models. No OpenAI/Google API.
         | 
         | Liebe Grusse, Adrian
        
       | notafraudster wrote:
       | "Free 30 Minutes Trial" is not native English. "Free 30 Minute
       | Trial" would be better; but I think the sentence is a little
       | confusing. I presume you mean you can convert 30 minutes of audio
       | for free, not that the trial account is only valid for 30 minutes
       | from creation. I would do "Clean 30 minutes of audio for free. No
       | Credit Card needed." or similar. The sale page which says "Get 30
       | minutes credit to try the service out." is better, and "30
       | minutes" does sound correct on that page.
       | 
       | In your FAQ, you say: "Currently we remove lip smacks, saliva
       | crackle, mouth clicks and harsh parts of breathing (not the whole
       | breath). If you want to remove a particular mouth sound (ex.
       | Chewing), write us in the chat as a feature request." I don't
       | think most English speakers would understand what "harsh parts of
       | breathing" are. Typically a parenthetical example in English
       | would be written "(e.g. chewing)" not "(ex. Chewing")".
       | 
       | Your question "What filetype and sizes do you support?" doesn't
       | answer what filetypes you support, and I suspect the singular
       | "filetype" was a grammar error. You also write "We have an audio
       | file size limit of 1.5G per file or in case you are uploading
       | multi-track and a total file size of 2 GB. ". The part that says
       | "or in case you are uploading multi-track and" doesn't make any
       | sense in English. I think you mean "We support file sizes up to
       | 1.5GB per file for single-track files, or 2GB if you are
       | uploading a multi-track file as separate files." but I'm not
       | sure.
       | 
       | In general I don't understand why each selling point has a
       | separate FAQ page but the FAQs are often not related to the
       | selling point. I don't think people think the "Mouth Sound
       | Remover" page is the one that lists file size support, while the
       | "Stutter Remover" page is the one that lists the maximum number
       | of tracks per project.
       | 
       | Your integrations page lowercases "cleanvoice" whereas other
       | pages write it as "Cleanvoice".
       | 
       | Under integrations, you have a section called "Markers Export".
       | This should probably be "Export Markers" or "Marker Export".
       | 
       | Under "How to Export Edits", you probably don't want to
       | capitalize "Results" or "Editor" unless these are supposed to be
       | title cased, in which case you probably want to title case all of
       | them.
       | 
       | Under your pricing FAQ you have "Does my credit expire at end of
       | the month? Your credit will reset every billing month. Unused
       | credit will be lost." This is needlessly confusing. You use the
       | verbs "expire", "reset", and "be lost" to describe the same
       | thing, and you don't actually answer the question. Also you don't
       | want "at end of the month", you want "at month's end" or "at the
       | end of the month". I would rewrite as "Does my credit expire at
       | the end of each month? Yes. Credit resets every month and cannot
       | be carried over to future months. Unused credit will be lost."
       | This is a terrible business model, though, and so I suggest you
       | not do this. Either sell as a subscription or sell as a credit
       | model, not both, this is gross.
       | 
       | In general I think you want to pay someone who is a professional
       | English copywriter to fix your website. Cheers.
       | 
       | Edit: I just noticed your changelog is powered by a service
       | called Headway. I am not sure if you also made Headway, but
       | Headway's website is also in need of English copyediting.
        
         | [deleted]
        
           | [deleted]
        
         | autoencoders wrote:
         | Wow! Thank you so much! You are right, I need to get ASAP a
         | copywriter.
         | 
         | I'm curious why the Subscription + Onetime Credit is bad. But I
         | agree it is confusing.
         | 
         | My understanding is that not every customer wants or needs a
         | subscription, since they upload podcasts irregularly.
         | 
         | This business model is seen in other AI products:
         | 
         | https://www.remove.bg/pricing https://auphonic.com/pricing
         | 
         | I am very grateful, you took the time to help out. Really
         | appreciate it!
        
           | sdoering wrote:
           | Maybe you can get away for a quick fix with something like
           | deepl.com.
           | 
           | They are great. As a German native speaker I came a long way
           | with using them when I needed valid translations.
        
       | arendtio wrote:
       | That logo is very similar to the Cisco logo:
       | 
       | https://www.cisco.com
        
       | stavros wrote:
       | This is excellent, well done! I'd be curious to know how it's
       | done, as I don't know much about deep learning and this looks
       | like magic to me.
        
       | autoencoders wrote:
       | Hey HN!
       | 
       | I like podcasting, but I hate editing them. I tend to stutter and
       | have a lot of filler words in my podcast. That's why I created
       | Cleanvoice, in order to spend less time editing them. Cleanvoice
       | is an ML tool which removes filler words, mouth sounds,
       | stuttering and dead air from your podcast. To use it, just upload
       | your podcast - wait some minutes - download the cleaned audio.
       | 
       | It's still not perfect, but it's at a stage where I can blindly
       | use it on every single one of my podcast.
       | 
       | I would love to hear your feedback!
        
         | wpietri wrote:
         | Neat! I love products that come out of a personal need.
         | 
         | Is it possible for you to do a live, personal demo? No logins
         | or anything. I'm thinking something where you tell people to
         | start up their audio and then give them a quick prompt like
         | "Describe your breakfast yesterday." Record for 30 seconds, and
         | then let them play back the original and cleaned versions. You
         | could limit them to, say, 5 goes, with a different prompt each
         | time.
         | 
         | I suggest it because a) a little personal investment makes it
         | more likely they'll give you their email address for signing
         | up, and b) many potential customers underestimate how much they
         | need something like this.
        
           | autoencoders wrote:
           | I like your idea, makes sense.
           | 
           | My biggest fear is that without login, people will start
           | abusing it in ways that I don't expect. Definitely
           | considering it. Thanks you!
        
             | wpietri wrote:
             | That's a good fear to have. That's the kind of thing I
             | would set up some monitoring for and then wait to see. You
             | might get a few jerks. But those same jerks might also be
             | the sort of people who would sign up with a bunch of fake
             | emails, so gating on an email address may not be much
             | better than gating on a fresh-issued cookie.
             | 
             | Thanks for listening, and good luck with your project!
        
         | telesilla wrote:
         | Have you compared this to other commercial options such as
         | Descript? Looks really great at a glance, thanks for sharing!
        
           | autoencoders wrote:
           | I tried to use Descript for my podcast, but it has some
           | issues.
           | 
           | 1) It doesn't work well if you have a strong accent. As an
           | non-native speaker, the transcription were quite bad, making
           | the editing quite bad.
           | 
           | 2) Cleanvoice works with multiple languages, descript
           | doesn't.
           | 
           | 3) Cleanvoice can remove stutters (not always, but it tries)
           | and mouth sounds like lip smacking, teeth clicking. Descript
           | can't. This is not a big deal for most, but since I stutter
           | alot this was essential.
           | 
           | My approach is different from Descript. They use a
           | transcription service, and then they edit the audio based on
           | the text. I work directly on the phonetics level. Allowing me
           | to have more control over audio.
           | 
           | Depending on the needs, either one is better. I guess you
           | should try it for yourself and compare.
        
           | ckdarby wrote:
           | I use Descript and it is absolutely lovely. There are a bunch
           | in this space that I would not be surprised being merged or
           | acquired. Would love to see Descript & GetWelder merging
           | together.
           | 
           | While Cleanvoice has some niche features that Descript
           | doesn't offer I would not be surprised to find them rolling
           | these features out in the next major release they're doing.
           | IMO the founder of Cleanvoice should sell/join Descript.
        
         | qmmmur wrote:
         | Without giving away your secret sauce, what are your approaches
         | to the cleaning process? Is it a combination of different
         | passes of algos or is it something more generic and "sausage
         | machine-like" like a neural network?
        
           | jwuphysics wrote:
           | Based on the OP's username, surely one of the deep learning
           | algorithms is a denoising autoencoder, right?
        
           | autoencoders wrote:
           | The audio is edited in several phases. It uses different
           | algorithms, but most of them are deep learning based. It is
           | surely overengineered, but as a Data Scientist, ML is the
           | most fun part for me.
        
             | nmstoker wrote:
             | How is the latency and, if it's sufficiently low, could
             | this realistically be applied to "nearly live" content?
             | 
             | That scenario seems really appealing for conferences, even
             | if it just quietens down the verbal ticks, but I'm guessing
             | if the lag is too great it would get like a bad lip sync
             | issue
        
               | pokot0 wrote:
               | How does real time makes sense in the first place for an
               | algorithm that gets 1 minute of audio and gives you back
               | 50s? You are gonna have to fill the gaps anyway with
               | something not meaningful.
        
               | staticautomatic wrote:
               | Silence is meaningful, but pretty awkward when not
               | deliberate!
        
               | laumars wrote:
               | Tools like this are designed to remove awkward silences.
               | 
               | What it sounds like the GP is after is something more
               | like hiss and pop removal (to use an only vinyl analogy)
               | and that's a different and also simpler problem to solve.
               | I'd wager there are already tools on the market for that.
        
               | pokot0 wrote:
               | Very insightful :). Now I need an AI to tell me when
               | silence is deliberate or not. :)
        
               | autoencoders wrote:
               | It would be a huge engineering endeavour, which I
               | wouldn't be capable of doing. That said, things like
               | background noise and some sounds can be removed. See
               | Krisp.ai
        
               | qmmmur wrote:
               | Izotope plugins already do some of these things but not
               | all. In particular their de-clicking algorithm is pretty
               | good but definitely not automatic or low latency.
        
               | Fogest wrote:
               | Nvidia RTX voice does similar. It's pretty similar to
               | other technology though where it focuses more on removing
               | background noise. It actually works very well. It would
               | definitely be interesting to see it also filter speech
               | itself. But I feel like this would be hard to do without
               | introducing extra latency. If someone is saying "umm" or
               | some other filler before a word you kinda need to know
               | what that word will be to determine if it's filler or
               | not. So it almost can't be done without introducing
               | latency as it would need some future speech to determine
               | if filler or not.
        
             | qmmmur wrote:
             | Do you do any audio segmentation to remove the filler words
             | and such?
        
         | undoware wrote:
         | I literally just bought your product, thank you very much, I
         | needed this and wondered why no one had made it yet.
        
           | autoencoders wrote:
           | I appreciate it! If you have any issues or need help, feel
           | free to reach out. (You can use the chat in the app.)
        
       | gus_massa wrote:
       | Is the example in the page really made by the computer? In my
       | opinion the pauses in where the filler words were are slightly
       | too long. Is it possible to configure this?
       | 
       | Is it possible to keep some filler words? I make something
       | similar (but not professionally), and sometimes I like too keep a
       | few of them.
        
         | autoencoders wrote:
         | > Is the example in the page really made by the computer? Yes.
         | >In my opinion the pauses in where the filler words were are
         | slightly too long. Is it possible to configure this? I agree,
         | however, if you use it in an interview. The edits sound better.
         | In an unnatural setting, you get unnatural results.
         | 
         | Currently, there is no way to set it for now. But customization
         | is planned for Q2 next year.
         | 
         | >Is it possible to keep some filler words? For now no, but
         | keeping some filler sounds to keep it authentic is something
         | which I plan.
        
           | gus_massa wrote:
           | I agree that the correct length of the pause after the word
           | is removed is very tricky. Perhaps your configuration is the
           | better than my imaginary magical edition.
           | 
           | In other comment, eganist posted a link to
           | https://cleanvoice.ai/integrations It looks interesting
           | because I can choose which to keep and even use it to sink
           | with video [with some additional work]. I didn't see it the
           | first time in the page.
        
             | autoencoders wrote:
             | ADL Support is also around Q2, so you could just import it
             | in your audio/video editor without issue. Thank you point
             | out. I'll put Integrations on the homepage as well.
        
       | pwned1 wrote:
       | I suspected something like this was happening with podcasts. I've
       | noticed lately that some podcasters have unnaturally short pauses
       | between speakers (question and answer) or between sentences. It
       | really annoys me. It makes it almost unlistenable.
        
         | carols10cents wrote:
         | Yes, the worst is when so much silence is removed that it
         | sounds like someone is laughing over themselves.
        
         | autoencoders wrote:
         | I agree, as if they don't breathe!
         | 
         | This is not the case with my app. I keep the edits longer than
         | shorter, since I also find that unlistenable.
        
       | nateweiss wrote:
       | Looks cool! Would this also work for "explainer" type videos,
       | showing how to use a software product or similar?
       | 
       | If yes, you might consider a page or callout about that use-case,
       | as it might attract some additional users. Just a thought.
        
         | tyingq wrote:
         | That seems like it would be tricky, as the video and audio
         | would get out of sync. You would have to remove, then "fill" to
         | keep the timing. Though this product does mention it works with
         | multiple speakers on different tracks...so they are already
         | somewhat in that space.
        
           | autoencoders wrote:
           | For video is quite tricky. One thing with Video is that you
           | don't want to over edit the audio, since its then very hard
           | to keep the video synced. That said for explainer video it
           | should work ok, but for a Video Podcast it would be horrible.
           | I have an idea how to deal with this, but this is not now
           | available.
        
       | sdoering wrote:
       | Not sure were you are located, but if you are giving access to
       | people protected by the GDPR your cookie notice does not fullfill
       | the requirements set by European Regulations.
       | 
       | Additionally, if you are located in a country that (like Germany
       | for example) has regulations on the necessity of an imprint, this
       | might also be missing.
        
         | autoencoders wrote:
         | It should be ok, since I use strictly essential cookies, which
         | don't require consent. (But users need to be informed)
         | 
         | Or do I misunderstand the law?
         | 
         | [1] Strictly necessary cookies -- These cookies are essential
         | for you to browse the website and use its features, such as
         | accessing secure areas of the site. Cookies that allow web
         | shops to hold your items in your cart while you are shopping
         | online are an example of strictly necessary cookies. These
         | cookies will generally be first-party session cookies. While it
         | is not required to obtain consent for these cookies, what they
         | do and why they are necessary should be explained to the user.
         | 
         | [1] - https://gdpr.eu/cookies/
        
       | geuis wrote:
       | Your demos don't play on iOS safari.
        
         | autoencoders wrote:
         | Ups! Thank you for point it that out. I'll check it.
        
       ___________________________________________________________________
       (page generated 2021-11-20 23:00 UTC)