[HN Gopher] Enhanced noise suppression in Jitsi Meet
       ___________________________________________________________________
        
       Enhanced noise suppression in Jitsi Meet
        
       Author : jlpcsl
       Score  : 254 points
       Date   : 2022-10-01 11:28 UTC (11 hours ago)
        
 (HTM) web link (jitsi.org)
 (TXT) w3m dump (jitsi.org)
        
       | quickthrower2 wrote:
       | Aside, Jitsi is pretty awesome for creating an video app idea
       | quickly. The API is very easy to use.
        
       | mcluck wrote:
       | It does seem to do a good job of eliminating noise but it seems
       | like it gets rid of a lot of the signal too. It's much easier to
       | understand the noisy sample than the processed one
        
       | josteink wrote:
       | I'm using RNNoise as a pipewire input filter on my Linux
       | machines, but that's very Linux-specific and a bit "hardcore" to
       | setup.
       | 
       | Nice to see it getting integrated into video meeting solutions,
       | so more people can take advantage of this awesome library.
        
         | Doman wrote:
         | Awesome! Could you please elaborate how to do it or post some
         | good/not outdated links?
        
           | josteink wrote:
           | Don't remember exactly which guide I followed, but I used the
           | build from this repo, and the instructions looks plausible:
           | 
           | https://github.com/werman/noise-suppression-for-
           | voice#pipewi...
        
         | asicsp wrote:
         | > _but that's very Linux-specific and a bit "hardcore" to
         | setup_
         | 
         | Have you tried https://github.com/noisetorch/NoiseTorch/?
        
           | nicolaslem wrote:
           | Or https://github.com/wwmm/easyeffects for noise reduction
           | and other effects like compression and EQ for a real crooner
           | voice in any video call application.
        
             | kevincox wrote:
             | Definitely recommend easyeffects over noisetorch. No root,
             | high quality GUI and can work automatically in startup. I
             | only use the noise suppression 99% of the time but having
             | the other effects available can also be fun.
        
       | pen2l wrote:
       | Created a few years ago by Jean-Marc Valin of xiph/mozilla (who
       | by the way is also the author of Opus codec among other things):
       | https://gitlab.xiph.org/xiph/rnnoise/
       | 
       | Overview of RNNoise from the horse's mouth is here:
       | https://jmvalin.ca/demo/rnnoise/
       | 
       | Used as a Wasm module! In some ways the web is becoming more
       | opaque. Is this the future then, a hodgepodge of binaries doing
       | things behind the scenes? Though in this case it happens to be
       | OSS, and it may well be a moot point -- backend is already a
       | blackbox to the enduser, now parts of frontend are blackboxes.
       | The practical implication is probably just that some measure of
       | customizability is gone.
        
         | saghul wrote:
         | What a weird take.
         | 
         | How else would we have implemented this? WASM has facilitated
         | introducing these technologies into web applications, it
         | literally wasn't possible before.
         | 
         | Thanks to emscripten it wasn't even that hard to get rnnoise
         | working on WASM: https://github.com/jitsi/rnnoise-wasm
         | 
         | I concede WASM does open the possibility of adding opaque stuff
         | to web apps but IMHO the benefits outweigh the drawbacks at
         | this point.
        
           | pen2l wrote:
           | Oh no you're absolutely right, my general frustration was
           | ill-placed for this thread. Wasm is no doubt the right and
           | only way to have done this.
        
         | danuker wrote:
         | Short of reproducible builds, you can't even check that what
         | you're being served is, in fact, the OSS version.
        
         | naillo wrote:
         | I feel like I have about as much chance reading disassembled
         | wasm as I would have reading unminified javascript so I don't
         | think it changes much.
         | 
         | (You could technically turn the wasm to JS and unminify that
         | too, which I doubt is much harder/easier to decipher as the
         | same thing written in JS and minified/unminified.)
        
         | sabjut wrote:
         | Is this a troll comment? Yes, wasm works based on a compiled
         | binary, just like any other program written in a compiled
         | language in the past 50 years. You try to suggest that everyday
         | users of the web are just going into the js sources of webpages
         | and understand whats going on. With the plethora of libraries,
         | frameworks and static optimization used in todays websites,
         | normal people can't really dissect the inner workings of a
         | website just by looking at the code. That's why we have tools
         | like request analyzers etc which all would still work with
         | compiled libraries.
         | 
         | Compiled code has existed for half a century and we know how to
         | work with it.
         | 
         | Suggesting that the web is doomed because people of the future
         | prefer rust instead of javascript is beyond any rationale.
        
           | salawat wrote:
           | ...I still dissect website code, thank you very much.
           | Basically have to do it just to figure out quirks I'm
           | constantly running into.
        
           | troyvit wrote:
           | They didn't suggest the web is doomed, just that more aspects
           | of it are opaque. I don't think they're talking about every
           | day users of the web either, but rather nascent developers.
           | 
           | The early web was a great equalizer. Anybody could study a
           | little html, download an ftp manager, jump through a few
           | procedural hoops and have a web page. After some studying and
           | trial and error they could even build an interactive site.[1]
           | 
           | It's easy to miss all the potential of wasm when that's what
           | you remember of the web. To me the amazing thing is that
           | browsers will still work with the methods described above[2]
           | but we're on the cusp of being able to do almost everything a
           | full application environment can do.
           | 
           | That said, even though there will be plenty of OSS wasm tech,
           | it'll still be more opaque to those of us who don't do
           | compiled languages. It'll be a lot tougher to just fork the
           | code and do something more creative with it.
           | 
           | [1] PHP used to stand for "Personal Home Page" and, as one of
           | its founders put it, was created so that "any idtiot" could
           | make an interactive site.
           | 
           | [2] https://t.mkws.sh/58bytes/
        
             | fragmede wrote:
             | JavaScript minifiers to obfuscate the code have been around
             | pretty much since the language got popular, so that version
             | of the web's been gone since about when Myspace lost to
             | Facebook. Places like Glitch.com is trying to bring that
             | back though.
        
             | maven29 wrote:
             | Are modern-day "no code" tools like Webflow not an
             | acceptable equivalent?
             | 
             | We already lost any semblence of building from scratch in
             | the mid-2000s with the emergence of gargantuan HTML
             | templates and Wordpress/Drupal/PHPbb deployments with
             | plugins and themes.
             | 
             | This is a direct result of people being held to higher
             | standards and thus spending a lot more effort overriding
             | the compositional and behaviour defaults of the user agent.
             | 
             | The modern-day iteration just optimizes for scaling up to
             | tens of thousands of concurrent end-users on anemic
             | hardware.
             | 
             | We have to accept the fact that personal webpages gave way
             | to social network profile pages. This didn't happen
             | overnight and there is zero demand for a hand-crafted
             | presence on the web anymore.
        
               | kragen wrote:
               | No, an environment for writing _new_ code is not any kind
               | of equivalent for the ability to reverse-engineer
               | _existing_ code. Firebug and its clones are a much closer
               | equivalent than anything like WebFLow.
        
               | rektide wrote:
               | Build from scratch is out of favor, but not necessarily
               | that far off. Folks like Github & Youtube have very
               | simple bottom-up webcomponent systems they use, rather
               | than top doen frameworks. Existing concerns about
               | bundling might be met by bundled http exchamges
               | (webpackage).
               | 
               | I dont think "no code" is an aid. If anything it's
               | pushing in the opposite direction: rather than a
               | transparent approachable web medium, it suggests we need
               | hyperadvanced tools that we really wont understand or
               | have control over to synthesize web code. It's a simpler
               | user experience, but a push away from notepad.exe webdev.
               | 
               | I wouldnt rush to make any conclusions about who or what
               | has won, as a settled fact & case for all time. We havent
               | had good ways to run online systems ourselves, versus
               | hosted for us, and there's still lightyears to go but
               | we're doing good things & finally maturing well. We're
               | only a couple years into ActivityPub as an interchange
               | format & growing many of the caoabilities & tools &
               | systems, around all mimds of use cases, that will make
               | throwong together a fair, interactabke competitive
               | offering possoble. Social media has had huge huge
               | investmemt poured into it, but we are in decent preteen
               | years of growing up & owning the libre equivalents. We
               | can assess demamd only after there is a visualizable
               | state people can imagine; just having an isolated blog is
               | not the equivalent to the well connected social media
               | site, but these capabilities slowly arise. Follow the
               | alpha geeks; this currently long phase will not be
               | forever.
        
           | Uehreka wrote:
           | Sure "everyday users" aren't clicking "View Source", but
           | that's not really what the issue is about.
           | 
           | When I was a kid, every piece of software I used was pre-
           | compiled, and therefore opaque. This made it difficult for me
           | to figure out how people made certain things, and after a
           | while I lost interest in programming.
           | 
           | When I got back into it later, one thing that made a huge
           | difference was being able to see how various cool JS sites
           | were built. The ability to "View Source" like that was
           | revolutionary, and also allowed me to build some early fun
           | projects, like a Cookie Clicker "AI" that could play the game
           | automatically by calling the functions I could see in the
           | game's source.
           | 
           | I'm far from the only person with experiences like these.
           | Yes, there was programming before View Source and there will
           | be programming after. And for those of us with the right
           | tools or reverse engineering skills, View Source isn't
           | particularly relevant. What we're losing is a pipeline that
           | helped people become/stay interested in programming, which
           | makes it likely that future programmers who would've followed
           | a path like mine will do something else instead.
        
             | est31 wrote:
             | On the other hand, it's never been as easy to contribute to
             | OSS projects as it is now. Github has severely lowered the
             | requirements compared to earlier settings where you had to
             | get an e-mail client, configure it in just the right way,
             | etc. You have live coding youtubers, there are discord
             | communities for all types of technology, and knowledge
             | about programming and technology is extremely available
             | through Google, way more than it was 20 years ago. I think
             | young people still have tons of opportunities to start out.
        
             | SergeAx wrote:
             | Today's JavaScript "View source" is 90% useless because of
             | Webpack et al. The original program is effectively compiled
             | into obscure and obfuscated lowest-common-denominator JS.
        
         | Weatherweathe wrote:
         | Arent wasm modules still sandboxed? Reverse enginering binaries
         | should have around same complexity than reverse enginering
         | uglify js, not sure how they are more opaque
        
           | pen2l wrote:
           | You probably have a point but I'm thinking unuglified js code
           | (http://www.nice2predict.org/) is not as impenetrable as code
           | from reverse engineered wasm binaries? The element of
           | plausible deniability is more potent though for the nefarious
           | actor on the other side in the case of wasm binaries.
        
         | robalni wrote:
         | I don't think it makes much of a difference whether you can
         | read the code because even if you can read the javascript, it's
         | automatic so it can be different on the next request. If we
         | want to be able to trust the web, we have to get rid of the
         | automatic download and execution of arbitrary script code.
        
         | api wrote:
         | Most JavaScript these days is basically compiled binary. Rarely
         | is it very human readable.
        
         | elcomet wrote:
         | Wasm is about distribution of binaries, not about open source.
         | Those are two different subjects.
         | 
         | When I install a program on my debian machine with apt-get, I
         | also get binaries. But this doesn't mean that it is opaque
         | right?
        
         | KMnO4 wrote:
         | I think we've been lulled into some false sense of expectation
         | that the web exists as a place for "open source code" to be
         | run. As if the fact that you can view the source of any page is
         | any purveyor of that.
         | 
         | If that's your definition of transparency, then perhaps
         | learning to read assembly would give you the same comfort. In
         | fact, there's a lot more binaries distributed with symbols
         | intact than unminified JS.
         | 
         | Or, to put it another way, if you could right click -> view
         | disassembly of any binary on your computer, how would that be
         | any different than today's web?
        
       | geiser wrote:
       | Sorry, but at least in my smartphone, I can understand better the
       | unprocessed audio showcased down in the Web page, than the noise-
       | suppresed audio. How is that?
        
         | CharlesW wrote:
         | The original audio is significantly easier to understand. This
         | may be technically interesting, but the noise suppression is
         | aggressive to the point that it's eating critical signal with
         | the noise.
        
           | SergeAx wrote:
           | This is the default for online conferencing. Everyone is way
           | better off asking other party to repeat couple of words than
           | listening for all that noise during the whole call.
        
             | ComputerGuru wrote:
             | > Everyone is way better off asking other party to repeat
             | couple of words than listening for all that noise during
             | the whole call.
             | 
             | I didnt understand the first three words, for Alice it was
             | the next two, and for Bob it was the last four. How many
             | people are going to ask to repeat?
             | 
             | Evolution taught us to understand over the sound of waves,
             | crickets, rain, thunder, and more. It didn't teach us to
             | comprehend with half the signals masked.
        
             | leni536 wrote:
             | But this might be better served with a simplistic voice
             | activity detection, like in mumble.
        
       | atty wrote:
       | Somewhat tangential, but at my work we have found WebEx's
       | background noise removal to be absolutely amazing. So many times
       | we've had someone in a meeting say "sorry about X/Y/Z, it's so
       | noisy", and the rest of us won't hear a thing. This sorta tech
       | has gotten so good, and is a really nice quality of life
       | improvement for remote work. (Or for meetings with people in
       | noisy offices of course)
        
       | naillo wrote:
       | Rare to find creative real time small-weight uses of ML but I
       | love when it's done and this has an impressive and well written
       | explanation with it as well. Great stuff.
        
       | haunter wrote:
       | This is one of the filters OBS use too (the other is Speex which
       | is obsolete to some extent)
        
       | eis wrote:
       | Bummer, reading the title I thought Jitsi had a new de-noiser
       | because they had RRNoise for some time. Unfortunately RRNoise has
       | not received much advancement for a couple years. It's by now
       | half a decade old tech. I've worked with the WASM version in the
       | past but it can be hit or miss. Sometimes it makes the audio you
       | want a bit weird. It also added something like 10% CPU usage and
       | in the end we disabled it again.
       | 
       | I'd love to see some more state of the art solution that works
       | with WASM. Maybe even something that one could train on their own
       | voice and filter everything else would be awesome. Because all
       | the noise cancellation tech does not help if you sit in an
       | environment with other people talking next to you and the AI
       | doesn't filter it because it's voices. Sometimes coworkers use
       | Krisp but even that proprietary paid solution is so-so.
        
         | saghul wrote:
         | While we've had rnnoise integration for a while it was for
         | "noisy environment" notifications, this is the first time we
         | use it to actually filter audio.
         | 
         | Also audio worklets weren't a thing when we first introduced
         | it.
         | 
         | I'm not aware of any other open source (and better) models, but
         | if any come up, we'll certainly check them out!
        
           | pen2l wrote:
           | If you have any involvement with Jigasi or might be in the
           | know -- are there plans to use whisper, for instance, instead
           | of Google's API for transcription? If I recall correctly
           | jigasi is using google's API, local transcription aligns well
           | with the rest of Jitsi's missions.
        
             | nikvaes wrote:
             | The problem for Jigasi's speech-to-text feature with
             | Whisper - or any recent SOTA speech-to-text neural
             | networks, is that they are transformer-based. One of the
             | key features of transformers is that they are very good at
             | processing a sequence with the attention mechanism. But
             | attention inherently needs to see the whole input sequence.
             | So it's difficult to adapt these architectures to perform
             | well in real-time scenarios like captioning meetings.
        
               | pen2l wrote:
               | Yes! But a part of the Jitsi ecosystem enables recordings
               | and whisper is a good candidate to use for these recorded
               | sessions.
               | 
               | On that topic -- they record sessions in an interesting
               | way, basically an instance of chrome and is started and
               | captured... I think with OBS. That always made me raise
               | an eye but I also can't think of up a better way.
        
             | saghul wrote:
             | We do have VOSK support already. I haven't heard of
             | whisper, but it does sound like a good GSoC project for
             | next year!
        
               | pen2l wrote:
               | If I have time I'll try to help you guys out. I'm a big
               | fan of what you're doing. :)
        
           | eis wrote:
           | Thanks for the clarification. We also experimented with audio
           | worklets + rrnoise about 1.5 years or so ago but had very
           | mixed results. The potential upside with processing in
           | another thread is clear but some browser and OS combinations
           | just didn't work well and resulted in micro stutters in the
           | audio. I remember Chromium on Linux for example being
           | finicky. Some browsers worked better with smaller buffers,
           | some needed bigger ones. We spent too much time debugging and
           | tuning for different systems and the audio quality
           | improvement was not deemed good enough so we shelved the
           | effort. I guess audio worklets improved since then and
           | probably is more useable by now. Do you guys have some kind
           | of performance monitoring for the noise cancellation or audio
           | in general?
           | 
           | At the time I also spent a few days looking for something
           | better but didn't really find anything. Unfortunately RRNoise
           | is the best we have :( The only other noise cancellation
           | software that actually impressed me was the one from Nvidia
           | but that's not something that one could integrate via WASM
           | and of course wouldn't work on most devices anyways.
           | 
           | Oh what a day it will be where we have energy efficient
           | hardware encoders for AV1 in every device plus some really
           | good noise cancellation. Oh and then we just need internet
           | connections without packetloss :P
        
         | [deleted]
        
       | gnicholas wrote:
       | Anyone have tips for using Jitsi? I've been thinking about moving
       | off Zoom now that they're enforcing a 40 min limit even for one-
       | on-one calls.
       | 
       | Does it create friction for folks who haven't used it before? Any
       | suggested instructions to send with a meeting invite?
        
         | e12e wrote:
         | We've been using jitsi via zulip chat at work. It should be
         | drop-in for at least small groups (one-on-one, handful of
         | people - I have yet to investigate "conference" or "class room"
         | size).
         | 
         | We do unfortunately see semi-regular lock-up/freezes where one
         | end of the stream stops for ~30 seconds. Maybe this is worse in
         | safari vs chrome/Firefox - we have not yet experimented much
         | with different browsers. Or maybe there's a difference between
         | x86_64 and arm/m1/m2.
        
         | dividedbyzero wrote:
         | As someone invited to a Jitsi meeting a while ago, not having
         | any video background removal, a lot less audio processing and
         | what looked like no video processing at all meant everyone was
         | harder to understand, harder to see and any activity or clutter
         | in the background was fully visible of course. I guess buying
         | quality microphones and cameras for everyone involved would
         | help. Detailed instructions are a good idea as well, I
         | struggled a bit with the unfamiliar interface.
         | 
         | Personally, I'd stick with the big names, long remote meetings
         | are strenuous enough even with all the quality of life features
         | those offer.
        
       | _joel wrote:
       | I prefer the sample with the noise. Seems clearer to understand
        
         | SergeAx wrote:
         | Would you prefer to listen this noise for half an hour? :)
        
           | mcluck wrote:
           | Or just have them mute and unmute at appropriate times. I do
           | this even in non-noisy environments
        
       | dsr_ wrote:
       | Assuming the demo samples aren't rigged, that's a very
       | substantial improvement.
        
       | hawski wrote:
       | Is there video conferencing software that does spatial audio for
       | conferences? What I have in mind is that it is often problematic
       | to understand each other while multiple people are talking. It is
       | much easier in person. I guess it all goes down to ability to
       | focus on directial cues of an audio source. Currently everyone
       | are placed inside one's head so they interfere much more this
       | way.
        
         | gnicholas wrote:
         | Apparently FaceTime offers this. [1] Presumably Apple will
         | allow other companies to do it as well, since they let them
         | offer spatial audio in other contexts.
         | 
         | 1: https://support.apple.com/guide/iphone/change-the-audio-
         | sett...
        
           | d110af5ccf wrote:
           | Why would Apple need to allow it? It's simply a matter of a
           | given program postprocessing the various audio streams
           | appropriately prior to muxing them for output.
        
         | rasz wrote:
         | You could give up on audio portion of your current Video
         | conferencing setup and just install Teamspeak with spatial
         | plugin
         | https://www.myteamspeak.com/addons/9ddfa0b2-25c2-4302-8a43-0...
        
       | tbalsam wrote:
       | Very very good, a little bit of stuttering during the honking I
       | think but I like it overall! :D :)
       | 
       | Jitsi Meet has been a great alternative to other meeting apps in
       | these crazy times.
        
         | Kwpolska wrote:
         | My experience with Jitsi Meet has been quite bad. My previous
         | employer was a cheapskate, and they self-hosted Jitsi Meet.
         | Random disconnections and instability were pretty much a daily
         | occurrence, some people were disconnected every few seconds.
         | While I suppose the self-hosting by Cheapskate Inc. was the
         | main culprit, Jitsi's screen sharing wasn't looking very good.
        
           | andrepd wrote:
           | So someone hosted $software on a shitty server, and you blame
           | $software for the shitty performance? To draw any conclusions
           | you should look at meet.jit.si (hosted by Jitsi), no?
        
           | saghul wrote:
           | We've made significant tweaks to screen-sharing in the past
           | 2-3 stable releases, in case you feel inclined to check us
           | out again :-)
        
           | spockz wrote:
           | Aside from consuming a ton of resources when screen sharing,
           | my experience with Jitsi meet has been very good. It consumes
           | two cores of my 5900X (1 for the Firefox process, and another
           | for some system process I don't recall exactly) but it works.
           | This was with sharing a 4K screen.
           | 
           | I have run jitsi on cheap VMs and it worked decently. But you
           | need quite some cores to serve all the traffic. Ultimately I
           | ended up having as many 2-4core VMs as I had concurrent
           | calls.
        
             | 2Gkashmiri wrote:
             | how is the meet.jit.si hosted? i assume with lots and lots
             | of random users, the bandwidth and processing costs to be
             | astronomical
        
           | troyvit wrote:
           | My last employers were cheapskates too (I love 'em for it)
           | and they just used meet.jit.si for calls. It was a lot more
           | stable than self-hosted jitsi. That said there were almost
           | always microphone or video issues using it, just because
           | people weren't used to it I guess. It made job interviews
           | fun. It was a nice live test to show how a potential employee
           | handled adversity.
        
           | TingPing wrote:
           | My company self-hosts an instance and it's excellent.
        
         | wrp wrote:
         | I've been using Jitsi Meet regularly for about a year. It's
         | usually fine, but on some days I experience disconnections
         | every several minutes.
        
         | shaan7 wrote:
         | Indeed! I recently used a locally hosted Jitsi to talk to my
         | family in the other room while in COVID isolation. It was a
         | life saver, and extremely easy to setup with docker-compose
         | with only a handful of steps that I could complete even with
         | fever+headache https://jitsi.github.io/handbook/docs/devops-
         | guide/devops-gu...
        
       ___________________________________________________________________
       (page generated 2022-10-01 23:00 UTC)