[HN Gopher] Enhanced noise suppression in Jitsi Meet ___________________________________________________________________ Enhanced noise suppression in Jitsi Meet Author : jlpcsl Score : 254 points Date : 2022-10-01 11:28 UTC (11 hours ago) (HTM) web link (jitsi.org) (TXT) w3m dump (jitsi.org) | quickthrower2 wrote: | Aside, Jitsi is pretty awesome for creating an video app idea | quickly. The API is very easy to use. | mcluck wrote: | It does seem to do a good job of eliminating noise but it seems | like it gets rid of a lot of the signal too. It's much easier to | understand the noisy sample than the processed one | josteink wrote: | I'm using RNNoise as a pipewire input filter on my Linux | machines, but that's very Linux-specific and a bit "hardcore" to | setup. | | Nice to see it getting integrated into video meeting solutions, | so more people can take advantage of this awesome library. | Doman wrote: | Awesome! Could you please elaborate how to do it or post some | good/not outdated links? | josteink wrote: | Don't remember exactly which guide I followed, but I used the | build from this repo, and the instructions looks plausible: | | https://github.com/werman/noise-suppression-for- | voice#pipewi... | asicsp wrote: | > _but that's very Linux-specific and a bit "hardcore" to | setup_ | | Have you tried https://github.com/noisetorch/NoiseTorch/? | nicolaslem wrote: | Or https://github.com/wwmm/easyeffects for noise reduction | and other effects like compression and EQ for a real crooner | voice in any video call application. | kevincox wrote: | Definitely recommend easyeffects over noisetorch. No root, | high quality GUI and can work automatically in startup. I | only use the noise suppression 99% of the time but having | the other effects available can also be fun. | pen2l wrote: | Created a few years ago by Jean-Marc Valin of xiph/mozilla (who | by the way is also the author of Opus codec among other things): | https://gitlab.xiph.org/xiph/rnnoise/ | | Overview of RNNoise from the horse's mouth is here: | https://jmvalin.ca/demo/rnnoise/ | | Used as a Wasm module! In some ways the web is becoming more | opaque. Is this the future then, a hodgepodge of binaries doing | things behind the scenes? Though in this case it happens to be | OSS, and it may well be a moot point -- backend is already a | blackbox to the enduser, now parts of frontend are blackboxes. | The practical implication is probably just that some measure of | customizability is gone. | saghul wrote: | What a weird take. | | How else would we have implemented this? WASM has facilitated | introducing these technologies into web applications, it | literally wasn't possible before. | | Thanks to emscripten it wasn't even that hard to get rnnoise | working on WASM: https://github.com/jitsi/rnnoise-wasm | | I concede WASM does open the possibility of adding opaque stuff | to web apps but IMHO the benefits outweigh the drawbacks at | this point. | pen2l wrote: | Oh no you're absolutely right, my general frustration was | ill-placed for this thread. Wasm is no doubt the right and | only way to have done this. | danuker wrote: | Short of reproducible builds, you can't even check that what | you're being served is, in fact, the OSS version. | naillo wrote: | I feel like I have about as much chance reading disassembled | wasm as I would have reading unminified javascript so I don't | think it changes much. | | (You could technically turn the wasm to JS and unminify that | too, which I doubt is much harder/easier to decipher as the | same thing written in JS and minified/unminified.) | sabjut wrote: | Is this a troll comment? Yes, wasm works based on a compiled | binary, just like any other program written in a compiled | language in the past 50 years. You try to suggest that everyday | users of the web are just going into the js sources of webpages | and understand whats going on. With the plethora of libraries, | frameworks and static optimization used in todays websites, | normal people can't really dissect the inner workings of a | website just by looking at the code. That's why we have tools | like request analyzers etc which all would still work with | compiled libraries. | | Compiled code has existed for half a century and we know how to | work with it. | | Suggesting that the web is doomed because people of the future | prefer rust instead of javascript is beyond any rationale. | salawat wrote: | ...I still dissect website code, thank you very much. | Basically have to do it just to figure out quirks I'm | constantly running into. | troyvit wrote: | They didn't suggest the web is doomed, just that more aspects | of it are opaque. I don't think they're talking about every | day users of the web either, but rather nascent developers. | | The early web was a great equalizer. Anybody could study a | little html, download an ftp manager, jump through a few | procedural hoops and have a web page. After some studying and | trial and error they could even build an interactive site.[1] | | It's easy to miss all the potential of wasm when that's what | you remember of the web. To me the amazing thing is that | browsers will still work with the methods described above[2] | but we're on the cusp of being able to do almost everything a | full application environment can do. | | That said, even though there will be plenty of OSS wasm tech, | it'll still be more opaque to those of us who don't do | compiled languages. It'll be a lot tougher to just fork the | code and do something more creative with it. | | [1] PHP used to stand for "Personal Home Page" and, as one of | its founders put it, was created so that "any idtiot" could | make an interactive site. | | [2] https://t.mkws.sh/58bytes/ | fragmede wrote: | JavaScript minifiers to obfuscate the code have been around | pretty much since the language got popular, so that version | of the web's been gone since about when Myspace lost to | Facebook. Places like Glitch.com is trying to bring that | back though. | maven29 wrote: | Are modern-day "no code" tools like Webflow not an | acceptable equivalent? | | We already lost any semblence of building from scratch in | the mid-2000s with the emergence of gargantuan HTML | templates and Wordpress/Drupal/PHPbb deployments with | plugins and themes. | | This is a direct result of people being held to higher | standards and thus spending a lot more effort overriding | the compositional and behaviour defaults of the user agent. | | The modern-day iteration just optimizes for scaling up to | tens of thousands of concurrent end-users on anemic | hardware. | | We have to accept the fact that personal webpages gave way | to social network profile pages. This didn't happen | overnight and there is zero demand for a hand-crafted | presence on the web anymore. | kragen wrote: | No, an environment for writing _new_ code is not any kind | of equivalent for the ability to reverse-engineer | _existing_ code. Firebug and its clones are a much closer | equivalent than anything like WebFLow. | rektide wrote: | Build from scratch is out of favor, but not necessarily | that far off. Folks like Github & Youtube have very | simple bottom-up webcomponent systems they use, rather | than top doen frameworks. Existing concerns about | bundling might be met by bundled http exchamges | (webpackage). | | I dont think "no code" is an aid. If anything it's | pushing in the opposite direction: rather than a | transparent approachable web medium, it suggests we need | hyperadvanced tools that we really wont understand or | have control over to synthesize web code. It's a simpler | user experience, but a push away from notepad.exe webdev. | | I wouldnt rush to make any conclusions about who or what | has won, as a settled fact & case for all time. We havent | had good ways to run online systems ourselves, versus | hosted for us, and there's still lightyears to go but | we're doing good things & finally maturing well. We're | only a couple years into ActivityPub as an interchange | format & growing many of the caoabilities & tools & | systems, around all mimds of use cases, that will make | throwong together a fair, interactabke competitive | offering possoble. Social media has had huge huge | investmemt poured into it, but we are in decent preteen | years of growing up & owning the libre equivalents. We | can assess demamd only after there is a visualizable | state people can imagine; just having an isolated blog is | not the equivalent to the well connected social media | site, but these capabilities slowly arise. Follow the | alpha geeks; this currently long phase will not be | forever. | Uehreka wrote: | Sure "everyday users" aren't clicking "View Source", but | that's not really what the issue is about. | | When I was a kid, every piece of software I used was pre- | compiled, and therefore opaque. This made it difficult for me | to figure out how people made certain things, and after a | while I lost interest in programming. | | When I got back into it later, one thing that made a huge | difference was being able to see how various cool JS sites | were built. The ability to "View Source" like that was | revolutionary, and also allowed me to build some early fun | projects, like a Cookie Clicker "AI" that could play the game | automatically by calling the functions I could see in the | game's source. | | I'm far from the only person with experiences like these. | Yes, there was programming before View Source and there will | be programming after. And for those of us with the right | tools or reverse engineering skills, View Source isn't | particularly relevant. What we're losing is a pipeline that | helped people become/stay interested in programming, which | makes it likely that future programmers who would've followed | a path like mine will do something else instead. | est31 wrote: | On the other hand, it's never been as easy to contribute to | OSS projects as it is now. Github has severely lowered the | requirements compared to earlier settings where you had to | get an e-mail client, configure it in just the right way, | etc. You have live coding youtubers, there are discord | communities for all types of technology, and knowledge | about programming and technology is extremely available | through Google, way more than it was 20 years ago. I think | young people still have tons of opportunities to start out. | SergeAx wrote: | Today's JavaScript "View source" is 90% useless because of | Webpack et al. The original program is effectively compiled | into obscure and obfuscated lowest-common-denominator JS. | Weatherweathe wrote: | Arent wasm modules still sandboxed? Reverse enginering binaries | should have around same complexity than reverse enginering | uglify js, not sure how they are more opaque | pen2l wrote: | You probably have a point but I'm thinking unuglified js code | (http://www.nice2predict.org/) is not as impenetrable as code | from reverse engineered wasm binaries? The element of | plausible deniability is more potent though for the nefarious | actor on the other side in the case of wasm binaries. | robalni wrote: | I don't think it makes much of a difference whether you can | read the code because even if you can read the javascript, it's | automatic so it can be different on the next request. If we | want to be able to trust the web, we have to get rid of the | automatic download and execution of arbitrary script code. | api wrote: | Most JavaScript these days is basically compiled binary. Rarely | is it very human readable. | elcomet wrote: | Wasm is about distribution of binaries, not about open source. | Those are two different subjects. | | When I install a program on my debian machine with apt-get, I | also get binaries. But this doesn't mean that it is opaque | right? | KMnO4 wrote: | I think we've been lulled into some false sense of expectation | that the web exists as a place for "open source code" to be | run. As if the fact that you can view the source of any page is | any purveyor of that. | | If that's your definition of transparency, then perhaps | learning to read assembly would give you the same comfort. In | fact, there's a lot more binaries distributed with symbols | intact than unminified JS. | | Or, to put it another way, if you could right click -> view | disassembly of any binary on your computer, how would that be | any different than today's web? | geiser wrote: | Sorry, but at least in my smartphone, I can understand better the | unprocessed audio showcased down in the Web page, than the noise- | suppresed audio. How is that? | CharlesW wrote: | The original audio is significantly easier to understand. This | may be technically interesting, but the noise suppression is | aggressive to the point that it's eating critical signal with | the noise. | SergeAx wrote: | This is the default for online conferencing. Everyone is way | better off asking other party to repeat couple of words than | listening for all that noise during the whole call. | ComputerGuru wrote: | > Everyone is way better off asking other party to repeat | couple of words than listening for all that noise during | the whole call. | | I didnt understand the first three words, for Alice it was | the next two, and for Bob it was the last four. How many | people are going to ask to repeat? | | Evolution taught us to understand over the sound of waves, | crickets, rain, thunder, and more. It didn't teach us to | comprehend with half the signals masked. | leni536 wrote: | But this might be better served with a simplistic voice | activity detection, like in mumble. | atty wrote: | Somewhat tangential, but at my work we have found WebEx's | background noise removal to be absolutely amazing. So many times | we've had someone in a meeting say "sorry about X/Y/Z, it's so | noisy", and the rest of us won't hear a thing. This sorta tech | has gotten so good, and is a really nice quality of life | improvement for remote work. (Or for meetings with people in | noisy offices of course) | naillo wrote: | Rare to find creative real time small-weight uses of ML but I | love when it's done and this has an impressive and well written | explanation with it as well. Great stuff. | haunter wrote: | This is one of the filters OBS use too (the other is Speex which | is obsolete to some extent) | eis wrote: | Bummer, reading the title I thought Jitsi had a new de-noiser | because they had RRNoise for some time. Unfortunately RRNoise has | not received much advancement for a couple years. It's by now | half a decade old tech. I've worked with the WASM version in the | past but it can be hit or miss. Sometimes it makes the audio you | want a bit weird. It also added something like 10% CPU usage and | in the end we disabled it again. | | I'd love to see some more state of the art solution that works | with WASM. Maybe even something that one could train on their own | voice and filter everything else would be awesome. Because all | the noise cancellation tech does not help if you sit in an | environment with other people talking next to you and the AI | doesn't filter it because it's voices. Sometimes coworkers use | Krisp but even that proprietary paid solution is so-so. | saghul wrote: | While we've had rnnoise integration for a while it was for | "noisy environment" notifications, this is the first time we | use it to actually filter audio. | | Also audio worklets weren't a thing when we first introduced | it. | | I'm not aware of any other open source (and better) models, but | if any come up, we'll certainly check them out! | pen2l wrote: | If you have any involvement with Jigasi or might be in the | know -- are there plans to use whisper, for instance, instead | of Google's API for transcription? If I recall correctly | jigasi is using google's API, local transcription aligns well | with the rest of Jitsi's missions. | nikvaes wrote: | The problem for Jigasi's speech-to-text feature with | Whisper - or any recent SOTA speech-to-text neural | networks, is that they are transformer-based. One of the | key features of transformers is that they are very good at | processing a sequence with the attention mechanism. But | attention inherently needs to see the whole input sequence. | So it's difficult to adapt these architectures to perform | well in real-time scenarios like captioning meetings. | pen2l wrote: | Yes! But a part of the Jitsi ecosystem enables recordings | and whisper is a good candidate to use for these recorded | sessions. | | On that topic -- they record sessions in an interesting | way, basically an instance of chrome and is started and | captured... I think with OBS. That always made me raise | an eye but I also can't think of up a better way. | saghul wrote: | We do have VOSK support already. I haven't heard of | whisper, but it does sound like a good GSoC project for | next year! | pen2l wrote: | If I have time I'll try to help you guys out. I'm a big | fan of what you're doing. :) | eis wrote: | Thanks for the clarification. We also experimented with audio | worklets + rrnoise about 1.5 years or so ago but had very | mixed results. The potential upside with processing in | another thread is clear but some browser and OS combinations | just didn't work well and resulted in micro stutters in the | audio. I remember Chromium on Linux for example being | finicky. Some browsers worked better with smaller buffers, | some needed bigger ones. We spent too much time debugging and | tuning for different systems and the audio quality | improvement was not deemed good enough so we shelved the | effort. I guess audio worklets improved since then and | probably is more useable by now. Do you guys have some kind | of performance monitoring for the noise cancellation or audio | in general? | | At the time I also spent a few days looking for something | better but didn't really find anything. Unfortunately RRNoise | is the best we have :( The only other noise cancellation | software that actually impressed me was the one from Nvidia | but that's not something that one could integrate via WASM | and of course wouldn't work on most devices anyways. | | Oh what a day it will be where we have energy efficient | hardware encoders for AV1 in every device plus some really | good noise cancellation. Oh and then we just need internet | connections without packetloss :P | [deleted] | gnicholas wrote: | Anyone have tips for using Jitsi? I've been thinking about moving | off Zoom now that they're enforcing a 40 min limit even for one- | on-one calls. | | Does it create friction for folks who haven't used it before? Any | suggested instructions to send with a meeting invite? | e12e wrote: | We've been using jitsi via zulip chat at work. It should be | drop-in for at least small groups (one-on-one, handful of | people - I have yet to investigate "conference" or "class room" | size). | | We do unfortunately see semi-regular lock-up/freezes where one | end of the stream stops for ~30 seconds. Maybe this is worse in | safari vs chrome/Firefox - we have not yet experimented much | with different browsers. Or maybe there's a difference between | x86_64 and arm/m1/m2. | dividedbyzero wrote: | As someone invited to a Jitsi meeting a while ago, not having | any video background removal, a lot less audio processing and | what looked like no video processing at all meant everyone was | harder to understand, harder to see and any activity or clutter | in the background was fully visible of course. I guess buying | quality microphones and cameras for everyone involved would | help. Detailed instructions are a good idea as well, I | struggled a bit with the unfamiliar interface. | | Personally, I'd stick with the big names, long remote meetings | are strenuous enough even with all the quality of life features | those offer. | _joel wrote: | I prefer the sample with the noise. Seems clearer to understand | SergeAx wrote: | Would you prefer to listen this noise for half an hour? :) | mcluck wrote: | Or just have them mute and unmute at appropriate times. I do | this even in non-noisy environments | dsr_ wrote: | Assuming the demo samples aren't rigged, that's a very | substantial improvement. | hawski wrote: | Is there video conferencing software that does spatial audio for | conferences? What I have in mind is that it is often problematic | to understand each other while multiple people are talking. It is | much easier in person. I guess it all goes down to ability to | focus on directial cues of an audio source. Currently everyone | are placed inside one's head so they interfere much more this | way. | gnicholas wrote: | Apparently FaceTime offers this. [1] Presumably Apple will | allow other companies to do it as well, since they let them | offer spatial audio in other contexts. | | 1: https://support.apple.com/guide/iphone/change-the-audio- | sett... | d110af5ccf wrote: | Why would Apple need to allow it? It's simply a matter of a | given program postprocessing the various audio streams | appropriately prior to muxing them for output. | rasz wrote: | You could give up on audio portion of your current Video | conferencing setup and just install Teamspeak with spatial | plugin | https://www.myteamspeak.com/addons/9ddfa0b2-25c2-4302-8a43-0... | tbalsam wrote: | Very very good, a little bit of stuttering during the honking I | think but I like it overall! :D :) | | Jitsi Meet has been a great alternative to other meeting apps in | these crazy times. | Kwpolska wrote: | My experience with Jitsi Meet has been quite bad. My previous | employer was a cheapskate, and they self-hosted Jitsi Meet. | Random disconnections and instability were pretty much a daily | occurrence, some people were disconnected every few seconds. | While I suppose the self-hosting by Cheapskate Inc. was the | main culprit, Jitsi's screen sharing wasn't looking very good. | andrepd wrote: | So someone hosted $software on a shitty server, and you blame | $software for the shitty performance? To draw any conclusions | you should look at meet.jit.si (hosted by Jitsi), no? | saghul wrote: | We've made significant tweaks to screen-sharing in the past | 2-3 stable releases, in case you feel inclined to check us | out again :-) | spockz wrote: | Aside from consuming a ton of resources when screen sharing, | my experience with Jitsi meet has been very good. It consumes | two cores of my 5900X (1 for the Firefox process, and another | for some system process I don't recall exactly) but it works. | This was with sharing a 4K screen. | | I have run jitsi on cheap VMs and it worked decently. But you | need quite some cores to serve all the traffic. Ultimately I | ended up having as many 2-4core VMs as I had concurrent | calls. | 2Gkashmiri wrote: | how is the meet.jit.si hosted? i assume with lots and lots | of random users, the bandwidth and processing costs to be | astronomical | troyvit wrote: | My last employers were cheapskates too (I love 'em for it) | and they just used meet.jit.si for calls. It was a lot more | stable than self-hosted jitsi. That said there were almost | always microphone or video issues using it, just because | people weren't used to it I guess. It made job interviews | fun. It was a nice live test to show how a potential employee | handled adversity. | TingPing wrote: | My company self-hosts an instance and it's excellent. | wrp wrote: | I've been using Jitsi Meet regularly for about a year. It's | usually fine, but on some days I experience disconnections | every several minutes. | shaan7 wrote: | Indeed! I recently used a locally hosted Jitsi to talk to my | family in the other room while in COVID isolation. It was a | life saver, and extremely easy to setup with docker-compose | with only a handful of steps that I could complete even with | fever+headache https://jitsi.github.io/handbook/docs/devops- | guide/devops-gu... ___________________________________________________________________ (page generated 2022-10-01 23:00 UTC)