[HN Gopher] How to build large-scale end-to-end encrypted group ... ___________________________________________________________________ How to build large-scale end-to-end encrypted group video calls Author : jiripospisil Score : 147 points Date : 2021-12-15 20:06 UTC (2 hours ago) (HTM) web link (signal.org) (TXT) w3m dump (signal.org) | johnisgood wrote: | Great, now they should just stop using telephone numbers as | identifiers. | maxwell wrote: | What do you suggest? | sam_lowry_ wrote: | A login+password, like in IRC. | tptacek wrote: | IRC tracks metadata serverside! | johnisgood wrote: | I do not think that OP was referring to implementing it | the same, or even similar way, but to use a | username/password pair. OP is free to correct me if I am | wrong though. | zamadatix wrote: | Signal has had standard usernames on the roadmap for years. | johnisgood wrote: | Usernames work. You could even use UUIDs these days as QR is | an increasingly common way of sharing data. But yeah, | usernames would be a great improvement. | tptacek wrote: | Usernames _do not just work_. The Signal team is not | unaware of usernames and Signal is not a weird scheme to | get all your phone numbers. The difference between Signal | and systems that use usernames (or email addresses) is that | Signal deliberately doesn 't operate a serverside directory | or buddy list service. By contrast, other relatively | popular messengers essentially keep a plaintext database of | who talks to who on their service. | | What phone numbers allow Signal to do is to piggyback off | the contact lists people already have on their devices. | kitkat_new wrote: | > is that Signal deliberately doesn't operate a | serverside directory or buddy list service. | | how do people again discover each other on Signal most of | the time? | | Anyways, nothing prevents Signal form creating it's own | contact list within the app, perhaps bootstrapped from | the existing one | tptacek wrote: | They can do that, but then when you switch devices, you | lose your contact list. That's not what happens with the | built-in contact list. | | This issue has been rehashed dozens of times on HN before | (use the search bar below) and has basically nothing to | do with the article. | kitkat_new wrote: | actually, the contact list could include the signal | identifiers | stormbrew wrote: | I mean, phone numbers also don't really "work." Do you | know how many old phone numbers I have in my phone's | contact list that aren't actually owned by the person | they're listed on anymore? Using signal I get "Person You | Knew 10 Years Ago Is On Signal!" notifications every now | and then and.. yeah I can assure you that's not them. | | For example, I have literally 6 phone numbers in my phone | for my sister because every time she job hops she ends up | with a new number. I'm not even sure which one is | actually her. | | Phone numbers are not permanent identities, any more than | usernames or email addresses are. There's no single | perfect answer to identity online and if there is, I'm | sorry, it's not a number that can be changed, stolen, | lost, etc. | [deleted] | remus wrote: | I don't know what their threat model is but it's interesting that | they don't seem too bothered about reducing meta data collection | potential on the server. I bet you could put together some pretty | interesting graphs of who is talking to who, how much they talk | and when. | tptacek wrote: | Their messaging substrate is Signal itself, for whatever that's | worth, so at least the signaling component of the system should | inherit the guarantees Signal already makes. But it's a good | question. | Naac wrote: | >> There is no off the shelf software that would allow us to | support calls of that size while ensuring that all communication | is end-to-end encrypted, so we built our own open source Signal | Calling Service to do the job | | But wasn't there Jitsi? [0] | | I think its great we have competition among Free Software | projects so that both can improve. But sometimes I feel like | maybe duplicated efforts create two 5/10 solutions. Instead what | we really want is one 8/10 solution, or better. | | [0] https://meet.jit.si/ | estaseuropano wrote: | While I love jitsi, i don't think it is E2E? | dest wrote: | AFAIK it's E2E for 1:1 video chats, but not when more are | there. | bilal4hmed wrote: | Jitsi does support e2ee for groups as well | https://jitsi.org/e2ee-in-jitsi/ | [deleted] | Naac wrote: | AFAIK this _was_ a work in progress[0]. I am not sure what | the status of this is now. | | [0] https://jitsi.org/blog/e2ee/ | jkepler wrote: | I think Jitsi group calls can be end to end encrypted, | provided all participants use Chromium 83, per | https://jitsi.org/security/. | Vinnl wrote: | It's the first of the links where they say "When building | support for group calls, we evaluated many open source SFUs", | so I suppose it's either not one of the two with "adequate | congestion control", or is the one that did not reliably scale | past 8 participants? | landstrom wrote: | Daily.co has a developer friendly offering that accomplishes | this as well. Many offerings available and many reasons to not | take on this added complexity. | jcelerier wrote: | As much as I like Jitsi conceptually, it has consistently | performed much more poorly than Zoom starting from 5/6 ppl | skybrian wrote: | There is some duplication of effort but sometimes progress | happens via rewrites and that might actually be a faster way to | an 8/10 system than direct collaboration? | | Also I think it's interesting to see how this builds on | Google's work (the googcc algorithm). Which of course builds on | previous open source work. The underlying technical | collaboration happens even with quite different organizational | goals and different codebases. | [deleted] | johnisgood wrote: | There is also https://jami.net/. I have no clue how group video | calls are implemented though. It seems like it is not an easy | thing to do. | | https://wire.com/en/ seems to support it, too, although not | exactly "large-scale". Audio calls allow for up to 100 | participants, for one. | 1vuio0pswjnm7 wrote: | "Full mesh: Each call participant sends its media (audio and | video) directly to each other call participant. This works for | very small calls, but does not scale to many participants. Most | people just don't have an Internet connection fast enough to send | 40 copies of their video at the same time. | | Server mixing: Each call participant sends its media to a server. | The server "mixes" the media together and sends it to each | participant. This works with many participants, but is not | compatible with end-to-end encryption because it requires that | the server be able to view and alter the media. | | Selective Forwarding: Each participant sends its media to a | server. The server "forwards" the media to other participants | without viewing or altering it. This works with many | participants, and is compatible with end-to-end-encryption." | | Imagine an end user who is interested in "very small calls" with | friends and family. She is not interested in communicating to an | infinitely large audience ("broadcasting"). She never has group | calls on Signal with 40 people. We have to use our imagination | because this user does not actually exist. | | The imaginary user reads this blog post and she thinks to herself | "Full mesh sounds like the best design. There is less/no reliance | on a third party, traffic does not need to be sent to a third | party server." With full mesh, there is no need to mention the | caveat "without viewing or altering it" (or selectively choosing | not to forward it to certain recipients). Full mesh seems to give | the user the most control and require the least dependence on | third party servers (not necessarily none, but the least). | | Then she reads this line: "Because Signal must have end-to-end | encryption _and scale to many participants_ , we use selective | forwarding." | | The make-believe user wonders "Why must Signal scale to many | particpants." For this user, "scal[ing] to a many participants" | appears to be an artificial constraint. She has no such need. | "Perhaps Signal is not designed for users like me. Maybe Signal | is trying to compete with Facebook, TikTok, Zoom, etc. Signal is | supposedly non-commercial and should be free from such pressures | to compete. Does this mean that if I make a call to two people, | the traffic has to be sent to third party servers so they can | "forward" the audio/video the appropriate recipients." | | "Why can't I be the one to choose at run-time whether full mesh | or selective forwarding is used." | | Finally she comes to her senses. "This blog post was not written | for me. It seems to be a form of show and tell by the people | working at Signal not an birectional dialogue with Signal users." | prophesi wrote: | Just an FYI full mesh would still require communicating with a | third-party server, at the very least for initial networking | when joining/leaving a group call. | | The whole point of E2E encryption is so that passing data | through a third party shouldn't matter in the first place. | | And lastly, even when you have just a 1:1 video chat, sending | and receiving full resolution/quality multimedia can still be | way too much for some peoples' internet connections. UX is | extremely important for Signal, as unreliable video chat is a | surefire way for those less caring about privacy to hop back | over to a privacy-violating alternative. | | I feel sorry for those working on bringing security/privacy to | everyone, as they have to appease power users and privacy | absolutists, along with one's grandmother and the TikTok | generation. | sneak wrote: | They have the bandwidth for relaying video streams to 40 people | but won't let me send full res jpegs in 1:1 messages? | | And no, I can't just rebuild my client, because I'm on iOS and | non-official builds won't receive push notifications from the | official developers. | Vinnl wrote: | That's not really related to this article, but I can select | photo quality if I send a photo on Android. Appears to have | been added in May. | sneak wrote: | The article specifically mentions that they operate the | infrastructure for relaying encrypted video streams for up to | 40 participants. | | I can also select media quality on iOS. My options are | "compressed way too much" and "compressed too much". I assume | you have the same options. | | I would like to be able to attach images as files and have | them come though unmodified. It is a general purpose | communications tool, it should not be editorializing over my | attachments. | | I use Signal to communicate privately with my attorney. Why | does anyone think tampering with evidence in transit is okay? | | Apple also doesn't support open source in the App Store, so I | can't fix the problem myself. | wyager wrote: | How does signal get money to cover costs of running compute- | intensive services? | sandstrom wrote: | They recently added support for in-app donations: | https://www.theverge.com/2021/12/2/22814934/signal-launches-... | | I hope they'll take it a step further and require payment for | certain functionality (maybe video calls?, or desktop client | support?). | keewee7 wrote: | One of the the WhatsApp founders, Brian Acton, donated $100 | million to them as an unsecured loan due to be repaid in 2068: | | https://en.wikipedia.org/wiki/Brian_Acton#Signal | | https://en.wikipedia.org/wiki/Signal_(software)#Developers_a... | sorenjan wrote: | How long does that last? Telegram uses a few hundred million | dollars each year, although they are significantly larger. | | > As Telegram approaches 500 million active users, many of | you are asking the question - who is going to pay to support | this growth? After all, more users mean more expenses for | traffic and servers. A project of our size needs at least a | few hundred million dollars per year to keep going. | | https://t.me/durov/142 | new_stranger wrote: | > needs at least a few hundred million dollars per year to | keep going | | I'm pretty sure that is not server cost. This is probably | the standard approach of companies hiring tons of personal | and spending tens of thousands or hundreds of thousands on | ads every single day. | benlivengood wrote: | To scale to thousands (is this even useful?) of e2e users build a | tree of participants who can remix each other's video. | | Pick a handy mixing ratio like 4:1 or 9:1 (a square helps, since | they compose nicely if downscaled to a grid vs. active talker | stays fullscreen) and nodes with the highest bandwidth and lowest | latency take M-1 streams and add it to their own to make an M:1 | mix which can be forwarded to a node closer to the root which | produces another M:1 stream, and the root sends a single mixed | stream down the tree until every participant has the mix. Max | bandwidth at each node is M down and M up. Minimal spanning tree | with max M edges per node recomputed as participants leave and | join. Build 3 or 4 distinct trees and leave the connections open | for more rapid switching if intermediate nodes stop | participating. | JoeAltmaier wrote: | Oh this all brings back memories, of Sococo in the 2000's. We | faced all these problems and had similar solutions to them all. | | We even had a rapidly adapting network make-and-break recovery | layer. You unplug your laptop from a wired connection, switch to | wireless - we recovered in milliseconds. You heard barely a | click. | | The encryption issue is fun - we had a rotate-key message in- | band. The receiver loaded new keys and tried them in sequence to | ease the turnover time - out-of-order packets etc could make it | ambiguous for a short while which key to use. A cache and aging | keys out made it work pretty well. | | Remixing on user stations proved to be problematic (mentioned | elsewhere on this thread). You'd think if 6 people at one site | were conferencing with a dozen elsewhere, you could elect one at | each site to mix-and-forward. But corporate networks made it hard | to determine who was 'adjacent' - they were often layered and | without uPNP (is that what the router protocol is called?) you | couldn't tell if somebody at the next desk was even in your | company. | | We had up to 100 people in a conference, and our enter-the- | conference time was on the order of 100ms. Click into an all- | hands, and be able to hear everybody before you finger left the | mouse button. It was wonderful. | | Sococo today is a sad shadow of that. They went open-source and | lost all our IP instantly. Just another WebRTC client last I | knew. | narush wrote: | > They went open-source and lost all our IP instantly. | | Can you explain what this means? Like - other people copied | your work? | | Genuinely wondering, OSS noob here... | JoeAltmaier wrote: | There was little or nothing in WebRTC to match what we'd | spend 5 years creating. So they were back to 1-5 people in a | conference, with 1-3 second connect times, and no resilience | to network changes. | | The excuse they gave was "We can't rely on 6 people in Iowa | for our core IP". So they switched to some open source mix | node that was the pet project of 2 guys in Italy. Two | academics, who gave it hardly any attention. And it had zero | IP; just a collection of APIs stitched together to give you | the impression of having a mix node. | | We said all that at the time. But such was the power of the | magic words "Open Source" that it all bounced off their | mental shields. | BitPirate wrote: | Are there any plans to add VP9 support? | kitkat_new wrote: | Next step: decentralizing encrypted group calls [0] | | [0]: https://2021.commcon.xyz/talks/extending-matrix-s-e2ee- | calls... ___________________________________________________________________ (page generated 2021-12-15 23:00 UTC)