WebRTC
WebRTC is a peer-to-peer, secure communication protocol built on top of UDP. It supports video, voice, and generic data to sent between peers without requiring any plugins or third-party software.
Last updated
Was this helpful?
WebRTC is a peer-to-peer, secure communication protocol built on top of UDP. It supports video, voice, and generic data to sent between peers without requiring any plugins or third-party software.
Last updated
Was this helpful?
One of the significant milestones in the video conferencing area was the release of WebRTC. This open-source project provides web browsers and mobile applications with real-time communication (RTC) capabilities.
WebRTC powers many well-known video conferencing software like Google Duo, Facebook Messenger, Microsoft Teams.
WebRTC’s protocol stack includes UDP, ICE, STUN, TURN, DTLS, SRTP and SCTP.
ICE, STUN and TURN are necessary to establish and maintain a peer-to-peer connection over UDP. DTLS is used to secure all data transfers between peers, as encryption is a mandatory feature of WebRTC. Finally, SCTP and SRTP are the application protocols used to multiplex the different streams, provide congestion and flow control, and provide partially reliable delivery and other additional services on top of UDP.
WebRTC is a peer-to-peer protocol. This means that every participant in a conference maintains connection with every other participant creating a mesh topology. This kind of architecture doesn’t scale well beyond 10 or 12 participants. This is the reason why many of the WebRTC powered apps have a limitation on the number of participants.
By default, WebRTC connections form a Mesh topology since it is a peer-to-peer protocol. To scale it better, we use either an SFU or an MCU based architecture. Both of these form a star topology.
To scale WebRTC to thousands or even more participants in a conference, an SFU (Selective Forwarding Unit) or an MCU (Multipoint Control Unit) is used.
In an SFU based architecture, every participant in the conference sends his video/audio to the SFU, and the SFU forwards that video to other participants in the call. SFU selectively forwards the right stream to other participants depending on their bandwidth or display.
MCU is a more sophisticated media server than SFU. It mixes the audio/video sent by each participant and transmits it to the other participants over a single stream. MCU works well for low-end mobile devices, but it requires more processing power on the server and is usually costlier than an SFU. This is the reason why SFU powers most real-world WebRTC deployments.
Sooner or later, you’re going to run into the capacity limits of the machine that’s running the SFU. It could be the CPU or bandwidth (Mbps).
Calls connecting from multiple geographies to an SFU hosted in some remote zone.
High latency. Expensive intern-regional connections
Two or more SFUs that are interconnected in such a way that one conference can span multiple SFUs.
Participants can join any one of the SFUs in the cascade and seamlessly interact with all the other participants in the conference, regardless of whether they are on the same SFU or not.
Cascade can be used to create a conference that dynamically grows to virtually any size as participants join.
You can distribute the participants to the available SFUs.
If the available SFUs get overloaded as well, then you can, on the fly, cascade-in another SFU that has excess capacity, and have the new participants join this SFU.
Use some location-based routing algorithm to connect users to the SFU closest to them.
When a request is made to join the same conference on two separate SFU clusters, you can, on-the-fly, cascade them and create one conference that spans both geographies.
The local SFU cluster will forward only one copy of the video to the remote SFU cluster which then forwards it to all its locally connected participants.
The error correction mechanisms are localized to the local clusters.
Following are some of the most popular open-source WebRTC gateways.
Janus: Written in C. Supports recording and streaming. Can manage around 250 participants in a conference room.
Jitsi: Written in Java. It uses XMPP for signalling.
Mediasoup: Written in Node.js. User libwebrtc c++ library under the hood.
Pion: WebRTC implementation in Golang. The Pion project also has an SFU called ion that supports SFU-SFU relay and multi-datacenter deployment.