Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Recent advances in the field of Computer Vision has led to a multitude of applications in the field of facial/fingerprint/voice recognition. Deep neural networks based on open source libraries like Pytorch and TensorFlow can be used to develop highly accurate facial recognition systems. This can be used for adding an extra layer of authentication when customers join a meeting room.
Advances in Natural language processing have made implementation of real-time language translation a reality. State of the art open-source Neural Machine Translation (NMT) systems are able to understand human language and translate them into multiple languages. Open source solutions are available for . Advances have also been made in the field of real-time translation of which can help people with hearing challenges.
Efficient communication requires virtual meetings to happen without distractions. Distractions can be in the form of audio or video like background audio noise or visual background of the participant. Visual backgrounds can be removed real-time using deployed on the edge devices using TensorFlowJS. Noise is a form of audio distraction, and machine learning algorithms can help in real-time removal of stationary and non-stationary audio noise.
AI can be used to incorporate a whole host of features to bring delight to end-users. These include (not limited to),
Automatically capturing minutes & highlights of the meeting in the form of audio, video and text for records and sharing.
Chatbot support to provide minutes of the meeting for absentee participants.
WebRTC is a peer-to-peer, secure communication protocol built on top of UDP. It supports video, voice, and generic data to sent between peers without requiring any plugins or third-party software.
One of the significant milestones in the video conferencing area was the release of WebRTC. This open-source project provides web browsers and mobile applications with real-time communication (RTC) capabilities.
WebRTC powers many well-known video conferencing software like Google Duo, Facebook Messenger, Microsoft Teams.
WebRTC’s protocol stack includes UDP, ICE, STUN, TURN, DTLS, SRTP and SCTP.
ICE, STUN and TURN are necessary to establish and maintain a peer-to-peer connection over UDP. DTLS is used to secure all data transfers between peers, as encryption is a mandatory feature of WebRTC. Finally, SCTP and SRTP are the application protocols used to multiplex the different streams, provide congestion and flow control, and provide partially reliable delivery and other additional services on top of UDP.
WebRTC is a peer-to-peer protocol. This means that every participant in a conference maintains connection with every other participant creating a mesh topology. This kind of architecture doesn’t scale well beyond 10 or 12 participants. This is the reason why many of the WebRTC powered apps have a limitation on the number of participants.
By default, WebRTC connections form a Mesh topology since it is a peer-to-peer protocol. To scale it better, we use either an SFU or an MCU based architecture. Both of these form a star topology.
To scale WebRTC to thousands or even more participants in a conference, an SFU (Selective Forwarding Unit) or an MCU (Multipoint Control Unit) is used.
In an SFU based architecture, every participant in the conference sends his video/audio to the SFU, and the SFU forwards that video to other participants in the call. SFU selectively forwards the right stream to other participants depending on their bandwidth or display.
MCU is a more sophisticated media server than SFU. It mixes the audio/video sent by each participant and transmits it to the other participants over a single stream. MCU works well for low-end mobile devices, but it requires more processing power on the server and is usually costlier than an SFU. This is the reason why SFU powers most real-world WebRTC deployments.
Sooner or later, you’re going to run into the capacity limits of the machine that’s running the SFU. It could be the CPU or bandwidth (Mbps).
Calls connecting from multiple geographies to an SFU hosted in some remote zone.
High latency. Expensive intern-regional connections
Two or more SFUs that are interconnected in such a way that one conference can span multiple SFUs.
Participants can join any one of the SFUs in the cascade and seamlessly interact with all the other participants in the conference, regardless of whether they are on the same SFU or not.
Cascade can be used to create a conference that dynamically grows to virtually any size as participants join.
Use some location-based routing algorithm to connect users to the SFU closest to them.
When a request is made to join the same conference on two separate SFU clusters, you can, on-the-fly, cascade them and create one conference that spans both geographies.
The local SFU cluster will forward only one copy of the video to the remote SFU cluster which then forwards it to all its locally connected participants.
Following are some of the most popular open-source WebRTC gateways.
Janus: Written in C. Supports recording and streaming. Can manage around 250 participants in a conference room.
Jitsi: Written in Java. It uses XMPP for signalling.
Mediasoup: Written in Node.js. User libwebrtc c++ library under the hood.
These are pillars of our business, which will create a base for everything that we do and keep us motivated throughout the journey.
To make remote communication simpler (User Experience), secure (Technology) and unified (Innovative). Simpler - Keep improving the experience. Secure - Keep building more secure technologies. Unified - Keep innovating different ways to unify the communication.
To build the world's most secure, diverse and delightful communication systems. Secure - There should not be any loop hole in the security. (Technology) Diverse - Product and technology should cater diverse set of users and scenarios. (Innovation) Delightful - We will always delight our users in all his actions throughout the journeys. (Experience)
Become a market leader in remote communication. Tap existing and other un-tappable segments of the market.
Humans at centre of everything - Human should centre of everything, humanise things as we are humans. Customer first approach - Listen to the customer and put customers above everything else. Ethical in all ways - Follow ethics in everything. Transparent in nature - Be transparent with everyone in everything. Data driven - Data is the reflection of whatever we do. Analyse, Empathise, Improvise.
If the available SFUs get overloaded as well, then you can, on the fly, cascade-in another SFU that has excess capacity, and have the new participants join this SFU.
Golang: Server side programming language Go is widely used on the server side to build highly concurrent and scalable applications. Our team has a track record of developing scalable products using Go
WebRTC: For Real-time video and audio communication WebRTC is a free and open source framework that enables Secure Real-Time communications (RTC) over web browsers and mobile applications. It is developed by Google. Google Duo, Facebook messenger, and numerous other video conferencing apps are built using WebRTC.
WebSocket: We’re using WebSocket to enable the initial handshake before a video call starts and also to send and receive chat messages between all the participants in the conference.
Consul: Consul is used for service discovery and load balancing between our backend server clusters.
Postgres: To persist user’s and room’s data
Redis: Redis is used to cache the currently active room’s and user’s information.
HAProxy: For load balancing requests to the Gateway server
GRPC: GRPC is used for Inter-service communication.
GStreamer/FFMpeg: For Audio/Video mixing, processing, and transcoding
HLS: For Live streaming
React: React is our choice of framework for building the web client
Flutter/Electron: For building the desktop client.
Java (Android)
Swift (iOs)
AI/ML frameworks: Tensorflow, TensorflowJS, Pytorch
Cyber Security & Cryptology Expert - Lecturer @ Nanyang Technological University, Visiting Lecturer -Indian Statistical Institute
I am a Researcher and Teacher, positioned precariously at the confluence of Computer Science, Mathematics, and Engineering. While my research interest revolves around the "science of security" (cryptology, cybersecurity), I love to teach the "art of analytics" (data science, machine learning).
I am passionate about expressing the hardest of topics in the simplest possible ways, to students and practitioners. My current research, teaching and consultancy interests dwell in the exciting domain of Cryptocurrencies, Blockchain Technology, as well as in applications of Machine Learning in Security.
Lead Product Engineer @ GOJEK, ex-Directi, Yatra, Infosys
Full stack developer with experience in building intelligent & scalable applications using Java/Spring, Golang, Javascript/Node.js, React, and Python/Django.
I love distributed systems and the challenges associated with them. I like writing, teaching and mentoring other engineers.
Here is the passionate team who is making things happen.
Made with ❤️in India 🇮🇳
Facecomm is Made in India 🇮🇳 Video calling and conferencing tool, keeping user privacy and security on top. A product and platform to provide end-to-end social and enterprise communication solution. Facecomm is build for Indian user base, keeping the diversity and challenges in mind.
We are bunch of passionate experienced folks from different domains came together to build Made in India social and enterprise communication platform for everyone. Know more about our team here
Adaptivity of communication will be inbuilt in our product through design and technology. The product will be able to adjust the behaviour based on the user bandwidth connection and connectivity. Suppose a user is running the application on an extremely internet connection; our product intelligence will recognise it and adapt the interface and modes to provide seamless communication experience.
Facecomm comprise of exciting features ranging from basic communication to complex enterprise communication and collaboration. Few of the product features listed below;
🎥 Video Call 🔊 Voice Call 💬 Chat 📃 File Sharing 🖥️ Screen Sharing 🎞️ Audio/Video Call Recording 🈁 Close Captioning (Multilingual) 🔄 Speech to Text Transcript 📋 Meeting Notes 📄 Automated MOM 🚪 Open/Closed Rooms ⏰ Schedule Meeting 📆 Calendar Integration 📋 There is a long list of features
Data Scientist @ Myntra | PGDBA IIM Calcutta | Electrical Engineering IIT Bombay
I am a Machine learning enthusiast who loves to build AI products for real life use-cases. My areas of research lie in Deep learning for Natural Language Processing and Computer Vision.
Optimized for low-bandwidth
End-to-end encryption by default
Based on phone numbers allowing users to call someone from their contact list
Automatically switches between WI-FI and cellular networks
A "Knock Knock" feature lets users see a live preview of the caller before answering.
It lets users worldwide make audio-only calls.
Optimization is further achieved through the degradation of video quality through monitoring network quality
Users can leave messages up to 30 seconds long for contacts who are unavailable. These messages can then be viewed by the other party, with the option of calling afterward
Maximum 12 participants allowed currently.
WebRTC
QUIC over UDP
WaveNetEQ (For packet loss concealment), a generative model based on DeepMind/Google AI’s WaveRNN
Fun masks, effects, and filters on top of the video
User limit of 50
Facebook Live with Co-broadcasting
VR Chat – video calls in Oculus
WebRTC
MQTT for Signalling
Microsoft Teams, both for regular browsers and their Teams App relies on WebRTC for audio and video communications.
Senior Web UI Engineer @ Revolut, London, ex-Swiggy, Flipkart, Tapzo
Hello, I'm Param Singh, Javascript Developer based out of Bangalore, India. I like creating awesome User Interfaces making out the most of the modern web trends. Being a JavaScript zealot, I like exploring new frameworks an libraries in the Web Development mundane. Besides this, I love to listen EDM music, trying out contemporary cuisines and cuddling dogs.
Product Engineer @ GOJEK, ex-Yatra, Infosys
Backend developer, functional programming enthusiast, experienced in building scalable softwares that solves real world problem. Experienced in Golang, Scala, Haskell, Java/Spring, Javascript/Node.js, and React. I am an haemophiliac and take great pleasure in learning and creating softwares that has an impact on individual.
Product Manager @ Automation Anywhere, ex - Apttus, LocalOye, Allscripts
Tec-savvy and versatile business analyst and solution architect offering 12+ years of success leading all phases of diverse technology projects; Degrees in Computer Application; and years of computer programming experience in health care, RPA and middle office industries.
Business strategist; plan and manage projects aligning business goals with technology solutions to drive process improvements, competitive advantage and bottom-line gains.
Software Development Management: experience enough to manage the group of talent in Mobile, Business Intelligence, RPA, Web, CRM
Many covered/uncovered sectors and use cases exist where our product solution can bridge the gap of communication. Few of the potential industry and business use cases are described below;
Education sector is one of the biggest sectors which get impacted by the pandemic of COVID 19. Our solution can provide a sustainable long term solution to the education sector ranging from school to colleges to higher education institutions, by remote connectivity.
Remotely attending classes, lectures, sharing study materials, scheduling time tables, creating notes, and cohesive communication & collaboration can get benefits through our product.
In the longer run, it could provide a solution for conducting remote examination and publishing results online.
Our solution can provide seamless communication and interchange of information between patients and doctors, doctors to doctors and connect small health organisations to bigger health institutions.
A patient could directly connect with the doctor by sitting at home, and doctors can see the symptoms and prescribe the medicines remotely.
Doctors and expertise can exchange information without any barriers and save millions of lives. In tough situations, medical experts can examine the critical patients remotely and help other medical practitioners to perform any minor or major operations remotely.
The entertainment industry got poorly impacted by the pandemic of Corona, where performers, artists and creative personas can not physically perform in front of an audience or in theatres.
Our platform could help these performers connect with millions of the audience remotely through the broadcast feature. Individuals can connect with the artist on our platform. Performers can come together and perform remotely either to create the revenue or to generate donations.
Banking is one of the most affected sectors by the pandemic. In banking, many of the activities and verification procedures happen on the ground. Many financial institutions are already using technology to reduce physical interaction and friction.
Our product could help banks to connect users more efficiently without going on the ground. Banks can do remote verifications of users for their financial services and provide loan and financial help to the needful. Our product could also help financial institutions to create better customer support and experience.
Security is inbuilt and the core of our product; government, can use our product for public security and safety. A scenario could be where our system can help monitor and identify public activity in high alert areas For surveillance. Similarly, our product could be used in collaboration with governments to monitor suspected areas and track criminal activities to avoid worst-case scenarios.
The agriculture sector is another big sector which could leverage our platform to connect with others remotely. Farmers can directly approach to the local authorities for any help, to get informed about new government policies and announcements.
Our Prime Minister and maybe ministers could be remotely connected to individual farmers through broadcasting.
In villages, our product could organise remote panchayats, where local authorities and government can remotely connect with panchayats and villages.
Judiciary is another sector which gets impacted by the pandemic, where standard judicial procedure got disturbed as public gathering and movement is prohibited. Our product could be used in our judicial system to Fast Track the judicial process in the remote conditions as well.
To manage global operations efficiently, manufacturing companies need real-time access to their worldwide resource and supplier base. Our video conferencing systems can enable real-time virtual product reviews and product development status meetings with offshore plants over video, for a dramatic time savings over shipping products for in-person feedback. From procurement to engineering to final product review, our system can improves manufacturing efficiency every step of the way.
With remote working becoming the new normal, companies want to hire global talent from other parts of the world. A video conferencing solution having integration with recruitment products like Lever, Recruiterbox, etc can go a long way in shaping the way companies hire talent.
With remote-first world, the gap between work & educational/ professional services will thin out. We are prepared for this and we will add features for services outside a technology first work environment. Services like Learning language classes, Musical Instruments, CA Services, Yoga, Zumba, Home Exercise can all be hosted within our application. We will add voting, attendance, reviews/ratings and other ancillary features to support this.
As you scale a remote friendly organisation, all social events can easily reach to a global audience. You can take advantage of Webinars and Live streaming features of our product to broadcast communications and special events. You can promote new launches, news, and activities. You can provide executive updates and do remote contractor and vendor meetings over video conferencing.
Our product can enhance the way Sales professionals do remote customer meetings, partner conferences, business reviews, sales pitch, etc. Our Meeting Minutes of the Meeting feature can be used to document customer meetings, and customer interviews. We can integrate with Sales CRM products like Salesforce, FreshWorks, and others to provide a seamless experience for Sales professionals. We also plan to deeply enhance calendar to work for sales professionals that travel, within our tool so they are always at their best game.
Remote governance is the future. State and Local government and various government bodies like Panchayat, Municipality, City council, Wards, and different ministries can work together remotely using video conferencing. The conversations can be recorded for future purposes. Our Minutes of the Meeting feature and Closed Captioning feature can bring together representatives from different languages and states to solve common problems.
One of our aims is to be human centeric, which means that we want to showcase the fun that happens in various social settings. A level of social engagement is what we want to provide by leveraging Artificial Intelligence and Augmented reality. Similar to Snapchat and other fun social engagement apps.
Social being is at the center of every experience we share, so we aim to take online experience just like it is offline. We want people to host big and small events right from their chairs, with us leading forward. You can organise small and large scale remote events, social gatherings, watch parties, discussions, and much more to engage people, and build communities.
We aim to bring in our automatic voice and video responses to help users address their customers. Voice/video will feel more connected & if we can with our AI/ML efforts make it seem more human, we think we can make a real difference. Chat and video product could be integrated with customer support systems to provide better customer support experience.
We aim to go above and beyond just being a conferencing tool. With our expertise in real time video processing, we will give timely updates to all users regarding security of their homes. We want to upgrade and re-imagine, the clunky camera setup and the delay of stream. Our product could be integrated with Home security systems to perform monitoring and surveillance from any remote place.
Senior UX Designer @ Myntra, ex - Directi, Tapzo, Localoye
“Focus on the user, and all else will follow."
I'm an Interaction designer, who loves to solve problems and make products meaningful. My product and design philosophy is to make product Usable, Feasible and Scalable. I have a proven track record of working on complex and user-centric mobile and web products.
I work closely with team of product managers, developers and end users to make user-centric and data-driven decisions.

The Value Proposition Canvas is a tool which can help ensure that a product or service is positioned around what the customer values and needs.
The Value Proposition Canvas is formed around two building blocks – customer profile and a company’s value proposition.
Customer Profile
Gains – the benefits which the customer expects and needs, what would delight customers and the things which may increase likelihood of adopting a value proposition.
Pains – the negative experiences, emotions and risks that the customer experiences in the process of getting the job done.
Customer jobs – the functional, social and emotional tasks customers are trying to perform, problems they are trying to solve and needs they wish to satisfy.
Value Map
Gain creators – how the product or service creates customer gains and how it offers added value to the customer.
Pain relievers – a description of exactly how the product or service alleviates customer pains.
Products and services – the products and services which create gain and relieve pain, and which underpin the creation of value for the customer.
The Product/Market Fit Canvas is a strategic innovation tool. It allows you to define, validate and reach your customers. It also allows you to define and iterate your product to achieve market validation with your product.

Product positioning is a form of marketing that presents the benefits of your product to a particular target audience. Through market research and focus groups, marketers can determine which audience to target based on favourable responses to the product.
We positioned ourself initially to somewhere at SME's and large Orgs with expectation to have medium adoption in market. It's fairly a large market. There is enough space for new players to come and capture some industry share. This could be a sweet spot to start with in the wide canvas of communication.
Vertically we will be try to go deep into the larger organisations and try to have high adoption with large market share. Here our focus will be more towards solving complex large org problems. Efficient communication and collaboration and increased productivity. We will try to become market leader in this segment.
Horizontally we will be looking into social space where our products will have higher adoption and virility. This is highly crowed market where lot of social and engagement products already exist. This could be a good horizontal market to look into in a the longer term.
Product Consultant, ex - LOCO, Directi, ITC
Lead Live and Interactive games unit at Loco as Product Manager. It was the fastest downloaded app ever on the Google Playstore. Within one year we went from 100K users to 19Mn+ growth. Out of which a significant % of users play Live games! Have also collaborated with Brands like Myntra, OnePlus to run sponsored games that have been very successful for their marketing purposes
Some brainstorming done to understand the landscape of communication and the scope of opportunity.
Communication is simply the act of transferring information from one place, person or group to another.
Every communication involves (at least) one sender, a message and a recipient. This may sound simple, but communication is actually a very complex subject. The transmission of the message from sender to recipient can be affected by a huge range of things. These include our emotions, the cultural situation, the medium used to communicate, and even our location.
Human communication has evolved over a period of time. Initial interaction used to happen through facial expressions and gestures; then humans started talking to each other and later they started drawing and finally developed the languages. Since then human development didn’t stop and keep on developing. Communication is one of the primary aspects which helped us building relationships and creating great civilisations.
Now with the advancement of technology, humans have learned to use different modes of communication, to accomplish day to day tasks in efficient manners and keep increasing the efficiency day by day.
There are several different ways we share information with one another. For example, you might use verbal communication when sharing a presentation with a group.
Verbal communication is the use of language to transfer information through speaking or sign language. It is one of the most common types, often used during presentations, video conferences and phone calls, meetings and one-on-one conversations. Verbal communication is important because it is efficient. It can be helpful to support verbal communication with both nonverbal and written communication.
Nonverbal communication is the transfer of information through the use of body language including eye contact, facial expressions, gestures and more. Verbal communication is the use of language to transfer information through written text, speaking or sign language.
Nonverbal communication is important because it gives us valuable information about a situation including how a person might be feeling, how someone receives information and how to approach a person or group of people.
There are several types of nonverbal communications including;
Body language
Gestures
Facial expressions
Touch
Written communication is the act of writing, typing or printing symbols like letters and numbers to convey information. It is helpful because it provides a record of information for reference. Writing is commonly used to share information through books, pamphlets, blogs, letters, memos and more. Emails and chats are a common form of written communication in the workplace.
Visual communication is the act of using photographs, art, drawings, sketches, charts and graphs to convey information. Visuals are often used as an aid during presentations to provide helpful context alongside written and/or verbal communication. Because people have different learning styles, visual communication might be more helpful for some to consume ideas and information.
Direct communication is the natural physical communication between individuals and groups. From the origin of human race, direct communication is the key to build communities and relationships.
Remote communication allows people from the different locations to communicate and collaborate together. They use many tools and mediums like email, chat, and online collaboration tools to facilitate the communication. Remote communication is an area of the science that deals with the data transferring between the devices not located at the same place.
Visual mode - Video calling
Verbal mode - Voice calling
Written mode - Messaging
Personal remote communication could be easily managed through many available free or paid applications. Again any mode of communication for personal use could be used like, simple voice calls, Whatsapp or Facebook messages or by simple video calls as well.
If we talk about professional communication, there are multiple factors that need to be considered even in direct communication. But if we think about remote communication in any kind of organisation, there is a drastic change in how we communicate, based on the demography, nature of the organisation, geographical conditions, scale & impact of the organisation and the quantum of the challenge is even higher.
Irrespective of any kind of organisation there are different nature of communication that exists in any organisation. I have divided these into categories and further described in detail of each.
Social 👩👩👧👧
In organisations, there are communications which are very social in nature, people from different teams come together and communicate with each other. Many this social communication goes out of the organisation and across multiple organisation within the bigger umbrella. These communications sometimes have different ways to communicate like informative, explanatory, collaborative, open ended, restrictive etc.
Private 👬
private communication is restricted communication, or any communication made under circumstances creating a reasonable expectation of privacy. This form of communication can reach its intended receiver in a private space like, group, room, channel,1-1 etc.
Confidential 🤝
Information exchanged between two people who have a relationship in which private communications are protected by law, and intend that the information be kept in confidence. The law recognises certain parties whose communications will be considered confidential and protected, including spouses, doctor and patient, attorney and client, and priest and confessor. The intention that the communication be confidential is critical.
Following are the Risks involved and a short summary of how we mitigate those risks with our solution.
We have designed our solution with security in mind. We have used W3C technical standards and protocols that are highly secure and provide end-to-end encryption. We have taken extra care to make sure that all communication is secure and any user’s data is stored in a secure format in our databases. We also have a cyber security advisor Saurav Sengupta (https://www.souravsengupta.com/) who is helping us make sure that our infrastructure is secure. Apart from network security, we have added several application level security features like Password protected rooms, Role based access, Host controls like mute participant, pause video of a participant, remove a participant from a conference, put participants in a waiting room, and many others. Moreover, Unlike other video conferencing solutions, We default to the highest security setting for any meeting/webinar. Hosts can turn off some of the application level security features if they want to. We offer on-premise hosting solution which allows you to host all the backend infra and the database in Govt’s owned data center. This allows full-control over all the data and prevention from any unwanted access.
We have designed our backend infrastructure to scale to millions of concurrent connections. Our infrastructure is capable of supporting more than 500 participants per meeting room which can be scaled even further by adding more SFU servers. We use a concept called SFU cascades wherein a conference can span across multiple SFU servers. This is why we’re not limited by the CPU and bandwidth of a single SFU server. All our backend servers are horizontally scalable. To handle more load, we can add more servers and the infrastructure will be able to support additional load.
We also support multi-datacenter deployments by default. The users are connected to the datacenter closest to them to improve latency.
India, being a developing nation, doesn’t have high speed internet at all places. We believe that technology should empower everyone whether they are living in urban cities or remote villages with slow network connection. Our solution detects network conditions and automatically upscales or downscales video/resolutions and bitrates to make sure that we provide optimal experience to the users. Our media server (SFU) that handles video/audio communication supports Error correction, handles packet loss, and makes sure that it forwards the video/audio streams to the receivers depending on their bandwidth and display.
Some of the technical hurdles we would face will be in the field of AI/ML.
We are building closed captioning as part of our solution. Building this requires large training datasets in multiple Indian languages which is difficult to get since the area of AI based speech-to-text is very new. We plan to collaborate with Indian academic institutes for our data needs.
In order to develop our AI solutions, we would require powerful compute resources in the form of GPUs to perform parallelised large scale mathematical operations on our datasets.
Once we have developed the closed captioning solution, we need to make sure that it can perform speech-to-text tasks with reasonable latency. In this regard, we are evaluating multiple state of the art frameworks and open source solutions to make sure that all the performance requirements are met for a smooth user experience.
Apart from this, there are challenges involved in the cyber security field to make sure that we leave no stone unturned when it comes to security. We have taken extra measures to ensure that all the communication is end-to-end encrypted and have employed several application level security measures. We are also in touch with experts in the cyber security domain who are willing to guide us.
Real-time communications on the internet has been on the rise in the recent past. Many companies are offering video conferencing solutions for social as well as enterprise users. Real-time communications are powered by VOIP (Voice Over IP) protocols. VOIP protocols include;
Session Initiation Protocol (SIP), connection management protocol developed by the IETF
Real-time Transport Protocol (RTP), transport protocol for real-time audio and video data
Real-time Transport Control Protocol (RTCP), sister protocol for RTP providing stream statistics and status information
Secure Real-time Transport Protocol (SRTP), encrypted version of RTP
Session Description Protocol (SDP), a syntax for session initiation and announcement for multimedia communications and WebSocket transports.

The product roadmap is split into two pillars.
Vertical: We will build for a general use-case that can be used regardless of what sector or who uses our product, we will grow these features on the sole of our Vision - To make remote communication better.
Horizontal: We plan to eventually be a marketplace where we give developers access if required to build on top of our E2E encrypted, remote conferencing video product. These efforts, however, have to start in the house, which means that we need to build a few features and products to begin this innovation within the product.
We have a reasonably diverse team that we plan to work with covering various areas from artificial intelligence to cybersecurity. We plan to have a lean team until we raise revenue & achieve product-market fit. We plan to build our distribution strategy by starting with our friends and move to a further bigger circle by making sure our customers have a smooth journey while adapting us. (More details in the diagram below). Once we have reached a certain scale and are open to exploring our horizontal pillars, we plan to expand our team close to where our sales prospects take us. Then we plan to add sales, marketing & customer success efforts.
We would want to collaborate with educational ministry or other sectors once we scale horizontally. This is still an exploratory space; once we have a fit for our product, we can expand more on this.
So in the roadmap, we will address both of these pillars. We will focus on vertical pillars in the short term to achieve product-market fit. We will, in the long term, move to unbundle segments and add more horizontal pillars.
The general use-cases that can be used regardless of what sector or who use our product, we will grow these features on the sole of our Vision - To make remote communication better.
For Signups, we have thought to keep 1-1 calls open & no signup process required. However, to schedule calls in the future, we will have social(Facebook,google) signups or build our login system. This will mainly be done so out users can log in & see their contacts & their schedules at their convenience.
With any service, we want our remote conference should help have various kinds of conversation & not just be used for meetings, for this purpose we want to introduce various rooms.
Closed Room - A password-protected room that is invite-only.
Open Room - A water-cooler space, which is open all the time for any on & off discussions within the remote work environment.
Broadcast Rooms - A room that will be used to deliver an overall message, in the case of offices, it could be a broad CEO’s message to the employees.
As an application we are fully encrypted and private; however, we also aim to provide User-centered design which means that we want to give explicit user controls and that is what this section of features will do
Audio - Ability to Mute /Unmute audio.
Video - Ability to switch on/off
Pin - preview
Share File - Being able to send documents/audio/video across on the chat.
We believe while we are faced with a world of remote, it doesn't take away from the need to be ready. Prepared, professional and easy to adapt. This is why we are introducing these features.
Background change - Ability to change the background to a few templatized images.
Image - Ability to change your profile picture.
Background blur - Focus on the speaker is vital, so we give an ability to blur the background.
It is vital to keep communication beyond conferencing & calls. For these reasons, our next set of features focuses on making information more accessible.
Transcript - A fully automated transcript via speech recognition.
Closed Captions - Providing subtitles to every conference
Minutes of the Meeting - Automated minutes of all the meetings, by analysing transcripts with Natural Language Understanding (NLU)
We plan to eventually be a marketplace where we give developers access if required to build on top of our E2E encrypted, remote conferencing video product. These efforts, however, have to start in the house, which means that we need to build a few features and products to begin this innovation within the product.
As we build for various organisations, it is important to give them access controls and
Org Admin Roles and controls - Giving access to admins of each organisation & give controls in terms of what users have what controls. Super Admin, Admin, User levels are well defined.
Account switching (Personal/Professional/multi-account) - We aim to be organisational as well as personal; hence the ability for a user to use the tool for personal should be effortless.
Chat history/Room history - Ability to control how much information about confidential conferences can be given to different users.
Contact list - Open/closed access to the contact information within the organisation.
Schedules, Calendar - Open/closed access to the schedules/calendar information within the organisation.
Personal meeting ID - Password Protected meeting IDs in order to protect data & privacy.
For an organisation in any sector to work well, the communication tools have to be integrated with other channels as well. We aim to do that to ease the organisation to work efficiently.
Chat integrations - Integrating chat apps to remind you about the conference calls.
Browser integration - Introducing browser extensions to access & notify about the conference calls from the web.
Calendar integration - Integrating Calendar apps to remind about the conference calls & allow to add the conference link to calendar meetings as well.
We plan to start with the freemium model, but we want to provide a tiered pricing strategy eventually. A paid small org, paid mid-org, paid large-org will be on different tires, and the features we open will be dependent on size & data usage.
We aim to provide a social presence within our application so to have more engagement within the users.
Channels - Different channel conversations to deal with the interest of the users. Eg: #technology, #dance etc.
Membership - People could take free/paid membership of any channel.
Donation - Allowing people to donate to bigger causes around them, and create social impact.
Organising catchups, webinars & interviews will be at the centre of remote collaboration.
Create an event - Ability for users to create events & share it with interested people easily.
Webinar - Ability for users to easily create events & share it with interested people.
Anonymous Voting feature / Poll - Ability to vote/ poll in interesting, community shared topics.
Every communication is incomplete without powerful collaborative tools so we will expand on this.
Interactive Board - Ability to collaborate and discuss on top of video calls.
Presentation mode - Ability to present your ideas and discuss on top of video calls.
Screen highlight - As we explore education as a big sector, we think in teaching & collaboration, it will be powerful to highlight texts on screen.
While we explore the fun aspects that remote communication would bring, we feel it is important to provide certain hooks through enhancements to keep people involved.
Filters - Beautification of participants by providing filters to enhance appearance.
Masks - Adding fun elements on top of videos.
Reactions/gifs - Adding integrations with gif websites to react or comment on the video/audio call.
Host/Organiser - Every meeting will have an organiser/host who has explicit authority.
Mute others - If the user host of this meeting.
Manually admit participants - Allow/ reject admission by the host of the meeting.
Raise hand - Every communication is not one way, hence a feature for everyone to be given an opportunity to speak.
Screen Share - Ability to share the user’s screen with others, either in full screen or minimized view.
Meeting recording - Ability to record the conference call with audio & video.
Invitation - through email/ through link/ calendar invite
Preview calls - Ability to preview the video/ the person calling before the call is accepted.
Gateway for teams (Meeting room setup) - Setting up rooms for various teams that will always be used by them. Ease of use for teams to set up conference calls is assured.
Attendance - With remote teams, it is essential to set up an easy way to check-in with the team in a reliable way.
Commenting - Allowing the users to comment on video/audio calls that are recorded.
Replying - Allowing the users to reply to comment on video/audio calls that are recorded.
Liking - Allowing the users to like comments on video/audio calls that are recorded.
Sharing - Allowing the users to like share the video/audio calls that are recorded
Press to talk - Like walkie talkie, allowing users to talk by pressing a certain button - So as to allow only one person to talk at a time
Emoji faces - Allowing users to communicate beyond words & use emotions to strengthen the conversation.
BRB experience - Allow users to indicate status and indicate if they are away from the keyboard, on vacation etc.
Voice message - If the recipient is not able to take the video/audio call, the sender can record a voice message.
Video messages - If the recipient is not able to take the video/audio call, the sender can record a video message.
Auto-reply template - Ability to auto-respond in terms of audio/video etc. if one user is unable to make it to a group meeting.




We're using WebRTC for Real-time audio and video communication and WebSocket for sending chat messages and initiating a video/audio call. We also support Live streaming using WebRTC which is broadcasted to other users using HLS.
We have added process flow diagrams for all the flows like Creating/Joining a meeting room, Initiating a Video/audio call, Sending/receiving chat messages, and Live streaming in the backend architecture section.
Following diagram describes the high-level process flow for video conferencing.
Users interact with one of our mobile applications or web clients to start/join a video conferencing. The client first establishes a WebSocket connection with the Gateway server. If user is already authenticated, the client sends the authentication token in the request header which is validated by the Gateway server.
Once a WebSocket connection is established successfully, the client makes a request to create/join a room. The Gateway server calls the Room server over a GRPC connection to create a room and generate a password for the room, or validate the roomId and password if the user is trying to join an existing room.
Once the user joins the room, the client creates a WebRTC SDP offer to establish a WebRTC connection with the media servers for exchanging video/audio streams. The Gateway server receives the SDP offer and calls the Room server to find a media server where the user’s offer should be sent to.
Room servers constantly monitor media servers through service discovery mechanism and find the best-suited media server for a user. Once the media server is identified, the Gateway server sends the user’s offer to the media server. The media server responds with a WebRTC SDP answer which is sent back to the client.
Once the offer/answer flow is complete, the client establishes a persistent WebRTC connection with the media server and starts sending video/audio streams to the media server.
Media Server is the main component that receives video/audio streams from all participants in a room and sends those streams to the other participants in the room.
We allow guest users as well as authenticated users in our application. The Gateway server is responsible for authenticating users and validating every incoming request.
To authenticate users, we use both email/password-based login as well as OAuth2 based login using Google and Facebook. All the Passwords are hashed using bcrypt and stored in our database. Once a user successfully logs-in through either email/password or Google/Facebook, we create a JWE token and send it back to the client. The client sends this token in an Authorization header along with every request to the server.
When a user wants to create a new meeting room or join an existing room, the client first connects to the Gateway server using a secure WebSocket connection. The WebSocket connection is maintained throughout the life cycle of the meeting. It is used for sending chat messages and other events like user-join, user-leave, etc.
Once the WebSocket connection is established, the client sends a request to the Gateway service to join or create a room. Gateway server calls Room server to create a room or validate the RoomId and password if the user is trying to join an existing room. When a user successfully joins a room. A ‘user-join’ event is sent to all the other participants in the room.
After joining the room, the client initiates a Video/audio communication by creating an SDP (Session Description Protocol) offer and sending it to the Gateway server. The Gateway server calls Room server to find the Media Server where this user should be sent to. Room servers constantly monitor media servers through service discovery mechanism. It tries to assign all users in the same data centre to the same media server (up to a certain limit) to improve efficiency. After getting the response from Room server, the Gateway server sends the SDP offer to the media server. The media server responds with an SDP answer which is sent back to the client.
After the offer/answer flow, the client establishes a secure WebRTC connection with the media server and starts sending Audio/Video streams to the server. The media server routes the audio/video streams of every user to every other user in the same room.
The Media Server also called an SFU (Selective Forwarding Unit) is the core component which is responsible for transmitting audio/video streams between the participants in a room.
Following are the characteristics of the SFU:
Low delay: The SFU only forwards the video. So it adds minimal additional delay to the video/audio streams.
Selective Forwarding: Our SFU is selective. It picks and chooses, from the incoming video stream, a subset to forward to the receiver. And that subset could be different for every one of the receivers. So depending on the receiver’s available bandwidth or display, it will pick a subset of the video streams to forward to the user. If the receiver is watching a participant in a big window, it will choose a high resolution for that video stream. If a participant is hidden in the receiver’s display, the SFU won’t send the video stream of that participant to the receiver.
The SFU sends only the minimally required amount of media to optimize the experience for every individual receiver.
Apart from the optimisations at the SFU server, our client systems are also intelligent to adapt the bit-rate, choose the correct video resolution given the network conditions. Our clients also provide users, the ability to select the video resolution that they want to use.
To send a message to other participants in a Room; a user sends the message to the Gateway server that he is connected to. The Gateway server forwards this message to the Room server. Room server first stores the message in the database. It then looks up in the redis cache and finds out all the Gateway servers who have users that are part of that room, and sends the message to those Gateway servers. The message is finally delivered to the individual participants by the Gateway servers over the WebSocket connection.
The Room server acts as a message router, and the Gateway servers manage WebSocket connection with individual users.
We have designed our architecture to scale to virtually any number of participants per room. Our architecture also supports multi-datacenter deployments without any tweaks to the backend services.
There are three main components in our application - Gateway server, Room server, and Media server. All of these servers are horizontally scalable. The media server is responsible for receiving and forwarding audio/video streams. Let’s understand how we scale the media server.
Each media server periodically reports its health and load, and this information is curated and placed into our service discovery system. Whenever a user joins a new room, the Room server watches the service discovery system and assigns the least utilized media server to that user. Any other user joining the same room is also sent to the same media server as long as the user is connected to the same data centre and the media server is capable of taking more load. If the media server can’t take more load or the user connects to a different data centre, then the user is sent to a media server which is geographically closer to him and is able to take on more load. Once we do this, we have one conference room spanned across multiple media servers. In this case, we set up a server-to-server relay between the existing media server and the new media server. This makes sure that any audio/video stream sent to the first media server is relayed to the second media server and vice versa.
We have taken extra care to make sure that there is no single point of failure in our system. There are multiple instances of each server running in the backend. Every server instance reports its health to the service discovery system (Consul). If an instance crashes, it is removed from Consul, and the backend adapts accordingly.
We have also set up alerts using Prometheus and grafana to get notified whenever any component in our architecture is experiencing issues like high CPU or memory usage, or general application errors like increased 5xx errors.
We support live broadcasting from your device to the outside world. The Audio/video stream is captured at the local device and sent to the HLS gateway over a WebRTC connection. We transcode the video at the HLS gateway and send it to CDN servers from where the video stream is distributed to other users via HLS.
We support live broadcasting from your device to the outside world. The Audio/video stream is captured at the local device and sent to the HLS gateway over a WebRTC connection. We transcode the video at the HLS gateway and send it to CDN servers from where the video stream is distributed to other users via HLS.
We have taken utmost care to make the application as secure as possible. We can divide security features into:
Transport level security
Application level security
WebRTC provides end-to-end encryption of our video/audio streams. It uses SRTP (Secure Real-Time Transport Protocol or Secure RTP) on top of DTLS (Datagram Transport Layer Security) to provide end-to-end encryption.
HTTPS/WSS - All connections to the Gateway service happen over Secure Websocket and HTTPS connections.
Databases - All the data in our database are encrypted.
We provide many application-level security features to secure any meeting from unwanted access. We have -
Password protected rooms
Role-based access
Host controls - Host can mute a participant, pause their video, remove a participant from the meeting, allow/disallow screen sharing etc.
We support settings like ask_before_join, wherein before any person joins a meeting room, we ask the host for permission. A host can allow/disallow a person in the meeting room, or He can put him in a waiting room.
Unlike other video conferencing solutions, We default to the highest security setting for any meeting/webinar. Hosts can turn off some of the application-level security features if they want to.
We’re exploring various approaches and open source ML frameworks to perform closed captioning. There are four steps to achieve close captioning with multilingual support
Obtain a transcript of the content.
Translate the transcript into the target language.
Create the corresponding subtitles from the transcript or translation and create a subtitles file.
Combine all of those pieces into a finished product.
We’re currently trying out a POC based on DeepSpeech () which is a TensorFlow implementation of the Baidu’s DeepSpeech architecture, Mozilla voice () and other language translation machine learning models.
Works on Mobile apps, web browsers, and Desktop apps.
Supports all the audio/video codecs like VP8, VP9, H264, Opus, G722, PCMU, PCMA, etc.
Supports adaptive bit-rate, automatically chooses the right video resolution as per the network conditions. Allows users to choose the video resolution.
Works in low network conditions.
We have chosen technical standards that are mainstream and have huge support in web browsers and mobile applications. WebRTC is supported in most modern web browsers and mobile applications. It is used by some of the well known video conferencing products like Google Duo, Facebook messenger, and Microsoft Teams.
Similarly, WebSocket is the de-facto standard for by-directional communication between clients and servers.
Other technologies like HLS (HTTP Live Streaming) are chosen to provide support for all kinds of devices. Though Mpeg Dash is a newer technology than HLS for live streaming, We chose HLS because Apple doesn’t support Mpeg Dash yet.
Our backend solution works on general purpose commodity hardware. We don’t require any specialised hardware.
We recommend general purpose Linux based VMs with at least 4 CPU cores, 16GB RAM, 2.5 GHz Intel Scalable Processor, and network performance of 5Gbps.
Our solution works across web browsers, desktop, and mobile applications. For deploying the backend infrastructure, we recommend Linux based VMs.
Error correction: Many transmission errors can be corrected at an SFU without impacting anybody on the call. The SFU localizes the error correction only between itself and the endpoint that’s experiencing these errors.
We support settings wherein - After joining a meeting, a participant can only change their name/profile pics after the host approves.











Here is the Product Roadmap from inception to long term feature building
Existing Technologies Backend Research Competitive Technical Analysis
Exploring Communication as domain Existing products study Market Study Benchmarking
Backend - Done Frontend - Done UX/UI Design - Done DS/AI/ML - Done Cyber Security - Done Product Manager - Done App Developer - Done
Product Market Fit Challenges Feature Listing UX/UI Design Flows
Home Page Start a call Join a call
Video Call Interface Grid Layout Audio Controller Video Controller Invite meeting link/Meeting ID Leave Call
Home Page Signup Login
App Interface User Profile Join a Call Start a Call
Video Call Interface List of participants Chat Screen Share File Sharing
Home Page Login/Signup with Google
App Interface Import Contacts from Google Add Contacts Schedule a call List of scheduled calls Invite with email ID
App Interface Schedule call settings Public Call / Private Call Open Room/ Close Room Password Protected Locked/ Unlocked Call
Org Setup Admin Panel User Management Admin Controls Dial In Important Integrations like calendar, bridge calling, existing tools etc
Deployment on Gov Cloud Scalability Testing/Load Testing Performance Testing Debugging UI Polishing
Product & Tech Work on products functionalities, Scalability, Market Ready Product, Polished UI/UX.
Marketing Strategy We will work on different go to market strategies and marketing mediums to optimise and minimise the marketing cost while doing impactful marketing of product.
Sales Strategy Work of different sales channels and sales pitches to make it most impactful.
Human Resource We will try to hire the passionate young talent in different domains who can work with us at same energy and pace and also learn from others expertise. We will try to minimise the hiring as much as possible to reduce the expenditure while fulfilling the product requirements.
Video Call Interface Close Captioning (Multilingual) Meeting Audio/ Video Recording Full Screen View Speaker Layout Trail Layout Participants Mic Status Meeting Timer Permissions Participants Mic permission Participants Screen Share Permission




These are product functionality buckets that we can use to prioritise and build features. Every product feature will be around Communication and Security, these could be core of product functionality.
Basic functionality is something which is require to have a basic communication, whitout these basic functionality users can't use the app.
Core functionalities are which are part of our core philosophy (Communication & Security)
Enhancements are additional features which will enhance the Basic and core functionalities.
Delight features are the one which gives WOW to the users and make them excited while using the product.
MVP is the smallest solution that delivers customer value. We should take approach to build MVP before committing any big resource investment.









