How to make a robot social?
At BLUESKEYE AI we create the world’s best face and voice analysis technology for you to use in your robots. It lets your robot read a user’s social, emotional, and medically relevant behaviour.
If you made it to this blog post, I don't have to tell you that Humanoid robots are not only possible, they’re already here. I basically shook hands with one the other week. I also don’t have to tell you how much value they will deliver humanity. It is not a question of whether, but when and how many will walk among us. Most importantly, it is a question of how they will interact with us.
I also don’t have to tell you that it will not be just one type of robot, but a multitude, each specialising for a different purpose. They will be made by different manufacturers, with different form factors, operating in increasingly complex and sophisticated ecosystems.
Whilst some will undeniably be condemned to work in dark factories, what excites me are the large number of robots whose primary purpose is to service and build personal connections with us humans.
Wouldn’t it be great if you could communicate as smoothly with a robot as with your best friend? Where half a word - or no word at all - is enough to let the robot know what you need?
This is not just a fancy idea, or a nice-to-have that you can bolt onto your robot later on. For some humanoid robot creators it is their vision. Take the robotics pioneer Scott LaValley, founder and CEO of Cartwheel robotics and famed for working at both Boston Dynamics and Disney. Having worked with both the most advanced and the most appealing robots in the world, Scott says that he wants his Yogi robot to be “a small, friendly humanoid robot designed to bring joy, warmth, and a bit of everyday magic into the spaces we live in. It’s expressive, emotionally intelligent, and full of personality—not just a piece of technology but a presence you can feel.”. That you can feel!
Another great example of robots that already use our technology are the Mirokai, created by Enchanted Tools. Enchanted build ‘People facing robots, powered by AI'. As their visionary CEO Jérôme Monceaux says, "At Enchanted Tools, we strive to be more than just useful. Our aim is to enchant the world through impactful technological innovation, always prioritising ethical and responsible practices. The essence of [Enchanted’s] vision lies in seamlessly blending usefulness with enchantment."
Benefits
Here are five immediate benefits that you can expect from making your robot socially and emotionally aware:
Create a more engaging, empathetic and effective experience for your end-users that resonates with them and will make them want to come back for more.
Better understand what your user intends to achieve from the interaction.
Know when the user is confused about what the robot says, when they are engaging with the topic or when they are bored.
Solve the cocktail party problem: who’s speaking, and who are they addressing?
Change the tone of the robot’s speech to match that of the user, or to steer the user in the desired emotional quadrant.
Building robots is hard
And understanding humans even harder. Despite all the progress with LLMs and computer vision for navigation and object detection, robots to date are basically blind and deaf to the multitude of signals that humans produce and interpret constantly in order to have meaningful and informative interactions. Human behaviour is subtle, culturally specific, often deliberately vague to remain polite and deal with partial information. What’s more, to make that kind of interaction work, latency must be super fast. And if you’re an ethical solution provider, you really don’t want to stream super private data from someone’s home to the cloud all the time. All that makes it very hard to use the standard cloud based architectures that LLMs use.
How are you, the creators of the next generation of service robots (serfbots?), going to add this additional layer of technical complexity to your already complex development roadmap? Isn't it enough that you have to deal with navigation, object manipulation, and scheduling? And isn't there a whole load of privacy and security concerns that come with processing people's social and emotional signals?
B-Social SDK
Luckily for you, the founders of BLUESKEYE AI have worked for over 20 years to build the state of the art technology to read people's facial expressions and to understand tone of voice. In fact, at Blueskeye our stated mission is to objectively measure all social, emotional, and medically relevant face and voice behaviour. And we make that available as an SDK that runs on edge.
Every robot platform is different, including yours. Different price points determine what hardware components you can choose from, what computational budget you have available. Working with BLUESKEYE AI will give you the trust and certainty that you can deliver on this exciting but complex journey. Over the years we gained a mountain of experience as partners and customers integrated our system in social robots and virtual assistants. We know what problems you are likely to face, and we're here to help you navigate them to build a solution that will delight your users.
Apparent emotion recognition
The most important feature is BLUESKEYE AI’s measure of apparent emotion, which can measure very subtle changes in emotion by estimating the two dimensions valence (positive vs negative feelings) and arousal (high energy vs low energy). This is literally infinitely more meaningful than relying on 6 basic emotions. Read this BLUESKEYE knowledge base article to learn all about BLUESKEY AI’s approach to measuring emotion.
By measuring apparent emotion in continuous time and value, changes in a user’s expressed emotion can be associated with events or changes in their local environment.
This allows the creation of a more engaging and responsive social robot which can engage empathetically with the world around it.
Other features included in B-Social include the most accurate face tracker in the world, social gaze tracking, face re-identification, confusion, interest, and engagement detection.
Health features such as fatigue estimation, heart-rate estimation or depression recognition are available upon request.
Solving the cocktail party problem
One of the biggest problems that people-facing robots face is that there are frequently multiple users in front of the robot. Different users speak at different times, they may switch between addressing the robot to addressing each other, and they may move in and out of view of the robot’s cameras. The problem of knowing who speaks to who, and being able to attach the interaction data of the same user to the same internal buffer when they return in view, is known as the cocktail party problem.
Blueskeye’s B-Social SDK has a great solution for this, combining fast audio-based voice activity detection with highly robust visual voice activity detection to disambiguate who’s speaking. This is combined with person re-identification to make sure you can continue delivering a great experience to the same user even if they temporarily leave the scene.
See this 4-minute AV-VoxAD video for a detailed explanation of our audio-visual voice activity detection feature.
Performance
BLUESKEYE’s performance is state of the art. In particular, we now have super-human apparent emotion recognition. This means that our apparent emotion recognition is better than the average human at determining what a user appears to feel based on their facial expressions.
In the two-dimensional valence/arousal emotion space, B-Social has lower error than the average human in all green areas.
Not only is Blueskeye’s B-Social SDK more accurate than any other system, it is also really robust against pose. We basically see no degradation of the accuracy between frontal poses and poses as extreme as 60 degrees yaw.
Valence and Arousal estimation is robust to variations in head pose.
Our face tracker is 3 times more accurate than that of Google, and the error rate of our face detector is only half that of Google’s mediapipe. That’s important because it all starts with capturing the location of the face. And all of this will run on edge in real time with minimal compute requirements.
Comparison of BLUESKEYE AI face detector and tracker to Google Mediapipe showing Blueskeye's significant superiority in accuracy.
To make it easier for you to get started, B-Social also has built-in function calls to calculate the mood of a user over a period of time. This can give you information about how well an entire interaction went, or a single turn, or any other unit of time that is important for you.
Perhaps one of the simplest but most important features of B-Social is that it will allow you to connect a user’s opinions and attitudes to your robot or other people and events in the same space. When the user scowls, you need to know whether that was directed at your robot or someone else!
With our re-identification module, you will also be able to assign attitudes and opinions to different users.
Compliance
BLUESKEYE AI historically operates in the Health and Automotive industries and is ready to use in the UK, EU, US, and most other parts of the world. Complex compliance is nothing new to us, and the current requirements from the EU AI Act have simply been included in our existing company’s compliance framework.
Start integrating today
Request a quote for our standard 3-month evaluation licence today. This comes with dedicated integration and technical support as well as monthly check-ins with our world-leading experts on social and emotion AI to help your product and interaction development teams understand how to make the best possible social robot!