Paolo Pirjanian is an Armenia born in Iran and fled to Denmark as a teen. From the time he was young, he was fascinated by computers and started coding in his bedroom. After getting his PhD in robotics, Paolo became an early leader in the field of consumer robotics who has 16+ years of experience developing and commercializing cutting-edge home robots. He worked at NASA JPL and led world-class teams and companies at iRobot®, Evolution Robotics®, and others. In 2016, Paolo founded Embodied, Inc. with the vision to build socially and emotionally intelligent digital companions that improve care and wellness and support people in living better lives every day.
What attracted you initially to AI and robotics?
My fascination with AI and robotics stems back to my childhood. I was displaced from country to country several times until our family moved to Denmark. By accident, I discovered a computer. I became so fascinated by it that I locked myself in my room and started coding all day and night for months. My parents thought I was depressed or on drugs, but it was none of that. I was just so completely fascinated by the computer!
During that same time, I saw a documentary on TV by Pixar. Pixar was presenting their first animated short, Luxo Jr., a two-minute short about two table lamps running around and playing with a ball. I was so fascinated by that and amazed that a computer that I was just learning to code could generate such endearing characters on TV that evoke so much emotion in me. So from there on, I decided to go to school to study robotics, eventually getting my PhD.
I then moved to the US to work on Mars rovers at NASA, which was a childhood dream job. Eventually, I got into entrepreneurship to develop SLAM navigation technology that now enables iRobot’s products.
But looking back, I realized that my inspiration for this whole journey was actually the Pixar short animation of bringing life to inanimate objects. So, that’s why we created Embodied – to bring life to robots that can interact with people, focusing on helping children with social-emotional development.
When did you first come across the concept for launching Evolution Robotics?
Evolution Robotics was originally started by Bill Gross of Idealab in 2001 to become the Microsoft of Robotics, a bold vision which turned out to be way way too early and eventually failed. I was the CTO and GM at Evolution Robotics and after its failure I negotiated with Idealab to spin out some of the core technologies that my team and I had developed and start a new company. In 2008 the new entity, also known as Evolution Robotics, started out to develop products using our core navigation technologies including NorthStar and vSLAM which were groundbreaking approaches to spatial mapping and autonomous navigation similar to what we are seeing in self-driving cars but targeted for low-cost, consumer electronics products.
We developed a line of products for automatic sweeping and mopping of hard floors called Mint which we launched in 2010. By 2011 we rapidly grew to $25m in sales and got acquired by iRobot in 2012 for our product revenues and our navigation technology vSLAM which now powers Roomba and Braava product lines at iRobot.
At that point you became the CTO at iRobot. Could you discuss your experience at iRobot and what you learned from your experience?
As the CTO of iRobot, I was able to quickly integrate vSLAM into the Roomba product line to launch a new model that was able to systematically cover the entire floor plan without missing a spot. That helped the company stay ahead of competition like Dyson which was coming out with systematic cleaning solutions. vSLAM is now an integral part of iRobot’s flagship product lines Roomba and Braava.
I enjoyed working closely with Colin Angel, CEO of iRobot to help set a strategic direction to make Roomba central to the connected home ecosystem where Roomba’s spatial awareness gives it a unique position in understanding the floor plan and becoming the connective tissue between all connected devices. That strategy seems to have had a strong footing since my departure in 2015.
In addition, we decided on doubling down on the Consumer Robotics business to help iRobot maintain its global leadership position. This led to the divestiture of the defense business and exiting other peripheral businesses to bring focus and intensity to the consumer business.
Furthermore, we had to re-architect the organisation to be able to support a software-heavy strategy with connected products. That required a transformation of company culture to embrace more of an agile, iterative approach.
The list of things I learned at iRobot is long. One thing that sticks out is the power of team culture. Staying agile and committed to mission is probably the most important competitive advantage any company can have above any patent portfolio and above trade secrets. If you have a high-performing team, who feels empowered and inspired towards a clear goal, they will be hard to stop.
You’re currently the Founder & CEO of Embodied. Can you discuss what the inspiration was behind launching this company?
I really enjoyed my time at iRobot as the CTO, and we were working on a lot of exciting projects and pushing the boundaries of robotics. It was exciting to launch commercially successful robots into the marketplace that performed helpful physical tasks, such as vacuuming the floor.
However, in the back of my mind, I knew I still had a lifelong dream to fulfill – to build socially and emotionally intelligent robotic companions that improve care and wellness and enhance our daily lives. I knew we were at a tipping point in the way we will interact with technology. So with that, I decided to resign from iRobot and start Embodied.
When we started Embodied, from the beginning, we were rethinking and reinventing how human-machine interaction is done beyond simple verbal commands, to enable the next generation of computing, and to power a new class of machines capable of fluid social interaction. Specifically, the first product was to focus on building an animate companion to help children build social and emotional skills through play-based learning. This companion would come to be known as Moxie. Moxie is a new type of robot that has the ability to understand and express emotions with emotive speech, believable facial expressions and body language, tapping into human psychology and neurology to create deeper bonds. To do this, we brought together a cross-functional team of passionate leaders in engineering, technology, entertainment, game design, and child development. For the past four years, Embodied has been working tirelessly to bring all of the latest technology together to bring Moxie to life, and the team is excited to finally deliver it to families in need of a co-pilot for supporting healthy child development.
What are some of the unique entrepreneurial challenges behind a robotics startup?
It’s fun to do the impossible, but it can also be a little scary. We knew that if we wanted to revolutionize how humans interact with machines, we were going to have to solve problems that hadn’t been solved before. Some problems included:
- Flat screens are on devices, and we want to bring a device to life. So how do we create a face that’s more life-like, rounded, and not two-dimensional?
- Current conversation engines only allow for very limited conversation, so how do we create a solution that allows for more natural conversation?
- We don’t want the voice to sound robotic, so how do we make the voice sound natural, with contextually-appropriate tonality and inflections?
- We knew eye contact was very important, so we had to figure out how to use computer vision to ensure reliable eye tracking capabilities.
All of these questions about Moxie’s features led to many state of the art technological innovations.
First, projected and rounded face. The statistics are starting to pile up to show us that too much screen time can have devastating effects on developing minds. Even worse, most kids’ tech devices feature digital screen displays. That’s why we decided to put in the extra investment to make Moxie’s face fully projected which allowed us to create a face screen that is rounded with naturally-curved edges, instead of a flat display. This makes interacting with Moxie feel more life-like, realistic, and believable. In fact, only through this 3D appearance of the face, is it possible for Moxie to have actual eye-contact with the child. So not only is Moxie’s face protecting children from excessive screen-time, but it also makes the interaction experience feel all the more real.
Second, the conversation engine. Thus far, smart speakers and voice assistants have required the repetitive use of wake words to initiate commands. Moxie’s conversational engine is different. It follows a natural conversation and responds to typical flow of communication without the use of wake words (like “Hey Siri” or “Ok Google”). Advanced natural language processing allows Moxie to recognize, understand, and generate language seamlessly, making the interaction feel more personal and natural.
Third, speech synthesis. Moxie’s voice doesn’t have the same robotic speech and monotone sound found in most robots and voice assistants. Instead, Moxie uses natural and emotive vocal inflections, which help communicate a broader range of emotions. This enhances the scope of social-emotional lessons Moxie can engage in, while also bringing an added life-likeness and believability to the interaction.
Fourth, the eyes. One of the most important features is Moxie’s large, animated eyes. Innovative eye tracking technology allows Moxie to keep eye-contact with the child even as the child moves about the room. This eye tracking capability not only creates an incredibly life-like interaction, but it also helps the child practice eye contact. Additionally, the large, animated eyes help exaggerate emotional communication, so the child can more easily recognize certain emotions. Practicing eye contact and understanding emotions are two key developmental goals in social-emotional curriculum.
Lastly, all of these technological features allow interactions with Moxie to feel realistic and natural. Moxie’s multimodal sensory fusion makes Moxie aware of the environment and its users. Moxie’s computer vision and eye tracking technology helps maintain eye contact as the child moves. Machine learning helps Moxie to learn user preferences and needs, and recognize people, places, and things. Specially located mics enable Moxie to hear the direction a voice came from and easily turn to the source. Touch sensors allow Moxie to recognize hugs and handshakes. All of these pieces come together to make the experience very realistic.
Could you tell us some of the things that makes Moxie perfect for children?
With Moxie, children can engage in meaningful play, every day, with content informed by the best practices in child development and early childhood education. Every week is a different theme such as kindness, friendship, empathy or respect, and children are tasked to help Moxie with missions that explore human experiences, ideas, and life skills. These missions are activities that include creative unstructured play like drawing, mindfulness practice through breathing exercises and meditation, reading with Moxie, and exploring ways to be kind to others. Moxie encourages curiosity so children discover the world and people around them. All these activities help children learn and safely practice essential life skills such as turn taking, eye contact, active listening, emotion regulation, empathy, relationship management, and problem solving.
Embodied has also partnered with Encyclopaedia Britannica and Merriam-Webster to integrate Merriam-Webster’s Dictionary for Children, enabling Moxie to provide age-appropriate definitions and related information to help children learn and understand the meanings of new words and concepts. This is the first of many integrations with Moxie that deliver on Britannica and Merriam-Webster’s shared mission to inspire curiosity and the joy of learning.
Embodied has also developed a full ecosystem that assists parents in supporting their child’s journey with Moxie and allows children to expand their use of Moxie in a safe and parent-approved way:
- The Embodied Moxie Parent App provides a dashboard to help parents understand their child’s development progress with Moxie. The app will provide key insights to a child’s social, emotional, and cognitive development through their activities with Moxie. The app further provides valuable suggestions and tips to parents to enhance their child’s experience and progress with Moxie.
- An online child portal site (referred to as the Global Robotics Laboratory, or G.R.L.) provides additional activities, games and stories that will enhance the experience with Moxie.
- Monthly Moxie Mission Packs are mailings meant to engage children in new activities with Moxie and also provide fun items like trading cards and stickers.
Over time, Moxie learns more about the child to better personalize its content to help with each child’s individual developmental goals. Embodied has taken careful steps to ensure that information provided by children and families is handled with high standards of privacy and security. We intend that Moxie will be fully COPPA (Children’s Online Privacy Protection Act) Safe Harbor certified so parents can feel safe knowing that Moxie employs leading data integrity and security procedures and that its systems are regularly audited to ensure full compliance. Further, personally identifiable data and sensitive information is encrypted with the highest level of security and can only be decrypted by a unique key that only the parent has access to.
What are some of the natural language processing challenges that are faced by Moxie?
At Embodied, we strive to redefine how humans interact with machines, especially in conversation through natural language processing. So, we decided to create SocialXTM, which is a platform that enables children to engage with Moxie through natural interaction (i.e., facial expressions, conversation, body language, etc.), evoking trust, empathy and motivation as well as deeper engagement to promote developmental skills. With SocialXTM, Embodied is introducing a whole new category of robots: animate companions. “Animate” means to bring to life and SocialXTM allows Moxie to embody the very best of humanity in a new and advanced form of technology that can fuel new ways of learning.
Natural language processing is at the core of our natural conversation engine, and there are many unique features to the conversation engine that we worked tirelessly to create.
The key feature we worked on was Moxie’s ability to focus conversation with a single user and separate out background conversations and sounds, so Moxie is only responding to the user. This allows for a more focused and personable interaction. This is a solution to what many call the “cocktail party problem”. When you are at a cocktail party, and there are many people all around you talking in a room while you are trying to stay in conversation with one person, it isn’t terribly difficult for humans. For a computer, this is incredibly difficult. How do we make sure that Moxie only responds to what the single user says, and doesn’t get thrown off by background noises, conversations, TV, etc. There are many ways we approach the solution to this problem.
- We use our vision system to identify who is looking at and facing Moxie.
- We have a number of microphones in the front of Moxie that tell us where that sound is coming from.
- We can then use machine learning to match the sound to who is speaking in front of Moxie. This allows us to filter out the other conversations and stay focused on a single user.
Generally, conversation agents in the market have avoided the “cocktail party problem” by using wake words, such as, “Hey (device, followed by a question)”. This wake word allows the conversation agent to listen for the wake word and respond only when that wake word is said. However, since Moxie can focus on a single user, Moxie doesn’t need to have wake words to activate a response.
We wanted to make sure that Moxie’s conversation engine is so sophisticated that it is contextually aware of conversational responses. This allows for more nuanced conversation. For example, Moxie can understand the different meanings behind “I don’t know” and “no”.
Is there anything else that you would like to share about Moxie or Embodied?
We have been working on this project for four years with a dedicated team that has worked tirelessly to make the amazing inventions that are required to bring Moxie to life. Now we are excited to finally bring Moxie to families to help their children with social emotional development. So, we are looking forward to the journey!
Thank you for the interview, I loved hearing how you were initially inspired by a short Pixar film, and how you’ve since pursued your life passion. Readers who wish to learn more or who want to order a Moxie should visit Embodied, Inc.
- Attention-Based Deep Learning Networks Could Improve Sonar Systems
- Cerebras CS-1 System Integrated Into Lassen Supercomputer
- Deepfaked Voice Enabled $35 Million Bank Heist in 2020
- Facebook: ‘Nanotargeting’ Users Based Solely on Their Perceived Interests
- IBM Announces AI-Driven Software for Environmental Intelligence