Ranked as one of India’s 10 top data scientists by Analytics India Magazine, Joy Mustafi has led data science research at tech giants including Salesforce, Microsoft, and IBM, winning 50 patents and authoring over 25 publications on AI.
He was associated with IBM for a decade as Data Scientist involved in a variety of business intelligence solutions, including IBM Watson. He worked as Principal Applied Scientist at Microsoft, responsible for AI research. Most recently, Mustafi was the Principal Researcher for Salesforce’s Einstein platform.
Mustafi is also the Founder and President of MUST Research, a non-profit organization promoting excellence in the fields of data science, cognitive computing, artificial intelligence, machine learning, and advanced analytics for the benefit of society.
Recently Mustafi joined Redwood City-based Aviso, Inc as Chief data scientist, where he will leverage his decades of experience to help Aviso customers accelerate deal-closing and expand revenue opportunities.
What initially attracted you to AI?
I love mathematics a lot, and the same for programming. I did my graduate degree in statistics and post-graduate work in computer applications. When I started my AI research journey back in 2002 at the at Indian Statistical Institute in Kolkata, I used the C programming language to develop an Artificial Neural Network system for handwritten numeral recognition. That was 2500+ lines of code, all written from scratch without any inbuilt libraries apart from standard input / output. It consisted of data cleansing and pre-processing, feature engineering, and a back propagation algorithm with a multilayer perceptron. The entire process was a combination of all the subjects that I studied. At that time AI was not so popular in the corporate world, and few academic organisations were doing advanced research in the field. And, by the way, AI wasn’t new at the time! The field of AI research dates all the way back to 1956, when Prof. John McCarthy and others inaugurated the field at a now-legendary workshop at Dartmouth College.
You have worked with some of the most advanced companies in AI such as IBM Watson & Microsoft. What has been the most interesting project that you have worked on?
I want to mention the first patent I was awarded while working at IBM: a method for solving word problems in natural language, which was an open problem with IBM Watson. The system I developed can understand an arithmetic or algebraic problem stated in natural language and provide a solution in real-time as a natural language answer. To do that, the system had to handle the following key steps: Get the input problem statements and question to be answered; convert the input sentences to a sequence of sentences which are well-formed from a mathematical perspective; convert the well-formed sentences into mathematical equations; solve the set of equations; and narrate the mathematical result in natural language.
There’s also my best project for Microsoft — Softie! I invented and built a physical robot equipped with various types of interchangeable input devices and sensors to allow it to receive information from humans. A standardized method of communication with the computer allowed the user to make practical adjustments, enabling richer interactions depending on the context. We were able to implement a robust system with features including a keyboard, pointing device, touchscreen, computer vision, speech recognition, and so forth. We formed a team from various business units, and encouraged them to explore research applications on artificial intelligence and related fields.
You’re also the Founder and President of MUST Research, a non-profit organization registered under Society and Trust Act of India. Could you tell us about this non-profit?
MUST Research is dedicated to promoting excellence and competence in the fields of data science, cognitive computing, artificial intelligence, machine learning, and advanced analytics for the benefit of the society. MUST aims to build an ecosystem to enable interaction between academia and enterprise, helping them to resolve problems and making them aware of the latest developments in the cognitive era to provide solutions, offer guidance or training, organize lectures, seminars and workshops, and collaborate on scientific programs and societal missions. The most exciting feature of MUST is its fundamental research on cutting-edge technologies like artificial intelligence, machine learning, natural language processing, text analytics, image processing, computer vision, audio signal processing, speech technology, embedded systems, robotics, etc.
What was it that inspired you to launch MUST Research?
My love of sci-fi movies and mathematics means I’m often thinking about how technology can change the world, and I’d been thinking about forming a group of like-minded experts on advanced technologies since 1993, when I was in 9th grade. Once I got my first job, it took 10 years to call for a meeting — and another 10 years to identify a group of suitable experts and form a non-profit society. Now, though, we have around 500 data scientists in MUST across India who are passionately contributing to research on emerging technologies.
Over the past several years the industry has been significant advances in deep learning, reinforcement learning, natural language processing, etc. Which area of machine learning do you currently view as the most exciting?
All machine-learning algorithms are exciting once they are implemented as a product or service that can be used by businesses or individuals in the real world. The Deep Learning era has pros and cons, though — sometimes it helps in automatic feature engineering, but at the same time it can work like a black box, and end up with a garbage-in-garbage-out scenario if proper datasets or algorithms aren’t used. Some of the latest technologies are also resource-hungry and require huge amounts of processing power, time, and data. The key thing to remember is that Deep Learning is a subset of Machine Learning (ML), which in turn is a subset of Artificial Intelligence (AI), and AI is a subset of Data Science — so it’s all connected. And it’s not about Python, R or Scala — I started my AI journey in C, and one can even write AI programs in assembly language code. Building successful AI systems depends first and foremost on understanding the business or research environment, and then connecting the dots between actions and data to build a system which genuinely helps various people in different domains. Whether you’re working with Natural Language Processing, Computer Vision, Video Analytics, Speech Technology, or Robotics, the best way forwards is to start with the simplest possible approach, and then adopt more complex methods iteratively as you experiment with and refine your system.
You are a frequent guest speaker at leading universities in India. What is one question that you often hear from students, and how do you best answer it?
The single question I hear most often is: “How can I become a data scientist?” I always tell young people that it’s definitely possible, and try to guide them towards using their love of mathematics, statistics, or computer science to try to solve real-world business problems. People also ask how they can join MUST, and again, the answer is simple: “Build your profile with multiple projects and focus on thinking outside of the box.” If you want to become a data scientist, you have to also prove that you can innovate. Without innovation, we can’t call ourselves scientists. Of course, being awarded patents or publishing your research in reputed journals and conferences also helps!
You recently joined Redwood City-based Aviso as chief scientist, in order to use your AI/ML expertise. Could you tell us a bit about Aviso and your role with this company?
Aviso uses AI and machine learning to guide sales executives and take the guesswork out of the deal-making process. That’s a fascinating challenge, and my primary responsibility is to help the organization grow in a positive direction, using deep research to set the stage for the customers’ success. I’m using my knowledge and experience in artificial intelligence and innovation to help make our core products and research projects more:
Adaptive: They must learn as information changes, and as goals and requirements evolve. They must resolve ambiguity and tolerate unpredictability. They must be engineered to feed on dynamic data in real time.
Interactive: They must interact easily with users so that those users can define their needs comfortably. They must interact with other processors, devices, services, as well as with people.
Iterative and Stateful: They must aid in defining a problem by asking questions or finding additional source input if a problem statement is ambiguous or incomplete. They must remember previous interactions in a process and return information that is suitable for the specific application at that point in time.
Contextual: They must understand, identify, and extract contextual elements such as meaning, syntax, time, location, appropriate domain, regulation, user profile, process, task and goal. They must draw on multiple sources of information, including both structured and unstructured digital information.
What was it that attracted you to this position with Aviso?
Aviso is working to replace bloated legacy CRM systems with frictionless, AI-enabled tools that can deliver actionable insights and unlock sales teams’ full potential. Our product is a smart system which understands the pain points of salespeople, does away with time-consuming data entry, and gives executives the suggestions and guidance they need to close deals effectively. I was attracted to the strong leadership team and customer base, but also to Aviso’s commitment to using sophisticated AI tools to solve real-world challenges. Selling is a vital part of any business, and Aviso helps with that by leveraging the power of artificial intelligence. Bulls-eye! What more could you want?
Lastly, is there anything else that you would like to share about AI?
Artificial intelligence makes a new class of problems computable. To respond to the fluid nature of users understanding of their problems, the cognitive computing system offers a synthesis not just of information sources but of influences, contexts, and insights. These systems differ from current computing applications in that they move beyond tabulating and calculating based on pre-configured rules and programs. They can infer and even reason based on broad objectives. In this sense, cognitive computing is a new type of computing with the goal of developing more accurate models of how the human brain or mind senses, reasons, and responds to stimulus. It is a field of study which studies how to create computers and computer software that are capable of intelligent behavior. This field is interdisciplinary: artificial intelligence is a place where a number of sciences and professions converge, including computer science, electronics, mathematics, statistics, psychology, linguistics, philosophy, neuroscience, and biology. That’s what makes it so exciting!
Anastassia Loukina, Senior Research Scientist (NLP/Speech) at ETS – Interview Series
Her research interests span a wide range of topics. She has worked among other things on Modern Greek dialects, speech rhythm and automated prosody analysis.
Her current work focuses on combining tools and methods from speech technologies and machine learning with insights from studies on speech perception/production in order to build automated scoring models for evaluating non-native speech.
You clearly have a love of languages, what introduced you to this passion?
I grew up speaking Russian in St. Petersburg, Russia and I remember being fascinated when I was first introduced to the English language: for some words, there was a pattern that made it possible to “convert” a Russian word to an English word. And then I would come across a word where “my” pattern failed and try to come up with a better, more general rule. At that time of course, I knew nothing about linguistic typology or the difference between cognates and loan words, but this fueled my curiosity and desire to learn more languages. This passion for identifying patterns in how people speak and testing them on the data is what lead me to phonetics, machine learning and the work I am doing now.
Prior to your current work in Natural Language Processing (NLP) you were a translator between English-Russian and Modern Greek-Russian. Do you believe that your work as a translator has given you additional insights into some of the nuances and problems associated with NLP?
My primary identity has always been that of a researcher. It’s true that I started my academic career as a scholar of Modern Greek, or more specifically, Modern Greek phonetics. For my doctoral work, I explored phonetic differences between several Modern Greek dialects and how the differences between these dialects could be linked to the history of the area. I argued that some of the differences between the dialects could have emerged as a result of the language contact between each dialect and other languages spoken in the area. While I no longer work on Modern Greek, the changes that happen when two languages come in contact with each other is still at the heart of my work: only this time I focus on what happens when an individual is learning a new language and how technology can help do this most efficiently.
When it comes to the English language, there are a myriad of accents. How do you design an NLP with the capability to understand all of the different dialects? Is it a simple matter of feeding the deep learning algorithm additional big data from each type of accent?
There are several approaches that have been used in the past to address this. In addition to building one large model that covers all accents, you could first identify the accent and then use a custom model for this accent, or you can try multiple models at once and pick the one which works best. Ultimately, to achieve a good performance on a wide range of accents you need training and evaluation data representative of the many accents that a system may encounter.
At ETS we conduct comprehensive evaluations to make sure that the scores produced by our automated systems reflect differences in the actual skills we want to measure and are not influenced by the demographic characteristics of the learner such as their gender, race, or country of origin.
Children and/or language learners often have difficulty with perfect pronunciation. How do you overcome the pronunciation problem?
There is no such thing as perfect pronunciation: the way we speak is closely linked to our identity and as developers and researchers our goal is to make sure that our systems are fair to all users.
Both language learners and children present particular challenges for speech-based systems. For example, child voices not only have very different acoustic quality but children also speak differently from adults and there is a lot of variability between children. As a result, developing an automated speech recognition for children is usually a separate task that requires a large amount of child speech data.
Similarly, even though there are many similarities between language learners from the same background, learners can vary widely in their use of phonetic, grammatical and lexical patterns making speech recognition a particularly challenging task. When building our systems for scoring English language proficiency, we use the data from language learners with a wide range of proficiencies and native languages.
In January 2018, you published ‘Using exemplar responses for training and evaluating automated speech scoring systems‘. What are some of the main breakthroughs fundamentals that should be understood from this paper?
In this paper, we looked at how quality of training and testing data affects the performance of automated scoring systems.
Automated scoring systems, like many other automated systems, are trained on data that has been labeled for humans. In this case, these are scores assigned by human raters. Human raters do not always agree in the scores they assign. There are several different strategies used in assessment to ensure that the final score reported to the test-taker remains highly reliable despite variation in human agreement at the level of the individual question. However, since automated scoring engines are usually trained using response-level scores, any inconsistencies in such scores due to the variety of reasons outlined above may negatively affect the system.
We were able to have access to a large amount of data with different agreement between human raters and to compare system performance under different conditions. What we found is that training the system on perfect data doesn’t actually improve its performance over a system trained on the data with more noisy labels. Perfect labels only give you an advantage when your total size of the training set is very low. On the other hand, the quality of human labels had a huge effect on system evaluation: your performance estimates can be up to 30% higher if you evaluated on clean labels.
The takeaway message is that if you have a lot of data and resources to clean your gold-standard labels, it might be smarter to clean the labels in the evaluation set rather than the labels in the training set. And this finding applies not just to automated scoring but to many other areas too.
Could you describe some of your work at ETS?
I work on a speech scoring engine system that process spoken language in an educational context. One such system is SpeechRater®, which uses advanced speech recognition and analysis technology to assess and provide detailed feedback about English language speaking proficiency. SpeechRater is a very mature application that has been around for more than 10 years. I build scoring models for different applications and work with other colleagues across ETS to ensure that our scores are reliable, fair and valid for all test takers. We also work with other groups at ETS to continuously monitor system performance.
In addition to maintaining and improving our operational systems, we prototype new systems. One of the projects I am very excited about is RelayReader™: an application designed to help developing readers gain fluency and confidence. When reading with RelayReader, a user takes turns listening to and reading aloud a book. Their reading is then sent to our servers to provide feedback. In terms of speech processing, the main challenge of this application is how to measure learning and provide actionable and reliable feedback unobtrusively, without interfering with the reader’s engagement with the book.
What’s your favorite part of working with ETS?
What initially attracted me to ETS is that it is a non-profit organization with a mission to advance the quality of education for all people around the world. While of course it is great when research leads to a product, I appreciate having an opportunity to work on projects that are more foundational in nature but will help with product development in the future. I also cherish the fact that ETS takes issues such as data privacy and fairness very seriously and all our systems undergo very stringent assessment before being deployed operationally.
But what truly makes ETS a great place to work is its people. We have an amazing community of scientists, engineers and developers from many different backgrounds which allows for a lot of interesting collaborations.
Do you believe that an AI will ever be able to pass the Turing Test?
Since the 1950s, there have been a lot of interpretation of how the Turing test should be done in practice. There is probably a general agreement that the Turing test hasn’t been passed in a philosophical sense that there is no AI system that thinks like human. However, this has also become a very niche subject. Most people don’t build their systems to pass Turing test – we want them to achieve specific goals.
For some of these tasks, for example, speech recognition or natural language understanding, human performance may be rightly considered the gold standard. But there are also many other tasks where we would expect an automated system to do much better than humans or where an automated system and human expert need to work together to achieve the best result. For example, in an educational context we don’t want an AI system to replace a teacher: we want it to help teachers, whether it is through identifying patterns in student learning trajectories, help with grading or finding the best teaching materials.
Is there anything else that you would like to share about ETS or NLP?
Many people know ETS for its assessments and automated scoring systems. But we do much more than that. We have many capabilities from voice biometrics to spoken dialogue applications and we are always looking for new ways to integrate technology into learning. Now that many students are learning from home, we have opened several of our research capabilities to general public.
Thank you for the interview and for offering this insight on the latest advances in NLP and speech recognition. Anyone who wishes to learn more can visit Educational Testing Services.
Charles J. Simon, Author, Will Computers Revolt? – Interview Series
Charles J. Simon, BSEE, MSCS, nationally-recognized entrepreneur, software developer and manager. With a broad management and technical expertise and degrees in both Electrical Engineering and Computer Sciences Mr. Simon has many years of computer experience in industry including pioneering work in AI and CAD (two generations of CAD).
He is also the author of ‘Will Computers Revolt‘, which offers an in-depth view at the future possibility of Artificial General Intelligence (AGI).
What was it that originally attracted you to AI, and specifically to AGI?
I’ve been fascinated by the question, “Can machines think?” ever since I first read Alan Turing’s seminal 1950 paper which begins with that question. So far, the answer is clearly, “No,” but there is no scientific reason why not. I joined the AI community with the initial neural network boom in the late 1980s and since then AI has made great strides. But the intervening thirty years haven’t brought understanding to our machines, an ability which would catapult numerous apps to new levels of usefulness.
You stated that you share the option of MIT AI expert Rodney Brooks who says, ‘that without interaction with an environment – without a robotic body as you will – machines will never exhibit AGI.’ This is basically stating that with insufficient inputs from a robotic body, the AI will never develop AGI capabilities. Outside of computer vision, what types of inputs are needed to develop AGI?
Today’s AI needs to be augmented with basic concepts like the physical existence of objects in a reality, the passage of time, cause and effect—concepts clear to any three-year-old. A toddler uses multiple senses to learn these concepts by touching and manipulating toys, moving through the home, learning language, etc. While it is possible to create an AGI with more limited senses, just as there are deaf people and blind people who are perfectly intelligent but more senses and abilities to interact makes solving the AGI problem easier.
For completeness my simulator can provide senses of smell and taste. It remains to be seen if these will also prove important to AGI.
You stated that ‘A Key Requirement for intelligence is an environment which is external to the intelligence’. The example you gave is that ‘it is unreasonable to expect IBM’s Watson to “understand” anything if it has no underlying idea of what a “thing” is’. This clearly plays in the current limitations of narrow AI, especially natural language processing. How can AI developers best overcome this current limitation of AI?
A key factor is storing knowledge which is not specifically verbal, visual, or tactile but as abstract “Things” which can have verbal, visual, and tactile attributes. Consider something as simple as the phrase, “a red ball”. You know what these words mean because of your visual and tactile experiences. You also know the meaning of related actions like throwing, bouncing, kicking, etc. which all come to mind to some extent when you hear the phrase. Any AI system which is specifically word-based or specifically image-based will miss out on the other levels of understanding.
I have implemented a Universal Knowledge Store which stores any kind of information in a brain-like structure where Things are analogous to neurons and have many attribute references to other Things—references are analogous to synapses. Thus, red and ball are individual Things and a red ball is a Thing which has attribute references to the red Thing and the ball Thing. Both red and ball have references to the corresponding Things for the words “red” and “ball”, each of which, in turn, have references to other Things which define how the words are heard, spoken, read, or spelled as well as possible action Things.
You’ve reached the conclusion that brain simulation of general intelligence is a long way off while AGI may be (relatively) just around the corner. Based on this statement, should we move on from attempting to emulate or create a simulation of the human brain, and just focus on AGI?
Today’s deep learning and related technologies are great for appropriate applications but will not spontaneously lead to understanding. To take the next steps, we need to add techniques specifically targeted at solving the problems which are within the capacity of any three-year-old.
Taking advantage of the intrinsic abilities of our computers can be orders of magnitude more efficient than the biological equivalent or any simulation of it. For example, your brain can store information in the chemistry of biological synapses over several iterations requiring 10-100 milliseconds. A computer can simply store the new synapse value in a single memory cycle, a billion times faster.
In developing AGI software, I have done both biological neural simulation and more efficient algorithms. Carrying forward with the Universal Knowledge Store, when simulated in simulated biological neurons, each Thing requires a minimum of 10 neurons and usually many more. This puts the capacity of the human brain somewhere between ten and a hundred million Things. But perhaps an AGI would appear intelligent if it comprehends only one million Things—well within the scope of today’s high-end desktop computers.
A key unknown is how much of the robot’s time should be allocated to processing and reacting to the world versus time spent imagining and planning. Can you briefly explain the importance of imagination to an AGI?
We can imagine many things and then only act on the ones we like, those which further our internal goals, if you will. The real power of imagination is being able to predict the future—a three-year-old can figure out which sequences of motion will lead her to a goal in another room and an adult can speculate on which words will have the greatest impact on others.
An AGI similarly will benefit from going beyond being purely reactive to speculating on various complex actions and choosing the best.
You believe that Asimov’s three laws of robotics are too simple and ambiguous. In your book you shared some ideas for recommended laws to be programmed in robots. Which laws do you feel are most important for a robot to follow?
New “laws of robotics” will evolve over years as AGI emerges. I propose a few starters:
- Maximize internal knowledge and understanding of the environment.
- Share that knowledge accurately with others (both AGI and human).
- Maximize the well-being of both AGIs and humans as a whole—not just as an individual.
You have some issues with the Turing Test and the concept behind it. Can you explain how you believe the Turing Test is flawed?
The Turing Test has served us well for fifty years as an ad-hoc definition of general intelligence but as AGI nears, we need to hone the definition and we need a clearer definition. The Turing Test is actually a test of how human one is, not how intelligent one is. The longer a computer can maintain the deception, the better it performs on the test. Obviously, asking the question, “Are you a computer?” and related proxy questions such as, “What is your favorite food?” are dead giveaways unless the AGI is programmed to deceive—a dubious objective at best.
Further, the Turing Test has motivated AI development into areas of limited value with (for example) chatbots with vast flexibility in responses but no underlying comprehension.
What would you do differently in your version of the Turing Test?
Better questions could probe specifically into the understanding of time, space, cause-and-effect, forethought, etc. rather than random questions without any particular basis in psychology, neuroscience, or AI. Here are some examples:
- What do you see right now? If you stepped back three feet, what differences would you see?
- If I [action], what would your reaction be?
- if you [action], what will my likely reactions be?
- Can you name three things which are like [object]?
Then, rather than evaluating responses as to whether they are indistinguishable from human responses, they should be evaluated in terms of whether or not they are reasonable responses (intelligent) based on the experience of the entity being tested.
You’ve stated that when faced with demands to perform some short-term destructive activity, properly programmed AGIs will simply refuse. How can we ensure that the AGI is properly programmed to begin with?
Decision-making is goal-based. In combination with an imagination, you (or an AGI) consider the outcome of different possible actions and choose the one which best achieves the goals. In humans, our goals are set by evolved instincts and our experience; an AGI’s goals are entirely up to the developers. We need to ensure that the goals of an AGI align with the goals of humanity as opposed to the personal goals of an individual. [Three possible goals as listed above.]
You’ve stated that it’s inevitable that humans will create an AGI, what’s your best estimate for a timeline?
Facets of AGI will begin to emerge within the coming decade but we won’t all agree that AGI has arrived. Eventually, we will agree that AGI has arrived when they exceed most human abilities by a substantial margin. This will take two or three decades longer.
For all the talks of AGI will it be real consciousness as we know it?
Consciousness manifests in a set of behaviors (which we can observe) which are based on an internal sensation (which we can’t observe). AGIs will manifest the behaviors; they need to in order to make intelligent decisions. But I contend that our internal sensation is largely dependent on our sensory hardware and instincts and so I can guarantee that whatever internal sensations an AGI might have, they will be different from a human’s.
The same can be said for emotions and our sense of free will. In making decisions, one’s belief in free will permeates every decision we make. If you don’t believe you have a choice, you simply react. For an AGI to make thoughtful decisions, it will likewise need to be aware of its own ability to make decisions.
Last question, do you believe that an AGI has more potential for good or bad?
I am optimistic that AGIs will help us to move forward as a species and bring us answers to many questions about the universe. The key will be for us to prepare and decide what our relationship will be with AGIs as we define their goals. If we decide to use the first AGIs as tools of conquest and enrichment, we shouldn’t be surprised if, down the road, they become their own tools of conquest and enrichment against us. If we choose that AGIs are tools of knowledge, exploration, and peace, then that’s what we’re likely to get in return. The choice is up to us.
Thank you for a fantastic interview exploring the future potential of building an AGI. For readers who wish to learn more they may read ‘Will Computers Revolt‘ or visit Charle’s website futureai.guru.
Noah Schwartz, Co-Founder & CEO of Quorum – Interview Series
Noah is an AI systems architect. Prior to founding Quorum, Noah spent 12 years in academic research, first at the University of Southern California and most recently at Northwestern as the Assistant Chair of Neurobiology. His work focused on information processing in the brain and he has translated his research into products in augmented reality, brain-computer interfaces, computer vision, and embedded robotics control systems.
Your interest in AI and robotics started as a little boy. How were you first introduced to these technologies?
The initial spark came from science fiction movies and a love for electronics. I remember watching the movie, Tron, as an 8-year old, followed by Electric Dreams, Short Circuit, DARYL, War Games, and others over the next few years. Although it was presented through fiction, the very idea of artificial intelligence blew me away. And even though I was only 8-years old, I felt this immediate connection and an intense pull toward AI that has never diminished in the time since.
How did your passions for both evolve?
My interest in AI and robotics developed in parallel with a passion for the brain. My dad was a biology teacher and would teach me about the body, how everything worked, and how it was all connected. Looking at AI and looking at the brain felt like the same problem to me – or at least, they had the same ultimate question, which was, How is that working? I was interested in both, but I didn’t get much exposure to AI or robotics in school. For that reason, I initially pursued AI on my own time and studied biology and psychology in school.
When I got to college, I discovered the Parallel Distributed Processing (PDP) books, which was huge for me. They were my first introduction to actual AI, which then led me back to the classics such as Hebb, Rosenblatt, and even McCulloch and Pitts. I started building neural networks based on neuroanatomy and what I learned from biology and psychology classes in school. After graduating, I worked as a computer network engineer, building complex, wide-area-networks, and writing software to automate and manage traffic flow on those networks – kind of like building large brains. The work reignited my passion for AI and motivated me to head to grad school to study AI and neuroscience, and the rest is history.
Prior to founding Quorum, you spent 12 years in academic research, first at the University of Southern California and most recently at Northwestern as the Assistant Chair of Neurobiology. At the time your work focused on information processing in the brain. Could you walk us through some of this research?
In a broad sense, my research was trying to understand the question: How does the brain do what it does using only what it has available? For starters, I don’t subscribe to the idea that the brain is a type of computer (in the von Neumann sense). I see it as a massive network that mostly performs stimulus-response and signal-encoding operations. Within that massive network there are clear patterns of connectivity between functionally specialized areas. As we zoom in, we see that neurons don’t care what signal they’re carrying or what part of the brain they’re in – they operate based on very predictable rules. So if we want to understand the function of these specialized areas, we need to ask a few questions: (1) As an input travels through the network, how does that input converge with other inputs to produce a decision? (2) How does the structure of those specialized areas form as a result of experience? And (3) how do they continue to change as we use our brains and learn over time? My research tried to address these questions using a mixture of experimental research combined with information theory and modeling and simulation – something that could enable us to build artificial decision systems and AI. In neurobiology terms, I studied neuroplasticity and microanatomy of specialized areas like the visual cortex.
You then translated your work into augmented reality, and brain-computer interfaces. What were some of the products you worked on?
Around 2008, I was working on a project that we would now call augmented reality, but back then, it was just a system for tracking and predicting eye movements, and then using those predictions to update something on the screen. To make the system work in realtime, I built a biologically-inspired model that predicted where the viewer would based on their microsaccades – tiny eye movements that occur just before you move your eye. Using this model, I could predict where the viewer would look, then update the frame buffer in the graphics card while their eyes were still in motion. By the time their eyes reached that new location on the screen, the image was already updated. This ran on an ordinary desktop computer in 2008, without any lag. The tech was pretty amazing, but the project didn’t get through to the next round of funding, so it died.
In 2011, I made a more focused effort at product development and built a neural network that could perform feature discovery on streaming EEG data that we measured from the scalp. This is the core function of most brain-computer interface systems. The project was also an experiment in how small of a footprint could we get this running on? We had a headset that read a few channels of EEG data at 400Hz that were sent via Bluetooth to an Android phone for feature discovery and classification, then sent to an Arduino-powered controller that we retrofitted into an off-the-shelf RC car. When in use, an individual who was wearing the EEG headset could drive and steer the car by changing their thoughts from doing mental math to singing a song. The algorithm ran on the phone and created a personalized brain “fingerprint” for each user, enabling them to switch between a variety of robotic devices without having to retrain on each device. The tagline we came up with was “Brain Control Meets Plug-and-Play.”
In 2012, we extended the system so it operated in a much more distributed manner on smaller hardware. We used it to control a multi-segment, multi-joint robotic arm in which each segment was controlled by an independent processor that ran an embedded version of the AI. Instead of using a centralized controller to manipulate the arm, we allowed the segments to self-organize and reach their target in a swarm-like, distributed manner. In other words, like ants forming an ant bridge, the arm segments would cooperate to reach some target in space.
We continued moving in this same direction when we first launched Quorum AI – originally known as Quorum Robotics – back in 2013. We quickly realized that the system was awesome because of the algorithm and architecture, not the hardware, so in late 2014, we pivoted completely into software. Now, 8 years later, Quorum AI is coming full-circle, back to those robotics roots by applying our framework to the NASA Space Robotics Challenge.
Quitting your job as a professor to launch a start-up had to have been a difficult decision. What inspired you to do this?
It was a massive leap for me in a lot of ways, but once the opportunity came up and the path became clear, it was an easy decision. When you’re a professor, you think in multi-year timeframes and you work on very long-range research goals. Launching a start-up is the exact opposite of that. However, one thing that academic life and start-up life have in common is that both require you to learn and solve problems constantly. In a start-up, that could mean trying to re-engineer a solution to reduce product development risk or maybe studying a new vertical that could benefit from our tech. Working in AI is the closest thing to a “calling” as I’ve ever felt, so despite all the challenges and the ups and downs, I feel immensely lucky to be doing the work that I do.
You’ve since then developed Quorum AI, which develops realtime, distributed artificial intelligence for all devices and platforms. Could you elaborate on what exactly this AI platform does?
The platform is called the Environment for Virtual Agents (EVA), and it enables users to build, train, and deploy models using our Engram AI Engine. Engram is a flexible and portable wrapper that we built around our unsupervised learning algorithms. The algorithms are so efficient that they can learn in realtime, as the model is generating predictions. Because the algorithms are task-agnostic, there is no explicit input or output to the model, so predictions can be made in a Bayesian manner for any dimension without retraining and without suffering from catastrophic forgetting. The models are also transparent and decomposable, meaning they can be examined and broken apart into individual dimensions without losing what has been learned.
Once built, the models can be deployed through EVA to any type of platform, ranging from custom embedded hardware or up to the cloud. EVA (and the embeddable host software) also contain several tools to extend the functionality of each model. A few quick examples: Models can be shared between systems through a publication/subscription system, enabling distributed systems to achieve federated learning over both time and space. Models can also be deployed as autonomous agents to perform arbitrary tasks, and because the model is task-agnostic, the task can be changed during runtime without retraining. Each individual agent can be extended with a private “virtual” EVA, enabling the agent to simulate models of other agents in a scale-free manner. Finally, we’ve created some wrappers for deep learning and reinforcement learning (Keras-based) systems to enable these models to operate on the platform, in concert with more flexible Engram-based systems.
You’ve previously described the Quorum AI algorithms as “mathematical poetry”. What did you mean by this?
When you’re building a model, whether you’re modeling the brain or you’re modeling sales data for your enterprise, you start by taking an inventory of your data, then you try out known classes of models to try and approximate the system. In essence, you are creating rough sketches of the system to see what looks best. You don’t expect things to fit the data very well, and there’s some trial and error as you test different hypotheses about how the system works, but with some finesse, you can capture the data pretty well.
As I was modeling neuroplasticity in the brain, I started with the usual approach of mapping out all the molecular pathways, transition states, and dynamics that I thought would matter. But I found that when I reduced the system to its most basic components and arranged those components in a particular way, the model got more and more accurate until it fit the data almost perfectly. It was like every operator and variable in the equations were exactly what they needed to be, there was nothing extra, and everything was essential to fitting the data.
When I plugged the model into larger and larger simulations, like visual system development or face recognition, for instance, it was able to form extremely complicated connectivity patterns that matched what we see in the brain. Because the model was mathematical, those brain patterns could be understood through mathematical analysis, giving new insight into what the brain is learning. Since then, we’ve solved and simplified the differential equations that make up the model, improving computational efficiency by multiple orders of magnitude. It may not be actual poetry, but it sure felt like it!
Quorum AI’s platform toolkit enables devices to connect to one another to learn and share data without needing to communicate through cloud-based servers. What are the advantages of doing it this way versus using the cloud?
We give users the option of putting their AI anywhere they want, without compromising the functionality of the AI. The status quo in AI development is that companies are usually forced to compromise security, privacy, or functionality because their only option is to use cloud-based AI services. If companies do try to build their own AI in-house, it often requires a lot of money and time, and the ROI is rarely worth the risk. If companies want to deploy AI to individual devices that are not cloud-connected, the project quickly becomes impossible. As a result, AI adoption becomes a fantasy.
Our platform makes AI accessible and affordable, giving companies a way to explore AI development and adoption without the technical or financial overhead. And moreover, our platform enables users to go from development to deployment in one seamless step.
Our platform also integrates with and extends the shelf-life of other “legacy” models like deep learning or reinforcement learning, helping companies repurpose and integrate existing systems into newer applications. Similarly, because our algorithms and architectures are unique, our models are not black boxes, so anything that the system learns can be explored and interpreted by humans, and then extended to other areas of business.
It’s believed by some that Distributed Artificial Intelligence (DAI), could lead the way to Artificial General Intelligence (AGI). Do you subscribe to this theory?
I do, and not just because that’s the path we’ve set out for ourselves! When you look at the brain, it’s not a monolithic system. It’s made up of separate, distributed systems that each specialize in a narrow range of brain functions. We may not know what a particular system is doing, but we know that its decisions depend significantly on the type of information it’s receiving and how that information changes over time. (This is why neuroscience topics like the connectome are so popular.)
In my opinion, if we want to build AI that is flexible and that behaves and performs like the brain, then it makes sense to consider distributed architectures like those that we see in the brain. One could argue that deep learning architectures like multi-layer networks or CNNs can be found in the brain, and that’s true, but those architectures are based on what we knew about the brain 50 years ago.
The alternative to DAI is to continue iterating on monolithic, inflexible architectures that are tightly coupled to a single decision space, like those that we see in deep learning or reinforcement learning (or any supervised learning method, for that matter). I would suggest that these limitations are not just a matter of parameter tweaking or adding layers or data conditioning – these issues are fundamental to deep learning and reinforcement learning, at least as we define them today, so new approaches are required if we’re going to continue innovating and building the AI of tomorrow.
Do you believe that achieving AGI using DAI is more likely than reinforcement learning and/or deep learning methods that are currently being pursued by companies such as OpenAI and DeepMind?
Yes, although from what they’re blogging about, I suspect OpenAI and DeepMind are using more distributed architectures than they let on. We’re starting to hear more about multi-system challenges like transfer learning or federated/distributed learning, and coincidentally, about how deep learning and reinforcement learning approaches aren’t going to work for these challenges. We’re also starting to hear from pioneers like Yoshua Bengio about how biologically-inspired architectures could bridge the gap! I’ve been working on biologically-inspired AI for almost 20 years, so I feel very good about what we’ve learned at Quorum AI and how we’re using it to build what we believe is the next generation of AI that will overcome these limitations.
Is there anything else that you would like to share about Quorum AI?
We will be previewing our new platform for distributed and agent-based AI at the Federated and Distributed Machine Learning Conference in June 2020. During the talk, I plan to present some recent data on several topics, including sentiment analysis as a bridge to achieving empathic AI.
I would like to give a special thank you to Noah for these amazing answers, and I would recommend that you visit the Quorum to learn more.