Researchers from Cornell University have created a computer algorithm inspired by the mammalian olfactory system. Scientists have long sought out explanations of how mammals learn and identify smells. The new algorithm provides insight into the workings of the brain, and applying it to a computer chip allows it to quickly and reliably learn patterns better than current machine learning models.
Thomas Cleland is a professor of psychology and senior author of the study titled “Rapid Learning and Robust Recall in a Neuromorphic Olfactory Circuit,” published in Nature Machine Intelligence on March 16.
“This is a result of over a decade of studying olfactory bulb circuitry in rodents and trying to figure out essentially how it works, with an eye towards things we know animals can do that our machines can’t,” Cleland said.
“We now know enough to make this work. We’ve built this computational model based on this circuitry, guided heavily by things we know about the biological systems’ connectivity and dynamics,” he continued. “Then we say, if this were so, this would work. And the interesting part is that it does work.”
Intel Computer Chip
Cleland was joined by co-author Nabil Imam, a researcher at Intel, and together they applied the algorithm to an Intel computer chip. The chip is called Loihi, and it is neuromorphic, which means it is inspired by the functions of the brain. The chip has digital circuits that mimic the way in which neurons learn and communicate.
The Loihi chip relies on parallel cores that communicate via discrete spikes, and each one of these spikes has an effect that can change depending on local activity. This requires different strategies for algorithm design than what is used in existing computer chips.
Through the use of neuromorphic computer chips, machines could work a thousand times faster than a computer’s central or graphics processing units at identifying patterns and carrying out certain tasks.
The Loihi research chip can also run certain algorithms while using around a thousand times less power than traditional methods. This is well-suited for the algorithm, which can accept input patterns from various different sensors, learn patterns quickly and sequentially, and identify each of the meaningful patterns even with strong sensory interference. The algorithm is capable of successfully identifying odors, and it can do so when the pattern is an astounding 80% different from the pattern originally learned by the computer.
“The pattern of the signal has been substantially destroyed,” Cleland said, “and yet the system is able to recover it.”
The Mammalian Brain
The brain of a mammal is able to identify and remember smells extremely well, and there can be thousands of olfactory receptors and complex neural networks working to analyze the patterns associated with odors. One of the things that mammals can do better than artificial intelligence systems is retain what they’ve learned, even after there is new knowledge. In deep learning approaches, the network must be presented with everything at once, since new information can affect or even destroy what the system previously learned.
“When you learn something, it permanently differentiates neurons,” Cleland said. “When you learn one odor, the interneurons are trained to respond to particular configurations, so you get that segregation at the level of interneurons. So on the machine side, we just enhance that and draw a firm line.”
Cleland spoke about how the team came up with new experimental approaches.
“When you start studying a biological process that becomes more intricate and complex than you can just simply intuit, you have to discipline your mind with a computer model,” he said. “You can’t fuzz your way through it. And that led us to a number of new experimental approaches and ideas that we wouldn’t have come up with just by eyeballing it.”
Ingo Mierswa, Founder & President at RapidMiner, Inc – Interview Series
Ingo Mierswa is the Founder & President at RapidMiner, Inc. RapidMiner brings artificial intelligence to the enterprise through an open and extensible data science platform. Built for analytics teams, RapidMiner unifies the entire data science lifecycle from data prep to machine learning to predictive model deployment. More than 625,000 analytics professionals use RapidMiner products to drive revenue, reduce costs, and avoid risks.
What was your inspiration behind launching RapidMiner?
I had worked in the data science consultancy business for many years and I saw a need for a platform that was more intuitive and approachable for people without a formal education in data science. Many of the existing solutions at the time relied on coding and scripting and they simply were not user-friendly. Furthermore, it made data difficult to manage and maintain the solutions that were developed within those platforms. Basically, I realized that these projects didn’t need to be so difficult so, we started to create the RapidMiner platform to allow anyone to be a great data scientist.
Can you discuss the full transparency governance that is currently being utilized by RapidMiner?
When you can’t explain a model, it’s quite hard to tune, trust and translate. A lot of data science work is the communication of the results to others so that stakeholders can understand how to improve processes. This requires trust and deep understanding. Also, issues with trust and translation can make it very hard to overcome the corporate requirements to get a model into production. We are fighting this battle in a few different ways:
As a visual data science platform, RapidMiner inherently maps out an explanation for all data pipelines and models in a highly consumable format that can be understood by data scientists or non-data scientists. It makes models transparent and helps users in understanding model behavior and evaluating its strengths and weaknesses and detecting potential biases.
In addition, all models created in the platform come with extensive visualizations for the user – typically the user creating the model – to gain model insights, understand model behavior and evaluate model biases.
RapidMiner also provides model explanations – even when in production: For each prediction created by a model, RapidMiner generates and adds the influence factors that have led to or influenced the decisions made by that model in production.
Finally – and this is very important to me personally as I was driving this with our engineering teams a couple of years ago – RapidMiner also provides an extremely powerful model simulator capability, which allows users to simulate and observe the model behavior based on input data provided by the user. Input data can be set and changed very easily, allowing the user to understand the predictive behavior of the models on various hypothetical or real-world cases. The simulator also displays factors that influence the model’s decision. The user – in this case even a business user or domain expert – can understand model behavior, validate the model’s decision against real outcomes or domain knowledge and identify issues. The simulator allows you to simulate the real world and have a look into the future – into your future, in fact.
How does RapidMiner use deep learning?
RapidMiner’s use of deep learning somethings we are very proud of. Deep learning can be very difficult to apply and non-data-scientists often struggle with setting up those networks without expert support. RapidMiner makes this process as simple as possible for users of all types. Deep learning is, for example, part of our Auto machine learning (ML) product called RapidMiner Go. Here the user does not need to know anything about deep learning to make use of those types of sophisticated models. In addition, power users can go deeper and use popular deep learning libraries like Tensorflow, Keras, or DeepLearning4J right from the visual workflows they are building with RapidMiner. This is like playing with building blocks and simplifies the experience for users with fewer data science skills. Through this approach our users can build flexible network architectures with different activation functions and user-defined number of layers and nodes, multiple layers with different numbers of nodes, and choose from different training techniques.
What other type of machine learning is used?
All of them! We offer hundreds of different learning algorithms as part of the RapidMiner platform – everything you can apply in the widely-used data science programming languages Python and R. Among others, RapidMiner offers methods for Naive Bayes, regression such as Generalized Linear Models, clustering such as k-Means, FP-Growth, Decision Trees, Random Forests, Parallelized Deep Learning, and Gradient Boosted Trees. These and many more are all a part of the modeling library of RapidMiner and can be used with a single click.
Can you discuss how the Auto Model knows the optimal values to be used?
RapidMiner AutoModel uses intelligent automation to accelerate everything users do and ensure accurate, sound models are built. This includes instance selection and automatic outlier removal, feature engineering for complex data types such as dates or texts, and full multi-objective automated feature engineering to select the optimal features and construct new ones. Auto Model also includes other data cleaning methods to fix common issues in data such as missing values, data profiling by assessing the quality and value of data columns, data normalization and various other transformations.
Auto Model also extracts data quality meta data – for example, how much a column behaves like an ID or whether there are lots of missing values. This meta data is used in addition to the basic meta data in automating and assisting users in ‘using the optimal values’ and dealing with data quality issues.
For more detail, we’ve mapped it all out in our Auto Model Blueprint. (Image below for extra context)
There are four basic phases where the automation is applied:
– Data prep: Automatic analysis of data to identify common quality problems like correlations, missing values, and stability.
– Automated model selection and optimization, including full validation and performance comparison, that suggests the best machine learning techniques for given data and determines the optimal parameters.
– Model simulation to help determine the specific (prescriptive) actions to take in order to achieve the desired outcome predicted by the model.
– In the model deployment and operations phase, users are shown factors like drift, bias and business impact, automatically with no extra work required.
Computer bias is an issue with any type of AI, are there any controls in place to prevent bias from creeping up in results?
Yes, this is indeed extremely important for ethical data science. The governance features mentioned before ensure that users can always see exactly what data has been used for model building, how it was transformed, and whether there is bias in the data selection. In addition, our features for drift detection are another powerful tool to detect bias. If a model in production demonstrates a lot of drift in the input data, this can be a sign that the world has changed dramatically. However, it can also be an indicator that there was severe bias in the training data. In the future, we are considering to going even one step further and building machine learning models which can be used to detect bias in other models.
Can you discuss the RapidMiner AI Cloud and how it differentiates itself from competing products?
The requirements for a data science project can be large, complex and compute intensive, which is what has made the use of cloud technology such an attractive strategy for data scientists. Unfortunately, the various native cloud-based data science platforms tie you to cloud services and data storage offerings of that particular cloud vendor.
The RapidMiner AI Cloud is simply our cloud service delivery of the RapidMiner platform. The offering can be tailored to any customer’s environment, regardless of their cloud strategy. This is important these days as most businesses’ approach to cloud data management is evolving very quickly in the current climate. Flexibility is really what sets RapidMiner AI Cloud apart. It can run in any cloud service, private cloud stack or in a hybrid setup. We are cloud portable, cloud agnostic, multi-cloud – whatever you prefer to call it.
RapidMiner AI Cloud is also very low hassle, as of course, we offer the ability manage all or part of the deployment for clients so they can focus on running their business with AI, not the other way around. There’s even an on-demand option, which allows you spin up an environment as needed for short projects.
RapidMiner Radoop eliminates some of the complexity behind data science, can you tell us how Radoop benefits developers?
Radoop is mainly for non-developers who want to harness the potential of big data. RapidMiner Radoop executes RapidMiner workflows directly inside Hadoop in a code-free manner. We can also embed the RapidMiner execution engine in Spark so it’s easy to push complete workflows into Spark without the complexity that comes from code-centric approaches.
Would a government entity be able to use RapidMiner to analyze data to predict potential pandemics, similar to how BlueDot operates?
As a general data science and machine learning platform, RapidMiner is meant to streamline and enhance the model creation and management process, no matter what subject matter or domain is at the center of the data science/machine learning problem. While our focus is not on predicting pandemics, with the right data a subject matter expert (like a virologist or epidemiologist, in this case) could use the platform to create a model that could accurately predict pandemics. In fact, many researchers do use RapidMiner – and our platform is free for academic purposes.
Is there anything else that you would like to share about RapidMiner?
Give it a try! You may be surprised how easy data science can be and how much a good platform can improve you and your team’s productivity.
Thank you for this great interviewer, readers who wish to learn more should visit RapidMiner.
Researchers Develop AI Capable of Detecting and Classifying Galaxies
Researchers at UC Santa Cruz have developed Morpheus, a computer program that is capable of analyzing the pixels in astronomical image data. It can then identify and classify all of the galaxies and stars that exist in large data sets that come from astronomy surveys.
What is Morpheus
Morpheus is a deep-learning framework that consists of various different artificial intelligence (AI) technologies. The AI technologies focus on certain applications like image and speech recognition.
Brant Robertson is a professor of astronomy and astrophysics. He is in charge of the Computational Astrophysics Research Group at UC Santa Cruz. According to Robertson, certain tasks that were traditionally done by astronomers need to be automated. This is because the sizes of astronomy data sets are constantly increasing.
“There are some things we simply cannot do as humans, so we have to find ways to use computers to deal with the huge amount of data that will be coming in over the next few years from large astronomical survey projects,” he said.
Ryan Hausen is a computer science graduate student at UCSC’s Baskin School of Engineering. He collaborated on Morpheus with Anderson over the past two years.
Their results were published on May 12 in the Astrophysical Journal Supplement Series. The Morpheus code will also be released to the public and there will be online demonstrations.
Morphologies of Galaxies
Astronomers are able to learn how galaxies form and evolve through time by observing the morphologies of galaxies.
There are some large-scale surveys that are set to take place which will generate massive amounts of image data that can be used. One of those surveys is the Legacy Survey of Space and Time (LSST), and it will be conducted at the Vera Rubin Observatory in Chile.
Robertson has been actively working on ways to use the data to better understand the formation and evolution of galaxies.
When the LSST is conducted, it will take over 800 panoramic images per night with a 3.2 billion pixel camera. Two times each week, the LSST will also record the entire visible sky.
“Imagine if you went to astronomers and asked them to classify billions of objects — how could they possibly do that? Now we’ll be able to automatically classify those objects and use that information to learn about galaxy evolution,” Robertson said.
Deep-Learning Technology for Galaxies
Deep-learning technology has been used by some astronomers to classify galaxies, but it usually requires existing image recognition algorithms to be adapted. The algorithms are traditionally fed curated images of galaxies.
Morpheus was developed specifically for astronomical image data. It uses the original image data, which is in the standard digital format used by astronomers.
According to Robertson, one of the main points of Morpheus is pixel-level classification.
“With other models, you have to know something is there and feed the model an image, and it classifies the entire galaxy at once,” he said. “Morpheus discovers the galaxies for you, and does it pixel by pixel, so it can handle very complicated images, where you might have a spheroidal right next to a disk. For a disk with a central bulge, it classifies the bulge separately. So it’s very powerful.”
The researchers utilized information from a 2015 study in order to train the deep-learning algorithm. The study collected data and classified around 10,000 galaxies in Hubble Space Telescope images from the CANDELS survey. Morpheus was then applied to image data from the Hubble Legacy Fields.
After processing an image of a part of the sky, Morpheus then generates a new set of images of that same area, and it color-codes all objects based on their morphology. Astronomical objects are separated from the background, and it identifies stars and different types of galaxies. The program runs on USCS’s lux supercomputer, where a pixel-by-pixel analysis for the entire data set is quickly generated.
“Morpheus provides detection and morphological classification of astronomical objects at a level of granularity that doesn’t currently exist,” Hausen said.
The work completed by the researchers was supported by NASA and the National Science Foundation.
AI Could Help Researchers Determine Which Papers Can Be Replicated, Aims To Address Reproduction Crisis
More and more attention is being paid in recent years to what scholars and researchers dub the replication/reproducability crisis. Many studies simply fail to give the same significant results when replication of the study is attempted, and as a result, the scientific community is concerned that findings are often overemphasized. The problem affects fields as diverse as psychology and artificial intelligence. When it comes to the AI field, many non-peer reviewed papers are published purporting impressive results that other researchers cannot reproduce. In order to tackle the problem and reduce the number of non-reproducible studies, researchers have designed an AI model that aims to determine which papers can be replicated.
As reported by Fortune, a new paper published by a team of researchers from the Kellog School of Management and the Institute of Complex Systems at Northwestern University presents a deep learning model that can potentially determine which studies are likely to be reproducible, and which studies aren’t. If the AI system can reliably discriminate between reproducible and non-reproducible studies, it could help universities, research institutes, companies, and other entities filter through thousands of research papers to determine which papers are most likely to be useful and reliable.
The AI systems developed by the Northwestern team doesn’t utilize the type of empirical/statistical evidence that researchers typically use to ascertain the validity of studies. The model actually employs natural language processing techniques to try and quantify the reliability of a paper. The system extracts patterns in the language used by the authors of a paper, finding that some word patterns indicate greater reliability than others.
The research team drew upon psychological research as old as the 1960’s, which found that people often communicate the level of confidence they have in their ideas through the words that they use. Running with this idea, the researchers thought paper authors might unknowingly signal their confidence in their research findings when writing their papers. The researchers conducted two rounds of training, utilizing different datasets. Initially, the model was trained on approximately two million abstracts from scientific papers, while the second time the model was trained on full papers to take from a project intended to determine which psychology papers can be reproduce – the Reproducibility Project: Psychology.
After testing, the researchers deployed the model on a collection of hundreds of other papers, taken from various fields like psychology and economics. The researchers found that their model gave a more reliable prediction regarding a paper’s reproducibility than the statistical techniques typically used to ascertain whether or not a paper’s results can be replicated.
Researcher and Kellog School of Management Professor Brian Uzzi, explained to Fortune that while he is hopeful that the AI model could someday be used to help researchers ascertain how likely results are to be reproduced, the research team is unsure of the patterns and details their model learned. The fact that machine learning models are often black boxes is a common problem within AI research, but this fact could make other scientists hesitant to utilize the model.
Uzzi explained that the research team hopes that the model could potentially be used to tackle the coronavirus crisis, helping scientists more quickly understand the virus and determine which study results are promising. As Uzzi said to Fortune:
“We want to begin to apply this to the COVID issue—an issue right now where a lot of things are becoming lax, and we need to build on a very strong foundation of prior work. It’s unclear what prior work is going to be replicated or not and we don’t have time for replications.”
Uzzi and the other researchers are hoping to improve the model by making use of further natural language processing techniques, including techniques that the team created to analyze call transcripts regarding corporate earnings. The research team has already built a database of approximately 30,000 call transcripts that they will analyze for clues. If the team can build a successful model, they might be able to convince analysts and investors to use the tool, which could pave the way for other innovative uses for the model and its techniques.
- AI Model Might Let Game Developers Generate Lifelike Animations
- Akilesh Bapu, Founder & CEO of DeepScribe – Interview Series
- AI Models Trained On Sex Biased Data Perform Worse At Diagnosing Disease
- Stefano Pacifico, and David Heeger, Co-Founders of Epistemic AI – Interview Series
- New Software Developed to Improve Robotic Prosthetics