Epistemic AI employs state-of-the-art Natural Language Processing (NLP), machine learning and deep learning algorithms to map relations among a growing body of biomedical knowledge, from multiple public and private sources, including text documents and databases. Through a process of Knowledge Mapping, users’ work interactively with the platform to map and understand subsets of biomedical knowledge, which reveals concepts and relationships and that are otherwise missed with traditional search.
We interviewed both Co-Founders of Epistemic AI to discuss these latest advances.
Stefano Pacifico comes from 10+ years in applied AI and NLP development. Formerly at Bloomberg, where he spent 7 years, and was at Elemental Cognition before starting Epistemic.
David Heeger is a Silver Professor of data science and neuroscience at NYU, and has spent his career bridging computer science, AI and bioscience. He is a member of the National Academy of Sciences. As founders they bring together the expertise of building applied large-scale AI and NLP systems for understanding large collections of knowledge, with expertise in computational biology and biomedical science from years of research in the area.
What is it that introduced and attracted you to AI and Natural Language Processing (NLP)?
Stefano Pacifico: When I was in college in Rome, and AI was not popular at all (in fact it was very fringe), I asked my then advisor what specialization I should have taken among those available. He said: “If you want to make money, Software Engineering and Databases, but if you want to be weird but very advanced, then choose Artificial Intelligence”. I was sold at “weird”. I then started working on knowledge representation and reasoning to study how autonomous agents could play soccer or rescue people. Then two realizations made me fall in love with NLP: first, autonomous agents might have to communicate with natural language among themselves! Second, building formal knowledge bases by hand is hard, while natural language (in text) already provides the largest knowledge base of all. I know today these might seem obvious observations, but they were not as mainstream before.
What was the inspiration behind launching Epistemic AI?
Stefano Pacifico: I am going to make a bold claim. Nobody today has adequate tooling to understand and connect the knowledge present in large, ever-growing collections of documents and data. I had previously worked on that problem in the world of finance. Think of news, financial statements, pricing data, corporate actions, filings etc. I found that problem intoxicating. And of course, it’s a difficult problem; and an important one! When I met my co-founder, Dr. David Heeger, we spent quite a bit of time evaluating startup opportunities in the biomedical industry. When we realized the sheer volume of information generated in this field, it’s as if everything fell in its right place. Biomedical researchers struggle with information overload, while attempting to grapple with the vast and rapidly expanding base of biomedical knowledge, including documents (e.g., papers, patents, clinical trials) and databases (e.g., genes, proteins, pathways, drugs, diseases, medical terms). This is a major pain point for researchers and, with no appropriate solution available, they are forced to use basic search tools (PubMed and Google Scholar) and explore manually-curated databases. These tools are suitable for finding documents matching keywords (e.g., a single gene or a published journal paper), but not for acquiring comprehensive knowledge about a topic area or subdomain (e.g., COVID-19), or for interpreting the results of high throughput biology experiments, such as gene sequencing, protein expression, or screening chemical compounds. We started Epistemic AI with the idea to address this problem with a platform that allows them to iteratively:
- Shorten the time to gather information and build comprehensive knowledge maps
- Surface cross-disciplinary information that can be otherwise difficult to find (real discoveries often come from looking into the white space between disciplines);
- Identify causal hypotheses by finding paths and missing links in your knowledge map.
What are some of both the public and private sources that are used to map these relations?
Stefano Pacifico: At this time, we are ingesting all the publicly available sources that we can get our hands on, including Pubmed and clinicaltrials.gov. We ingest databases of genes, drugs, diseases and their interactions. We also include private data sources for select clients, but we are not at liberty to disclose any details yet.
What type of machine learning technologies are used for the knowledge mapping?
Stefano Pacifico: One of the deeply held beliefs at Epistemic AI is that zealotry is not helpful for building products. Building an architecture integrating several machine learning techniques was a decision made early on, and those range from Knowledge Representation to Transformer models, through graph embeddings, but include also simpler models like regressions and random forests. Each component is as simple as it needs to be, but no simpler. While we believe to have already built NLP components that are state-of-the-art for certain tasks, we don’t shy away from simpler baseline models when possible.
Can you name some of the companies, non-profits, or academic institutions that are using the Epistemic platform?
Stefano Pacifico: While I’d love to, we have not agreed with our users to do so. I can say that we had people signing up from very high-profile institutions in all three segments (companies, non-profits, and academic institutions). Additionally, we intend to keep the platform free for academic/non-profit purposes.
How does Epistemic assist researchers in Identifying central nervous system (CNS) and other disease-specific biomarkers?
Dr. David Heeger: Neuroscience is a very highly interdisciplinary field including molecular and cellular biology and genomics, but also psychology, chemistry, and principles of physics, engineering, and mathematics. It’s so broad that nobody can be an expert at all of it. Researchers at academic institutions and pharma/biotech companies are forced to specialize. But we know that the important insights are interdisciplinary, combining knowledge from the sub-specialties. The AI-powered software platform that we’re building enables everyone to be much more interdisciplinary, to see the connections between their individual subarea of expertise and other topics, and to identify new hypotheses. This is especially important in neuroscience because it is such a highly interdisciplinary field to begin with. The function and dysfunction of the human brain is the most difficult problem that science has ever faced. We are on a mission to change the way that biomedical scientists work and even how they think.
Epistemic also enables the discovery of genetic mechanisms of CNS disorders. Can you walk us through how this works?
Dr. David Heeger: Most neurological diseases, psychiatric illnesses, and developmental disorders do not have a simple explanation in terms of genetic differences. There are a handful of syndromic disorders for which a specific mutation is known to cause the disorder. But that’s not typically the case. There are hundreds of genetic differences, for example, that have been associated with autism spectrum disorders (ASD). There is some understanding for some of these genes about the functions they serve in terms of basic biology. For example, some of the genes associated with ASD hold synapses together in the brain (note, however, that the same genes typically perform different functions in other organ systems in the body). But there’s very little understanding about how these genetic differences can explain the complex suite of behavioral differences exhibited by individuals with ASD. To make matters worse, two individuals with the same genetic difference may have completely different outcomes, one diagnosed with ASD and the other, not. And two individuals with completely different genetic profiles may have the same outcome with very similar behavioral deficits. To understand all this requires making the connection from genomics and molecular biology to cellular neuroscience (how do the genetic differences cause individual neurons to function differently) and then to systems neuroscience (how do those differences in cellular function cause networks of large numbers of interconnected neurons to function differently) and then to psychology (how do those differences in neural network function cause differences in cognition, emotion, and behavior). And all of this needs to be understood from a developmental perspective. A genetic difference may cause a deficit in a particular aspect of neural function. But the brain doesn’t just sit there and take it. Brains are highly adaptive. If there’s a missing or broken mechanism then the brain will develop differently to compensate as much as possible. This compensation might be molecular, for example, upregulating another synaptic receptor to replace the function of a broken synaptic receptor. Or the compensation might be behavioral. The end result depends not only on the initial genetic difference but also on the various attempts to compensate relying on other molecular, cellular, circuit, systems, and behavioral mechanisms.
No individual has the knowledge to understand all this. We all need help. The AI-powered software platform that we’re building enables everyone to collect and link all the relevant biomedical knowledge, to see the connections and to identify new hypotheses.
How are biopharma and academic institutions using Epistemic to tackle the COVID-19 challenge?
Stefano Pacifico: We have released a public version of our platform that includes COVID specific datasets and is freely accessible to anyone doing research on COVID-19. It is available at https://covid.epistemic.ai
What are some of the other diseases or genetic issues that Epistemic have been used for?
Stefano Pacifico: We have collaborated with autism researchers and are most recently putting together a new research effort for Cystic Fibrosis. But we are happy to collaborate with any other researchers or institutions that might need help with their research.
Is there anything else that you would like to share about Epistemic?
Stefano Pacifico: We are building a movement of people that want to change the way biomedical researchers work and think. We sincerely hope that many of your readers will want to join us!
Thank you both for taking the time to answer our questions. Readers who wish to learn more should visit Epistemic AI.