Connect with us


Mo Abdolell, CEO & Founder, Densitas Inc – Interview Series




Mo Abdolell, CEO & Founder, Densitas Inc - Interview Series

Mo Abdolell is CEO/Founder of Densitas Inc., a company focused on the mammography enterprise that delivers machine learning solutions for personalized breast health with technologies focused on breast density, clinical image quality, tailored risk.

Mo is also Associate Professor, Diagnostic Radiology, Dalhousie University and as a consulting biostatistician has 25 years experience in study design, statistical analysis and machine learning in biomedical/clinical research.

Can you walk us through your journey behind launching Densitas, Inc?

From as early as I can remember I wanted to help improve the health of underserved populations globally.  I wanted to work for the World Health Organization in some capacity that made good use of my mathematical inclinations.  Well that never quite worked out.  Instead, I studied biostatistics in graduate school at the Dalla Lana School of Public Health, University of Toronto, completing my thesis in machine learning before moving on to work in hospitals and health research institutions.  I have taught and supervised graduate students in diverse areas including biostatistics, epidemiology, biomedical engineering and health/medical informatics with a focus on diagnostic imaging.  This has afforded me a front row seat at the intersection of diverse scientific disciplines and has given me a broad perspective on how artificial intelligence can be used to deliver better patient outcomes.

Someone who had a big impact on the decision to launch Densitas has been Dr. Judy Caines, who was the founding medical director of the Nova Scotia Breast Screening Program in Halifax, Canada.  Dr. Caines was a true luminary and innovator in her clinical practice with an unwavering focus on achieving practical and sustainable solutions for improving quality of care and patient outcomes in women’s breast health.  Her absolute commitment and drive for excellence was inspiring.  She supported and encouraged my early investigations in digital mammography, one of which led to the development of an algorithm to measure breast density as it appears in a mammogram. The very first thing that Dr. Caines said when the algorithm was completed was that she wanted to use it in clinical practice.  This was the nudge that led to the launch of the company, ultimately closing the loop on my long standing desire to contribute in some way to improving global health by establishing a partnership with RAD-AID International to deliver better breast health in low-income countries and medically underserved regions of the world.

It may come as a surprise to many that breast density is a factor in the risk of breast cancer. Could you share your views on this?

Breast density refers to the amount of epithelial and stromal components of the breast tissue, and breast cancers most commonly appear in epithelial cells.  So, the more epithelial tissue in the breast the greater chance that cancer may appear in the epithelial cells.

Additionally, dense breast tissue and underlying breast cancers both appear white on a mammogram, meaning that dense tissue can mask the presence of cancer and increase the risk of missing the cancer.  In fact, half of all women have dense breasts, and half of breast cancers in dense breasts are missed.

These facts are consistent with research that shows women with extremely dense breasts have a 4-6x increased risk of having breast cancer compared to women with fatty breasts, and risk models that incorporate breast density predict breast cancer better than those that do not.

What are the different machine learning technologies that are used when it comes to analyzing breast density?

At a high level there are essentially three data modelling strategies for building algorithms that compute breast density.  These include statistical learning, machine learning and deep learning, with each successive approach demanding increasingly larger labelled training data sets.  Statistical learning and machine learning algorithms both require hand-crafted image features to be developed as inputs to predict breast density.  Deep learning does not require hand-crafted image features to be developed, but rather discovers these features itself driven by the available training data.

How can AI Help Reduce Burnout for radiologists?

Burnout is recognized by the World Health Organization as an ‘occupational phenomenon’ characterized by feelings of exhaustion, isolation, cynicism, and reduced professional engagement.

Bureaucratic tasks like charting, reporting, administrative paperwork and mandated accreditation responsibilities are the leading reported causes of burnout for the majority of radiologists.

AI automation integrated with digitalization can deliver better reporting and workflow efficiencies as well as better image quality and improved process management that liberate radiologists from tedious and repetitive reporting and administrative burdens that lead to burnout.  Such solutions allow radiologists to dedicate more time to interpretive tasks and focus on patient care.

Densitas also helps imaging departments with workflow efficiencies, and compliance with national guidelines. Could you share with us how Densitas enables this?

Nearly 40 states across the U.S. have passed breast density notification legislation that mandates women must be informed of their breast density.  This covers over 85% of the screen eligible population across the country.  Breast density is visually reported according to the American College of Radiology BI-RADS density scale which has been shown to be unreliable and not reproducible, time-consuming, an additional reporting step, and distracting from the primary task of cancer detection.

A key aspect of the FDA Mammography Quality Standards Act (MQSA) Enhancing Quality Using the Inspection Program (EQUIP) is that the Lead Interpreting Physician shall ensure proper maintenance and updates of records concerning quality control (including corrective actions) and patient positioning, and corrective actions in the event of inadequate clinical image quality. Additionally, the MQSA EQUIP requires establishment of a QA program and maintenance of associated records. The MQSA EQUIP sets standards but is not prescriptive in its requirements.  Clinical image quality is subjectively assessed and not standardized.  Administration of a quality system focussed on the responsibilities of the LIP and lead QC technologist is not typically digitalized, is very time consuming and not reimbursable.  Yet, mammography facility accreditation hinges on a facility’s ability to demonstrate the effective implementation of a such a system.

The densitasai™ platform delivers AI automation of breast density, clinical image quality and breast cancer risk assessment at the mammogram level at point-of-care as well as at the clinic and health system level through an advanced web-based analytics interface.

Integrations with industry leaders including Nuance, ikonopedia, Three Palm Software, and major PACS vendors ensure that breast density, image quality and risk scores are automatically embedded in systems that are already well-established in radiologists’ reporting and workflows, eliminating transcription errors, improving reporting speed, and prioritizing studies for review.

The advanced web-based embedded analytics system provides automated reporting and auditing, digitalized workflows, health system wide QC and benchmarking of performance and resource utilization, and automation of administrative tasks.

Can you discuss Densitas partnership with RAD-AID International and how it helps underserved regions?

The aim of the partnership is to provide low-resource institutions with education, clinical support, and hands-on training so that they may adopt sustainable mammography practices that leverage practical AI powered applications. For breast imaging departments in participating institutions, the program is expected to help advance the quality of patient management decision-making and improve early disease detection.

Kenya is a country with nearly 5 million women who are eligible for screening mammography, but there are only 3 fellowship-trained breast imagers in the entire country. Tanzania is a country of 58 million people, but has just 450 technologists for the entire country.  These are just two examples of many such underserved countries across the world. Screening so many women with only a handful of specialist-trained breast radiologists and radiological technologists is just impossible.

Large scale population-wide screening programs like breast cancer screening are characterized by high patient volumes that drive the need for efficient clinical workflows, standardization of processes and care, optimization of repetitive reporting and administrative tasks, cost-effective patient and process management, and adherence to national accreditation standards.

Densitas’ artificial intelligence solutions address these challenges with automated breast density, breast cancer risk and clinical image quality assessments.

Ultimately the goal is to help improve breast cancer screening through earlier detection and better treatment to save lives.

Densitas has also launched a program to help with radiology personnel during the COVID-19 pandemic. What is the program being offered?

In these unprecedented times, the primary focus of care is necessarily devoted to critically ill patients.

An unfortunate consequence of COVID-19 is that breast cancer screening has largely been put on hold.

Yet, even in the hardest hit regions we are seeing mammography facilities already planning to ramp up, starting with diagnostic and followed by screening mammography exam bookings.  As that happens, the sheer number of patients awaiting breast screening in the coming weeks and months (rescheduled and net new) will present a monumental challenge to mammography facilities and health systems, with radiology personnel stretched thin.

It will be important to ensure that clinical image quality is maintained at the highest standard despite the stress of overloaded imaging throughput. AI automation of clinical image quality assessment, integrated in a comprehensive analytics and reporting platform, will boost MQSA inspection preparedness and will free up crucial resources for patient care. AI automation of breast density and breast cancer risk scoring and reporting will increase workflow efficiency and support tailored follow-up screening protocols.

We want to help.  For a limited time we are offering our densitasai™ platform as a no risk, no charge trial to select qualifying health systems, hospitals, and imaging centers.

The densitasai™ platform will (1) help battle the impending backlog with AI automation of tasks, (2) provide workflow efficiencies that lessen the burden of tedious, (3) time-consuming and repetitive tasks and help mitigate staff burnout, (4) alleviate significant resource and administrative demands of FDA Mammography Quality Standards Act inspections, (5) digitalize processes to maximize social-distancing principles in clinical care on an ongoing basis.

The densitasai™ platform can be deployed remotely.

Densitas will engage and support 20 sites. The deadline for registration and qualification is June 30, 2020 or when we are fully subscribed, whichever comes first.

Click Here to read more on how AI Can Help Reduce Burnout in Your Mammography Practice.

Thank you for the interview. There is a lot of important information here on breast health. Anyone who wishes to learn more should visit Densitas.


Spread the love

Antoine Tardif is a futurist who is passionate about the future of AI and robotics. He is the CEO of, and has invested in over 50 AI & blockchain projects. He is also the Co-Founder of a news website focusing on digital securities, and is a founding partner of


Akilesh Bapu, Founder & CEO of DeepScribe – Interview Series




Akilesh Bapu, Founder & CEO of DeepScribe - Interview Series

Akilesh Bapu is the Founder & CEO of DeepScribe, which uses natural language processing (NLP) and advanced deep learning to generate accurate, compliant, and secure notes of doctor-patient conversations.

What was it that introduced and attracted you to AI and natural language processing?

If I remember correctly, Jarvis from “Iron Man” was the first thing that really attracted me to the world of natural language processing and AI. Particularly, I found it fascinating how much faster a human was able to not only go through tasks but also go into an incredible level of depth into certain tasks and unveil certain information that they wouldn’t have even known about if it weren’t for this AI.

It was this concept of “AI by itself won’t be as good as humans at most tasks but put a human and AI together and that combination will dominate.”  Natural language processing is the most efficient way for this human/AI combination to happen.

From then on, I was obsessed with Siri, Google Now, Alexa, and the others. While they didn’t work as seamlessly as Jarvis, I so badly wanted to make them work as Jarvis did. Particularly, what became apparent was, commands such as “Alexa do this,” “Alexa do that,” were pretty easy and accurate to do with the current state of technology. But when it comes to something like Jarvis, where it can actually learn and understand, filter, and pick up on important topics during another conversational exchange—that hadn’t really been done before. This actually directly relates to one of my core motivations in founding DeepScribe. While we are solving the issue of documentation for physicians, we’re attempting a whole new wave of intelligence while doing it: ambient intelligence. AI that can dig through your day-to-day utterances, find useful information, and use that information to help you out.


You previously did some research using deep learning and NLP at UC Berkeley College of Engineering. What was your research on?

Back at the Berkeley AI Research Lab, I was working on a gene ontology annotator project where we were summarizing PubMed articles with specific output parameters.

The high-level overview: Take a task like the CNN news article summarization. In that task you’re taking news articles and summarizing them into roughly a few sentences. In your favor you have data and the ability to train these models on over a million articles. However, the problem space is enormous since you have limited structure to the summaries. In addition, there is hardly any structure to the actual articles. While there have been quite a few improvements since 2.5 years ago when I was working on this project, this is still an unsolved problem.

In our research project, however, we were developing structured summaries of articles. A structured summary in this case is similar to a typical summary except we know the exact structure of the output summary. This is helpful since it dramatically reduces the output options for our machine learning model—the challenge was that there was not enough annotated training to run a data-hungry deep learning model and get usable results.

The core of the work I did on this project was to leverage the knowledge we have around the input data and develop an ensemble of shallow ML models to support it—a technique we invented called the 2-step annotator. The 2-step annotator benchmarked at nearly 20x the accuracy as the previous best (54 percent vs 3.6 percent).

While side by side, this project and DeepScribe may sound entirely different, they were highly similar in how they used the 2-step annotation method to vastly improve results on a limited dataset.


What was the inspiration behind launching DeepScribe?

It all started with my father, who was a medical oncologist. Before electronic health record systems took over health care, physicians would jot down things on paper and spend very little time on notes. However, once EHRs started becoming popular as part of the HITECH Act of 2009, I started noticing that my dad spent more and more time at the computer. He’d start coming home later. On the weekends, he’d be sitting on the couch dictating notes. Simple things like him picking me up from school or basketball practice became a thing of the past as he’d be spending most of his evening hours catching up on documentation.

As a nerdy kid growing up, I would try to find solutions for him by searching the web and having him try them out. Sadly, nothing worked well enough to save him from the long hours of documentation.

Fast forward several years to the summer of 2017—I’m a researcher working at the Berkeley AI Research Lab, working on projects in document summarization. One summer when I’m back at home, I notice that my dad is still spending copious amounts of time documenting. I ask, “What’s new in the world of documentation? Alexa is everywhere, Google Assistant is so good now. Tell me, what’s the latest in the medical space?” And his answer was, “Nothing has changed.” I thought that it was just him but when I went and surveyed several of his colleagues, it was the same issue: not what the latest is in cancer treatment or the novel problems their patients were having—it was documentation. “How can I get rid of documentation? How can I save time on documentation? It’s taking so much of my time.”

I also noticed several companies that had emerged to try to solve documentation. However, either they were too expensive (thousands of dollars per month) or they were too minimal in terms of technology. The physicians at that time had very few options. That was when the opportunity opened up that if we could create an artificially intelligent medical scribe, a technology that could follow physicians’ patient visits and summarize them, and offer it at a cost that could make it accessible for everyone, it could truly bring the joy of care back to medicine.


You were only 22 years old when you launched DeepScribe. Can you describe your journey as an entrepreneur?

My first exposure to entrepreneurship was back in high school. It started when a friend and I who happened to know some JavaScript basics met with the director of a center for children with learning disabilities. They told us how the simplest of tools could go a long way with dyslexic children. We ended up hacking together a dyslexia reader Chrome extension. It was really bare bones—it simply adjusted the font to meet the scientific guidelines for ease of reading by dyslexic people. While the concept was simple, we ended up getting over 5000 active users in a handful of months. I was blown away by how basic tech can have such a profound impact on people.

At Berkeley, I continued to delve into the world of entrepreneurship as much as possible, primarily with their wide array of classes. My favorites were:

  1. The Newton Lecture Series—people like Jessica Mah from InDinero or Diane Greene from VMWare who were Cal alums gave highly relatable talks about their time at Berkeley and how they started their own companies
  2. Challenge Lab—I actually met my co-founder Matt Ko through this class. We were placed in groups and went through a semester-long journey of creating a product and being mentored on what it takes during the early stages to get an idea going.
  3. Lean Launchpad—By far my favorite of the three; this was a grueling and rigorous process where we were guided by Steve Blank (acclaimed billionaire and the man behind the lean startup movement) to take an idea, validate it through 100 customer interviews, build a financial model, and more. This was the type of class where we pitched our “startup” only to get stopped on slide 1 or 2 and get grilled. If that wasn’t hard enough, we were also expected to interview 10 customers a week. Our idea at the time was to create a patent search that would give similar results to an expensive prior art search, which meant we were pitching to 10 enterprise customers a week. It was great because it taught us to think fast on our feet and be extra resourceful.

DeepScribe started when an investor group called The House Fund was writing checks for students who would turn down their summer internships and spend their summer building their company. We had just shut down Delphi (the patent search engine) and Matt and I had been constantly talking about medical documentation and everything fell in place since it was the perfect time to give it a shot.

With DeepScribe, we were lucky to have just come fresh out of Lean Launchpad since one of the most important factors in building a product for physicians was to iterate and refine the product around customer feedback. A historical issue with the medical industry has been that software has rarely had physicians in the design loop, therefore resulting in software that wasn’t optimized for the end user.

Since DeepScribe was happening at the same time as my final year at Berkeley, it was a heavy balancing act. I’d show up to class in a suit so I could be on time for a customer demo right after. I’d use all the EE facilities and professors not for anything to do with class but 100 percent for DeepScribe. My meetings with my research mentor even turned into DeepScribe brainstorming sessions.

Looking back, if I had to change one thing about my journey, it would’ve been to put college on hold so I could spend 150 percent of my time on DeepScribe.


Can you describe for a medical professional what the advantages of using DeepScribe are versus the more traditional method of voice dictation or even taking notes?

Using DeepScribe is meant to be very similar to using an actual human scribe. As you talk naturally to your patient, DeepScribe will listen in and pick up on the medically relevant speech that usually goes in your notes and puts it in there for you, using the same medical language that you yourself use. We like to think of it as a new AI-powered member of your medical staff that you can train as you’d like to help with documentation in your electronic health record system as you’d like. It’s very different from using voice dictation service as it eliminates the entire step of having to go back and document. While typical dictation services turn 10 minutes of documentation into 7-8 minutes, DeepScribe turns it into a few seconds. Our physicians report anywhere from 1.5 to 3 hours of time saved per day depending on how many patients they see.

DeepScribe is device-agnostic, operable from an iPhone, Apple Watch, browser (for telemedicine), or hardware device.


What are some of the speech recognition or NLP challenges that DeepScribe may encounter due to complex medical terminology?

Contrary to popular opinion, complex medical terminology is actually the easiest part for DeepScribe to pick up. The trickiest part for DeepScribe is to pick up on unique contextual statements a patient may give a physician. The more they stray from a typical conversation, the more we see the AI stumble. But as we collect more conversational data, we see it improve on this dramatically every day.


What are the other machine learning technologies that are used with DeepScribe?

The large umbrellas of speech recognition and NLP tend to cover most of the machine learning we’re doing at DeepScribe.


Can you name some of the hospitals, nonprofits, or academic institutions that are using DeepScribe?

DeepScribe started out through a pilot program with the UC Berkeley Health Center. Hartford Healthcare, Texas Medical Center, and Cedar Valley Medical Specialists are a handful of the larger systems DeepScribe is working with.

However, the larger percentage of DeepScribe users are 50 private practices from Alaska to Florida. Our most popular specialties are primary care, orthopedics, gastroenterology, cardiology, psychiatry, and oncology, but we do support a handful of other specialties.


DeepScribe has recently launched a program to assist with COVID-19. Could you walk us through this program?

COVID-19 has hit our doctors hard. Practices are only seeing 30-40 percent of their patient load, scribe staffing is being cut, and providers are being forced to rapidly switch all their patients on to telemedicine. All this ends up leading to more clerical work for providers—we at DeepScribe firmly believe that in order for this pandemic to come to a halt, physicians must devote 100 percent of their attention and time to taking care of their patients.

To help aid this cause, we are proud to launch a free telemedicine solution to health care professionals fighting this pandemic. Our telemedicine solution is fully integrated with our AI-powered medical scribe solution, eliminating the need for clinical documentation for encounters made on our platform.

We’re also offering our scribe service for free during the pandemic. This means that any physician can get access to a scribe for free to handle their documentation. Our hopes are that by doing this, physicians will be able to focus more of their attention on their patients and spend less time thinking about documentation, leading to a faster halting of the COVID-19 outbreak.

Thank you for the great interview, I really enjoyed learning about DeepScribe and your entrepreneurial journey. Anyone who wishes to learn more should visit DeepScribe.

Spread the love
Continue Reading


AI Models Trained On Sex Biased Data Perform Worse At Diagnosing Disease




AI Models Trained On Sex Biased Data Perform Worse At Diagnosing Disease

Recently, a study published in the journal PNAS and conducted by researchers from Argentina, implied that the presence of sex-skewed training data leads to worse model performance when diagnosing diseases and other medical issues. As reported by Statsnews, the team of researchers experimented with training models where female patients were notably underrepresented or excluded altogether, and found that the algorithm performed substantially worse when diagnosing them. The same also held true for incidents where male patients were excluded or underrepresented.

Over the past half-decade, as AI models and machine learning have become more ubiquitous, more attention has been paid to the problems of biased datasets and the biased machine learning models that result from them. Data bias in machine learning can lead to awkward, socially damaging, and exclusive AI applications, but when it comes to medical applications lives can be on the line. However, despite knowledge of the problem, few studies have attempted to quantify just how damaging biased datasets can be. The study carried out by the research team found that data bias could have more extreme effects than many experts previously estimated.

One of the most popular uses for AI in the past few years, in medical contexts, has been the use of AI models to diagnose patients based on medical images. The research team analyzed models used to detect the presence of various medical conditions like pneumonia, cardiomegaly, or hernias from X-rays. The research teams studied three open-source model architectures: Inception-v3, ResNet, and DenseNet-121. The models were trained on chest X-rays pulled from two open-source datasets originating from Stanford University and the National Institutes of Health. Although the datasets themselves are fairly balanced when it comes to sex representation, the researchers artificially skewed the data by breaking them into subsets where there was a sex imbalance.

The research team created five different training datasets, each composed of different ratios of male/female patient scans. The five training sets were broken down as follows:

  • All images were of male patients
  • All images were of female patients
  • 25% male patients and 75% female patients
  • 75% female patients and 25% male patients
  • Half male patients and half female patients

After the model was trained on one of the subsets, it was tested on a collection of scans from both male and female patients. There was a notable trend that was present across the various medical conditions, the accuracy of the models was much worse when the training data was significantly sex-skewed. An interesting thing to note is that if one sex was overrepresented in the training data, that sex didn’t seem to benefit from the overrepresentation. Regardless of whether or not the model was trained on data skewed for one sex or the other, it didn’t perform better on that sex compared to when it was trained on an inclusive dataset.

The senior author of the study, Enzo Ferrante, was quoted by Statnews as explaining that the study underlines how important it is for training data to be diverse and representative for all the populations you intend to test the model in.

It isn’t entirely clear why models trained on one sex tend to perform worse when implemented on another sex. Some of the discrepancies might be due to physiological differences, but various social and cultural factors could also account for some of the difference. For instance, women may tend to receive X-rays at a different stage of progression in their disease when compared to men. If this were true, it could impact the features (and therefore the patterns learned by the model) found within training images. If this is the case, it makes it much more difficult for researchers to de-bias their datasets, as the bias would be baked into the dataset through the mechanisms of data collection.

Even researchers who pay close attention to data diversity sometimes have no choice but to work with data that is skewed or biased. Situations where a disparity exists between how medical conditions are diagnosed will often lead to imbalance data. For example, data on breast cancer patients is almost entirely collected from women. Similarly, autism manifests differently between women and men, and as a result, the condition is diagnosed at a much higher rate in boys than girls.

Nonetheless, it’s extremely important for researchers to control for skewed data and data bias in any way that they can. To that end, future studies will help researchers quantify the impact of biased data.

Spread the love
Continue Reading


Stefano Pacifico, and David Heeger, Co-Founders of Epistemic AI – Interview Series




Stefano Pacifico, and David Heeger, Co-Founders of Epistemic AI - Interview Series

Epistemic AI employs state-of-the-art Natural Language Processing (NLP), machine learning and deep learning algorithms to map relations among a growing body of biomedical knowledge, from multiple public and private sources, including text documents and databases. Through a process of Knowledge Mapping, users’ work interactively with the platform to map and understand subsets of biomedical knowledge, which reveals concepts and relationships and that are otherwise missed with traditional search.

We interviewed both Co-Founders of Epistemic AI to discuss these latest advances.

Stefano Pacifico comes from 10+ years in applied AI and NLP development. Formerly at Bloomberg, where he spent 7 years, and was at Elemental Cognition before starting Epistemic.

David Heeger is a Silver Professor of data science and neuroscience at NYU, and has spent his career bridging computer science, AI and bioscience. He is a member of the National Academy of Sciences. As founders they bring together the expertise of building applied large-scale AI and NLP systems for understanding large collections of knowledge, with expertise in computational biology and biomedical science from years of research in the area.

What is it that introduced and attracted you to AI and Natural Language Processing (NLP)?

Stefano Pacifico: When I was in college in Rome, and AI was not popular at all (in fact it was very fringe), I asked my then advisor what specialization I should have taken among those available. He said: “If you want to make money, Software Engineering and Databases, but if you want to be weird but very advanced, then choose Artificial Intelligence”. I was sold at “weird”. I then started working on knowledge representation and reasoning to study how autonomous agents could play soccer or rescue people. Then two realizations made me fall in love with NLP: first, autonomous agents might have to communicate with natural language among themselves! Second, building formal knowledge bases by hand is hard, while natural language (in text) already provides the largest knowledge base of all. I know today these might seem obvious observations, but they were not as mainstream before.

What was the inspiration behind launching Epistemic AI?

Stefano Pacifico: I am going to make a bold claim. Nobody today has adequate tooling to understand and connect the knowledge present in large, ever-growing collections of documents and data. I had previously worked on that problem in the world of finance. Think of news, financial statements, pricing data, corporate actions, filings etc. I found that problem intoxicating. And of course, it’s a difficult problem; and an important one!  When I met my co-founder, Dr. David Heeger, we spent quite a bit of time evaluating startup opportunities in the biomedical industry. When we realized the sheer volume of information generated in this field, it’s as if everything fell in its right place. Biomedical ​researchers struggle with information overload, while attempting to grapple with the vast and rapidly expanding base of biomedical knowledge, including documents (e.g., papers, patents, clinical trials) and databases (e.g., genes, proteins, pathways, drugs, diseases, medical terms). This is a major pain point for researchers and, with no appropriate solution available, they are forced to use basic search tools (PubMed and Google Scholar) and explore manually-curated databases. These tools are suitable for finding documents matching keywords (e.g., a single gene or a published journal paper), but not for acquiring comprehensive knowledge about a topic area or subdomain (e.g., COVID-19), or for interpreting the results of high throughput biology experiments, such as gene sequencing, protein expression, or screening chemical compounds. We started Epistemic AI with the idea to address this problem with a platform that allows them to iteratively:

  1. Shorten the time to gather information and build comprehensive knowledge maps
  2. Surface cross-disciplinary information​ that can be otherwise difficult to find (real discoveries often come from looking into the white space between disciplines);
  3. Identify causal hypotheses by finding paths and missing links in your knowledge map​.

What are some of both the public and private sources that are used to map these relations?

Stefano Pacifico: At this time, we are ingesting all the publicly available sources that we can get our hands on, including Pubmed and We ingest databases of genes, drugs, diseases and their interactions. We also include private data sources for select clients, but we are not at liberty to disclose any details yet.

What type of machine learning technologies are used for the knowledge mapping?

Stefano Pacifico: One of the deeply held beliefs at Epistemic AI is that zealotry is not helpful for building products. Building an architecture integrating several machine learning techniques was a decision made early on, and those range from Knowledge Representation to Transformer models, through graph embeddings, but include also simpler models like regressions and random forests. Each component is as simple as it needs to be, but no simpler. While we believe to have already built NLP components that are state-of-the-art for certain tasks, we don’t shy away from simpler baseline models when possible.

Can you name some of the companies, non-profits, or academic institutions that are using the Epistemic platform?

Stefano Pacifico: While I’d love to, we have not agreed with our users to do so. I can say that we had people signing up from very high-profile institutions in all three segments (companies, non-profits, and academic institutions). Additionally, we intend to keep the platform free for academic/non-profit purposes.

How does Epistemic assist researchers in Identifying central nervous system (CNS) and other disease-specific biomarkers?

Dr. David Heeger: Neuroscience is a very highly interdisciplinary field including molecular and cellular biology and genomics, but also psychology, chemistry, and principles of physics, engineering, and mathematics. It’s so broad that nobody can be an expert at all of it. Researchers at academic institutions and pharma/biotech companies are forced to specialize. But we know that the important insights are interdisciplinary, combining knowledge from the sub-specialties. The AI-powered software platform that we’re building enables everyone to be much more interdisciplinary, to see the connections between their individual subarea of expertise and other topics, and to identify new hypotheses. This is especially important in neuroscience because it is such a highly interdisciplinary field to begin with. The function and dysfunction of the human brain is the most difficult problem that science has ever faced. We are on a mission to change the way that biomedical scientists work and even how they think.

Epistemic also enables the discovery of genetic mechanisms of CNS disorders. Can you walk us through how this works?

Dr. David Heeger: Most neurological diseases, psychiatric illnesses, and developmental disorders do not have a simple explanation in terms of genetic differences. There are a handful of syndromic disorders for which a specific mutation is known to cause the disorder. But that’s not typically the case. There are hundreds of genetic differences, for example, that have been associated with autism spectrum disorders (ASD). There is some understanding for some of these genes about the functions they serve in terms of basic biology. For example, some of the genes associated with ASD hold synapses together in the brain (note, however, that the same genes typically perform different functions in other organ systems in the body). But there’s very little understanding about how these genetic differences can explain the complex suite of behavioral differences exhibited by individuals with ASD. To make matters worse, two individuals with the same genetic difference may have completely different outcomes, one diagnosed with ASD and the other, not. And two individuals with completely different genetic profiles may have the same outcome with very similar behavioral deficits. To understand all this requires making the connection from genomics and molecular biology to cellular neuroscience (how do the genetic differences cause individual neurons to function differently) and then to systems neuroscience (how do those differences in cellular function cause networks of large numbers of interconnected neurons to function differently) and then to psychology (how do those differences in neural network function cause differences in cognition, emotion, and behavior). And all of this needs to be understood from a developmental perspective. A genetic difference may cause a deficit in a particular aspect of neural function. But the brain doesn’t just sit there and take it. Brains are highly adaptive. If there’s a missing or broken mechanism then the brain will develop differently to compensate as much as possible. This compensation might be molecular, for example, upregulating another synaptic receptor to replace the function of a broken synaptic receptor. Or the compensation might be behavioral. The end result depends not only on the initial genetic difference but also on the various attempts to compensate relying on other molecular, cellular, circuit, systems, and behavioral mechanisms.

No individual has the knowledge to understand all this. We all need help. The AI-powered software platform that we’re building enables everyone to collect and link all the relevant biomedical knowledge, to see the connections and to identify new hypotheses.

How are biopharma and academic institutions using Epistemic to tackle the COVID-19 challenge?

Stefano Pacifico: We have released a public version of our platform that includes COVID specific datasets and is freely accessible to anyone doing research on COVID-19. It is available at

What are some of the other diseases or genetic issues that Epistemic have been used for?

Stefano Pacifico: We have collaborated with autism researchers and are most recently putting together a new research effort for Cystic Fibrosis. But we are happy to collaborate with any other researchers or institutions that might need help with their research.

Is there anything else that you would like to share about Epistemic?

Stefano Pacifico: We are building a movement of people that want to change the way biomedical researchers work and think. We sincerely hope that many of your readers will want to join us!

Thank you both for taking the time to answer our questions. Readers who wish to learn more should visit Epistemic AI.

Spread the love
Continue Reading