Recently, a study published in the journal PNAS and conducted by researchers from Argentina, implied that the presence of sex-skewed training data leads to worse model performance when diagnosing diseases and other medical issues. As reported by Statsnews, the team of researchers experimented with training models where female patients were notably underrepresented or excluded altogether, and found that the algorithm performed substantially worse when diagnosing them. The same also held true for incidents where male patients were excluded or underrepresented.
Over the past half-decade, as AI models and machine learning have become more ubiquitous, more attention has been paid to the problems of biased datasets and the biased machine learning models that result from them. Data bias in machine learning can lead to awkward, socially damaging, and exclusive AI applications, but when it comes to medical applications lives can be on the line. However, despite knowledge of the problem, few studies have attempted to quantify just how damaging biased datasets can be. The study carried out by the research team found that data bias could have more extreme effects than many experts previously estimated.
One of the most popular uses for AI in the past few years, in medical contexts, has been the use of AI models to diagnose patients based on medical images. The research team analyzed models used to detect the presence of various medical conditions like pneumonia, cardiomegaly, or hernias from X-rays. The research teams studied three open-source model architectures: Inception-v3, ResNet, and DenseNet-121. The models were trained on chest X-rays pulled from two open-source datasets originating from Stanford University and the National Institutes of Health. Although the datasets themselves are fairly balanced when it comes to sex representation, the researchers artificially skewed the data by breaking them into subsets where there was a sex imbalance.
The research team created five different training datasets, each composed of different ratios of male/female patient scans. The five training sets were broken down as follows:
- All images were of male patients
- All images were of female patients
- 25% male patients and 75% female patients
- 75% female patients and 25% male patients
- Half male patients and half female patients
After the model was trained on one of the subsets, it was tested on a collection of scans from both male and female patients. There was a notable trend that was present across the various medical conditions, the accuracy of the models was much worse when the training data was significantly sex-skewed. An interesting thing to note is that if one sex was overrepresented in the training data, that sex didn’t seem to benefit from the overrepresentation. Regardless of whether or not the model was trained on data skewed for one sex or the other, it didn’t perform better on that sex compared to when it was trained on an inclusive dataset.
The senior author of the study, Enzo Ferrante, was quoted by Statnews as explaining that the study underlines how important it is for training data to be diverse and representative for all the populations you intend to test the model in.
It isn’t entirely clear why models trained on one sex tend to perform worse when implemented on another sex. Some of the discrepancies might be due to physiological differences, but various social and cultural factors could also account for some of the difference. For instance, women may tend to receive X-rays at a different stage of progression in their disease when compared to men. If this were true, it could impact the features (and therefore the patterns learned by the model) found within training images. If this is the case, it makes it much more difficult for researchers to de-bias their datasets, as the bias would be baked into the dataset through the mechanisms of data collection.
Even researchers who pay close attention to data diversity sometimes have no choice but to work with data that is skewed or biased. Situations where a disparity exists between how medical conditions are diagnosed will often lead to imbalance data. For example, data on breast cancer patients is almost entirely collected from women. Similarly, autism manifests differently between women and men, and as a result, the condition is diagnosed at a much higher rate in boys than girls.
Nonetheless, it’s extremely important for researchers to control for skewed data and data bias in any way that they can. To that end, future studies will help researchers quantify the impact of biased data.
New Advancements in AI for Clinical Use
Researchers from Radboudumc helped advance artificial intelligence (AI) in the clinical setting after demonstrating how AI can diagnose problems similar to a doctor, while also showing how it reaches the diagnosis. AI already plays a role in this environment, being utilized to quickly detect abnormalities that could be labeled as a disease by experts.
AI in the Clinical Setting
Artificial intelligence has been increasingly used in the diagnosis of medical imaging. What was traditionally done by a doctor studying an X-ray or biopsy to identify abnormalities can now be done with AI. Through the use of deep learning, these systems can diagnose by themselves, oftentimes being just as accurate or even better than human doctors.
The systems are not perfect, however. One of the issues is that the AI does not demonstrate how it is analyzing the images and reaching a diagnosis. Another problem is that they do not do anything extra, meaning they stop once reaching a specific diagnosis. This could lead to the system missing some abnormalities even when there is a correct diagnosis.
In this scenario, the human doctor is better at observing the patient, X-ray, or other images overall.
Advancements in the AI
These problems for AI in the clinical setting are now being addressed by researchers. Christina González Gonzalo is a Ph.D. candidate at the A-eye Research and Diagnostic Image Analysis Group of Radboudumc.
González Gonzalo developed a new method for the diagnostic AI by utilizing eye scans that found abnormalities of the retina. The specific abnormalities can be easily found by human doctors and AI, and they often are found in groups.
In the case of the AI system, it would diagnose one or a few of the abnormalities and stop, demonstrating one of the downsides of using such a system. In order to address this, González Gonzalo developed a process where the AI goes over the picture multiple times. When it does this, it learns to ignore the places that it had already covered, which allows it to discover new ones. On top of that, the AI also highlights suspicious areas, making the whole diagnostic process more transparent for humans to observe.
This new method is different from the traditional AI systems used in these settings, which base their diagnosis on one assessment of the eye scan. Now, researchers can see how the new AI system reached its diagnosis.
In order to ignore the already detected abnormalities, the AI system digitally fills them with healthy tissue from around the abnormalities. The diagnosis is then made based on all of the assessment rounds being added together.
The study found that this new system improved the sensitivity of the detection of diabetic retinopathy and age-related macular degeneration by 11.2+/-2.0%.
This new system could really change how AI is used when diagnosing diseases based on abnormalities, and the biggest advancement is the new transparency that it can demonstrate when undergoing this process. This transparency is what will allow even more future corrections and advancements, with the end-goal being an AI system that could diagnose problems much more accurately and faster than the best human experts within the field. All of this could also lead to a more trustworthy system, possibly resulting in the widespread adoption of it within the larger field.
Naheed Kurji, Co-Founder, President and CEO of Cyclica – Interview Series
Naheed Kurji is the President and CEO of Cyclica, a Toronto-based biotechnology company that leverages artificial intelligence and computational biophysics to reshape the drug discovery process. Cyclica provides the pharmaceutical industry with an integrated, holistic, and end-to-end enabling platform that enhances how scientists design, screen, and personalize medicines for patients, and has recently been named by Deep Knowledge Analytics as one of the top 20 AI in Pharma companies globally
Cyclica leverages artificial intelligence and computational biophysics to reshape the drug discovery process. Can you discuss in what way AI is used in this process?
Technology has played a critical role in drug discovery dating back to the ’80s. However, the drug discovery and development process is still very inefficient, time consuming and expensive, costing more than 2 billion dollars over 12 years. The poor efficiency often results in high rates of attrition and failure to meet drug safety and efficacy milestones. Researchers are aware of this and they are actively seeking tools to holistically understand the qualities that define the best drugs in order to develop safer and more effective medicines
Recent advances in cloud computing, AI and biophysics have created an opportunity to gain deep insight from the vast amounts of biochemical, biological, healthcare and patient data that are now available in order to better understand disease. These advances have also enabled medicinal chemists to enhance the design of novel therapies and use AI to drive greater predictive insights earlier in the drug development process. At Cyclica we have developed proprietary deep-learning engines, MatchMaker and POEM to support the drug design process. MatchMaker predicts how chemical compounds and drugs interact with multiple proteins, known as polypharmacology. We found the combination of both a knowledge-based and structure-based approach yielded the greatest predictive accuracy and performance. POEM (Pareto-Optimal Embedded Modeling), is a parameter-free supervised learning approach for building drug property prediction models and addresses several limitations of other ML approaches, resulting in less overfitting and increased interpretability.
At Cyclica, we are using AI to provide scientists with a robust and validated platform to accelerate decision-making and hypothesis generation in order to increase the overall efficiency of the drug discovery process and to reduce the number of downstream failures.
Cyclica has designed the Ligand Design and Ligand Express platform, what is this precisely?
We are the first company to approach computational polypharmacology (an appreciation that drugs interact with multiple targets) with an integrated drug discovery platform that interrogates molecular interactions on a proteome-wide scale. Our platform is comprised of two key pieces, Ligand Express, our first generation off-target profiling and target deconvolution platform, and Ligand Design, our next generation single and multi-targeted in silico drug design technology. Ligand Express and Ligand Design are powered by two internally built, validated, and patented machine learning and deep learning engines: MatchMaker and POEM. Rooted deeply in protein biophysics, MatchMaker is a deep learning drug-target interaction engine that generalizes across both data-rich and data-poor targets (see validation notes here and here). POEM, a machine learning technology implemented for Absorption, Distribution, Metabolism, and Excretion (ADME) property prediction, is a novel, parameter-free approach to model building.
All taken together, Ligand Design and Ligand Express offer a powerful end to end AI-augmented drug discovery platform for the design of advanced, chemically novel lead-like molecules that simultaneously prioritizes compounds based on their polypharmacological profile, effectively minimizing undesirable off-target effects. Our differentiated platform opens new opportunities for drug discovery, including multi-targeted and multi-objective drug design, lead optimization, ADMET-property prediction, target deconvolution, and drug repurposing. Driven by a diverse and highly-talented team with deep expertise across machine learning, computational biophysics/chemistry/biology, biochemistry, and medicinal chemistry, we are continuing to innovate through our robust R&D pipeline.
How important is decentralizing the discovery of medicine to the Cyclica business model?
Our vision is to decentralize the discovery of better medicines by combining our deep roots in Artificial Intelligence (AI) and protein biophysics with an innovative business model. And at the very core of Cyclica’s ethos is the steadfast desire to help patients by advancing the discovery and development of better medicines by taking a holistic yet personalized approach.
To this end, we believe that the future of drug discovery is in the hands of innovative research institutions and emerging biotech companies (we wrote about this in Forbes here). Supporting our philosophy, in 2019 IQIVIA reported that emerging biopharma companies account for over 70% of the total R&D pipeline (up from 50% in 2003), and that these companies patented over 2/3 of new drugs in 2018 (up from 50% in 2010). While emerging biotech companies will lead innovation in drug discovery, big pharma will continue to invest in advancing late stage clinical trials and market penetration through their sales channels.
With our Series B funding, we will accelerate commercial plans to advance a growing pipeline of pre-clinical and clinical assets through an innovative decentralized partnership model. Our goal is to create and own hundreds of drug discovery programs across multiple therapeutic areas. These programs are created via spin outs and joint ventures (JVs) with top tier research institutions, facilitated largely through the Cyclica Academic Partnership Program (“CAPP”).
Propelled by a rapidly growing portfolio of more than 30 active and advancing drug discovery programs, we will continue to spark innovation through a combination of venture creation and partnerships with early-stage and emerging biotech companies. Recent partnerships include EntheogeniX Biosciences, NineteenGale Therapeutics, Rosetta Therapeutics, the Rare Diseases Medicine Accelerator, and two stealth JVs encompassing over 50 programs across multiple therapeutic areas. By executing on our decentralized business model, creating new companies through spin-outs and joint ventures and helping them scale, we are in effect creating the biotech pipeline of the future.
Many of your technologies are cloud-based, why is this so important?
Access to the cloud allows us to computationally scale the workflows that we are conducting, as well as benefit from regulated security infrastructure. Also, as an early stage company, the ability to get up and running with the cloud without the overhead of investing in our own hardware was critical for the financial viability in our early days. Looking forward, while much of our R&D work is done on the cloud, over the past couple of years we have become less cloud-dependent with the ability to run projects on single machines. We are also aiming to support private cloud installations since that’s something we feel our partners may desire. Technological advancements have made it possible to do on a personal laptop what used to take many machines on the cloud, but by continuing to utilize the cloud we are able to greatly expand the scope of the problems we are solving.
Cyclica often takes equity positions in companies that they partner with. Can you discuss the business reasoning behind this?
Smaller biotechnology companies and academic groups are generally overlooked by the market in terms of partnership opportunities. While they may not have the resources, infrastructure or facilities in comparison to mature big pharma counterparts, small biotechs are increasingly entering the spotlight with a combination of deep subject-matter expertise in specific indications and the benefits of a lean organization conducive to rapid innovation.
This led us to think on how we can engage with these smaller companies with an avant-garde strategy. We partner scientists in research organizations who are interested in spinning out a company or early stage biotech companies, and enable them with ourAI-augmented drug discovery platform through in kind contributions. In return, take equity into the companies and/or share in the ownership of the compounds and assets that are created and pursued. By sparking a surge of innovation through a combination of venture creation and partnerships, we can capture greater value and develop long-term relationships with our partners to address a spectrum of unmet medical needs to better the lives of patients.
Entheogenix Biosciences is a joint venture between Cyclica and ATAI Life Science. What exactly is Entheogenix Biosciences?
There is a unique opportunity for innovation in the neuropsychiatric landscape to better serve patients suffering from complex mental ailments. Current medicines and therapies that rely on single-targeted drug interventions often fall short, requiring patients to take multiple medications that may present potential safety issues as well as reduce medication adherence. We have partnered with ATAI Life Science to leverage their deep experience in mental health and psychedelics, while empowering them with our AI-augmented drug discovery platform to create not only new medicines, but the right ones to tackle mental ailments. Entheogenix Biosciences is one of the many joint ventures we have formed and is a testament to our belief in changing the paradigm in which mental health disorders are treated by bringing our disease agonistic, robust and scientifically validated computational platform into the hands of subject-matter experts and world-class scientists.
Is there anything else that you would like to share about Cyclica?
While we are very excited to share the announcement of our series B round of financing. We are just as eager to share the launch of the Cyclica Academic Partnership Program (CAPP) and new partnerships over the next few months.
Thank you for the interview. I look forward to following the future progress of Cyclica.
Groundbreaking Research Shows How Sensors Can Be 3D Printed on Contracting Organs
Major research has come out of the University of Minnesota that could have huge implications in healthcare. Mechanical Engineers and computer scientists have developed a new 3D printing technique that allows electronic sensors to be directly printed on organs that are expanding and contracting.
The new technique uses motion capture technology like what is used to create movies, and besides having implications within the general field of healthcare, it could be specifically applied to diagnose and monitor the lungs of individuals with COVID-19.
The research was published in Science Advances, a scientific journal published by the American Association for the Advancement of Science (AAAS).
3D Printing Technique
The research is based on a 3D printing technique that was discovered two years ago. The technique was first used on a hand that rotated and moved left to right, with electronics directly printed on the skin of the hand. It has now been developed even further to work on organs such as the lungs or heart, which expand and contract, leading to a change in the shape or distortion.
Michael McAlpine is a University of Minnesota mechanical engineering professor and senior researcher on the study.
“We are pushing the boundaries of 3D printing in new ways we never even imagined years ago,” said McAlpine. “3D printing on a moving object is difficult enough, but it was quite a challenge to find a way to print on a surface that was deforming as it expanded and contracted.”
Development and Future Applications
The researchers first used a balloon-like surface and a specialized 3D printer. They utilized motion capture tracking markers, like the ones used to create special effects in movies, in order to help the 3D printer adapt to the expansion and contraction movements on the surface.
After using the balloon-like surface, the researchers tested it on an animal lung that was artificially inflated. It proved to be a success, and a soft hydrogel-based sensor was printed directly on the surface.
According to McAlpine, this technology could be used in the future to print directly on a pumping heart.
“The broader idea behind this research, is that this is a big step forward to the goal of combining 3D printing technology with surgical robots,” said McAlpine. “In the future, 3D printing will not be just about printing but instead be part of a larger autonomous robotic system. This could be important for diseases like COVID-19 where health care providers are at risk when treating patients.
The research team also included lead author Zhijie Zhu, a mechanical engineering Ph.D. candidate at the University of Minnesota, as well as Hyun Soo Park, assistant professor in the University of Minnesota Department of Computer Science and Engineering.
The work was supported by Medtronic and the National Institute of Biomedical Imaging and Bioengineering of the National Institutes of Health.
- How Quantum Mechanics will Change the Tech Industry
- Jim McGowan, head of product at ElectrifAi – Interview Series
- NASA to Use Machine Learning to Enhance Search for Alien Life on Mars
- New Study Attempts to Improve Hate Speech Detection Algorithms
- Pentagon’s Joint AI Center (JAIC) Testing First Lethal AI Projects