Connect with us

Healthcare

Google DeepMind Unveils AlphaGenome to Decode Human Genome Function

mm

Google DeepMind released AlphaGenome on January 28, an AI model that predicts how DNA sequences translate into biological functions, processing up to one million base-pairs at once and outperforming existing models in 25 of 26 variant effect prediction benchmarks.

The model, published in Nature and detailed on the DeepMind blog, represents a significant advance in computational genomics. Where previous models required separate systems for different prediction tasks, AlphaGenome handles everything from gene expression to chromatin accessibility in a single unified architecture.

“AlphaGenome can look across a long stretch of DNA and predict where the critical regulatory elements are and their downstream effects on gene expression,” the DeepMind team wrote in their announcement. The model’s million-token context window allows it to capture long-range interactions between distant DNA regions that influence how genes are turned on and off.

How It Works

AlphaGenome combines two neural network architectures: a Borzoi-style 1D convolutional network for processing raw DNA sequences and a U-Net architecture adapted from image segmentation. This hybrid approach lets the model handle both the sequential nature of DNA and the complex spatial relationships between regulatory elements.

The training data spans approximately 7,000 genomic tracks from the ENCODE and FANTOM consortia—massive collaborative efforts that have cataloged functional elements across the human genome. The model learns to predict signals from experimental assays measuring gene expression, DNA accessibility, protein binding, and chromatin modifications.

For researchers, the practical value lies in variant effect prediction. When a patient’s genome contains a mutation, clinicians need to know whether that variant matters. AlphaGenome can predict how a single nucleotide change affects the entire regulatory landscape, potentially flagging disease-causing variants that current methods miss.

The model achieved strong results on benchmarks testing its ability to predict how genetic variants affect gene expression and regulatory element activity. On expression quantitative trait loci (eQTLs)—variants known to affect gene expression levels—AlphaGenome matched or exceeded specialized models trained specifically for those tasks.

Open Source Availability

DeepMind released AlphaGenome’s source code on GitHub for non-commercial use, continuing the lab’s pattern of making foundational biology tools publicly available. The repository includes model weights, inference code, and documentation for running predictions on custom sequences.

The open release follows the model established by AlphaFold, DeepMind’s protein structure prediction tool that has been used by over 3 million researchers since its 2021 release. AlphaGenome addresses a complementary problem: while AlphaFold predicts what proteins look like, AlphaGenome predicts when and where genes produce those proteins.

Google DeepMind CEO Demis Hassabis has positioned biology as a primary application domain for the lab’s AI capabilities. The genomics work extends DeepMind’s ambitions beyond the conversational AI and language models that power products like Gemini, applying similar architectural innovations to scientific problems.

Why This Matters

The human genome contains roughly 3 billion base-pairs, but only about 1.5% directly code for proteins. The remaining 98.5%—long dismissed as “junk DNA”—contains regulatory elements that control when, where, and how much genes are expressed. Mutations in these non-coding regions cause diseases, but identifying which variants matter has been extraordinarily difficult.

Traditional methods require expensive, time-consuming experiments to test individual variants. Machine learning models like AlphaGenome can screen thousands of variants computationally, prioritizing which ones deserve experimental follow-up. For rare disease diagnosis, where patients often carry novel variants with unknown effects, this capability could accelerate the path from sequencing to diagnosis.

The model’s ability to process million base-pair contexts is particularly significant. Gene regulatory elements can sit hundreds of thousands of base-pairs away from the genes they control, communicating through complex 3D folding of DNA. Previous models with shorter context windows couldn’t capture these long-range dependencies.

AlphaGenome joins a growing ecosystem of AI tools transforming biology research. Protein structure prediction, drug discovery, and now gene regulation are increasingly tractable problems for machine learning. For the genetics research community, the open availability of these models democratizes access to computational capabilities that were previously limited to well-funded labs.

The model’s limitations are also clear from DeepMind’s presentation. While AlphaGenome excels at predicting experimental measurements, translating those predictions to clinical outcomes requires additional validation. The gap between predicting chromatin accessibility and predicting disease risk remains substantial.

For now, AlphaGenome serves as a research tool—one that could accelerate understanding of how the genome works, even if clinical applications remain years away. The 3,000 scientists across 160 countries already using the model suggest the research community sees immediate value in what DeepMind has built.

Alex McFarland is an AI journalist and writer exploring the latest developments in artificial intelligence. He has collaborated with numerous AI startups and publications worldwide.