Connect with us

Healthcare

Ginkgo Datapoints Unveils VCPI: A Bold Plan to Fix AI Drug Discovery’s Data Problem

mm
Photorealistic mockup of high-throughput drug-screening technology

For years, AI in drug discovery has been held back by a deceptively simple problem: the data isn’t good enough. Mountains of sequencing, pooled perturbation studies, and those mixed-cell experiments gave the impression of progress without delivering real breakthroughs., but the predictive leap drug developers expected has never materialized. Instead of clarity, the field produced noise. Instead of reproducibility, it produced drift. And instead of the precise, pharmacology-specific measurements required to train reliable virtual cell models, it produced datasets optimized more for scale than scientific integrity.

This is the environment into which Ginkgo Datapoints is launching the Virtual Cell Pharmacology Initiative (VCPI)—a project that doesn’t just promise more data but aims to deliver better data, purpose-built for AI models trying to predict how real drug-like molecules perturb real biological systems. The company’s official announcement underscores that VCPI will generate over 12 billion data points and profile 100,000 compounds, establishing the first standardized pharmacology dataset for virtual cell modeling.

Why “More Data” Failed

In the blog post introducing VCPI, Ginkgo uses an analogy that perfectly captures the field’s misguided trajectory. Imagine tossing a handful of pills into a cage of mice—then trying to figure out which mouse ate what. Now scale it to a million mice in one giant cage. That’s the core flaw behind pooled single-cell pharmacology experiments. They generate impressive quantities of data, but the underlying design prevents clean attribution between compound and phenotype.

The problem isn’t technology; it’s experimental architecture. The assumption that bigger datasets inherently teach better models has proven false. The blog bluntly calls this mindset a “data addiction,” arguing that without well-structured, high-signal inputs, even the most advanced AI will learn the wrong patterns.

VCPI represents a sharp departure from this logic. Instead of glorifying size, it doubles down on biological traceability, experimental rigor, and the controlled structure needed for AI to actually learn pharmacology.

How VCPI Rebuilds the Data Pipeline

Rather than relying on pooled single-cell assays, VCPI uses DRUG-seq, a high-throughput bulk RNA-sequencing method in which each compound is treated in an isolated barcoded well. This allows Ginkgo to measure treatment-specific responses with far cleaner signal-to-noise than pooled designs offer. According to the press release, the company’s automation infrastructure can run over 100 full 384-well plates per week, generating millions of high-fidelity RNA measurements at industrial scale.

Just as important is the introduction of V-Ref293, a newly engineered, standardized reference cell line. Instead of each lab running its own mutated, drifted version of the same cell line, VCPI creates a universal biological baseline—an “organic twin” to the emerging class of virtual cells. This eliminates one of the long-standing sources of irreproducibility in pharmacogenomics and provides the stable ground truth AI models desperately need.

Under this initiative, Ginkgo is opening the doors to a community-driven dataset with several defining components:

  • Open participation for researchers, pharma teams, and AI developers
  • Free high-throughput RNA profiling for submitted compounds
  • Optional embargo or permanent proprietary access for contributors
  • Monthly data releases shaped by community voting
  • Opportunities for model sharing, compound prioritization, and early-access “super-user” status

A Community-Built Model, Not a Data Dump

One of the most unusual aspects of VCPI is the decision to launch before the dataset exists. Instead of uploading a finished resource, Ginkgo is asking the scientific community to help determine which compounds matter most and to collaborate in real time as the dataset grows.

This approach also de-risks participation. Early-stage biotechs can submit compounds and receive real pharmacology data without burning precious budget on high-throughput screening. AI teams can ensure the dataset reflects the perturbations they actually need for model training. And academic labs can contribute while still retaining the possibility of a 90-day exclusive window.

The structure transforms data generation into a participatory scientific process—not a static product.

What This Means for the Future of Bio-AI

The broader implications of VCPI reach beyond Ginkgo or any single virtual cell initiative. For virtual cell models to become scientifically credible, they must be trained on data that are reproducible, treatment-specific, and anchored to a stable biological reference. Without this foundation, AI will continue to hallucinate, mispredict, or overfit to artifacts.

Initiatives like VCPI signal a shift in how the field thinks about data itself. Experimental design is becoming as important as model architecture. Reproducibility is returning as a central requirement rather than an optional ideal. And community-driven, open-infrastructure projects are starting to outpace closed proprietary datasets in their ability to accelerate innovation.

If virtual cells eventually become reliable predictive engines—tools that help rank compounds, flag toxicities, or illuminate pathways before a human ever touches a pipette—it will be because projects like VCPI created the structured, trustworthy data environment they needed to grow.

By prioritizing better data over simply more data, Ginkgo is reframing the foundations of AI-enabled biology. VCPI doesn’t just react to the data crisis in drug discovery; it sets the stage for a new era where biological experiments and AI training pipelines evolve together, openly, and with purpose.

Antoine is a visionary leader and founding partner of Unite.AI, driven by an unwavering passion for shaping and promoting the future of AI and robotics. A serial entrepreneur, he believes that AI will be as disruptive to society as electricity, and is often caught raving about the potential of disruptive technologies and AGI.

As a futurist, he is dedicated to exploring how these innovations will shape our world. In addition, he is the founder of Securities.io, a platform focused on investing in cutting-edge technologies that are redefining the future and reshaping entire sectors.