Funding
Protege Raises $30M Series A Extension Led by Andreessen Horowitz to Expand Access to Real-World Data for AI

Protege, an AI data platform focused on unlocking trusted, real-world datasets for AI development, has raised a $30 million Series A extension led by Andreessen Horowitz. The new financing extends the company’s $25 million Series A announced in August 2025 and brings total funding to $65 million since its founding in 2024.
Returning investors include Footwork, CRV, Bloomberg Beta, Flex Capital, and Shaper Capital, reflecting growing investor confidence in Protege’s approach to one of the most persistent challenges in artificial intelligence: access to high-quality, non-public data.
Tackling AI’s Growing Data Constraints
As AI models advance, the limiting factor is increasingly not compute or algorithms, but data. Public datasets are becoming saturated, while many of the most valuable sources of information—such as healthcare records, media archives, audio data, and motion capture—remain fragmented, proprietary, or difficult to access responsibly.
Protege sits at the intersection of data holders and AI developers, enabling licensed access to real-world datasets while ensuring they are structured, curated, and optimized for modern AI workflows. Demand for this type of data is rising quickly across industries, particularly as AI systems move from experimentation into production environments.
A Licensing-First Model for Real-World Data
Rather than relying on scraping or unstructured aggregation, Protege works directly with trusted data providers through licensing agreements. These partners contribute private and proprietary datasets that may include de-identified health records, medical imaging, audio recordings, and media content.
Protege applies technical expertise to clean, curate, and package this data so it can be used effectively for training and evaluation. Data providers participate through revenue-sharing arrangements tied to usage, creating a repeatable model that aligns incentives around responsible data access and reuse.
The company works with AI organizations and institutions globally, including many of the world’s largest technology companies, supporting the development of next-generation AI systems across multiple domains.
Experienced Leadership and Strategic Backing
Protege is led by CEO and co-founder Bobby Samuels, with Travis May — previously CEO of Datavant and LiveRamp — serving as chairman and co-founder. The leadership team brings deep experience navigating data-intensive industries where privacy, compliance, and scale are critical. In a recent interview with Unite.AI, Samuels discussed how his background in data governance and privacy shaped his vision for a platform that connects data holders and AI developers in a transparent, ethical way, underscoring the growing importance of governed data access in the broader AI ecosystem. –
From an investor standpoint, Andreessen Horowitz views access to proprietary, real-world data as a defining advantage in the next phase of AI. As model architectures become more standardized, differentiated data — with clear provenance and ethical licensing — is emerging as a key driver of performance and competitive defensibility.
How Protege Plans to Use the New Capital
The Series A extension will support expanded product development, growth of Protege’s data partner network into new domains and formats, and deeper collaboration with institutions that hold valuable real-world data. The company also plans to scale its infrastructure and team to meet increasing demand from AI research and development groups.
This focus reflects a broader industry shift, where AI progress is increasingly tied to data quality, provenance, and relevance rather than model size alone.
Implications for the Future of AI
Protege’s momentum points to a structural change in how AI systems are built. As easily accessible data sources are exhausted, future breakthroughs are likely to come from responsibly unlocking private, real-world data generated through everyday activity.
Platforms like Protege suggest a future where data access is governed, compensated, and transparent. For AI developers, this could mean more reliable and domain-specific models. For data holders, it creates a sustainable path to participate in AI development without giving up control.
Over time, this approach may influence how the industry—and regulators—think about data ownership, reuse, and value creation. Instead of treating data as something to be extracted, AI development may increasingly depend on trust-based networks that balance innovation with responsibility.












