Connect with us

Interviews

Chester Leung, Co-Founder and Head of AI Platform at OPAQUE – Interview Series

mm

Chester Leung is Co-Founder and Head of Platform Architecture at OPAQUE, a Series A startup that's building the confidential data and AI platform that empowers teams to extend their enterprise data pipelines with a confidential layer, enabling faster insights with less effort and verifiable privacy and control.

Previously, Chester was a computer science graduate student at UC Berkeley, where he published peer-reviewed papers at top conferences and also served as a lead maintainer of the open-source MC2 project for secure collaborative analytics and machine learning.

You co-founded Opaque after your time at UC Berkeley’s RISELab, where your work bridged AI and secure systems. What specific gap in enterprise data infrastructure did you see that led to the creation of OPAQUE, and how did your academic experience inform the company’s direction?

At the time, there was an immense focus, both in academia and in industry, on leveraging machine learning for specific use cases. In the lab, we were extremely fortunate to have large enterprise sponsors who helped us researchers shape our work to solve more pressing problems that they were facing within their organizations. Our group, in particular, had a unique opportunity to work closely with tech and banking, financial services, and insurance (BFSI) companies, collaborating to solve hard privacy problems around the use of sensitive but valuable data for machine learning. Like all areas of AI, machine learning relies on large quantities of high-quality data to produce valuable and robust insights.

We came across the same pattern over and over again while collaborating with teams from the likes of Amazon, Scotiabank, and Ant Group (then Ant Financial): their machine learning-powered projects stalled out before reaching production due to concerns around the use of sensitive but critical data for these use cases. In other words, these teams were unable to use AI in projects that they knew could generate value for the company, not because of a technical problem with AI, but because they couldn’t get access to the right data.

At Opaque, we’ve been solving an identical problem. Helping teams get access to the right data, enabling them to unlock or upsell their AI capabilities. The only change since our research days has been the urgency of the problem: we’re now consistently seeing AI adoption and integration, which continues to be bottlenecked by access to the right data, become a company-wide strategic imperative.

In a landscape where enterprises are investing heavily in reasoning models and agentic AI, why do you believe secure data pipelines are more important than ever?

Secure data pipelines are the backbone upon which enterprises build reasoning models and agentic AI. Everything from training these reasoning models to deploying agentic AI involves sensitive data and relies on secure data pipelines.

For example, we’re now as an industry seeing growing investment in generating high-quality data to train these models. Some reports have even predicted that the compute investment in high-quality data generation will very soon be more than the compute investment on training the models themselves. Of course, data generation is a multi-step process powered by pipelines that produce an enterprise’s most valuable IP: high-quality domain-specific data that can train models that generate immense value downstream. The investment in generating this data is enormous, and the generated data, given its lineage, effectively distinguishes one enterprise from its competitors, serving as its moat. An enterprise must do all it can to keep this pipeline secure.

OPAQUE’s confidential computing platform enables analytics on encrypted data. What are the core technical challenges in making this both scalable and developer-friendly for enterprise environments?

Our Confidential AI platform not only enables analytics, machine learning, and generative AI on encrypted data but also provides verifiable proof that your data was used in ways that only you expect and permit.

The core challenges, with scalability, development, and management lie in making the orchestration of the workload secure and verifiable at scale. In particular, many enterprises these days use managed cloud services when they need to scale. This can be both cost-effective and convenient. However, some subset of the software powering managed cloud services is inherently, well, managed by the cloud provider. So the challenge becomes, how can an organization secure and verify software that isn’t under its control? If the organization does take back control of all the software, what do they have to give up by not using a managed service, and what do they lose by doing so?

You’ve said that secure-by-design architecture can provide a lasting competitive edge. Can you elaborate on how this principle plays out practically for enterprise AI teams?

There are two angles to look at this from: a product angle and an engineering angle.

From a product angle, everyone understands that their data is radioactive, their moat, or both. Enterprises are becoming increasingly mature in their evaluation of solutions’ data privacy, security, and sovereignty. Consequently, any team that builds any product that processes enterprise data must provide assurances that the processed data is only visible to and used by authorized parties and entities. A secure-by-design architecture provides confidence that data privacy, security, and sovereignty were first-class considerations in the design of the product, and enables the product to explicitly provide these assurances.

From an engineering angle, a secure-by-design architecture is more extensible and future-proof. Legal, risk, and compliance teams are becoming increasingly strict in response to newfound risks and regulations. Thus, engineering organizations should want to build a secure enterprise AI system from the start so that they don’t have to rearchitect and/or patch up their system once they realize that their existing system is insufficiently secure and risk-proof. Having to re-architect and patch costs months, if not years, of valuable engineering bandwidth.

As autonomous AI systems evolve, how should organizations rethink the role of data—beyond a resource—as a defensible moat?

There is a growing consensus in the industry that data may soon be the only moat an organization has. We’ve been seeing research and engineering talent, and the brilliant technologies and products that they build, jump from organization to organization. As a result, numerous organizations are able to offer the same products, backed by the same technologies.

What cannot easily transfer from organization to organization, however, is an organization’s data—unless it’s leaked. Moreover, it is exactly that data that can make a product more compelling than its competitors—more personalized, tailored, and domain-specific. Organizations must do everything they can to secure their data, enabling them to leverage their data as the competitive advantage.

What does a resilient AI pipeline look like in practice, and how does it help companies avoid hidden costs or risks as they scale their AI deployments?

A resilient AI pipeline is one that’s reliable, fault-tolerant, but most importantly, verifiably secure end-to-end. Before processing, companies should verify both the data that’s going into the pipeline, as well as the pipeline itself, to ensure that there’s no possibility of the pipeline misusing the data. During processing, the AI pipeline should be tamper-proof, to ensure that no one can steal any data that it’s processing or skew the insights it provides. After processing, the AI pipeline should be verifiably auditable, so that a team can observe and explain the decision making and trajectory of the AI pipeline, and so that a team can see what went wrong when something does go wrong.

It’s imperative to consider how an insecure, flawed AI pipeline could leak an organization’s data or proprietary model, and the implications that has on a company’s differentiating factors or reputation. What’s more important, though, is this: as companies scale their deployment of AI into more critical and impactful use cases, the risk of an insecure, unexplainable AI pipeline grows exponentially. In a world where lending decisions and hiring decisions are already AI-augmented, affecting everything from personal finances to careers, an intentional or unintentional error in an AI pipeline could have a dramatic effect on the life of an individual.

Many enterprises focus on model accuracy or latency. What are they overlooking when it comes to data integrity and long-term operational risk?

While many enterprises are focusing on the model or AI technology, I’ve long believed that data is the fundamental bottleneck to rolling out value-generating AI.

Having a model that super quickly generates an accurate response about a topic that the end user doesn’t care about generates zero value. To build a uniquely compelling product, enterprises must ensure that their models, and the products they power, are trained with high-quality, relevant data. Data hygiene issues that result from a lack of high-quality input data may not surface until months later.

Secondly, we’ve found that enterprises generally don’t have a good story for detecting data drift, contamination, or leakage, jeopardizing model integrity. This is tightly tied to my first point, and while more of a reactionary solution, it makes evals and observability even more important.

OPAQUE integrates into existing cloud stacks. What have you learned about balancing ease of adoption with strong security guarantees in enterprise deployments?

We’ve spent nearly a decade, starting from our research days, solving this problem. The provable security of AI systems, especially in an enterprise setting, is a very difficult problem. It requires systems, security, cryptography, and AI expertise. As a result, most systems that we’ve come across have not been fundamentally secure – because security is so difficult to implement.

At Opaque, we’ve built a product that’s the best of both worlds – inherently and verifiably secure from the ground up, but easily deployable through cloud marketplaces and sufficiently flexible to integrate into new and existing AI applications.

What kinds of threats or vulnerabilities are emerging around AI pipelines and data sharing that enterprise leaders may not yet fully appreciate?

What we’ve been seeing in this agentic gold rush is a blind urgency toward deploying AI agents that interact with various systems of record. While these agents can provide value, they also pose enormous risks because they touch so many systems with valuable data. Agents are inherently non-deterministic, and we’ve seen countless instances where they go off and do something we don’t expect. In a world where your data is your only moat, enterprise leaders should always question whether they can trust and rely on AI agents that have access to all of their data to not accidentally or even intentionally misuse it.

As AI regulation takes shape globally, how do you see the interplay between secure data infrastructure, model accountability, and compliance evolving in the next few years?

Verifiably secure data infrastructure enables model and agent accountability. Specifically, without verifiable proof of an agent's or model’s decision making or tool use, we can’t be certain of anything, so we won’t be able to trace accountability. As AI becomes more and more integrated into our everyday lives, to feel like we still have control, we’ll want more explainability and observability into AI. However, when AI can operate at machine speed, and we can’t, a malicious AI can easily fool us by constructing false histories. We need verifiability to hold AI accountable.

To me, regulatory compliance is very reactionary. The development and passing of regulations move much slower than technological innovation. This will be increasingly true as AI helps us increase our pace of innovation. While compliance will ultimately drive laggards to adopt secure data infrastructure, the early adopters and early majority recognize that it’s critical for AI safety, and will adopt it far before compliance makes it mandatory. They understand that agent accountability, enabled by secure data infrastructure, is critical to the adoption of their own AI-powered products.

Thank you for the great interview, readers who wish to learn more should visit OPAQUE. 

Antoine is a visionary leader and founding partner of Unite.AI, driven by an unwavering passion for shaping and promoting the future of AI and robotics. A serial entrepreneur, he believes that AI will be as disruptive to society as electricity, and is often caught raving about the potential of disruptive technologies and AGI.

As a futurist, he is dedicated to exploring how these innovations will shape our world. In addition, he is the founder of Securities.io, a platform focused on investing in cutting-edge technologies that are redefining the future and reshaping entire sectors.