Connect with us

Corey Sanders, Senior Vice President Product at CoreWeave – Interview Series

Interviews

Corey Sanders, Senior Vice President Product at CoreWeave – Interview Series

mm

Corey Sanders, Senior Vice President Product at CoreWeave, leads product strategy and execution for one of the fastest-growing AI-focused cloud platforms. He is responsible for scaling innovation, shaping purpose-built solutions with customers, and strengthening CoreWeave’s position in the AI infrastructure market. Prior to CoreWeave, Sanders spent two decades at Microsoft in senior leadership roles spanning cloud engineering, industry-specific platforms, commercial solution strategy, and large-scale enterprise partnerships, with deep experience bridging technical execution and go-to-market strategy.

CoreWeave is an AI-native cloud provider built specifically for high-performance computing and large-scale artificial intelligence workloads. The company operates a rapidly expanding footprint of data centers across the U.S. and Europe, delivering GPU-accelerated infrastructure and software designed for AI training, inference, and advanced compute use cases. By focusing on purpose-built architecture rather than general-purpose cloud, CoreWeave has become a critical infrastructure partner for AI labs and enterprises seeking performance, scalability, and efficiency at scale.

You spent more than 20 years at Microsoft working across Windows engineering, cloud sales strategy, and Microsoft Cloud for Industry. What did that progression teach you about what truly drives enterprise adoption, and how are you applying those lessons today at CoreWeave?

Enterprise adoption starts with solving a specific customer problem. Innovation for the sake of innovation isn’t actually that crucial for the enterprise. It’s about putting yourself in their shoes to understand what truly plagues them—whether it’s the cost of support, operational complexities, connecting with customers, or managing global teams and new product lines—and then delivering services that help. They are often willing to be innovative in their approach, but the most crucial consideration is helping them solve their problem. The most frequent mistake I’ve seen in product design is getting too caught up in the coolness of a product. While that carries weight in the consumer space, enterprise customers, in the end, care much more about utility than coolness.

CoreWeave is often described as offering purpose-built AI infrastructure. In practical terms, what does purpose-built mean from a product perspective, and where do general-purpose cloud platforms struggle with AI workloads?

The biggest benefit of being purpose-built is the ability to focus and deliver services without needing to solve for every general use case. I’ll give two examples: one in software and one in hardware.

On the software side, our Object Storage offering with LOTA cache is focused specifically on caching for AI workloads. It deploys directly on the GPU nodes, delivers an S3 endpoint for the application, and responds to GPU requests by spanning its cache across multiple nodes. This increases throughput to the GPU up to 7 GB/s, far exceeding what general-purpose clouds offer. We can achieve this because we make design assumptions around AI-specific workloads, read/write splits, and cluster layouts. If a customer used this for hosting a database or an e-commerce site, it wouldn’t have the same impact. That is the definition of purpose-built software.

The hardware example is similar. Given our expansive deployment of latest-generation NVIDIA SKUs—many of which require liquid cooling—CoreWeave has built specific expertise and data center designs to support those needs. Unlike larger clouds that build for fungibility and then must retroactively add liquid cooling, CoreWeave builds data centers focused on AI from the ground up. This results in lower costs and higher availability for the latest SKU types.

Below is a picture of the LOTA cache mentioned.

When customers first think about scaling AI, many believe they only need access to GPUs. What do they typically realize they are missing once they begin training or serving models at scale?

Given the complexity of running workloads across massive GPU clusters, the surrounding services become the true drivers of success. This includes the obvious ones, like storage and networking, but also critical operational services like observability, orchestration, and security. This is where CoreWeave really shines with our Mission Control offering. It provides customers with deep awareness of node health and runtime across their fleet, integrating that knowledge directly into the orchestration engine. This allows the customer to treat their infrastructure not as 1,000 individual GPUs, but as a single, cohesive job entity.

What are the top product priorities you are focused on right now to improve customer outcomes, whether that is performance, reliability, cost predictability, or developer experience?

In the core platform, we are constantly focused on performance, reliability, and observability. We must ensure customers can run jobs in a repeatable, predictable way while taking full advantage of every TFLOP in every GPU. Beyond that, we are working to simplify onboarding for customers who may not be familiar with every bell and whistle in a tool like SLURM (which everyone uses, but almost everyone hates). Finally, we are developing additional services and billing models to make it easier to innovate and start small. Right now, experimenting is surprisingly difficult due to high barriers to entry, such as capacity constraints, three-year commitments, and the need for specialized experts just to get started. We want to bring back the ease of innovation to the AI platform.

As more AI workloads shift from training-heavy to inference-heavy, how does that transition influence infrastructure design and product roadmap decisions?

It creates significant opportunities to apply CoreWeave’s existing differentiation to inference requirements. For example, the LOTA cache I mentioned focuses on feeding GPUs during training; however, we can take that same technology, integrate it into things like the KVCache, and turn it into a powerful inference differentiator. Similarly, tools like Mission Control become even more vital for inference, as observing GPU health is crucial for running highly available agentic applications.

Over the next one to two years, what will define leadership in the AI cloud market, and which capabilities will matter most to customers?

I think leadership will be defined by two things. The first is delivering the ever-growing scale requirements for training. This will require advancements in observability, health monitoring, and automatic recovery. When you move from hundreds to tens of thousands of GPUs distributed globally, manual response to failures is a non-starter.

The second is delivering the right services for inference and agentic workloads. This requires global deployment capabilities and business models that encourage experimentation. This usage pattern was what helped the cloud grow originally, and it has been somewhat lost in the age of AI. We need to bring it back through better platform support, multi-cloud capabilities, and multi-region ease of use.

You previously led industry-specific cloud initiatives across healthcare, retail, financial services, manufacturing, and sovereign cloud. Which lessons from those verticals translate directly to AI infrastructure, and which do not?

Generational shifts in GPUs continue to introduce new complexities. Each new release brings increased interconnectivity, higher memory, and greater power needs, all of which require us to revisit our assumptions on how nodes are connected and how software is delivered. We must remain relentless here to maintain our leadership. On the flip side, the area improving most rapidly is the sheer scale of what customers can accomplish; the speed at which they are adapting to larger compute footprints is impressive.

As AI data centers and clusters continue to scale, what operational challenges are proving hardest to solve today, and which ones are improving most rapidly?

The generational shifts of the GPUs continue to create new complexities in the design and software. Each new GPU release comes with increased interconnectivity capabilities, higher memory, more power needs, etc that require revisiting assumptions around how the nodes are connected, how the racks are managed, and how the software delivers. We will need to continue to focus on this work to ensure we maintain our leadership position. The ones improving most rapidly are what customers are able to accomplish with the growing scale of compute.

In AI infrastructure, reliability goes beyond uptime. How does CoreWeave define reliability, and what indicators best reflect success from the customer’s perspective?

At scale, the biggest consideration for a customer is simply getting the job done. In massive operations, individual failures or slowdowns are expected. The key is how we detect and automatically respond to those issues to ensure the job finishes despite the challenges. This is why we integrate Mission Control into higher-level services like SUNK (Slurm on Kubernetes). It allows customers to respond to failures automatically without losing hours or weeks of work. For us, success isn’t just about node uptime; it’s about job success.

Looking ahead, what major shift in AI infrastructure do you believe is still underappreciated, whether related to hardware evolution, specialization of stacks, sovereignty requirements, or new deployment models?

I believe the advent of Reinforcement Learning (RL) as a renewing part of the AI stack is still underappreciated. While not a new field of study, it was largely overshadowed during the initial wave of LLM development. RL is making a comeback and will play a vital role in making AI services more responsive to the changing landscapes of their users. Because of this, we are very excited about the serverless RL offering we have today.

Thank you for the great interview, readers who wish to learn more should visit CoreWeave.

Antoine is a visionary leader and founding partner of Unite.AI, driven by an unwavering passion for shaping and promoting the future of AI and robotics. A serial entrepreneur, he believes that AI will be as disruptive to society as electricity, and is often caught raving about the potential of disruptive technologies and AGI.

As a futurist, he is dedicated to exploring how these innovations will shape our world. In addition, he is the founder of Securities.io, a platform focused on investing in cutting-edge technologies that are redefining the future and reshaping entire sectors.