Connect with us

Interviews

Ken Claffey, CEO of VDURA – Interview Series: A Return Conversation

mm

Ken Claffey, CEO and President at VDURA, is a seasoned customer-centric business and product leader with deep expertise in cloud and enterprise infrastructure, hardware and software development, and driving strategic growth across product, operations, and go-to-market functions. Throughout his career he has built and led high-performance global teams, executed corporate strategy, driven profitable revenue growth and product innovation, and turned around underperforming businesses. Before taking the helm at VDURA, Claffey held senior leadership roles at Seagate Technology, where he served as SVP and General Manager overseeing enterprise systems and P&L, and earlier leadership positions with Xyratex, Adaptec, and Eurologic, bringing decades of experience in enterprise storage and high-performance computing.

VDURA is a software-defined data infrastructure company building modern storage solutions optimized for artificial intelligence and high-performance computing workloads under the motto “velocity meets durability.” The company’s flagship VDURA Data Platform combines flash-first parallel file system performance with object storage resilience in a unified architecture that scales linearly across thousands of clients and nodes while simplifying operations and lowering total cost of ownership. Originally founded as Panasas and rebranded in 2024, VDURA’s platform supports on-premises, cloud, and hybrid environments with advanced automation, metadata acceleration, and scalable performance designed to keep GPU clusters fed and data protected for enterprise, research, and mission-critical AI and HPC use cases.

How has your journey across HPC and enterprise storage shaped your view that storage is becoming the defining constraint in AI infrastructure?

Having built storage systems for some of the world’s most demanding compute environments, you develop an intuition for where the bottlenecks actually live versus where people assume they live. At Xyratex and through the ClusterStor work at Seagate, we were solving storage problems for supercomputers where the physics were unforgiving. You either fed the compute or you didn’t.

What I see now in AI infrastructure is the same fundamental constraint, just dressed in different economics. The GPU obsession in the Neocloud market was understandable. NVIDIA created a scarce and transformative resource. But the assumption that storage would just scale alongside it, cheaply and easily, was always going to break. It has broken. Storage is now trending toward 20 to 30 percent of AI infrastructure budgets in all-flash deployments, growing faster than any other component. When you have spent a career watching storage become the binding constraint in every large-scale compute environment, you stop being surprised when the rest of the market catches up to that reality.

Why was storage planning deprioritized during the Neocloud infrastructure land grab?

A few structural assumptions converged at exactly the wrong moment. First, flash prices were temporarily favorable. NVMe SSDs were affordable and abundant enough that going all-flash felt like a reasonable default. It was not architectural wisdom. It was a product of a brief economic window that operators mistook for a permanent condition.

Second, the competitive dynamic rewarded GPU counts above everything else. The Neocloud market was being evaluated on how many NVIDIA chips you could rack. Storage was roughly a 10 percent line item, easy to wave through procurement without deep scrutiny. Third, the all-flash decision felt safe because it eliminated complexity. One tier, one media type, simple to procure and operate. The problem is that “simple” and “economically sustainable” stopped being the same thing the moment NAND supply tightened and prices surged. By then, the infrastructure decisions were already locked in.

What surprises operators most when they see how storage is affecting their GPU utilization?

The relationship is more direct than most operators realize until they are staring at idle GPUs. Training runs with frequent checkpointing create burst write demands that can stall compute if the storage layer cannot absorb them fast enough. Data pipelines for preprocessing and ingestion create sustained read throughput requirements that, if unmet, starve the GPUs of work.

NVIDIA’s own DGX guidance quantifies this: text-based LLM training requires roughly 0.5 GB/s of read throughput per GPU, while physical AI and visualization workloads require approximately 4 GB/s of reads and 2 GB/s of writes per GPU. If your storage architecture cannot deliver that, you are not running your GPUs at capacity. You are running them at whatever fraction your storage allows.

The architecture matters enormously at cluster scale. A storage system that interposes an intermediary between the drive and the client may show comparable headline throughput on a single drive, but at scale you can end up needing three times as many drives to saturate the same GPU fleet. Three times the SSDs, three times the power, three times the rack space. The utilization math compounds quickly.

What cost differentials can emerge purely from SSD selection and architectural design even when headline throughput metrics appear similar?

This is where operators get into serious trouble, because the headline numbers can be genuinely misleading. Take a representative example. A 122.88 TB QLC NVMe SSD costs roughly $27,000. A 7.68 TB drive from the same generation delivers comparable sequential throughput for around $1,800. For a 4,096-GPU cluster on NVIDIA’s Enhanced specification, that single capacity selection decision produces a flash bill ranging from $600,000 to $9.6 million. The throughput is effectively identical. The only variable is how much cold data you are choosing to park on premium media that delivers no additional performance benefit.

On top of that, architectural design determines drive count at cluster scale. An architecture delivering roughly 5.8 GB/s of measured read throughput per SSD needs around 353 drives to saturate a 4,096-GPU cluster. An architecture delivering approximately 1.9 GB/s per SSD, due to intermediary overhead, needs over 1,000. At $12,000 per 30 TB drive, that difference is not a rounding error – it’s a business model question.

How should operators rethink all-flash versus tiered storage as flash prices rise and NAND supply remains constrained?

The starting point is accepting that the economic premise behind all-flash AI infrastructure was always contingent, not foundational. Phison’s CEO has described NAND production capacity as effectively allocated through 2026. Goldman Sachs projects DRAM prices rising double digits quarter-over-quarter through the same period. The all-flash default made sense when flash was cheap and abundant. It no longer is.

The right framework is to ask what flash is actually for. Flash is a performance medium. It should be sized to saturate GPU throughput requirements, no more. Everything else, including cold data, checkpoints that are not actively being read, and archived training sets, belongs on high-density HDD, which remains orders of magnitude cheaper per TB.

The trap operators fall into is treating tiering as a bolt-on: buy an all-flash primary layer, add a separate object store for cold data, and connect them with external data movers. That introduces a second software stack, a second data plane, networking complexity, and operational overhead. The hyperscaler approach, running SSD and HDD within the same software stack with native high-performance tiering and no external data movers, keeps storage closer to 10 percent of the infrastructure budget while still saturating every GPU.

What lessons can the Neocloud tier learn from hyperscaler storage design choices?

The most important lesson is that Google, Meta, and Microsoft do not run all-flash, and they have more AI workload experience than anyone. They deploy mixed-tier architectures with intelligent tiering: enough NVMe flash to saturate GPU throughput, then drain to high-density HDDs as fast as the physics allow. This is not a philosophical preference. It is an economic imperative driven by a clear-eyed understanding of AI workload physics.

The second lesson is architectural integration. Hyperscalers do not solve tiering by bolting together separate systems. They run SSD and HDD on the same software stack, the same data plane, with tiering as a first-class operation inside the storage system, not a batch job managed by a separate tool. That integration is what allows them to keep storage economical at enormous scale while maintaining the performance guarantees their GPU fleets require.

The third lesson is durability underwriting. AWS S3 delivers 11 nines of durability. Azure Blob delivers 12 or more. Legacy HPC architectures built on local RAID can fall well short of that at scale. If you cannot underwrite an SLA against your storage durability, you cannot compete for enterprise customers, and enterprise customers are where the sustainable Neocloud revenue lives.

How should infrastructure teams quantify the economic impact of storage availability on GPU fleets?

The math is sobering when you run it honestly. Shared storage failure does not produce a proportional SLA shortfall. It produces simultaneous breach across every GPU rack connected to that storage. A 5,000-GPU cluster with 98 percent storage availability does not deliver a 2 percent performance miss. It produces 876,000 GPU-hours of lost compute per year. At representative GPU-hour costs, that translates to millions of dollars in idle compute annually, plus SLA credits owed on every affected rack simultaneously.

The blast radius of storage failure in a large cluster is the entire cluster. Infrastructure teams need to model this explicitly: what is the annualized cost of idle compute at your current storage availability figure, what are the SLA credit obligations that attach to each availability tier, and what is the customer churn risk from SLA failures? CoreWeave and Oracle are already offering 99 percent rack-level uptime. Providers who cannot match that are losing deals today, and the deals they are losing are increasingly the high-value enterprise contracts that the Neocloud market needs to prove its long-term economics.

How do different storage architectures compare on performance per watt in power-constrained environments?

It comes up in almost every serious infrastructure conversation now, and the differential is not marginal. It is multiplicative. Based on published specifications and comparable configurations, delivering approximately 1,340 GB/s of read throughput, one architecture burns 55 kW while another achieves similar output at roughly 16 kW. That is a 3.4x difference in performance per watt. In a data center where AI workloads are consuming 40 to 250 kilowatts per rack against a fixed grid connection, wasted storage watts are GPUs you cannot deploy. NVIDIA’s own BlueField-4 documentation states explicitly that power availability is the primary constraint for scaling AI factories.

There is also a second-order effect that operators rarely account for. Some storage architectures require 5 GB of DRAM and one to four dedicated CPU cores permanently locked per GPU node just to achieve peak storage performance. Across a 500-node cluster, that is 2.5 TB of DRAM and up to 2,000 CPU cores permanently unavailable to AI workloads. When you are paying $30,000 or more per GPU, every stolen core and every locked gigabyte is a direct tax on the compute investment that is supposedly the entire point of the infrastructure.

How does storage architecture directly affect SLA competitiveness as uptime guarantees approach 99 percent?

Storage is the single largest blast radius in any GPU cluster, which makes it the single most important variable in any honest SLA commitment. The SemiAnalysis ClusterMAX 2.0 rating system, which is becoming an influential benchmark in Neocloud procurement, makes SLAs an explicit factor in pricing negotiations. Providers without competitive SLAs are losing deals now.

The durability dimension is equally important and less discussed. Enterprise customers have been conditioned by AWS S3 and Azure Blob to expect 11 to 12 nines of durability. Legacy HPC storage built on local RAID can fall below 5 nines at scale depending on drive failure rates and rebuild windows, potentially thousands of files lost per year across a billion-file corpus. Modern network erasure coding with multi-level protection can push past 11 nines. The gap between those two realities is the difference between a storage system you can actually underwrite an SLA against and one you cannot.

Which storage capabilities are most likely to determine long-term Neocloud survivability through consolidation?

The operators who survive will be those who have solved the total cost of ownership equation across the full infrastructure stack, not just the GPU procurement equation. That means several specific capabilities.

First, a unified software-defined architecture that runs flash and disk on a single data plane with native high-performance tiering, no external data movers, no second software stack, no operational complexity introduced by bolting together separate systems. Second, storage that can ride independent cost curves for flash and disk as those markets move independently of each other, which they will. Third, self-healing systems that maintain high availability without specialized administrators performing manual recovery at 3 AM. Operationally complex storage is an invisible cost that compounds at scale. Fourth, durability that can be credibly underwritten in an SLA against hyperscaler benchmarks.

The broader point is that the consolidation wave is separating infrastructure built for day-one benchmarks from infrastructure built for year-three economics. H100 rental rates have dropped more than 60 percent from peak. The market is no longer rewarding GPU accumulation. It is demanding proof of return on invested capital. Storage architecture is where that proof lives, because it is where GPU utilization rates, SLA commitments, power efficiency, and long-term cost structure all converge.

What is your message to Neocloud operators evaluating their storage strategy today?

Do not let the storage decision be the one you made by default. Every other part of the infrastructure stack gets rigorous engineering and financial scrutiny. Storage should be no different. The operators who are going to be here in three years are the ones who took a hard look at their true cost per GPU-hour of useful compute, understood their real availability posture, and made sure they were sized for the workload rather than for a procurement shortcut.

The window to get this right is narrowing. Consolidation is already underway, and the economics are unforgiving. But for operators who are willing to rethink the storage layer with the same rigor they applied to GPU selection, the opportunity is significant. Storage done right does not just reduce cost. It unlocks the full value of every GPU in the rack.

Thank you for the great interview, readers who wish to learn more about this tech stack should visit VDURA. They may also read our previous interview with Ken Claffey.

Antoine is a visionary leader and founding partner of Unite.AI, driven by an unwavering passion for shaping and promoting the future of AI and robotics. A serial entrepreneur, he believes that AI will be as disruptive to society as electricity, and is often caught raving about the potential of disruptive technologies and AGI.

As a futurist, he is dedicated to exploring how these innovations will shape our world. In addition, he is the founder of Securities.io, a platform focused on investing in cutting-edge technologies that are redefining the future and reshaping entire sectors.