Interviews
Ben Koska, Founder and CEO of SF Tensor – Interview Series

Ben Koska, Founder and CEO of SF Tensor, is an AI researcher and systems engineer known for his work on high-performance compute, kernel optimization, and efficient model training. His background spans developing low-level AI infrastructure, improving training throughput, and designing tools that make advanced model development accessible without heavyweight engineering overhead. He focuses on building systems that push the limits of speed, portability, and reliability across heterogeneous hardware.
SF Tensor is the company he leads to turn that philosophy into a practical platform. It introduces a unified programming model, a kernel optimizer, and a cross-cloud orchestration layer designed to remove the complexity of distributed AI workloads. The platform aims to give engineers a clean, hardware-agnostic environment where they can write once, deploy anywhere, and automatically achieve high performance. SF Tensor’s mission is to make AI compute dramatically faster, easier to manage, and free from vendor lock-in.
You founded SF Tensor at just 19 years old after already leading engineering at multiple startups. What inspired you to take on the challenge of reinventing AI infrastructure so early in your career?
The problem we are solving is one that I care deeply about, because it’s one that I encountered myself. When we developed what is now SF Tensor’s core stack, we were not working on a commercial project, it was actually an academic endeavor. We had received a grant to conduct some really interesting research but spent the vast majority of our time wrangling infrastructure and optimizations, instead of doing research. We found that people were universally more interested in our infrastructure tech, not our research project.
SF Tensor is tackling one of the toughest problems in AI — breaking free from NVIDIA’s CUDA dominance. How did you approach designing a system that could achieve true hardware portability without compromising performance
At the end of the day, all of AI boils down to simple mathematics. Every model is essentially a set of mathematical operations that we need to compute the results for. By treating it primarily as a mathematical problem rather than a computer science problem, we can identify the smallest set of constraints on the calculations, then generate millions to billions of different ways to turn those calculations into machine code, finding the fastest one. That’s easier said than done, as we can’t actually run billions of different programs to find the fastest one, so to prune our search space, we had to come up with an accurate mathematical model to estimate the speed of a given program for a given hardware, which is one of the core innovations that make what we do possible today.
The company’s blog highlights innovations around compiler optimization and cross-cloud orchestration. Can you explain how SF Tensor’s approach differs from existing frameworks like PyTorch or JAX?
We haven’t written a technical blog about it yet, but we actually support frameworks like PyTorch and JAX, allowing code written in them to be optimized by our stack. There are several architectural decisions that JAX and PyTorch made that differentiate them from our stack, but the most significant one is that we treat the entire model as a single calculation to be solved, instead of individual modules that must be individually and then jointly optimized. To this extent, instead of applying traditional compiler optimization techniques and trying to apply each individual optimization, we instead create a search space of millions to sometimes billions of potentially kernels and make the claim that no human can possibly come up with a set of rules to transform any given code into the fastest, so we must instead simply create every combination and then identify the fastest.
Many startups are focused on training efficiency, but you’ve emphasized the “infrastructure tax” — the time researchers lose managing compute instead of innovating. How does SF Tensor address this imbalance?
We believe that both problems must be tackled, and a lot of our work goes to addressing training efficiency, but the most acute problem that we can solve right now without being predicated on any future innovations is the infrastructure tax as it’s a problem we’ve already solved for ourselves.
You’ve mentioned achieving up to 80% reductions in training costs. What specific optimizations or architectural breakthroughs make that possible?
Our entire software stack is built on the idea that a search-based compiler will always beat human-crafted rules. So far, the largest constraint on these compilers has been the fact that it’s not possible to benchmark and rank billions or even millions of kernels. It was therefore necessary for us to create a mathematical model of compute which is able to accurately estimate the time that a given calculation, or set of calculations, will take on a given hardware. By doing this, we are able to expand our search space and then trim it down, which is a necessity if you want to find the fastest kernels consistently.
How does your background in building the Emma programming language influence SF Tensor’s architecture and philosophy toward performance and abstraction?
Don’t tell my investors, but at heart, I’m still a compiler engineer. I have always been interested in finding different ways to make things even just incrementally faster. In developing Emma we threw out the whole compiler 4 or 5 times; we started from scratch, each time because we ran into an optimization that we couldn’t implement given the current constraints, forcing us to reengineer the system to be even more general, while still allowing us to drop onto the lowest level of optimization when necessary, often going against common principles of compiler and language design. Those learnings and the resulting architecture combined nearly two years of what looked to many like minor optimizations and wrong bets has compounded into a system that allows us to now iterate faster and optimize better than any of the systems that followed common principles because those principles are fundamentally designed for CPUs, not GPUs and AI models.
You’ve worked on large-scale training runs across 4,000+ GPUs — what were some of the biggest lessons learned from managing compute at that scale?
A big one is that hardware failure is a lot more prevalent and a lot more problematic than one might assume. Having spent a lot of time working with traditional programs and compilers, generally speaking, a computer does exactly as it is told, and if something goes wrong, it is almost always the fault of the person that wrote the code. With GPUs, on the other hand, hardware failure is a common occurrence, especially in distributed training runs on extremely large clusters. Going hand in hand with that is the fact that unlike CPUs that generally act in a fairly deterministic and predictable manner, GPUs will sometimes inexplicably do things like lowering clock speeds for no apparent reason, slowing down the entire training process because a single chip is running slower.
Y Combinator has backed some of the most transformative infrastructure companies in tech. How has that experience shaped your approach to scaling SF Tensor’s product and vision?
Going into Y Combinator I thought that the bet we wanted to make then was ambitious. After just a few weeks, our definition of ambitious had drastically changed ,and we doubled down on an even bigger bet. For another, the sense of community and learning that I can pick up the phone or send an email to pretty much any company or person out there and receive a response and advice within a matter of hours to days, has changed the way we think about tackling problems and embracing a significantly more collaborative approach.
Looking ahead, you’ve expressed interest in non-LLM models, robotics, and synthetic data. How do these areas fit into your long-term vision for the company?
LLMs are absolutely an interesting technology and will have an integral part in how the world looks in the future, but the reason they are so much more advanced than any other area of AI stems mainly from the fact that there is a lot of money being invested into their development, and there are enough people collaborating on the problem that they have gotten fairly optimized. Suppose we can lower the barrier of entry, allowing researchers all around the country and planet, even those with limited resources and little-to-no knowledge in optimizations perform their research as cheaply and efficiently as possible. In that case, I think we will see a whole new generation of models crop up that will tackle problems that LLMs are not suited for, whether because they interact with the physical world or because they are problems that cannot be properly expressed in language.
What do you think the AI infrastructure stack will look like five years from now — and where do you see SF Tensor’s role within it?
Five years from now, I hope that many more companies will have developed and released their own specialized chips, and that researchers will be able to harness and utilize them without needing to write code specifically for them, ideally without even needing to know that they exist. That is the future that we are working towards and that I believe we will have a significant role in shaping.
Thank you for the great interview, readers who wish to learn more should visit SF Tensor.












