Interviews
Alex Holovach, Co-founder of Kubiks – Interview Series

Alex Holovach, co-founder of Kubiks is an experienced software engineer specializing in scalable, high-performance systems. He has led digital transformations, built fault-tolerant microservices, and developed enterprise integrations at Prove, TAG – The Aspen Group, airSlate, and Google. Today, he’s channeling this expertise into re-inventing observability with AI at Kubiks.
Kubiks is an AI-native observability platform that helps engineering teams monitor, diagnose, and resolve issues faster. It automatically captures logs, traces, queries, and LLM calls without manual setup, then uses AI to pinpoint root causes, send contextual alerts, and even suggest fixes. With real-time service maps, historical snapshots, and integrations across popular tools and cloud providers, Kubiks streamlines incident response and improves system reliability.
You’ve built and scaled infrastructure at companies like airSlate, Prove, and Google. Which of those roles most shaped your perspective on the challenges of scaling systems, and how did those lessons ultimately inspire you to co-found Kubiks?
I learned firsthand what it’s like to maintain reliability when over 100 engineers are pushing changes every day. In those setups, the bus factor, the risk if key team members are suddenly unavailable, is high, and the key is automating everything possible to keep the service running smoothly. But you can’t always predict what’ll break next. These experiences highlighted the limitations of traditional approaches, which is why having AI agents constantly monitoring every part in real time changes everything. They’re always on, alerting you instantly and handling root cause analysis when something goes off. That’s what drove me to co-found Kubiks.ai, to make that intelligent, always-on oversight accessible to more teams.
Kubiks launched in May 2025 with a bold promise: one-minute setup and AI-powered fixes. What gap in the market did you see that convinced you now was the right time to start this company?
There’s a huge gap right now because AI can finally add a self-healing layer to the internet. Our mission is straightforward: have AI monitor your production systems, run automatic root-cause analysis on breaks, and prepare safe fixes, so teams can react in seconds. With AI taking on the constant proactive monitoring, engineers can focus on swift reactions instead of endless checks. That’s the big shift we’re enabling.
Kubiks uniquely captures complete request and LLM calls, automatically generates fixes, and delivers pull requests for review. What technical breakthroughs enabled this frictionless detection-to-resolution flow? Was it hard to balance thoroughness with simplicity?
Our breakthrough is end-to-end correlation and context engineering: we automatically pull key IDs from every request, like payments, users, sessions, databases, queues, models, and versions, and weave them into a single timeline. With the full chain connected, the AI pinpoints the first failing call, the inputs that caused it, and exactly what needs fixing. This draws from Facebook’s Scuba, their internal observability tool. Once you’ve used something like that, you can’t go back to just metrics and aggregates.
Kubiks offers real-time visualizations, service maps, and relationship-centric views. How does combining logs, traces, metrics, and mapping into one unified dashboard radically change how teams detect and resolve issues?
Modern systems are like driving a car at highway speeds. If you had to parse every raw sensor reading, you’d surely crash. Instead, you need one dashboard that flags what’s wrong and where. That’s why we combine logs, traces, metrics, and a live map: one quick look gives you the full picture, and one click takes you to the fix. It turns scattered debugging into focused, efficient resolution.
Time travel and snapshot annotations sound powerful for historical debugging. In practical terms, what are some use cases where this has uncovered issues that real-time views alone couldn’t?
Imagine your core service goes down, and the live map turns red everywhere with system-wide errors but you can’t tell what failed first amid the chaos. For example, we once had an Airflow job with a misconfigured retry policy; it was scheduled for overnight but triggered mid-day during peak traffic, suffocating the database. Real-time views just showed widespread failures, but time travel let us rewind and watch the incident start with that job’s misfire, revealing the root cause that wasn’t clear live.
How does your AI analyze telemetry to detect anomalies and craft suggested fixes? Can you share examples where Kubiks caught subtle or silent issues that traditional monitoring would miss?
An engineer deployed new logic behind a feature flag, and production stayed stable for two weeks with the flag off. Then, enabling it for a user segment caused errors just for those users. In standard dashboards, it looked random and tough to trace back to the deploy. Kubiks connects each request to the code version, flag state, user segment, and downstream calls. When errors spiked, the AI matched them to the flag activation and the specific code path. It highlighted the failing function and triggering inputs. By linking observability to code and flags, the AI identifies causes quickly and suggests targeted fixes, catching what traditional tools overlook.
Users say Kubiks is “zero setup hassle” and “captures everything out of the box.” What measures did you take to ensure both trust and usability—from installation to daily workflow?
We designed Kubiks to feel familiar right from local development, so you build trust before production ever heats up. Our CLI runs your app locally, auto-instruments HTTP, DB, queues, and LLM calls, and streams clean telemetry; no manual logging or tracing needed. It feeds rich context to your AI code editor via MCP, with the exact same views you’ll see in staging and prod. You learn it once, in your natural flow while building features, making it seamless and reliable when it matters most.
Many AI startups today are wrestling with observability as their systems scale rapidly. How does Kubiks help smaller teams operate with the same reliability standards as billion-dollar companies?
Startups move fast. You can’t halt a sprint to add logs and traces everywhere. That’s why we emphasize automatic instrumentation. With one install, Kubiks captures the full picture out of the box: HTTP routes, database calls, LLM interactions. It lets small teams achieve enterprise-level reliability without the overhead.
With growing complexity in AI-powered systems, what role do you see Kubiks playing in ensuring reliability, observability, and actionability across distributed AI workloads?
Traditional microservices were complex but predictable. You could map the call graph and anticipate flows. Distributed AI flips that: agents interact dynamically, launch tools, adapt plans on the fly, and route based on context. It’s innovative but a nightmare for debugging. Kubiks auto-instruments the entire setup (every agent, tool, queue, webhook, and model call) then creates a live causal graph of who did what, when, and with what data. Our AI monitors this in real time, catching drifts, loops, missed handoffs, and poor decisions as they occur, not later in logs.
Looking ahead, how do you envision the evolution of observability in AI-powered, cloud-native environments? What roadmap are you pursuing—more automation, deeper intelligence, or expanded integration—for Kubiks.ai over the next few years?
Soon, companies will run millions of agents simultaneously across clouds, needing clear visibility into what is called what, when, and with what data. Observability will evolve to provide real-time insights into these dynamic systems, peering inside LLMs to understand their decisions. For Kubiks, we’re focusing on end-to-end tracing at the agent level: prompts, parameters, modes, tools, inputs, and outputs. This will help engineers detect threats, edge cases, and anomalies early, making complex AI environments more reliable and actionable.
Thank you for the great interview, readers who wish to learn more should visit Kubiks.












