Artificial Intelligence
Thinking Machines Lab Ships First Model With 200ms Real-Time Interaction

Thinking Machines Lab, the AI startup founded by former OpenAI CTO Mira Murati, released a research preview of its first in-house model on May 11, 2026, ending more than a year of silence on what the lab would actually build. The company calls the system an “interaction model” — a multimodal architecture trained from scratch to process audio, video, and text in 200-millisecond chunks rather than waiting for users to finish a turn.
The model, named TML-Interaction-Small, is a 276 billion-parameter mixture-of-experts system with 12 billion active parameters. According to the company’s announcement blog post, it is the first product from a lab that has raised about $2 billion at a $12 billion valuation without shipping anything beyond a fine-tuning tool. The release lands amid sustained pressure from talent departures and a stalled follow-on funding round.
What an Interaction Model Actually Does
Thinking Machines argues that today’s frontier models — including OpenAI’s GPT-Realtime and Google’s Gemini Live — bolt real-time behavior onto turn-based architectures using a “harness” of external components like voice-activity detection. Those components decide when the user has stopped speaking, then hand a finished utterance to the model. While the model generates a reply, its perception of the world freezes.
The interaction model replaces that scaffolding with what the company calls time-aligned micro-turns. The system continuously processes 200 milliseconds of input while generating 200 milliseconds of output, with both token streams interleaved on the same clock cycle. That structure lets the model interrupt a user mid-sentence, react to visual cues without being asked, or speak simultaneously with the user for tasks like live translation.
The architecture skips heavy standalone encoders. Audio is fed in as dMel features through a lightweight embedding layer, images are split into 40-by-40 patches, and all components are co-trained from scratch with the transformer. A separate background model runs asynchronously, handling deeper reasoning, tool calls, and web browsing while the interaction model stays present in the conversation.
On the company’s reported benchmarks, TML-Interaction-Small posts a turn-taking latency of 0.40 seconds on FD-bench V1, compared with 1.18 seconds for GPT-Realtime-2.0 in its minimal-thinking mode and 0.57 seconds for Gemini-3.1-flash-live. On FD-bench V1.5, which scores interaction quality across user interruptions, backchannels, and background speech, the model scores 77.8 against 46.8 for GPT-Realtime-2.0 minimal and 45.5 for Gemini-3.1-flash-live in its high-thinking mode. The figures are self-reported.
A Long-Awaited First Ship
The release closes a long gap between funding and product. Thinking Machines was founded in February 2025 and in July of that year closed a $2 billion seed round at a $12 billion valuation — widely reported as the largest seed round on record. The round was led by Andreessen Horowitz with participation from Nvidia, AMD, Cisco, Accel, ServiceNow, and Jane Street. Until now, the company’s only shipped product was Tinker, an API for fine-tuning open-weight models that launched in October 2025.
The intervening months brought turbulence. Co-founders Barret Zoph and Luke Metz left in January 2026 to return to OpenAI, with Murati announcing that the company had “parted ways” with Zoph. Andrew Tulloch decamped for Meta’s Superintelligence Labs after Mark Zuckerberg’s reported $1 billion offer to acquire the company outright was rebuffed. Meta has since hired five founding members of the lab. Murati responded by promoting Soumith Chintala, a co-creator of PyTorch, to CTO. A reported follow-on round at a roughly $50 billion valuation did not close by the end of 2025.
The compute story moved in the opposite direction. In March, Thinking Machines announced a partnership with Nvidia covering an undisclosed investment and the deployment of at least one gigawatt of next-generation Vera Rubin systems. The lab also expanded its Google Cloud relationship to cover frontier model training on Nvidia GB300 hardware.
What to Watch
The interaction model is not yet available to enterprises or the public. Thinking Machines says a limited research preview will open to selected partners in the coming months, with a wider release later in 2026. The company also plans to release larger interaction models, noting that the current 276B parameter version is the smallest variant it can serve at the required latency.
Independent verification of the benchmark claims is the immediate question. FD-bench is one of the few public benchmarks targeting interaction quality, and Thinking Machines’ scores have not yet been reproduced by third parties under realistic load. The proactivity tests the company introduced for visual cues, including adapted versions of RepCount-A, ProactiveVideoQA, and Charades, are new instruments without an established baseline.
The strategic bet is more pointed. While OpenAI, Anthropic, and Google have spent the past year pushing autonomous agent capabilities, Thinking Machines is wagering that the next axis of competition will be how humans communicate with AI — closer to a continuous conversation than a series of prompts. The interaction model competes most directly with the real-time voice AI systems shipping from OpenAI, Google, and a growing tier of speech-focused startups. Whether the architecture survives contact with production workloads — long sessions, unreliable connectivity, and the safety constraints of real-time refusal — is the test the next preview round will impose.












