What it is
Thinking Machines Lab’s May 11, 2026 post introducing “interaction models,” a real-time, multimodal approach designed to make interaction a native property of the model rather than external scaffolding.
Gabriel’s notes
This is Thinking Machines Lab’s announcement post (dated May 11, 2026) for a “research preview” of what they call interaction models: models designed to handle real-time collaboration natively (instead of relying on external, turn-based UX scaffolding). They frame the core idea as: interaction should scale alongside intelligence—so the interface isn’t a bolt-on afterthought.
Concretely, the post describes continuous, multimodal I/O (audio, video, and text) and a time-aligned “micro-turn” design (200ms chunks) intended to support interruptions, overlap, silence, and other normal conversation dynamics as first-class context.
Quick take: If you’ve ever felt like voice assistants are basically email threads wearing a headset, this is the “make it a phone call” school of thought. It’s a serious attempt to treat interaction as a model capability—not just a UI trick.
Thinking Machines Lab (TML) positions itself as an AI research and product company focused on making AI more widely understood, customizable, and generally capable—and they explicitly say they want broader access to the knowledge and tools people need to make AI work for their goals.
In early May 2026 (specifically May 11, 2026), TML announced what they call “interaction models.” Their stated bet is straightforward: interactivity should scale alongside intelligence, and human-in-the-loop work shouldn’t be penalized just because the underlying systems are built around turn-taking.
They claim their interaction models are trained from scratch and designed for real-time responsiveness via a multi-stream, micro-turn architecture. They also report benchmark results for a model they name TML-Interaction-Small, including a reported turn-taking latency of 0.40s on FD-bench v1 (streaming).
I saved this under AI because this is less about a single demo and more about a possible new “default shape” for multimodal assistants: always-on perception, always-on response, and fewer brittle glue layers between the two.
Good fit if you want to:
- Design assistants that can be interrupted (and can interrupt) without the whole experience becoming a laggy mess.
- Build voice/video-first collaboration flows (pair-programming, tutoring, live troubleshooting, translation).
- Think about “time” as a core part of model context (silence, overlap, backchannels), not a rounding error.
- Evaluate whether your product needs an orchestration harness… or whether the harness is the product.
Pricing snapshot (auto-enriched):
The blog post is free to read. Product pricing for any interaction-model access is Unknown / not confirmed. The post says they plan to open a limited research preview in the coming months, with a wider release later in 2026.
Work-use / compliance snapshot (auto-enriched):
For simply reading the article: standard website/privacy considerations apply (device/log/usage data, cookies, etc.).
If you later use TML’s hosted services (separate from this article), their Service Terms of Use describe paid services and include language stating they will not use customer content to develop or improve their technologies (i.e., not for training/fine-tuning) and that they may collect and use de-identified/aggregated usage analytics. Review with counsel for your specific regulatory context.
Alternatives (auto-enriched):
- OpenAI Realtime API: A production-facing way to do low-latency, real-time multimodal conversations over WebRTC/WebSocket—more “ship it now,” less “new paradigm manifesto.”
- Kyutai Moshi: A research/open implementation of full-duplex spoken dialogue; great if you want to study or prototype full-duplex behavior without waiting on a closed preview.
Before you adopt it:
- Decide what “real time” means for your use case (e.g., <500ms turn latency vs. truly simultaneous backchanneling).
- Budget for the unsexy parts: streaming infra, jitter/pand long-session context management.
- Write down your interruption policy (when should the assistant jump in?) before you let the model improvise.
Sources
- https://thinkingmachines.ai/blog/interaction-models
- https://thinkingmachines.ai/
- https://thinkingmachines.ai/legal/tos/
- https://thinkingmachines.ai/legal/privacy/
- https://platform.openai.com/docs/api-reference/realtime
- https://arxiv.org/abs/2410.00037