Thinking Machines Lab dropped a research preview this week: a native multimodal architecture that listens, thinks, and responds in overlapping 200ms micro-turns — no silent pause while it processes your words. The model, TML-Interaction-Small, clocks in at 276B parameters (12B active, MoE) and turns in 0.40s latency against Gemini Flash's 0.57s and GPT-realtime's 1.18s.
The framing is the interesting part. Instead of bolting real-time voice on top of a language model, they've split the system in two: a foreground Interaction Model that manages presence and turn-taking, and a Background Model that handles sustained reasoning and tool use, streaming results back into the conversation as they land. It's a meaningful architectural bet — and the kind of thing that matters a lot if you're building products on top of live AI conversations.