Qwen3.5-Omni processes text, images, audio, and video natively (opens in new tab)
Alibaba's Qwen team released an omni-modal model that handles text, images, audio, and video in a single architecture, with support for 113 languages and dialects.
From Week 14, 2026
Alibaba's Qwen team released an omni-modal model that handles text, images, audio, and video in a single architecture, with support for 113 languages and dialects.
From Week 13, 2026
Google launched its real-time voice model offering low-latency conversational AI across APIs and products, positioning it as infrastructure for voice-first agent experiences.
From Week 13, 2026
Mistral released a 4B-parameter multilingual TTS model supporting nine languages, optimized for on-device deployment with minimal audio samples for voice cloning.
From Week 12, 2026
Google's Stitch is a vibe design tool with an infinite canvas, voice input, Gemini-powered UI generation, and Design.md for portable design systems. Figma's stock dropped 8% on the news.
From Week 11, 2026
Claude Chat gains the ability to generate interactive charts and diagrams directly in conversation, making data exploration more visual without leaving the thread.
From Week 11, 2026
Open-source voice AI tool that lets you control macOS apps by speaking. Small project, surprisingly polished, and a glimpse at ambient voice interfaces beyond Siri.
From Week 10, 2026
Luma released multimodal agents built on its Uni-1 model that coordinate text, image, video, and audio generation end-to-end, launching with enterprise partners Publicis Groupe and Serviceplan Group.
From Week 10, 2026
Nick Tikhonov built a voice agent achieving roughly 400ms end-to-end latency by orchestrating STT, LLM, and TTS with optimized streaming, outperforming Vapi's equivalent setup by 2x.
Glean uses cookieless analytics by Cloudflare to count pageviews. We don’t track you across sites or sell your data. Read the privacy policy.