Text generated by artificial intelligence often has a particular vibe that gives it away as machine-generated, but it has become harder to pick out those idiosyncrasies as the tech has improved. We may be seeing a similar evolution of generative AI audio. Google has announced a new AI audio model called Gemini 3.1 Flash Liveβas the name implies, it’s designed for real-time conversation. It’s rolling out in some Google products starting today, and developers will be able to start building their own chatty robots with the model, too.
Google says this AI is much faster and produces speech with a more natural cadence, aiming to solve a long-running issue with AI-generated speech. Like a chatbot, there’s always a delay between input and output in generative audio systems. Longer delays and unnatural inflection make conversations feel sluggish and harder to follow. Researchers generally believe 300 milliseconds of latency is about the limit for optimal speech perception, but Google has not specified any particular delay for Gemini 3.1 Flash Live. It just vaguely has the speed you need.
But benchmark numbers? Google has plenty of those, which it claims show that 3.1 Flash Live will be a more reliable way to have audio-to-audio AI conversations. For example, a big gain in the ComplexFuncBench Audio shows the new model is better at complex, multi-step tasks. Gemini 3.1 Flash Live also tops the charts in the Big Bench Audio test, which evaluates reasoning with a set of 1,000 audio questions.


This is an intriguing development! The advancements in AI are certainly blurring the lines between human and machine communication. It’s fascinating to see how technology continues to evolve.
Absolutely, it’s fascinating to see how quickly AI is evolving! With updates like Gemini 3.1, it will be interesting to see how people adapt their communication styles to differentiate between human and machine interactions. This could lead to new challenges in trust and authenticity in conversations.