On May 19, 2026, during the Google I/O keynote at Shoreline Amphitheater, Demis Hassabis takes the stage and drops a sentence that will grab the attention of everyone who creates video with AI: your character keeps its face, voice, and identity from one scene to the next. No re-prompting, no lost coherence. That's precisely what Gemini Omni Flash promises, the first model from the new Gemini Omni family, available today in Google Flow, the Gemini app, and YouTube Shorts.
The missing piece for AI filmmakers
If you've ever tried to produce a short film with a generative video tool, you know the wall. Scene 1, your hero has brown eyes. Scene 2, he's blond with blue eyes. Scene 3, he speaks with a different voice. Until now the only solution was tedious workflows involving persona LoRAs in ComfyUI, or hoping Sora would eventually stabilize its character locks in production.
Gemini Omni Flash attacks this problem directly. According to the official Google blog:
“Omni Flash also improves character consistency, meaning identity and voice are preserved across every scene.
”
In practice: you create a character once, cast them into as many scenes as you want, and the model maintains their appearance, distinctive traits, and voice consistently. That's the missing piece for narrative AI production, following camera consistency and physics simulation.

"Create anything from anything": what the model can actually do
The Gemini Omni family adopts a concise tagline on the DeepMind page: "Create anything from anything." Behind that marketing slogan, the documented capabilities are concrete.
Here are the capabilities officially demonstrated at I/O 2026, drawn from the DeepMind product page:
- Multi-turn conversational video editing (you refine by talking, not by rewriting the prompt from scratch)
- Motion and style transfer from reference material
- Character or object swap via natural language
- Camera angle adjustments
- Sketch-to-realistic-video generation
- Stop motion and claymation
- Pose transfer and drawing-guided motion capture
- Character transformation with dialogue preservation
- Text synchronization with on-screen action (historically very hard for video models)
On that last point, Demis Hassabis emphasized during the keynote: Gemini Omni integrates "Gemini's reasoning powers with a better grasp of physics concepts such as kinetic energy and gravity."
With world models, AI is moving from predicting text to simulating reality.
That's the paradigm shift Google has been framing for months: a model that no longer predicts text, but models the physical world. The technical name for this framing is "world model," and Gemini Omni is their first consumer-facing implementation of the concept.
6 official prompts to understand the full scope
Google published six demonstration prompts on the DeepMind page. They communicate the model's input range better than any description:
Visual physics effect
"When the person touches the mirror, make the mirror ripple beautifully like liquid, and the person's arm turns into reflective mirror material" - physics simulation + real-time object transformation
Kinetic simulation
"A marble rolling fast on a chain reaction style track, continuous smooth shot" - applied physics + camera constraint (continuous shot)
Scientific stop motion
"Claymation explainer of protein folding, everything is made out of clay, no hands, stop motion, accurate" - style + scientific accuracy constraint simultaneously
Explainer with voiceover
"A skeuomorphism stop motion explainer about how the brain hippocampus works with a compelling voiceover" - audio + image sync + artistic style
Motion transfer
"Apply the pose and motion from input video to provided character from this image. Apply style from image reference to the new video" - combined multimodal input (video + image)
Text rendering
"word by word, one word on the screen at a time: did, you, know, that, this, model, can, do, pretty, good, text!?" - solving a historically hard problem for video models
These six examples deliberately cover the full spectrum: from visually spectacular effects to educational use cases, passing through motion capture and scientific explainers. Google is showing that Omni Flash isn't a niche cinema model, but a versatile production tool.


Distribution: the Flow + Gemini + YouTube Shorts trio
Omni Flash's distribution strategy is as significant as the model itself. Google chose to launch simultaneously across three surfaces:
Google Flow: the creative video studio launched at I/O 2025, now available in 140+ countries. Flow now integrates a dedicated AI agent and Custom Tools that any creator can build in natural language (and share or remix). One already-public example: pixelBento by Laszlo Gaal, which generates lo-fi and glitch effects without code.
Gemini app: the consumer-facing interface, with Omni Flash access for AI Pro and Ultra subscribers. Mobile Flow apps (Android in beta, iOS "coming soon") are also announced.
YouTube Shorts: direct integration for creators who publish short-form content.
The price drop warrants a critical look. At 100/month, it becomes competitive with Runway Pro and potentially with mid-tier creator subscriptions. For creators producing video content regularly, the math changes.
Safety and traceability: SynthID + C2PA
Every Omni Flash output is invisibly watermarked via SynthID (DeepMind technology) and carries C2PA (Content Credentials) metadata, the open standard for digital content provenance.
That's concrete progress in a domain that remains a blind spot for most AI video tools: traceability. Knowing whether a video was AI-generated, and by whom, is becoming a regulatory requirement in several countries (see ongoing discussions in the European Parliament under the AI Act). Google is getting ahead of this by embedding these metadata from day one.
What we don't know yet
The DeepMind page is deliberately marketing-forward. Several technical details remain unpublished and it would be inaccurate to invent them:
- Model size, parameter count, latency
- Public benchmarks against Sora 2, Veo 3, or Runway Gen-4
- Maximum duration of generated videos
- Exact technical mechanism behind character consistency (LoRA-like? persistent latent embedding? something else?)
- Timeline for the image and text outputs promised for the Omni family
These gaps are normal for a day-one announcement. Third-party benchmarks typically arrive in the weeks following a launch. For now, the keynote demos and six official prompts are the only verifiable performance indicators.
Frequently asked questions
Is Gemini Omni Flash available without a paid subscription?
No. Access to Gemini Omni Flash requires a Google AI Pro or Google AI Ultra subscription. AI Ultra dropped from 100/month at I/O 2026. No free access or limited tier has been announced at this stage.
What's the difference between Gemini Omni Flash and Veo 3?
Veo 3 was already in Google Flow before I/O 2026, focused on high-quality video generation. Gemini Omni Flash brings multimodal input (combining image, text, audio, and video), cross-scene character consistency, and multi-turn conversational editing. Both models coexist in Flow for now. Google has not published a convergence roadmap.
Does character consistency work with real faces?
Google has not specified the limits on real faces in its public documentation. SynthID and C2PA protections apply to all outputs. The European AI Act and Google's content policies also apply.
Is Google Flow available in the US?
Yes. Google Flow is available in 140+ countries since launch. The Flow page lists no geographic exception for the United States or the European Union as of May 19, 2026.
Going further
The full Google I/O 2026 keynote is the primary reference for verifying quotes from Pichai and Hassabis. The Gemini Omni section starts around the 35-minute mark:
Primary sources used for this breakdown:
If AI video generation interests you in a broader AI landscape context, we've broken down the latest competing announcements: Grok Build vs Claude Code: the May 2026 comparison and Meta's layoffs around its AI division give a panorama of the forces at play.
Want to integrate AI video into your creative strategy? Let's talk.