Four Resolution Tiers
Vidu Q3 generates at four resolution tiers. Use 360p / 540p for fast drafts and social loops, 720p for everyday output, and 1080p when you need clarity at full size. Credit cost scales with resolution, not duration.
Vidu Q3 — text-to-video and image-to-video at 360p, 540p, 720p, and 1080p across five aspect ratios. Integer durations from 1 to 16 seconds, optional synchronized audio. Hit the resolution you actually need, no upscaling tax.
Vidu Q3 is a flexible AI video generation model that lets you pick the exact resolution and aspect ratio you need before you hit generate. It accepts a text prompt and optionally a single first-frame image, and produces clips from 1 to 16 seconds at 360p, 540p, 720p, or 1080p across five aspect ratios.
What makes Vidu Q3 useful in a real workflow: the four resolution tiers mean you can iterate fast on 360p / 540p and only spend credits on 720p / 1080p once the prompt and shot are locked. The five aspect ratios cover every common social and cinematic frame without a second pass. Audio is off by default and one toggle away when you want it.
4×5
Resolutions × aspect ratios
1–16s
Integer duration range
T2V + I2V
Text or single-frame input
Tap the play button to view each clip. Audio is off by default in the model and in these previews; toggle volume on if a clip was generated with audio enabled.
Vidu Q3 generates at four resolution tiers. Use 360p / 540p for fast drafts and social loops, 720p for everyday output, and 1080p when you need clarity at full size. Credit cost scales with resolution, not duration.
Horizontal cinematic (16:9), classic broadcast (4:3), square social, portrait reels (3:4 / 9:16). One model, every common output frame — no upscaling or reformatting downstream.
Vidu Q3 generates audio only when you toggle it on. Useful when you want a clean silent clip for editing or are post-scoring later. Flip Audio: On to get synced ambient + diegetic sound layered with the visuals.
Upload one image — Vidu Q3 uses it as the first frame and writes motion onto it from your prompt. The source aspect drives the output frame, so portrait photos animate as portrait videos automatically.
Each example shows the exact prompt that produced the clip and the settings used. Copy any prompt with one click and tweak from there.
1080p · 16:9 · 8s · audio on
Vibrant cinematic travel montage of a host exploring a futuristic coastal city at golden hour. Slow aerial establishing shot drifting forward over neon-lined boardwalks, glass towers reflecting amber light, distant ocean waves crashing. Warm sunset color grade. Sounds of distant traffic, sea breeze, and faint city ambience.
Lead with the genre (cinematic travel), pick a single dominant camera move (slow aerial drift), and end with explicit color grade + sound design notes. Vidu Q3 follows that structure cleanly.
720p · 3:4 · 5s · audio on
Medium close-up of a young woman with rain-soaked hair sitting in a Tokyo coffee shop window seat, neon signs reflecting on the wet glass behind her. She slowly looks up from her phone, eyes glistening, and a small smile breaks across her face. Shallow depth of field, warm interior lighting against cool blue rain outside. Quiet ambient cafe sounds, distant rain on glass.
Portrait aspect (3:4) plus a precise emotional micro-beat (looking up, eyes glisten, smile breaks) gives a single character moment. Vidu reads these specific action verbs better than vague mood descriptions.
720p · 1:1 · 5s · audio off
Studio product shot of a matte-black ceramic coffee mug on a polished walnut surface, steam curling upward. Camera does a slow 90-degree rotational pan around the mug. Soft top-down rim lighting picks out the rim and the steam against a dark gradient backdrop. No text. Photoreal, advertising quality.
Square aspect (1:1) is purpose-built for product social. Specify the camera move (rotational pan, dolly in, push back) and the lighting style (rim, key, top-down) — Vidu's product output gets crisper with both.
720p · 16:9 · 10s · audio on
10-second action sequence. Shot 1: low-angle of a parkour runner sprinting toward a rooftop edge, wind whipping their jacket. Shot 2: mid-air shot of the runner leaping across a gap between buildings, city lights blurred behind. Shot 3: landing roll into a sprint that continues offscreen. Dynamic handheld camera throughout. Sounds of footsteps, fabric wind, and impact on landing.
Shot 1 / Shot 2 / Shot 3 explicit structure works at 720p+ and durations of 8s or more. State what changes per shot (angle, distance, motion) instead of describing the whole sequence as one beat.
1080p · 16:9 · 8s · audio on
Wide cinematic landscape of a misty Norwegian fjord at dawn. Mountain peaks emerge through low fog as the camera does a very slow forward dolly across glassy water. A single distant fishing boat leaves a thin wake. Cool blue-green palette with a soft warm glow on one mountain peak. Sounds of gentle water, distant gulls, and wind across mountains.
Atmospheric wides reward 1080p — the extra resolution preserves the fog, water surface, and distant detail that compress to mush at 540p. Pair the wide aspect with a single very slow camera move.
720p · 9:16 · 5s · from source image · audio off
Animate this portrait photo. Subtle breathing motion, hair catches a gentle breeze, eyes blink naturally, a small smile forms over 3 seconds. No camera movement — the subject animates in place. Maintain exact identity, lighting, and background from the source.
For I2V on a portrait, write motion that respects the source: subtle breathing, blink, hair catching wind, small expression shift. Adding 'maintain exact identity, lighting, background' prevents drift.
Six habits that consistently produce better Vidu Q3 output.
Vidu Q3 handles a single, clearly named camera move better than a sequence of moves stuffed into one shot. 'Slow forward dolly', 'rotational pan', 'static locked-off', 'handheld follow' — pick one and let the subject motion do the rest.
Adding 'warm sunset color grade', 'cool teal-and-orange palette', or 'desaturated documentary look' steers the look in a single phrase. Vidu reads grade language much better than generic 'cinematic' alone.
For durations 8s and up, structure your prompt as Shot 1 / Shot 2 / Shot 3 with one action and one camera angle per shot. The model uses the explicit structure to schedule cuts cleanly.
16:9 for cinematic and landscape, 9:16 for short-form vertical, 1:1 for product and feed posts, 3:4 / 4:3 when you want a slight portrait or vintage feel. Generating at the right aspect saves crop loss later.
If the clip is for an edit with a soundtrack you'll add later, leave audio off — it's cheaper and gives you a clean visual track. Turn audio on only when the generated soundscape will ship with the clip.
When you do turn audio on, end the prompt with a 'Sounds of …' line. Vidu Q3 hears those cues and lays them in: 'Sounds of distant traffic, sea breeze, footsteps on gravel.' Specific beats much better than 'with audio.'
Switch the mode toggle to Image-to-Video and upload a single source image. Vidu Q3 uses your image as the first frame and writes motion onto it from your prompt. The source aspect drives the output frame — portrait photos become portrait videos, square photos become square videos, automatically.
The cleanest I2V prompts respect the source. Describe motion that could plausibly happen inside the existing scene: subtle breathing, hair in a gentle breeze, eyes blink, steam curling off coffee, water rippling, light shifting. Avoid prompts that ask the model to fundamentally change the composition — that fights the first-frame anchor and produces drift.
A solid pattern: "Animate this image. [Motion 1], [Motion 2], [Motion 3]. Maintain exact identity, lighting, and background from the source." That last sentence is the single most effective drift guard for I2V.
| Setting | Values | Notes |
|---|---|---|
| Resolution | 360p · 540p · 720p · 1080p | Credits scale with resolution. Default 720p. |
| Aspect ratio | 16:9 · 4:3 · 1:1 · 3:4 · 9:16 | For I2V the source image drives aspect. |
| Duration | 1–16 seconds (integer) | UI exposes 3 / 5 / 8 / 10 / 16; API accepts any integer. |
| Audio | Off (default) / On | When on, end the prompt with a "Sounds of …" line for best results. |
| Seed | Integer 0–2,147,483,647 | Public API only. Same seed + prompt + settings reproduces the clip. |
| Mode | text-to-video / image-to-video | I2V requires an image_url; T2V requires only a prompt. |
Four tiers — 360p, 540p, 720p, and 1080p — across five aspect ratios (16:9, 4:3, 1:1, 3:4, 9:16). 360p and 540p are great for fast iteration and social-only output; 720p is the everyday default; 1080p is for hero clips and shots where atmospheric detail matters.