Vidu Q3 Prompting Guide

Four resolutions.
Five aspects. One model.

Vidu Q3 — text-to-video and image-to-video at 360p, 540p, 720p, and 1080p across five aspect ratios. Integer durations from 1 to 16 seconds, optional synchronized audio. Hit the resolution you actually need, no upscaling tax.

Try Vidu Q3 on PixelDojo

Overview

Vidu Q3 is a flexible AI video generation model that lets you pick the exact resolution and aspect ratio you need before you hit generate. It accepts a text prompt and optionally a single first-frame image, and produces clips from 1 to 16 seconds at 360p, 540p, 720p, or 1080p across five aspect ratios.

What makes Vidu Q3 useful in a real workflow: the four resolution tiers mean you can iterate fast on 360p / 540p and only spend credits on 720p / 1080p once the prompt and shot are locked. The five aspect ratios cover every common social and cinematic frame without a second pass. Audio is off by default and one toggle away when you want it.

4×5

Resolutions × aspect ratios

1–16s

Integer duration range

T2V + I2V

Text or single-frame input

Key Features

Tap the play button to view each clip. Audio is off by default in the model and in these previews; toggle volume on if a clip was generated with audio enabled.

360p · 540p · 720p · 1080p

Four Resolution Tiers

Vidu Q3 generates at four resolution tiers. Use 360p / 540p for fast drafts and social loops, 720p for everyday output, and 1080p when you need clarity at full size. Credit cost scales with resolution, not duration.

16:9 · 4:3 · 1:1 · 3:4 · 9:16

Five Aspect Ratios

Horizontal cinematic (16:9), classic broadcast (4:3), square social, portrait reels (3:4 / 9:16). One model, every common output frame — no upscaling or reformatting downstream.

Audio off by default

Optional Synchronized Audio

Vidu Q3 generates audio only when you toggle it on. Useful when you want a clean silent clip for editing or are post-scoring later. Flip Audio: On to get synced ambient + diegetic sound layered with the visuals.

Animate any source still

Image-to-Video From a Single Frame

Upload one image — Vidu Q3 uses it as the first frame and writes motion onto it from your prompt. The source aspect drives the output frame, so portrait photos animate as portrait videos automatically.

Example Videos

Each example shows the exact prompt that produced the clip and the settings used. Copy any prompt with one click and tweak from there.

Cinematic Aerial Establishing Shot

1080p · 16:9 · 8s · audio on

Vibrant cinematic travel montage of a host exploring a futuristic coastal city at golden hour. Slow aerial establishing shot drifting forward over neon-lined boardwalks, glass towers reflecting amber light, distant ocean waves crashing. Warm sunset color grade. Sounds of distant traffic, sea breeze, and faint city ambience.

Lead with the genre (cinematic travel), pick a single dominant camera move (slow aerial drift), and end with explicit color grade + sound design notes. Vidu Q3 follows that structure cleanly.

Character Close-Up With Emotional Beat

720p · 3:4 · 5s · audio on

Medium close-up of a young woman with rain-soaked hair sitting in a Tokyo coffee shop window seat, neon signs reflecting on the wet glass behind her. She slowly looks up from her phone, eyes glistening, and a small smile breaks across her face. Shallow depth of field, warm interior lighting against cool blue rain outside. Quiet ambient cafe sounds, distant rain on glass.

Portrait aspect (3:4) plus a precise emotional micro-beat (looking up, eyes glisten, smile breaks) gives a single character moment. Vidu reads these specific action verbs better than vague mood descriptions.

Product Spot, Square Social

720p · 1:1 · 5s · audio off

Studio product shot of a matte-black ceramic coffee mug on a polished walnut surface, steam curling upward. Camera does a slow 90-degree rotational pan around the mug. Soft top-down rim lighting picks out the rim and the steam against a dark gradient backdrop. No text. Photoreal, advertising quality.

Square aspect (1:1) is purpose-built for product social. Specify the camera move (rotational pan, dolly in, push back) and the lighting style (rim, key, top-down) — Vidu's product output gets crisper with both.

Multi-Shot Action With Beats

720p · 16:9 · 10s · audio on

10-second action sequence. Shot 1: low-angle of a parkour runner sprinting toward a rooftop edge, wind whipping their jacket. Shot 2: mid-air shot of the runner leaping across a gap between buildings, city lights blurred behind. Shot 3: landing roll into a sprint that continues offscreen. Dynamic handheld camera throughout. Sounds of footsteps, fabric wind, and impact on landing.

Shot 1 / Shot 2 / Shot 3 explicit structure works at 720p+ and durations of 8s or more. State what changes per shot (angle, distance, motion) instead of describing the whole sequence as one beat.

Atmospheric Wide Landscape

1080p · 16:9 · 8s · audio on

Wide cinematic landscape of a misty Norwegian fjord at dawn. Mountain peaks emerge through low fog as the camera does a very slow forward dolly across glassy water. A single distant fishing boat leaves a thin wake. Cool blue-green palette with a soft warm glow on one mountain peak. Sounds of gentle water, distant gulls, and wind across mountains.

Atmospheric wides reward 1080p — the extra resolution preserves the fog, water surface, and distant detail that compress to mush at 540p. Pair the wide aspect with a single very slow camera move.

Portrait Vertical, Image-to-Video

720p · 9:16 · 5s · from source image · audio off

Animate this portrait photo. Subtle breathing motion, hair catches a gentle breeze, eyes blink naturally, a small smile forms over 3 seconds. No camera movement — the subject animates in place. Maintain exact identity, lighting, and background from the source.

For I2V on a portrait, write motion that respects the source: subtle breathing, blink, hair catching wind, small expression shift. Adding 'maintain exact identity, lighting, background' prevents drift.

Prompting Tips

Six habits that consistently produce better Vidu Q3 output.

Pick one dominant camera move

Vidu Q3 handles a single, clearly named camera move better than a sequence of moves stuffed into one shot. 'Slow forward dolly', 'rotational pan', 'static locked-off', 'handheld follow' — pick one and let the subject motion do the rest.

State the color grade explicitly

Adding 'warm sunset color grade', 'cool teal-and-orange palette', or 'desaturated documentary look' steers the look in a single phrase. Vidu reads grade language much better than generic 'cinematic' alone.

Use shot numbering for sequences

For durations 8s and up, structure your prompt as Shot 1 / Shot 2 / Shot 3 with one action and one camera angle per shot. The model uses the explicit structure to schedule cuts cleanly.

Match aspect to use case before generating

16:9 for cinematic and landscape, 9:16 for short-form vertical, 1:1 for product and feed posts, 3:4 / 4:3 when you want a slight portrait or vintage feel. Generating at the right aspect saves crop loss later.

Audio off when you'll post-score

If the clip is for an edit with a soundtrack you'll add later, leave audio off — it's cheaper and gives you a clean visual track. Turn audio on only when the generated soundscape will ship with the clip.

Add sound design notes when audio is on

When you do turn audio on, end the prompt with a 'Sounds of …' line. Vidu Q3 hears those cues and lays them in: 'Sounds of distant traffic, sea breeze, footsteps on gravel.' Specific beats much better than 'with audio.'

Image-to-Video

Switch the mode toggle to Image-to-Video and upload a single source image. Vidu Q3 uses your image as the first frame and writes motion onto it from your prompt. The source aspect drives the output frame — portrait photos become portrait videos, square photos become square videos, automatically.

The cleanest I2V prompts respect the source. Describe motion that could plausibly happen inside the existing scene: subtle breathing, hair in a gentle breeze, eyes blink, steam curling off coffee, water rippling, light shifting. Avoid prompts that ask the model to fundamentally change the composition — that fights the first-frame anchor and produces drift.

A solid pattern: "Animate this image. [Motion 1], [Motion 2], [Motion 3]. Maintain exact identity, lighting, and background from the source." That last sentence is the single most effective drift guard for I2V.

Settings Reference

Setting	Values	Notes
Resolution	360p · 540p · 720p · 1080p	Credits scale with resolution. Default 720p.
Aspect ratio	16:9 · 4:3 · 1:1 · 3:4 · 9:16	For I2V the source image drives aspect.
Duration	1–16 seconds (integer)	UI exposes 3 / 5 / 8 / 10 / 16; API accepts any integer.
Audio	Off (default) / On	When on, end the prompt with a "Sounds of …" line for best results.
Seed	Integer 0–2,147,483,647	Public API only. Same seed + prompt + settings reproduces the clip.
Mode	text-to-video / image-to-video	I2V requires an image_url; T2V requires only a prompt.

FAQ

Four tiers — 360p, 540p, 720p, and 1080p — across five aspect ratios (16:9, 4:3, 1:1, 3:4, 9:16). 360p and 540p are great for fast iteration and social-only output; 720p is the everyday default; 1080p is for hero clips and shots where atmospheric detail matters.