Video Prompting Guide
Learn the motion-first workflow that improves consistency across PixelDojo video models. Build better prompts, control camera movement, and get cleaner outputs.
Overview
AI video models interpret prompts differently than image models. If you need precise control over character look, wardrobe, and scene composition, treat video generation as a motion step, not a scene-design step.
Key distinction: Image models excel at generating precise characters and scenes. Video models excel at turning those scenes into motion.
Recommended Workflow
Step 1 - Lock The Look
- -Generate an image first in any PixelDojo image tool.
- -Use that output as your exact character/scene reference.
- -Prioritize composition and identity before video generation.
Step 2 - Animate The Scene
- -Upload the image as the first frame in a video tool.
- -Prompt camera movement and subject action/dialogue.
- -Do not spend most of the prompt re-describing static visuals.
If you are using text-to-video, expect mixed and varied results. For precision and consistency, image-to-video is usually the better choice.
Prompt Framework
Keep prompts focused and structured. A reliable baseline is:
Scene anchor -> Camera move -> Subject action -> Timing/style cuesExample skeleton
Camera Language Cheat Sheet
Movement Verbs
- -pan left/right, tilt up/down
- -dolly in/out, truck left/right
- -orbit clockwise/counterclockwise
- -slow push, handheld drift, lock-off
Action Verbs
- -walks, turns, reaches, sits, stands
- -speaks softly, shouts, laughs, whispers
- -glances at camera, nods, pauses
- -door opens, cloth moves, smoke drifts
Visual Camera Examples
These example clips are reused from the WAN prompting guide to provide concrete motion references.
Pan Example
Directional camera control from the WAN guide
A low angle shot of a jazz pianist in a dimly lit 1920s jazz bar, playing the piano with concentration. He wears a white shirt with suspenders and black trousers, his hands move rapidly on the keys. Camera pans left to low angle shot of a cute girl with pigtails and glasses playing the trumpet.
Dolly Out Example
Clear push/pull motion language
In the style of an American drama promotional poster, Walter White sits in a metal folding chair wearing a yellow protective suit. Camera dollies out while the scene reveals an abandoned, dim factory with light filtering through windows.
Tracking Shot Example
Long-form movement through a complex scene
A sprawling cyberpunk metropolis, neon lights reflecting off rain-soaked streets. The camera follows a hooded figure in a long tracking shot, weaving through a crowded market.
Prompt Templates
Image-to-Video Template
Use this when you already have a reference frame.
Text-to-Video Template
Use this when no reference image is available.
Sample Prompt Gallery
Additional visual prompt examples from the WAN guide, useful as starting points in other video models.
Neon Drift
Sample prompt clip from WAN gallery
A rainy night in a dense cyberpunk market, neon kanji signs flicker overhead. The camera starts shoulder-height behind a hooded courier, steadily tracking forward as he weaves through crowds of holographic umbrellas.
Alpine Reveal
Cinematic pullback with landscape reveal
Extreme close-up of a mountaineer's ice axe biting into frozen rock. Camera dollies back and tilts up simultaneously, revealing the climber and a vast sunrise-lit alpine ridge behind him.
Aquatic Ballet
Orbit movement and atmospheric timing
An orca breaches in crystal-clear Arctic waters. Slow 360-degree orbital shot around the soaring whale as droplets hang suspended under soft polar sunset light.
Text-to-Video vs Image-to-Video
Text-to-Video
- - Fast for rough ideation.
- - Good for testing general movement concepts.
- - More variance in identity and scene layout.
Image-to-Video
- - Best for consistent characters and composition.
- - Better control over continuity between shots.
- - Recommended for production-ready results.
Frequently Asked Questions
How long should my prompt be?
Keep it concise but specific. One clear motion path and one clear subject action usually works better than overly long, descriptive prompts.
Should I describe clothing and environment every time?
Only when needed. In image-to-video, the reference image already defines most visual detail, so focus your text on movement and behavior.
Why isn't the model following my prompt?
Video models can reinterpret scene details, especially in text-to-video. For better prompt adherence, use image-to-video: generate your exact scene/character first, upload it as the first frame, then prompt motion, camera movement, and action.
Can I still use this guide for Wan, Kling, Sora, Veo, and others?
Yes. This guide is model-agnostic and intentionally focuses on universal prompting patterns.