Model-Agnostic GuideAll Video Tools

Video Prompting Guide

Learn the motion-first workflow that improves consistency across PixelDojo video models. Build better prompts, control camera movement, and get cleaner outputs.

Image-First
Best for consistency
Motion-First
Prompt what moves
All Models
Reusable workflow

Overview

AI video models interpret prompts differently than image models. If you need precise control over character look, wardrobe, and scene composition, treat video generation as a motion step, not a scene-design step.

Key distinction: Image models excel at generating precise characters and scenes. Video models excel at turning those scenes into motion.

Recommended Workflow

Step 1 - Lock The Look

  • -Generate an image first in any PixelDojo image tool.
  • -Use that output as your exact character/scene reference.
  • -Prioritize composition and identity before video generation.

Step 2 - Animate The Scene

  • -Upload the image as the first frame in a video tool.
  • -Prompt camera movement and subject action/dialogue.
  • -Do not spend most of the prompt re-describing static visuals.

If you are using text-to-video, expect mixed and varied results. For precision and consistency, image-to-video is usually the better choice.

Prompt Framework

Keep prompts focused and structured. A reliable baseline is:

Scene anchor -> Camera move -> Subject action -> Timing/style cues

Example skeleton

[Who/what is in frame]. Camera [move]. Subject [action/dialogue]. [Pacing/timing]. [Lighting/mood].

Camera Language Cheat Sheet

Movement Verbs

  • -pan left/right, tilt up/down
  • -dolly in/out, truck left/right
  • -orbit clockwise/counterclockwise
  • -slow push, handheld drift, lock-off

Action Verbs

  • -walks, turns, reaches, sits, stands
  • -speaks softly, shouts, laughs, whispers
  • -glances at camera, nods, pauses
  • -door opens, cloth moves, smoke drifts

Visual Camera Examples

These example clips are reused from the WAN prompting guide to provide concrete motion references.

Pan Example

Directional camera control from the WAN guide

A low angle shot of a jazz pianist in a dimly lit 1920s jazz bar, playing the piano with concentration. He wears a white shirt with suspenders and black trousers, his hands move rapidly on the keys. Camera pans left to low angle shot of a cute girl with pigtails and glasses playing the trumpet.

Dolly Out Example

Clear push/pull motion language

In the style of an American drama promotional poster, Walter White sits in a metal folding chair wearing a yellow protective suit. Camera dollies out while the scene reveals an abandoned, dim factory with light filtering through windows.

Tracking Shot Example

Long-form movement through a complex scene

A sprawling cyberpunk metropolis, neon lights reflecting off rain-soaked streets. The camera follows a hooded figure in a long tracking shot, weaving through a crowded market.

Prompt Templates

Image-to-Video Template

Use this when you already have a reference frame.

A confident woman in a tailored black coat stands in a rainy neon alley at night. Camera slowly dollies in while she turns her head toward camera and says: "We move at dawn." Subtle handheld micro-shake, shallow depth of field, drifting steam, foreground reflections sliding across frame.

Text-to-Video Template

Use this when no reference image is available.

A lone astronaut walks through a red dust storm on Mars. Wide shot, slow push-in, cape and dust moving in wind. The astronaut pauses, looks toward a distant beacon, then continues forward. Cinematic contrast, atmospheric haze, natural motion, no text overlays.

Additional visual prompt examples from the WAN guide, useful as starting points in other video models.

Neon Drift

Sample prompt clip from WAN gallery

A rainy night in a dense cyberpunk market, neon kanji signs flicker overhead. The camera starts shoulder-height behind a hooded courier, steadily tracking forward as he weaves through crowds of holographic umbrellas.

Alpine Reveal

Cinematic pullback with landscape reveal

Extreme close-up of a mountaineer's ice axe biting into frozen rock. Camera dollies back and tilts up simultaneously, revealing the climber and a vast sunrise-lit alpine ridge behind him.

Aquatic Ballet

Orbit movement and atmospheric timing

An orca breaches in crystal-clear Arctic waters. Slow 360-degree orbital shot around the soaring whale as droplets hang suspended under soft polar sunset light.

Text-to-Video vs Image-to-Video

Text-to-Video

  • - Fast for rough ideation.
  • - Good for testing general movement concepts.
  • - More variance in identity and scene layout.

Image-to-Video

  • - Best for consistent characters and composition.
  • - Better control over continuity between shots.
  • - Recommended for production-ready results.

Frequently Asked Questions

How long should my prompt be?

Keep it concise but specific. One clear motion path and one clear subject action usually works better than overly long, descriptive prompts.

Should I describe clothing and environment every time?

Only when needed. In image-to-video, the reference image already defines most visual detail, so focus your text on movement and behavior.

Why isn't the model following my prompt?

Video models can reinterpret scene details, especially in text-to-video. For better prompt adherence, use image-to-video: generate your exact scene/character first, upload it as the first frame, then prompt motion, camera movement, and action.

Can I still use this guide for Wan, Kling, Sora, Veo, and others?

Yes. This guide is model-agnostic and intentionally focuses on universal prompting patterns.