Seedance 2.0 Prompting Guide

The Next Generation
of AI Video

Cinematic output with native audio, real-world physics, and director-level camera control. Accepts text, image, audio, and video inputs — up to 12 assets in a single generation.

Try Seedance on PixelDojo

Overview

Seedance 2.0 is the most advanced Seedance video generation model. It uses a unified multimodal architecture that processes text, images, video clips, and audio together — generating cinematic video with native audio in a single pass.

What sets it apart is the combination of multi-shot storytelling, precise camera control, realistic physics simulation, and joint audio generation. One prompt can produce a multi-camera sequence with synced sound effects, dialogue, and music.

15s

Max video length

Input assets per generation

A+V

Native audio-video output

Key Features

Turn on audio to hear the native sound generation. Every example below was generated in a single pass with no post-production.

Advanced Cinematography

Director-Level Camera Control

Dolly zooms, rack focuses, tracking shots, POV switches, and smooth handheld movement. Describe the shot you want, and the camera executes it.

Real-World Physics

Action That Feels Real

Fight scenes, vehicle chases, explosions, falling debris. Collisions have weight, fabric tears realistically, and characters move with physical believability even in high-action sequences.

Audio-Video Joint Generation

Cinema-Grade Sound, Built In

Seedance 2.0 generates audio natively alongside video. Music carries deep bass and cinematic warmth. Dialogue is clear with precise lip-sync. Sound effects land exactly on cue.

Example Videos

Each of these videos was generated from a single text prompt. Copy any prompt to use as a starting point for your own generations.

High-Action Chase with Dynamic Tracking

Multi-camera action with crowd physics

Camera follows a man in black sprinting through a crowded street, a group chasing close behind. The shot cuts to a side tracking angle as he panics and crashes into a roadside fruit stall, scrambles to his feet, and keeps running. Sounds of a frantic crowd

Multi-shot action sequences work best when you describe camera angle changes alongside physical interactions and ambient audio cues.

Martial Arts Choreography in Nature

Complex multi-character combat with environmental interaction

A spear-wielding warrior clashes with a dual-blade fighter in a maple leaf forest. Autumn leaves scatter on each impact. Wide shot pulls into tight close-ups of parrying blades, then cuts to a slow-motion overhead as both leap into the air

For choreographed action, layer environmental reactions (scattering leaves, dust) with camera movement transitions and slow-motion beats.

Long-Take Spy Thriller

Continuous camera with character tracking and reveals

Spy thriller style. Front-tracking shot of a female agent in a red trench coat walking forward through a busy street, pedestrians constantly crossing in front of her. She rounds a corner and disappears. A masked girl lurks at the corner, glaring after her. Camera pans forward as the agent walks into a mansion and vanishes. Single continuous take, no cuts

Specify "single continuous take" and describe spatial transitions (rounding corners, entering buildings) to get long unbroken shots.

Multi-Shot Creative Commercial

Multi-cut commercial with text overlay and varied angles

15s commercial. Shot 1: side angle, a donkey rides a motorcycle bursting through a barn fence, chickens scatter. Shot 2: close-up of spinning tires on sand, then aerial shot of the donkey doing donuts, dust clouds rising. Shot 3: snow mountain backdrop, the donkey launches off a hillside, text 'Inspire Creativity, Enrich Life' revealed behind it as dust settles

Structure multi-shot commercials with explicit shot numbers, timings, and camera angles. Include text overlay instructions directly in the prompt.

Overdescription with Quality Anchors

Dense detail and fidelity keywords for maximum realism

A weathered fisherman mending nets on a sun-bleached wooden dock at golden hour, amber sunlight catching salt spray in the air, gnarled hands threading frayed twine through sun-bleached mesh, seagulls wheeling overhead against a lavender sky streaked with coral clouds, a distant lighthouse beam sweeping across choppy slate-blue waters. Hyper-realistic, 8k. Sounds of creaking dock wood, gentle waves lapping against barnacle-crusted pilings, and distant seagull calls.

Pack every prompt with sensory detail — textures, colors, light quality, sounds. Add "hyper-realistic, 8k" as quality anchors to push the model toward maximum fidelity.

Cinematographer Reference Style

Naming a DP to guide lighting, framing, and mood

Roger Deakins-style cinematography. A lone figure walks across a vast cracked salt flat at sunrise, their silhouette casting a razor-thin shadow across the white earth. Camera mounted low, slow dolly forward tracking the figure's boots. Warm amber backlight flares into the lens. The figure stops and looks toward the horizon. Hyper-realistic, 8k. Sounds of wind howling across the empty landscape and distant rumbling thunder.

Reference a cinematographer or director by name to steer the visual style. The model adapts lighting, lens choices, and camera movement to match their signature look.

Time-Coded Multi-Shot Sequence

Bracket timestamps for precise scene transitions

[0-5s]: Wide establishing shot of a neon-lit Tokyo alley at night, rain falling steadily, camera slowly dollying forward past glowing signs and steam vents. [5-10s]: Medium shot inside a tiny ramen shop, a chef ladles rich broth into a bowl with practiced precision, steam rising into warm overhead light. [10-15s]: Extreme close-up of the finished ramen bowl placed on the counter, chopsticks snapping apart, sounds of sizzling pork, bubbling broth, and the quiet murmur of the shop.

Use [0-Xs]: bracket notation to give the model explicit temporal structure. Each segment gets its own camera angle, subject, and audio cues for precise multi-shot control.

Wide-to-Close-Up Progression

Natural cinematic shot progression for momentum

A street violinist performing in a cobblestone European square at golden hour. Wide shot establishing the scene with passersby and autumn chestnut trees. Camera pushes in to a medium shot framing the musician's upper body and violin. Swift dolly zoom into an extreme close-up of fingers dancing across the strings, rosin dust catching the last rays of afternoon sunlight. Sounds of a melancholic violin melody echoing off sandstone walls.

Structure shots from wide establishing to medium to extreme close-up. This natural cinematic progression creates momentum, drawing the viewer deeper into the scene.

Prompting Tips

Structure multi-shot sequences

Label each shot with a number and timestamp. Include camera angle, subject action, and cut type: "Shot 1 (0-3s): Wide angle..."

Use explicit camera language

Call out movements: dolly, pan, tracking, crane, whip pan, slow push-in. Seedance responds to film terminology.

Add audio cues in the prompt

Describe the soundscape: "sounds of a frantic crowd", "autumn leaves rustling", "engine roar". Audio is generated natively.

Specify pacing and timing

Include temporal cues: "slow-motion overhead", "15s commercial", "single continuous take, no cuts" for control over rhythm.

Layer environmental reactions

Describe how the environment responds to action: scattering leaves, rising dust, shattering glass. It sells physical realism.

Mix input types for control

Combine images for visual style, audio clips for soundtrack, and text for scene direction. The model reads each input's role automatically.

Overdescribe the scene

Pack detail into every prompt. Instead of "a car chase," describe the rain-slicked streets, neon reflections, headlight beams cutting through mist. Seedance rewards specificity.

Use quality anchors

Add phrases like "hyper-realistic, 8k" as fidelity indicators. These push the model toward maximum visual quality and sharpness.

Try time-coded brackets

For multi-shot sequences, use [0-5s]: ... [5-10s]: ... bracket notation. The model recognizes this as explicit temporal structure.

Reference cinematographers

Name a director of photography like Deakins, Lubezki, or Kurosawa for style guidance. The model adapts lighting, framing, and movement to match.

Progress from wide to close-up

Structure shots from wide establishing to medium to extreme close-up. This natural cinematic progression creates momentum and draws the viewer in.

Label reference inputs

When using multiple images, videos, or audio clips, reference them with [Image1], [Video1], [Audio1] notation in your prompt for precision control.

Action Sequence Template

Use for multi-camera chase, fight, or high-energy scenes.

Camera follows [subject] through [environment], [action description]. The shot cuts to a [angle] as [secondary action]. [Environmental reaction]. [Audio cue].

Commercial / Ad Template

Use for multi-shot product or brand videos.

[Duration] commercial. Shot 1: [angle], [subject action]. Shot 2: [close-up detail], then [wide shot]. Shot 3: [final reveal], text '[tagline]' revealed as [transition].

Long-Take Narrative Template

Use for continuous single-take storytelling.

[Genre] style. [Camera movement] of [character description] moving through [environment]. [Character action]. [Secondary character reveal]. Single continuous take, no cuts.

Time-Coded Sequence Template

Use for precisely timed multi-shot transitions with bracket notation.

[0-Xs]: [Wide/medium/close], [camera movement], [subject action], [lighting]. [Xs-Ys]: [Cut type], [new angle], [new action]. [Ys-Zs]: [Final shot], [resolution moment], [audio cue].