Director-Level Camera Control
Dolly zooms, rack focuses, tracking shots, POV switches, and smooth handheld movement. Describe the shot you want, and the camera executes it.
Cinematic output with native audio, real-world physics, and director-level camera control. Accepts text, image, audio, and video inputs — up to 12 assets in a single generation.
Seedance 2.0 is ByteDance's most advanced video generation model. It uses a unified multimodal architecture that processes text, images, video clips, and audio together — generating cinematic video with native audio in a single pass.
What sets it apart is the combination of multi-shot storytelling, precise camera control, realistic physics simulation, and joint audio generation. One prompt can produce a multi-camera sequence with synced sound effects, dialogue, and music.
15s
Max video length
12
Input assets per generation
A+V
Native audio-video output
Turn on audio to hear the native sound generation. Every example below was generated in a single pass with no post-production.
Dolly zooms, rack focuses, tracking shots, POV switches, and smooth handheld movement. Describe the shot you want, and the camera executes it.
Fight scenes, vehicle chases, explosions, falling debris. Collisions have weight, fabric tears realistically, and characters move with physical believability even in high-action sequences.
Seedance 2.0 generates audio natively alongside video. Music carries deep bass and cinematic warmth. Dialogue is clear with precise lip-sync. Sound effects land exactly on cue.
Each of these videos was generated from a single text prompt. Copy any prompt to use as a starting point for your own generations.
Multi-camera action with crowd physics
Camera follows a man in black sprinting through a crowded street, a group chasing close behind. The shot cuts to a side tracking angle as he panics and crashes into a roadside fruit stall, scrambles to his feet, and keeps running. Sounds of a frantic crowd
Multi-shot action sequences work best when you describe camera angle changes alongside physical interactions and ambient audio cues.
Complex multi-character combat with environmental interaction
A spear-wielding warrior clashes with a dual-blade fighter in a maple leaf forest. Autumn leaves scatter on each impact. Wide shot pulls into tight close-ups of parrying blades, then cuts to a slow-motion overhead as both leap into the air
For choreographed action, layer environmental reactions (scattering leaves, dust) with camera movement transitions and slow-motion beats.
Continuous camera with character tracking and reveals
Spy thriller style. Front-tracking shot of a female agent in a red trench coat walking forward through a busy street, pedestrians constantly crossing in front of her. She rounds a corner and disappears. A masked girl lurks at the corner, glaring after her. Camera pans forward as the agent walks into a mansion and vanishes. Single continuous take, no cuts
Specify "single continuous take" and describe spatial transitions (rounding corners, entering buildings) to get long unbroken shots.
Multi-cut commercial with text overlay and varied angles
15s commercial. Shot 1: side angle, a donkey rides a motorcycle bursting through a barn fence, chickens scatter. Shot 2: close-up of spinning tires on sand, then aerial shot of the donkey doing donuts, dust clouds rising. Shot 3: snow mountain backdrop, the donkey launches off a hillside, text 'Inspire Creativity, Enrich Life' revealed behind it as dust settles
Structure multi-shot commercials with explicit shot numbers, timings, and camera angles. Include text overlay instructions directly in the prompt.
Label each shot with a number and timestamp. Include camera angle, subject action, and cut type: "Shot 1 (0-3s): Wide angle..."
Call out movements: dolly, pan, tracking, crane, whip pan, slow push-in. Seedance responds to film terminology.
Describe the soundscape: "sounds of a frantic crowd", "autumn leaves rustling", "engine roar". Audio is generated natively.
Include temporal cues: "slow-motion overhead", "15s commercial", "single continuous take, no cuts" for control over rhythm.
Describe how the environment responds to action: scattering leaves, rising dust, shattering glass. It sells physical realism.
Combine images for visual style, audio clips for soundtrack, and text for scene direction. The model reads each input's role automatically.
Use for multi-camera chase, fight, or high-energy scenes.
Use for multi-shot product or brand videos.
Use for continuous single-take storytelling.