Seedance 2.0 by ByteDance

The Next Generation
of AI Video

Cinematic output with native audio, real-world physics, and director-level camera control. Accepts text, image, audio, and video inputs — up to 12 assets in a single generation.

Overview

Seedance 2.0 is ByteDance's most advanced video generation model. It uses a unified multimodal architecture that processes text, images, video clips, and audio together — generating cinematic video with native audio in a single pass.

What sets it apart is the combination of multi-shot storytelling, precise camera control, realistic physics simulation, and joint audio generation. One prompt can produce a multi-camera sequence with synced sound effects, dialogue, and music.

15s

Max video length

12

Input assets per generation

A+V

Native audio-video output

Key Features

Turn on audio to hear the native sound generation. Every example below was generated in a single pass with no post-production.

Advanced Cinematography

Director-Level Camera Control

Dolly zooms, rack focuses, tracking shots, POV switches, and smooth handheld movement. Describe the shot you want, and the camera executes it.

Real-World Physics

Action That Feels Real

Fight scenes, vehicle chases, explosions, falling debris. Collisions have weight, fabric tears realistically, and characters move with physical believability even in high-action sequences.

Audio-Video Joint Generation

Cinema-Grade Sound, Built In

Seedance 2.0 generates audio natively alongside video. Music carries deep bass and cinematic warmth. Dialogue is clear with precise lip-sync. Sound effects land exactly on cue.

Example Videos

Each of these videos was generated from a single text prompt. Copy any prompt to use as a starting point for your own generations.

High-Action Chase with Dynamic Tracking

Multi-camera action with crowd physics

Camera follows a man in black sprinting through a crowded street, a group chasing close behind. The shot cuts to a side tracking angle as he panics and crashes into a roadside fruit stall, scrambles to his feet, and keeps running. Sounds of a frantic crowd

Multi-shot action sequences work best when you describe camera angle changes alongside physical interactions and ambient audio cues.

Martial Arts Choreography in Nature

Complex multi-character combat with environmental interaction

A spear-wielding warrior clashes with a dual-blade fighter in a maple leaf forest. Autumn leaves scatter on each impact. Wide shot pulls into tight close-ups of parrying blades, then cuts to a slow-motion overhead as both leap into the air

For choreographed action, layer environmental reactions (scattering leaves, dust) with camera movement transitions and slow-motion beats.

Long-Take Spy Thriller

Continuous camera with character tracking and reveals

Spy thriller style. Front-tracking shot of a female agent in a red trench coat walking forward through a busy street, pedestrians constantly crossing in front of her. She rounds a corner and disappears. A masked girl lurks at the corner, glaring after her. Camera pans forward as the agent walks into a mansion and vanishes. Single continuous take, no cuts

Specify "single continuous take" and describe spatial transitions (rounding corners, entering buildings) to get long unbroken shots.

Multi-Shot Creative Commercial

Multi-cut commercial with text overlay and varied angles

15s commercial. Shot 1: side angle, a donkey rides a motorcycle bursting through a barn fence, chickens scatter. Shot 2: close-up of spinning tires on sand, then aerial shot of the donkey doing donuts, dust clouds rising. Shot 3: snow mountain backdrop, the donkey launches off a hillside, text 'Inspire Creativity, Enrich Life' revealed behind it as dust settles

Structure multi-shot commercials with explicit shot numbers, timings, and camera angles. Include text overlay instructions directly in the prompt.

Prompting Tips

Structure multi-shot sequences

Label each shot with a number and timestamp. Include camera angle, subject action, and cut type: "Shot 1 (0-3s): Wide angle..."

Use explicit camera language

Call out movements: dolly, pan, tracking, crane, whip pan, slow push-in. Seedance responds to film terminology.

Add audio cues in the prompt

Describe the soundscape: "sounds of a frantic crowd", "autumn leaves rustling", "engine roar". Audio is generated natively.

Specify pacing and timing

Include temporal cues: "slow-motion overhead", "15s commercial", "single continuous take, no cuts" for control over rhythm.

Layer environmental reactions

Describe how the environment responds to action: scattering leaves, rising dust, shattering glass. It sells physical realism.

Mix input types for control

Combine images for visual style, audio clips for soundtrack, and text for scene direction. The model reads each input's role automatically.

Action Sequence Template

Use for multi-camera chase, fight, or high-energy scenes.

Camera follows [subject] through [environment], [action description]. The shot cuts to a [angle] as [secondary action]. [Environmental reaction]. [Audio cue].

Commercial / Ad Template

Use for multi-shot product or brand videos.

[Duration] commercial. Shot 1: [angle], [subject action]. Shot 2: [close-up detail], then [wide shot]. Shot 3: [final reveal], text '[tagline]' revealed as [transition].

Long-Take Narrative Template

Use for continuous single-take storytelling.

[Genre] style. [Camera movement] of [character description] moving through [environment]. [Character action]. [Secondary character reveal]. Single continuous take, no cuts.

Frequently Asked Questions