Director-Level Camera Control
Dolly zooms, rack focuses, tracking shots, POV switches, and smooth handheld movement. Describe the shot you want, and the camera executes it.
Cinematic output with native audio, real-world physics, and director-level camera control. Accepts text, image, audio, and video inputs — up to 12 assets in a single generation.
Seedance 2.0 is ByteDance's most advanced video generation model. It uses a unified multimodal architecture that processes text, images, video clips, and audio together — generating cinematic video with native audio in a single pass.
What sets it apart is the combination of multi-shot storytelling, precise camera control, realistic physics simulation, and joint audio generation. One prompt can produce a multi-camera sequence with synced sound effects, dialogue, and music.
15s
Max video length
12
Input assets per generation
A+V
Native audio-video output
Turn on audio to hear the native sound generation. Every example below was generated in a single pass with no post-production.
Dolly zooms, rack focuses, tracking shots, POV switches, and smooth handheld movement. Describe the shot you want, and the camera executes it.
Fight scenes, vehicle chases, explosions, falling debris. Collisions have weight, fabric tears realistically, and characters move with physical believability even in high-action sequences.
Seedance 2.0 generates audio natively alongside video. Music carries deep bass and cinematic warmth. Dialogue is clear with precise lip-sync. Sound effects land exactly on cue.
Each of these videos was generated from a single text prompt. Copy any prompt to use as a starting point for your own generations.
Multi-camera action with crowd physics
Camera follows a man in black sprinting through a crowded street, a group chasing close behind. The shot cuts to a side tracking angle as he panics and crashes into a roadside fruit stall, scrambles to his feet, and keeps running. Sounds of a frantic crowd
Multi-shot action sequences work best when you describe camera angle changes alongside physical interactions and ambient audio cues.
Complex multi-character combat with environmental interaction
A spear-wielding warrior clashes with a dual-blade fighter in a maple leaf forest. Autumn leaves scatter on each impact. Wide shot pulls into tight close-ups of parrying blades, then cuts to a slow-motion overhead as both leap into the air
For choreographed action, layer environmental reactions (scattering leaves, dust) with camera movement transitions and slow-motion beats.
Continuous camera with character tracking and reveals
Spy thriller style. Front-tracking shot of a female agent in a red trench coat walking forward through a busy street, pedestrians constantly crossing in front of her. She rounds a corner and disappears. A masked girl lurks at the corner, glaring after her. Camera pans forward as the agent walks into a mansion and vanishes. Single continuous take, no cuts
Specify "single continuous take" and describe spatial transitions (rounding corners, entering buildings) to get long unbroken shots.
Multi-cut commercial with text overlay and varied angles
15s commercial. Shot 1: side angle, a donkey rides a motorcycle bursting through a barn fence, chickens scatter. Shot 2: close-up of spinning tires on sand, then aerial shot of the donkey doing donuts, dust clouds rising. Shot 3: snow mountain backdrop, the donkey launches off a hillside, text 'Inspire Creativity, Enrich Life' revealed behind it as dust settles
Structure multi-shot commercials with explicit shot numbers, timings, and camera angles. Include text overlay instructions directly in the prompt.
Dense detail and fidelity keywords for maximum realism
A weathered fisherman mending nets on a sun-bleached wooden dock at golden hour, amber sunlight catching salt spray in the air, gnarled hands threading frayed twine through sun-bleached mesh, seagulls wheeling overhead against a lavender sky streaked with coral clouds, a distant lighthouse beam sweeping across choppy slate-blue waters. Hyper-realistic, 8k. Sounds of creaking dock wood, gentle waves lapping against barnacle-crusted pilings, and distant seagull calls.
Pack every prompt with sensory detail — textures, colors, light quality, sounds. Add "hyper-realistic, 8k" as quality anchors to push the model toward maximum fidelity.
Naming a DP to guide lighting, framing, and mood
Roger Deakins-style cinematography. A lone figure walks across a vast cracked salt flat at sunrise, their silhouette casting a razor-thin shadow across the white earth. Camera mounted low, slow dolly forward tracking the figure's boots. Warm amber backlight flares into the lens. The figure stops and looks toward the horizon. Hyper-realistic, 8k. Sounds of wind howling across the empty landscape and distant rumbling thunder.
Reference a cinematographer or director by name to steer the visual style. The model adapts lighting, lens choices, and camera movement to match their signature look.
Bracket timestamps for precise scene transitions
[0-5s]: Wide establishing shot of a neon-lit Tokyo alley at night, rain falling steadily, camera slowly dollying forward past glowing signs and steam vents. [5-10s]: Medium shot inside a tiny ramen shop, a chef ladles rich broth into a bowl with practiced precision, steam rising into warm overhead light. [10-15s]: Extreme close-up of the finished ramen bowl placed on the counter, chopsticks snapping apart, sounds of sizzling pork, bubbling broth, and the quiet murmur of the shop.
Use [0-Xs]: bracket notation to give the model explicit temporal structure. Each segment gets its own camera angle, subject, and audio cues for precise multi-shot control.
Natural cinematic shot progression for momentum
A street violinist performing in a cobblestone European square at golden hour. Wide shot establishing the scene with passersby and autumn chestnut trees. Camera pushes in to a medium shot framing the musician's upper body and violin. Swift dolly zoom into an extreme close-up of fingers dancing across the strings, rosin dust catching the last rays of afternoon sunlight. Sounds of a melancholic violin melody echoing off sandstone walls.
Structure shots from wide establishing to medium to extreme close-up. This natural cinematic progression creates momentum, drawing the viewer deeper into the scene.
Label each shot with a number and timestamp. Include camera angle, subject action, and cut type: "Shot 1 (0-3s): Wide angle..."
Call out movements: dolly, pan, tracking, crane, whip pan, slow push-in. Seedance responds to film terminology.
Describe the soundscape: "sounds of a frantic crowd", "autumn leaves rustling", "engine roar". Audio is generated natively.
Include temporal cues: "slow-motion overhead", "15s commercial", "single continuous take, no cuts" for control over rhythm.
Describe how the environment responds to action: scattering leaves, rising dust, shattering glass. It sells physical realism.
Combine images for visual style, audio clips for soundtrack, and text for scene direction. The model reads each input's role automatically.
Pack detail into every prompt. Instead of "a car chase," describe the rain-slicked streets, neon reflections, headlight beams cutting through mist. Seedance rewards specificity.
Add phrases like "hyper-realistic, 8k" as fidelity indicators. These push the model toward maximum visual quality and sharpness.
For multi-shot sequences, use [0-5s]: ... [5-10s]: ... bracket notation. The model recognizes this as explicit temporal structure.
Name a director of photography like Deakins, Lubezki, or Kurosawa for style guidance. The model adapts lighting, framing, and movement to match.
Structure shots from wide establishing to medium to extreme close-up. This natural cinematic progression creates momentum and draws the viewer in.
When using multiple images, videos, or audio clips, reference them with [Image1], [Video1], [Audio1] notation in your prompt for precision control.
Use for multi-camera chase, fight, or high-energy scenes.
Use for multi-shot product or brand videos.
Use for continuous single-take storytelling.
Use for precisely timed multi-shot transitions with bracket notation.