Skip to main content
Grok Imagine Video Prompting Guide

Grok Imagine Video.
xAI cinematic motion.

Grok Imagine Video is xAI's video model — cinematic motion with strong scene composition, two resolution tiers (480p / 720p), variable durations, and both text-to-video and image-to-video modes. Pair with Grok Image for end-to-end xAI visual content.

Overview

Grok Imagine Video is xAI's video generation model — the video sibling of Grok Image. Strong on cinematic compositions, naturalistic motion, and atmospheric scenes. Useful when you want a coherent xAI aesthetic across both static and motion content.

Two resolution tiers (480p for iteration, 720p for production), variable clip durations, six aspect ratios covering cinematic through vertical, and both text-to-video and image-to-video modes. Standard Grok-family prompt language carries between Image and Video.

2

Resolution tiers

Yes

T2V + I2V modes

xAI

Same family as Grok Image

Key Features

Smooth, naturalistic

Cinematic Motion

Grok Imagine Video produces smooth, naturalistic motion — strong on physically-plausible camera moves (tracking shots, push-ins, pull-backs) and coherent character animation across the duration of a clip. Less stylized than some competitors; more directed.

Wind, water, light

Strong on Atmospheric Scenes

Particularly good at atmospheric motion — wind catching fabric or hair, water surface motion, light shifts. The model reads atmospheric language ('windswept', 'soft natural light', 'ambient ocean sounds') as motion targets, not just visual style.

480p iteration / 720p production

720p Production Tier

Two resolution tiers: 480p for cheap iteration when prompting, 720p for production output. Default to 720p for keepers; 480p is for when you're stress-testing prompts and don't care about final fidelity.

Compose or animate

Text-to-Video + Image-to-Video

Text-to-video composes from a prompt alone. Image-to-video animates a starting frame (great for animating Grok Image outputs into video). Same aesthetic family — switching between Image and Video preserves the look.

Example Videos

Each example shows the exact prompt that produced the result. Copy any prompt with one click.

Cinematic Driving Shot

720p · 16:9 · 6s

A black 1969 Mustang fastback drives down an empty Mojave Desert highway at sunset, heat shimmer rising from the asphalt, slow tracking shot from behind and above, distant engine rumble, cinematic warm grade

"Slow tracking shot from behind and above" specifies angle and motion direction together. Grok Video executes this cleanly. Heat shimmer + warm grade + distant engine rumble = three layered sensory cues that elevate the clip beyond raw motion.

Character Beat

720p · 16:9 · 6s

A young woman in a yellow rain slicker stands on a windswept Irish cliff at dawn, looking out over the Atlantic, wind whipping her hair, the camera slowly pushes in toward her face, ambient ocean sounds, soft natural lighting

Character close-ups on Grok Video reward atmospheric motion ("wind whipping her hair", "windswept Irish cliff"). The slow push-in is a classic emotional-beat camera move; Grok handles it without jitter. Specific outerwear ("yellow rain slicker") gives character identity.

Product Macro

720p · 1:1 · 5s

Macro slow-motion of dark chocolate truffles being lifted with a silver fork from a velvet-lined box, soft warm spotlight, premium confectionery commercial mood, gentle ambient kitchen sounds

1:1 square at 5s is product social sweet spot. Macro slow-motion product shots reward Grok's smooth motion behavior — fork action, velvet drape, soft spotlight bloom all read coherent. "Premium confectionery commercial" anchors aesthetic without over-specifying.

Vertical Snowboarding

720p · 9:16 · 6s

A snowboarder carves down a fresh powder slope at sunrise, vertical handheld follow shot from behind, golden alpenglow on the peaks, snow spray catching the light

9:16 vertical at 720p for mobile-social. "Vertical handheld follow shot from behind" specifies the perspective; Grok composes the framing accordingly. Pair with golden alpenglow / snow spray for atmospheric polish that elevates the action.

Prompting Tips

Default to 720p

720p is the production tier; 480p is purely for cheap iteration when you're stress-testing prompts. For any keeper output, generate at 720p. The cost delta is worth the visible fidelity gain.

Name one dominant camera move

"Slow tracking shot from behind", "the camera pushes in toward her face", "vertical handheld follow shot from behind" — Grok reads camera language literally and executes ONE move cleanly. Combining two motion types usually produces compromised motion.

Use atmospheric language as motion direction

Grok Video treats atmospheric language ("windswept", "wind whipping", "heat shimmer rising", "snow spray catching") as motion targets, not just visual style. These prompts hit harder here than on stricter literal-motion models.

End with ambient cues

"Distant engine rumble", "ambient ocean sounds", "gentle ambient kitchen sounds" — Grok layers ambient and diegetic sound based on explicit cues. Generic "with audio" doesn't help; named sounds get rendered.

Use I2V to extend Grok Image work

Pair Grok Image (for static composition) with Grok Imagine Video in image-to-video mode (for the animation). Same aesthetic family — the I2V handoff preserves color, character identity, and composition while adding motion.

Aspect by channel

16:9 for cinematic / YouTube. 9:16 for TikTok / Reels / Shorts. 1:1 for square social spots. Generate at the right aspect; Grok composes differently per ratio.

Settings Reference

SettingValuesNotes
ModeText-to-video · Image-to-videoI2V requires a starting frame. T2V composes from text alone.
DurationVariable secondsCredit cost scales with duration. Default sweet spot around 5-6s.
Resolution720p · 480p720p for production. 480p for cheap iteration only.
Aspect ratio16:9 · 9:16 · 1:1 · 4:3 · 3:4 + extrasStandard aspect coverage. Match to downstream channel.
Source for I2VPass image as starting frameAnchors composition, identity, and palette. Prompt describes motion.
Provider familyxAI (same as Grok Image)Shared aesthetic across Grok Image and Grok Imagine Video.

FAQ

Grok Video is part of the xAI family — strong cinematic motion with naturalistic camera moves. Runway Gen-4.5 has stronger camera-language fluency. Veo 3 has native audio + dialogue. Kling 2.6 Pro has native audio + premium aesthetic. Grok is the right pick when you want shared aesthetic with Grok Image, or when you specifically want xAI's tuning.