Grok Imagine Prompting Guide

Master Grok Imagine
for Images & Video

From your first generation to extended multi-scene narratives — prompt structure, camera language, creative workflows, and copy-ready templates for Grok Imagine.

Adapted from the @XCreators official Grok Imagine guide.

Overview

Grok Imagine turns text and images into high-quality images and video. You describe what you want, pick your settings, hit generate, and have a clip or image ready to use. This guide covers every step: how to set up your first generation, how to write prompts that produce great results, how to extend clips into longer sequences, and how to build a repeatable workflow.

Prompt formula: Subject + style/mood + lighting + camera angle + finishing details

Generation Modes

Text-to-Image

Type a description and generate an image. Grok Imagine produces vivid, high-quality results from natural language prompts — no keyword stuffing needed.

Text-to-Video

Describe a scene and generate a video clip directly. Write like you're describing the scene to a friend: what is happening, what it looks like, how the camera moves. Works best when you have a clear scene in mind.

Image-to-Video

Start from any image and animate it into a short clip with synced audio. The image can be one you generated or one you upload. Because you already know what the base image looks like, the output is more predictable.

Pro Tip: The Two-Step Workflow

Generate an image first using a detailed prompt. Review it. If the composition, lighting, and subject look right, animate it with a short motion prompt. Keep the video prompt focused on describing movement, camera, and mood — the model already has the visual context.

Image Prompt

Two otters in aquamarine water, viewed from above with a vintage film aesthetic.

Video Prompt

Calm organic movement, subject is still and pulls out slowly. Otters slightly drifting, mostly calm.

Reference-to-Video

Upload up to 7 reference images as the visual foundation or style reference. Grok blends your references with your text prompt, so you get output that matches a specific look, brand style, color palette, or visual tone.

Choosing Your Settings

Aspect Ratio

Choose based on where you plan to post or use the output.

16:99:161:14:33:43:22:3

Duration (video)

Select 1–15 seconds for your clip. Shorter clips generate faster and cost fewer credits. Use Extend Video to chain clips into longer sequences.

Quality (video)

480p for quick drafts and iteration. 720p for polished, share-ready output.

Writing Prompts That Work

The quality of your output starts with the prompt. Specific prompts give the model clear anchors, but avoid getting too prescriptive to allow for creative freedom.

The Reliable Structure

SubjectStyle / MoodLightingCamera AngleFinishing Details

Vague

"a city at night"

Specific

"futuristic Tokyo street at 2am, rain-slicked asphalt, neon reflections, low-angle wide shot, cinematic fog, Blade Runner mood"

Image Examples

Each image below was generated with Grok Imagine on PixelDojo. Copy any prompt to use as a starting point, then iterate by changing one variable at a time.

Cinematic World Building

Cinematic World Building

Subject + style + lighting + camera angle

Futuristic Tokyo street at 2am, rain-slicked asphalt, neon reflections, low-angle wide shot, cinematic fog, Blade Runner mood

Each detail eliminates ambiguity. Location, time, lighting, camera position, and tonal reference give the model concrete anchors.

Focused Composition

Focused Composition

Clear subject against a defined background

A single vendor arranging flowers at a market stall, soft morning light, blurred background, shallow depth of field, warm natural tones

A clear subject against a defined background consistently beats a busy, crowded scene. When in doubt, simplify.

Dramatic Portrait

Dramatic Portrait

Name your lighting for complete control

Close-up portrait of a woman with wind-swept hair, golden hour backlight creating rim light, cinematic color grade, 85mm lens bokeh, soft film grain

"Golden hour backlight," "overcast diffused light," and "hard rim light from the left" produce completely different results. Lighting is a high-leverage detail.

Product Photography

Product Photography

Clean composition for commercial use

Minimalist product shot of a matte black ceramic coffee mug on a marble countertop, soft studio lighting from above, subtle shadow, clean white background, commercial quality

Product shots work best with explicit lighting direction, surface materials, and background descriptions.

Video Examples

Videos generated with Grok Imagine on PixelDojo. Notice how text-to-video prompts include full scene descriptions while image-to-video prompts stay short and focused on motion.

Text → Video

Epic Aerial Shot

Text-to-video world building

Video Prompt

Aerial drone slowly descending over an ancient stone temple reclaimed by jungle, golden hour shafts of light cutting through the canopy, muted greens and amber, birds scattering from the treetops, epic scale, ambient jungle sounds

For text-to-video, include camera movement, lighting conditions, and ambient sound descriptions for cinematic results.

Text → Video

Action Sequence

Dynamic motion with cinematic camera

Video Prompt

A surfer dropping into a massive wave at golden hour, low-angle tracking shot following the board, water spray catching the light, cinematic slow motion, ocean roar and muffled underwater sounds

Name your camera movement explicitly — "low-angle tracking shot" translates directly into how the scene is animated.

Image → Video

Portrait Animation

Image-to-video with subtle motion

Image Prompt

Close-up portrait of a person looking into the distance, soft natural light, shallow depth of field

Video Motion Prompt

Wind gently moving their hair, camera slowly pulling back, ambient city sounds, soft film grain

For image-to-video, keep prompts short — the model already has the visual context. Just describe what should move and how.

Image → Video

Mood Piece

Atmospheric image-to-video

Image Prompt

Someone sitting alone at a rain-streaked cafe window at night with warm interior lighting

Video Motion Prompt

Barely-there movement, steam rising from the coffee cup, rain streaking down the window, lo-fi ambient soundtrack

Minimal motion prompts create atmospheric mood pieces. Let the scene breathe rather than forcing action.

Extending Your Videos

A single generation gives you a short clip. Grok Imagine's Extend Video feature lets you go further — select any frame as the starting point for an extension. The model carries forward motion, character positioning, lighting, and audio. Each extension adds additional seconds, and you can keep chaining them together.

Extension Prompting Tip

Write a continuation prompt that describes what happens next in the scene, not a full re-description. The model already knows what the scene looks like — just tell it where to go.

Too much

"A woman in a red dress sitting at a rain-streaked cafe window at night with neon reflections, she stands up and walks toward the door"

Better

"She stands up slowly, grabs her coat, and walks toward the door. The camera follows."

Pro Tips

Keep compositions focused

A clear subject against a defined background consistently beats a busy, crowded scene. When in doubt, simplify.

Go wider for people in video

Wider shots and slower movements produce the cleanest results when there are people in the frame. Pull the camera back and let the motion breathe.

Reference moods, not just adjectives

"Blade Runner mood" or "Studio Ghibli feel" gives the model a rich visual library to draw from. Single adjectives like "dark" or "soft" are too open-ended.

Name your lighting

"Golden hour backlight," "overcast diffused light," and "hard rim light from the left" produce completely different results. Lighting is a high-leverage detail you can specify.

Name your camera movement

"Slow dolly in," "pan right," "static wide" translate directly into how the scene is animated. If you don't specify, you're leaving one of the most important creative decisions to chance.

Keep image-to-video prompts short

When animating an existing image, the model already has the visual context. Your prompt just needs to describe what should move and how.

Run the same prompt more than once

Results vary between runs, even from the same prompt. If the first generation doesn't land, try it again before rewriting. The second or third attempt often nails it.

Save prompts that worked

Outputs vary between runs. When something lands, keep the exact prompt so you can build on it and iterate from a known-good baseline.

Prompt Templates

World Building (Text-to-Video)

Cinematic landscapes and environments with atmosphere.

Aerial drone slowly descending over [location], [time of day] [light quality] cutting through [environment detail], [color palette], [ambient element], epic scale, [ambient sound description]

Action Sequence (Text-to-Video)

Dynamic motion with explicit camera language.

[Subject] [action] at [time/light], [camera angle] [camera movement], [detail catching light], cinematic [speed], [sound description]

Product Shot (Image-to-Video)

Animate product images for polished commercial content.

Camera doing a slow 360 orbit, sharp shadow rotating with the light, subtle ambient tone

Portrait Animation (Image-to-Video)

Bring portraits to life with subtle, natural motion.

Wind gently moving their hair, camera slowly [direction], [ambient sound], soft film grain

Mood Piece (Image-to-Video)

Atmospheric scenes with minimal, intentional motion.

Barely-there movement, [subtle motion detail], [environmental effect], [ambient soundtrack style]

Reference Video (Reference-to-Video)

Use reference images to lock a visual style or brand tone.

Match the color palette, lighting style, and framing of the reference. [Subject description], [action], [camera movement], maintaining the reference aesthetic throughout