Skip to main content
Hailuo 2.3 Prompting Guide

Smooth natural motion.
Reliable output.

MiniMax Hailuo 2.3 generates video with notably smooth, natural motion — characters move with weight, environments feel coherent across frames. Standard (text-to-video) and Fast (image-to-video) modes, 768p / 1080p, 6 or 10 second clips. Default settings already produce strong results.

Overview

Hailuo 2.3 is MiniMax's general-purpose video model. Where it consistently outperforms: smooth character motion (no rubber-banding limbs, weight transfer reads naturally), environmental coherence across the clip (no morphing background), and reliable output from short prompts. It's the model to pick when you want a clip that just works on the first try.

Two modes: Standard (`minimax/hailuo-2.3`) for text-to-video, Fast (`minimax/hailuo-2.3-fast`) for image-to-video from a starting frame. 768p and 1080p resolutions across 16:9, 9:16, 1:1 aspects. 6s or 10s durations (10s available at 768p only). Pricing scales with resolution and duration — 4–20 credits per clip.

2

Modes (Standard · Fast)

6 · 10s

Clip durations

768p · 1080p

Resolution tiers

Key Features

Weight and coherence

Smooth Natural Motion

Characters move with weight, environments stay coherent frame-to-frame, no rubber-banding limbs or morphing backgrounds. Hailuo handles the everyday human motion (walking, gesturing, turning) more reliably than most video models.

Text-to-video and image-to-video

Standard + Fast Modes

Standard mode generates from text alone — describe the scene and Hailuo composes the shot. Fast mode is image-to-video: upload a starting frame and Hailuo writes motion onto it. Same prompt language for both; same quality bar.

Pick by use case, not aspiration

768p and 1080p

768p (default) is the right choice for social, internal review, and most web use. 1080p is for hero shots and final deliverables. 10-second clips only available at 768p — 1080p caps at 6s.

Short prompts work

Strong Defaults

Even a 15-word prompt produces a watchable clip with intentional camera work and natural pacing. Hailuo fills in cinematography decisions without you having to over-prompt the lighting, color, or camera direction.

Example Videos

Each example shows the exact prompt that produced the result. Copy any prompt with one click.

Cinematic Wide Shot

Standard · 1080p · 16:9 · 6s

Wide cinematic shot of a lone fishing boat drifting across a glassy lake at golden hour, mist clinging to distant pine-covered shores, the camera does a very slow forward dolly, warm amber color grade, quiet atmospheric mood

Lead with one dominant camera move ("slow forward dolly", "static locked-off", "handheld follow") and let Hailuo handle the rest. The model picks pacing well on its own — overspecifying camera language can fight its defaults.

Character Action Beat

Standard · 768p · 16:9 · 6s

Medium shot of a parkour runner sprinting toward a rooftop edge, jacket flapping in the wind, the runner leaps the gap between buildings in mid-frame, lands in a roll and keeps running offscreen, dynamic handheld camera, city lights blurred behind

For character action, narrate the beat sequentially: setup → leap → landing. Hailuo executes the motion arc cleanly — the runner's weight on the landing is one of the things this model does better than most.

Product Reveal

Standard · 1080p · 1:1 · 6s

Studio product shot of a matte-black ceramic coffee mug on a polished walnut surface, steam curling upward, the camera does a slow 90-degree rotational pan around the mug, soft top-down rim lighting against a dark gradient backdrop, advertising quality

Square 1:1 with a single explicit camera move (rotational pan, dolly in, push back) is the canonical product-spot setup. Hailuo at 1080p handles material textures (matte ceramic, polished walnut) very cleanly.

Atmospheric Landscape

Standard · 1080p · 16:9 · 6s

Wide cinematic landscape of a misty Norwegian fjord at dawn, mountain peaks emerging through low fog, a single distant fishing boat leaves a thin wake on glassy water, the camera does a very slow forward dolly, cool blue-green palette with warm amber highlight on one peak

Atmospheric landscapes reward 1080p — the resolution preserves fog, water surface, and distant detail that compress badly at 768p. Pair a single slow camera move with a 'warm spot' element to break the cool palette.

Portrait Vertical, 9:16

Standard · 768p · 9:16 · 10s

Medium close-up portrait of a woman with copper-red hair sitting in a Tokyo coffee shop window seat, neon signs reflecting on wet glass behind her, she slowly looks up from her phone, eyes glistening, a small smile breaks across her face, shallow depth of field, warm interior light against cool blue rain

9:16 vertical at 768p with 10-second duration is ideal for short-form social. Hailuo handles emotional micro-beats (looking up, smile breaking, eyes glistening) better with specific verbs than generic 'happy expression'.

Prompting Tips

Short prompts are fine

Unlike some models, Hailuo doesn't reward 200-word prompts. 15-30 words with a clear subject, action, camera move, and mood are usually enough. The model fills in the cinematography with reasonable defaults.

Pick one camera move

Name one dominant camera move ('slow forward dolly', 'static locked-off', 'handheld follow', '90-degree rotational pan') and let it run. Trying to combine moves in a 6-second clip usually produces a confused result.

Specify motion verbs, not adjectives

'She slowly looks up and smiles' beats 'happy woman'. 'The runner leaps the gap and lands in a roll' beats 'parkour action'. Concrete verbs give Hailuo a motion target; adjectives leave it guessing.

Default to 768p for iteration

768p costs less and generates faster. Use it to nail the prompt and composition. Switch to 1080p only for the final deliverable — atmospheric scenes and material-heavy product shots benefit most.

Fast mode needs a strong starting frame

Fast mode is image-to-video — Hailuo uses your image as the first frame and writes motion onto it. The frame should already be visually rich (composition, lighting, subject pose). A blurry or generic source produces a generic clip.

10s only at 768p

Longer clips (10 seconds) are 768p-only. If you need 1080p, cap at 6s and let the editor crossfade if you need length. Don't fight the constraint.

Settings Reference

SettingValuesNotes
ModeStandard (text-to-video) · Fast (image-to-video)Fast requires a starting frame; Standard generates from text alone.
Resolution768p · 1080p10s clips only at 768p. 1080p caps at 6s.
Duration6s · 10s10s requires 768p resolution.
Aspect ratio16:9 · 9:16 · 1:1UI hint — actual aspect comes from source frame in Fast mode.
Prompt optimizerOn / OffOptional MiniMax-side prompt expansion. Try both; sometimes off produces cleaner output for already-detailed prompts.
Pricing4–20 credits per clipStandard: 8 cr (768p 6s), 12 cr (768p 10s / 1080p 6s), 20 cr (1080p 10s — N/A). Fast: 4–7 credits.

FAQ

Standard for text-to-video from scratch — describe the scene and Hailuo composes it. Fast for image-to-video — upload a starting frame and Hailuo writes motion onto it. Fast is cheaper (4–7 credits vs 8–20) and gives you more control over the look of frame one.