Happy Horse 1.0 Prompting Guide#1 on Video Arena

Direct the Camera,
Not Just the Subject

Happy Horse 1.0 is Alibaba's newest video model — physics-aware, multi-shot, and unusually faithful to long structured prompts. Below: how to write for it, what makes it different, and prompt patterns you can copy.

720p · 1080p

Resolution tiers

2 – 15s

Clip length

T2V + I2V

Generation modes

4 / 6 cr / s

720p · 1080p

Try Happy Horse 1.0 Jump to sample prompts

Overview

Happy Horse 1.0 is the latest video generation model out of Alibaba's ATH AI Innovation Unit. The headline traits: it follows long, structured prompts more literally than most video models, it animates motion that obeys physics instead of drifting through it, and it currently sits at #1 on the Artificial Analysis Video Arena leaderboard for blind side-by-side preference.

On PixelDojo, Happy Horse runs in two modes — text-to-video and image-to-video — at 720p or 1080p, with clip lengths from 2 to 15 seconds. Pricing is 4 credits per second at 720p and 6 at 1080p, billed only on success.

This guide walks through what the model is good at, a prompt framework that takes advantage of those strengths, and ready-to-copy examples for the most common shot types.

Why Happy Horse Is Different

Six things Happy Horse does meaningfully better than the previous generation. Lean into these in your prompts and you'll see the difference immediately.

Physics-Aware Motion

Believable, not floaty

Trained on a motion engine that respects real-world physics. Collisions have weight, fabric folds, and limbs swing on natural arcs. Fewer of the warped, slow-motion-y movements that plague older video models.

Multi-Shot Direction

Cuts inside one prompt

Happy Horse is unusually good at following long, structured prompts that describe more than one camera angle or beat. You can write a three-shot mini-sequence and the model will actually cut between them inside a single 10-15s clip.

Crisp Texture & Color

Pleasant out-of-the-box look

Outputs land with sharp textures, balanced contrast, and pleasing color science before you ever touch a LUT. Skin tones, fabric weaves, and surface materials read naturally — fewer plastic-y faces and muddy blacks.

Strong I2V Consistency

Your starting frame stays itself

When you animate a still, Happy Horse holds the subject's identity, outfit, and lighting through the entire clip. Faces don't morph mid-scene and props don't drift — well-suited for character-driven shots.

Multilingual Prompts

EN, ZH, JA

Native understanding of English, Chinese, and Japanese prompts. You can mix languages in a single prompt — useful if you have a precise term in one language that doesn't translate cleanly.

Top-Ranked Quality

#1 on Video Arena

Currently sits at the top of the Artificial Analysis Video Arena leaderboard for blind side-by-side preference. Translation: humans pick its outputs over most competing models.

Prompt Framework

A reliable structure for Happy Horse prompts. Not all five blocks are required, but in this order they let the model build the shot the way a director would.

Subject + state

Who or what is in frame, and what they're doing right now. Be specific about clothing, age, posture, expression — these anchor the rest of the scene.

Environment + light

Where and when. Time of day matters more than you'd think — "golden hour", "blue hour", "overcast noon" each give the model a complete lighting direction.

Camera move

Name the move explicitly. Push-in, pull-back, side-tracking, low-angle dolly, slow orbit. Static shots are fine too — just say "locked off" or "static camera".

Beats / action

What changes during the clip. For multi-shot prompts, number them: "Shot 1… Shot 2…". For single shots, name the micro-action (a blink, a turn, a step forward).

Style & quality anchors

Closing tags steer the look: "hyper-realistic", "cinematic", "anime style", "film grain", "8k", "shallow depth of field". Happy Horse responds to film vocabulary.

Sample Prompts

Six prompts spanning the most common shot types. Each one is ready to copy and run as-is — or use the takeaway as a starting point for your own version.

Slow push-in, golden hour

Coastal Establishing Shot

A weathered fisherman in a navy wool sweater stands at the edge of a stone harbor at golden hour, mending a net. Camera starts wide showing the bay and lighthouse, then slowly pushes in to a medium shot framing his hands working the twine. Late sun catches salt spray drifting through the air. Gulls wheel overhead. Hyper-realistic, cinematic.

Takeaway: Open wide, push in. Spelling out the camera move ("slowly pushes in to a medium shot") gives Happy Horse a clear arc to animate against — much stronger than a static shot of the same subject.

Multi-shot inside one prompt

Three-Shot Action Cut

Shot 1: side-tracking shot of a stunt rider on a motocross bike launching off a dirt ramp, slow motion, dust kicking up behind the rear tire. Shot 2: low angle as the bike soars over a rusted school bus, sun behind the rider, lens flare. Shot 3: rider lands hard, suspension compressing, dirt sprays toward camera, rider raises a fist. Gritty, sun-baked color grade.

Takeaway: Number your shots. Happy Horse handles 2-3 explicit camera changes well inside a 10s clip — give each beat its own camera angle and physical action so the cut points are unambiguous.

Single locked shot

Character Close-Up with Subtle Motion

Tight close-up portrait of a woman with copper hair and freckles, looking directly into the lens. Soft window light from camera-left, warm fall tones. She blinks, the corner of her mouth lifts into a half-smile, and a single autumn leaf drifts past her cheek. Shallow depth of field, subtle film grain.

Takeaway: For portrait work, name the micro-actions you want (blink, half-smile, leaf drift). Happy Horse renders those small motions with believable timing instead of either freezing the subject or over-animating them.

Stylized, hand-drawn feel

Anime / 2D Animation Style

Studio anime style. A schoolgirl in a navy uniform stands on a rooftop at sunset, wind lifting her hair and ribbon. Camera arcs around her from a three-quarter back angle to a profile. Cherry blossom petals drift sideways. Soft cel-shaded color, gentle bloom on the highlights, thin clean linework.

Takeaway: Lead with the style descriptor ("Studio anime style", "cel-shaded", "thin clean linework"). Happy Horse holds 2D aesthetics steadily through camera moves — you don't need to repeat the style every sentence.

15s commercial structure

Product Spot

15-second product commercial. Shot 1: extreme close-up of water beading and rolling off a matte black running shoe, slow motion. Shot 2: cut to a runner mid-stride on a wet city street at dawn, side tracking shot, neon shop signs reflecting in puddles. Shot 3: hero shot of the shoe rotating slowly on a clean white pedestal, soft studio rim light. Clean, premium commercial energy.

Takeaway: For ads, structure as detail → action → hero shot. Premium adjectives like "clean, premium" steer color and pacing without naming brands.

Environment-first prompting

Atmospheric Wide

A vast salt flat at blue hour, cracked white earth stretching to a flat horizon. A single figure walks slowly toward camera from a hundred meters out, silhouetted against the residual sunset. Wind kicks up a fine dust haze across the foreground. Camera is low and static. Cold cyan shadows, warm coral on the horizon. Hyper-realistic, anamorphic, 8k.

Takeaway: When the environment is the star, build it before the subject. Happy Horse will keep small distant figures coherent if the world around them has weight, light direction, and a named time of day.

Image-to-Video Tips

Happy Horse is unusually consistent when animating a still. These habits get the most out of I2V mode.

Describe the motion, not the image

The model can see your reference. Don't restate what's already in frame — focus the prompt entirely on what should happen next.

Start with a clean still

Sharp focus, well-lit, no compression artifacts. Happy Horse holds detail well, but it inherits the source image — garbage in, garbage out.

Match the aspect ratio

Crop the source to the orientation you want (16:9, 9:16, 1:1) before uploading. The model adapts to the source ratio rather than a separate setting.

Anchor camera language

"Slow push-in", "slight handheld sway", "the wind picks up". Even gentle named motions read more naturally than "make it move".

Use I2V for character work

If you need a specific face, build it as a still in your favorite image model first, then animate with Happy Horse — identity drift is dramatically lower than T2V from scratch.

Keep it short for reactive shots

5 seconds is the sweet spot for "alive portrait" style clips. Longer durations let drift accumulate; multi-shot prompts shine more in T2V.

Reference-to-Video

Happy Horse Reference-to-Video takes 1–9 reference images plus a prompt. Each image is mapped to a token — character1, character2… — and the model preserves each subject's identity through motion. Use it for multi-character scenes, product placement with a specific reference object, or locking a face across multiple shots.

Three references → one shot

Multi-character scene

A woman in a red qipao character1 stands in a moonlit bamboo grove. She slowly opens a folding fan character2 with one graceful motion as a breeze stirs the leaves. Tassel earrings character3 swing as she turns her head toward the camera. Slow orbital shot from medium to close-up. Cinematic, soft volumetric moonlight, hyper-realistic.

Takeaway: Number every reference. Order matches your upload grid: first image is character1, second is character2, etc. The model places each one in the role you've named.

One reference → cinematic restage

Single character, new world

A young woman character1 walks slowly through a rain-slicked cyberpunk alley at night, neon signs reflecting in puddles. Side-tracking shot follows her. Magenta and cyan lighting, lens flare, light haze in the air. Cinematic, hyper-realistic.

Takeaway: With a single reference, you can drop a known face or character into any environment. Identity stays steady; outfit, lighting, and setting follow the prompt.

Use clean reference shots

Sharp, well-lit photos with the subject filling most of the frame. Avoid heavy crops, extreme angles, or busy backgrounds — the model picks up everything that's in the reference.

One subject per reference

Don't combine multiple characters into one image. Upload them as separate references and label them character1 / character2 in the prompt.

Don't repeat the description

If character1 has red hair, you don't need to write "red-haired" in the prompt — the model already knows. Save your prompt budget for action and camera direction.

Props work too

Reference doesn't have to be a face. Clothing, vehicles, weapons, signature objects all work. Tag them with character tokens just like people.

Video Edit

Happy Horse Video Edit transforms an existing clip with a text instruction. Two modes happen organically depending on what you write: style transfer (a complete restyle of the whole video) and local replacement (swap an outfit, prop, or object using a reference image). Same look-and-feel as the base model — physics-aware, multi-shot-aware — but applied to footage you already have.

No reference, prompt-only

Style transfer — watercolor

Restyle the entire video as a flowing watercolor painting. Soft bleeding pigments, visible brushstrokes, warm cyan and coral palette, edges dissolve into the background. Maintain the original motion and composition.

Takeaway: For style transfer, name the medium (watercolor, charcoal, oil, anime, claymation), the palette, and one or two texture cues ("visible brushstrokes", "grainy paper"). Tell the model to preserve motion and composition.

Named-reference style

Style transfer — Studio Ghibli

Restyle as a Studio Ghibli animated film. Soft cel-shading, hand-drawn linework, watercolor sky, warm golden-hour palette. Maintain the original camera move and the fisherman character.

Takeaway: Naming a studio or director shorthand ("Studio Ghibli", "Wes Anderson", "Roger Deakins") locks in a familiar visual language without you having to enumerate every detail. Combine with "maintain the original camera move" to preserve the shot's intent.

Style transfer: name the medium

"Watercolor", "oil painting", "charcoal sketch", "Studio Ghibli", "comic book", "clay animation". One specific term beats five vague adjectives.

Local replacement: pair with a reference

For "swap the sweater", "change the car to this one", or "give the character this hairstyle", upload the target object as a reference image and write the swap as a single instruction.

Audio: Auto vs Original

Default Auto lets the model decide. Pick Original when your input clip has dialogue, music, or signature audio you need to keep — the visuals change but the audio track is preserved verbatim.

Source clip rules

MP4 or MOV, 3-60 seconds, max 100MB. Output is up to 15 seconds; longer source clips are clipped to the first 15. For best results, use a clip with a clear primary subject.

Pro Tips

Number your shots for cuts

Write "Shot 1: …Shot 2: …Shot 3: …". Happy Horse treats each numbered block as its own camera angle and will cut between them inside one clip.

Name the camera move

Be explicit. "Slow dolly in", "side-tracking shot", "low-angle push", "locked-off static". Vague motion = inconsistent results.

Use time-of-day shorthand

"Golden hour", "blue hour", "overcast noon", "night, neon-lit". These trigger complete lighting setups instead of you having to name colors.

Add quality anchors at the end

Closing the prompt with "hyper-realistic, cinematic, 8k, shallow depth of field" pushes the model toward its highest fidelity output.

One main motion per clip

For single-shot prompts, give the camera one job. Mixing too many simultaneous moves (zoom + orbit + pan) usually produces messy motion.

Mix languages when useful

Specific Japanese terms (e.g. 木漏れ日 for dappled sunlight) can sharpen mood that English struggles to name precisely. Happy Horse reads them natively.

Short for tests, long for hero

Iterate at 720p × 5s (cheapest path), then re-roll the winning prompt at 1080p × 10-15s for the final.

Lock the seed when iterating

Open Advanced Settings and pin a seed. You can then tweak the prompt and isolate what each change does without random variation getting in the way.

Use timecoded shot brackets

For tightly-cut multi-shot prompts, pin each beat to a time range: "Shot 1 (0-1s): wide establishing… Shot 2 (1-4s): mid tracking… Shot 3 (4-5s): slow push-in close." The brackets give the model unambiguous cut points and let you control pacing inside a single clip.

Trim slop adjectives if outputs feel generic

If a prompt comes back flat or plastic-faced, drop the hedging vocabulary first: "beautiful", "stunning", "amazing", "gorgeous", "masterpiece", "epic", "breathtaking", "insane detail", "ultra detailed". Replace them with one specific technical cue ("35mm telephoto", "shallow depth of field", "warm amber backlight") and re-roll.

Plain English, not tags

Skip booru-style tag lists, JSON payloads, weighted parentheses, and stacked color synonyms. Happy Horse rewards prose. One precise word beats five vague ones.

What Works, What Struggles

A practical map of where Happy Horse 1.0 punches above its weight and where you'll need to compensate. Lean into the left column; route around the right.

Renders cleanly

Camera moves. Steadicam glides, slow dolly-ins, locked-off wides, helicopter aerials all hold their geometry through the take.
Mirrors and reflections. Reflected figures stay in sync with the source — useful for window shots, puddles, vanity mirrors.
Fire and embers. Strong flame physics with realistic ember arcs. Pulling the camera back keeps the fire in frame instead of dissolving it.
Cloth and fabric in wind. Capes, flags, hair, dresses hold convincing secondary motion when you call it out.
Chrome and metal on vehicles. Highlights and reflections on cars, bikes, weapons read sharply.
Short legible text. Two- to three-word signage and simple labels render accurately.
Atmospheric lighting. Blue hour, neon noir with mist + puddle reflections, single hard top-down keys all register clearly.
Wide establishing shots. Drone aerials and landscape framing carry on their own without a strong subject.

Watch out for

Multi-step action in plain prose. Sequences flatten into a single motion. Reformat as numbered shots or timecoded brackets and the cuts come back.
Extreme slow-motion. Asking for "1000fps" or "bullet-time freeze" produces modest dilation, not dramatic stop. Use 5-10s clips and let the model pace naturally.
Wardrobe details during fast action. Specific patterns and small accessories drop out in motion. Keep the wardrobe-critical beats slower or static.
Long on-screen text. Anything past 2-3 words tends to hallucinate letters. Use short labels or none.
Bare director name-drops. "Roger Deakins cinematography" alone rarely lands. Pair the reference with the visual technique you actually want (telephoto compression, single hard key, deep falloff).
Stacked color synonyms. "Crimson, scarlet, ruby, deep red" muddies output. Pick one color word and move on.

Parameters

Resolution

720p · 1080p

Aspect derived from source / chosen ratio

Duration

2 – 15s

5s for portraits, 10-15s for multi-shot

Cost

4 cr/s · 6 cr/s

720p · 1080p — refunded on failure

Languages

EN · ZH · JA

Mix freely in one prompt

FAQ

Happy Horse 1.0 is Alibaba's latest video generation model. On PixelDojo it ships in four flavors: text-to-video, image-to-video, reference-to-video (1-9 character/prop images), and video edit (style transfer + local replacement on existing clips). All run at 720p or 1080p with clip lengths up to 15 seconds. The base model currently sits at #1 on the Artificial Analysis Video Arena leaderboard for blind preference.

Direct the Camera,Not Just the Subject

Overview

Why Happy Horse Is Different

Physics-Aware Motion

Multi-Shot Direction

Crisp Texture & Color

Strong I2V Consistency

Multilingual Prompts

Top-Ranked Quality

Prompt Framework

Subject + state

Environment + light

Camera move

Beats / action

Style & quality anchors

Sample Prompts

Coastal Establishing Shot

Three-Shot Action Cut

Character Close-Up with Subtle Motion

Anime / 2D Animation Style

Product Spot

Atmospheric Wide

Image-to-Video Tips

Describe the motion, not the image

Start with a clean still

Match the aspect ratio

Anchor camera language

Use I2V for character work

Keep it short for reactive shots

Reference-to-Video

Multi-character scene

Single character, new world

Use clean reference shots

One subject per reference

Don't repeat the description

Props work too

Video Edit

Style transfer — watercolor

Style transfer — Studio Ghibli

Style transfer: name the medium

Local replacement: pair with a reference

Audio: Auto vs Original

Source clip rules

Pro Tips

Number your shots for cuts

Name the camera move

Use time-of-day shorthand

Add quality anchors at the end

One main motion per clip

Mix languages when useful

Short for tests, long for hero

Lock the seed when iterating

Use timecoded shot brackets

Trim slop adjectives if outputs feel generic

Plain English, not tags

What Works, What Struggles

Renders cleanly

Watch out for

Parameters

FAQ

Direct the Camera,
Not Just the Subject