AI Image and Video API Platform
Compare models, inspect endpoints, and access docs in one place.
API Reference
Public docs
Endpoints, schemas, and examples.
OpenAPI 3.1
Typed integrations
SDK generation and machine-readable specs.
LLM Docs
Agent-ready docs
`llm.txt` and agent integration help.
API Keys
Authentication
Create keys for apps and agents.
Usage Dashboard
Requests and credits
Monitor volume, logs, and balance.
Buy Credits
Prepaid capacity
Top up for image and video usage.
Available Models
135 AI models for image and video generation behind one async control plane
Grok Imagine R2V
xAI Grok Imagine reference-to-video via Replicate. 1 to 7 reference images plus prompt for 1 to 10 second clips at 480p or 720p.
/models/grok-r2v/runGrok Video Extend
xAI Grok Imagine video extension. Continue an existing MP4 with a prompt-directed extension (2 to 10 seconds).
/models/grok-video-extend/runHailuo Standard
Premium quality text-to-video and image-to-video
/models/hailuo-standard/runHailuo Fast
Fast image-to-video generation
/models/hailuo-fast/runHappy Horse Reference
Alibaba Happy Horse reference-to-video (1.0 or 1.1) — multi-reference image input that preserves subject characters, driven by a text prompt. 720p / 1080p, 3-15 second clips. Version 1.1 runs at a lower per-second credit rate.
/models/happyhorse-1.0-r2v/runHappy Horse 1.0 Text-to-Video
Text-to-video with 720p/1080p output and 2-15 second durations
/models/happyhorse-1.0-t2v/runHappy Horse 1.0 Image-to-Video
Image-to-video animation with 720p/1080p output and 2-15 second durations
/models/happyhorse-1.0-i2v/runHappy Horse 1.0 Video Edit
Alibaba Happy Horse 1.0 video edit — apply style transfer or local replacement to a source video using text prompts and optional reference images. 720p / 1080p, 3-15 second output.
/models/happyhorse-1.0-video-edit/runHeygen Avatar
Heygen Avatar 4 via fal.ai. Animate a portrait with prompt-driven speech or an audio track, with optional background and captions.
/models/heygen-avatar/runKling Motion Control v3 Standard
Kling Video v3 Standard motion control endpoint
/models/kling-motion-control/runKling Motion Control v3 Pro
Kling Video v3 Pro motion control endpoint
/models/kling-motion-control-pro/runKling Reference to Video
Kling O3 reference-driven video generation. Image or video references, Standard or Pro tier.
/models/kling-reference-to-video/runKling 2.6 Pro
Kling Video v2.6 Pro (fal.ai). Text-to-video or image-to-video, 5 or 10 seconds, with audio generation.
/models/kling-v2-6/runKling Video v3 Standard (Text)
Standard text-to-video with native audio
/models/kling-video-v3-standard-text/runKling Video v3 Standard (Image)
Standard image-to-video with native audio
/models/kling-video-v3-standard-image/runKling Video v3 Pro (Text)
Pro text-to-video with cinematic quality and native audio
/models/kling-video-v3-pro-text/runKling Video v3 Pro (Image)
Pro image-to-video with cinematic quality and native audio
/models/kling-video-v3-pro-image/runKling Video Edit
Kling O3 video-to-video edit. Standard or Pro, with optional reference images and audio preservation.
/models/kling-video-edit/runLip Sync
Replicate sync/lipsync-2. Align mouth movements in a video to a separate audio track.
/models/lip-sync/runLTX 2.3 Fast Text-to-Video
Fast text-to-video generation (6-20s, 1080p-2160p).
/models/ltx-2-fast-t2v/runLTX 2.3 Fast Image-to-Video
Fast image-to-video generation (6-20s, 1080p-2160p).
/models/ltx-2-fast-i2v/runLTX 2.3 Pro Text-to-Video
Higher quality text-to-video generation (6-10s, 1080p-2160p).
/models/ltx-2-pro-t2v/runLTX 2.3 Pro Image-to-Video
Higher quality image-to-video generation (6-10s, 1080p-2160p).
/models/ltx-2-pro-i2v/runLTX 2.3 Pro Extend Video
Extend an existing video clip from the start or end (1-20s, Pro tier only).
/models/ltx-2-pro-extend/runOmniHuman 1.5
ByteDance OmniHuman 1.5 via Replicate. Audio-driven talking-head video with lip sync.
/models/omnihuman/runP-Video
Pruna P-Video — video generation with text/image/audio conditioning, draft mode, and 720p/1080p outputs.
/models/p-video/runP Video Avatar
Pruna P Video Avatar — animate a portrait into a talking avatar from a script or an audio file. 30 voices, 10 languages, 720p / 1080p.
/models/p-video-avatar/runPixverse v5.6
Pixverse v5.6 video generation via Replicate — text-to-video or image-to-video with optional audio, at 360p–1080p.
/models/pixverse/runPixverse V6
Pixverse V6 video generation via Runware. Text-to-video, image-to-video (start frame), or multi-clip (start + end frame).
/models/pixverse-v6/runRunway Gen-4.5 Video
Runway Gen-4.5 video generation. Text-to-video or image-to-video, 5 or 10 seconds.
/models/runway-gen4-video/runRunway
Canonical version-agnostic Runway video API ID.
/models/runway-video/runRunway Gen-4 (Legacy API ID)
Legacy alias for clients pinned to runway-gen4; maps to the current Runway model.
/models/runway-gen4/runSeedance 1
ByteDance Seedance 1 video generation. Text-to-video or image-to-video with optional end frame.
/models/seedance-1.5/runSeedance 2 High
Higher-quality Seedance 2.0 video generation (supports 1080p)
/models/seedance-2-high/runSeedance 2 Reference to Video
Seedance 2.0 multimodal reference-to-video. Combine up to 9 images, 3 video clips, and 3 audio tracks to guide characters, motion, and sound.
/models/seedance-2-reference/runSeedance 2 Video Edit
Edit source videos with Seedance 2.0 using prompted changes, optional reference images, and 480p, 720p, or 1080p output.
/models/seedance-video-edit/runVEO 3.1 Fast
Faster generation at 3 credits per second
/models/veo-3.1-fast/runVEO 3.1 Standard
Higher quality at 8 credits per second
/models/veo-3.1-standard/runVEO 3.1 Lite
Runware-powered Lite variant at 1.5 credits/sec for 720p and 2 credits/sec for 1080p. No reference images, no audio generation, no 1:1 aspect ratio.
/models/veo-3.1-lite/runVideo Autocaption
TikTok-style auto-captioning via Replicate.
/models/video-autocaption/runVideo Reframe
Luma Reframe Video via Replicate. Change a video's aspect ratio intelligently.
/models/video-reframe/runVideo to Sound
ThinkSound via Replicate. Generate a sound effect track from a video.
/models/video-to-sound/runVideo Transform
Runway Aleph 2.0 via Replicate. Transform up to 30 seconds of video with a prompt.
/models/video-transform/runVideo Upscaler
Topaz Labs Video Upscale via Replicate. Upscale video resolution and FPS.
/models/video-upscaler/runWAN 2.2 Standard
Premium quality with enhanced detail
/models/wan-2.2-standard/runWAN 2.2 Plus
Official Alibaba model with 1080p support
/models/wan-2.2-plus/runWAN 2.2 Extended
fal.ai WAN 2.2 with up to 10-second videos and dual LoRA support
/models/wan-2.2-extended/runWAN 2.2 Animate
WAN 2.2 video animation. Drive a character image with a motion reference video.
/models/wan-2.2-animate/runWAN 2.2 Spicy Image-to-Video
Image-to-video with WAN 2.2 Spicy. Animate a starting image. 480p or 720p, 5s or 8s clips.
/models/wan-2.2-i2v-spicy/runWAN 2.2 Replace
WAN 2.2 character replacement. Swap a character in a source video while preserving scene and motion.
/models/wan-2.2-replace/runWAN 2.6 Standard
Higher quality, 720p/1080p support
/models/wan-2.6-standard/runWAN 2.6 Flash
Fast and affordable image-to-video
/models/wan-2.6-flash/runWAN 2.7 Spicy Image-to-Video
Image-to-video with WAN 2.7 Spicy. Animate a starting image with optional driving audio. 720p or 1080p, 2–15 second clips.
/models/wan-2.7-i2v-spicy/runWAN 2.7 Text-to-Video
Text-to-video with audio sync, 720p/1080p output, and 2-15 second durations
/models/wan-2.7-t2v/runWAN 2.7 Image-to-Video
Image-to-video and video continuation with optional last-frame control and audio sync
/models/wan-2.7-i2v/runWAN Reference to Video
Alibaba WAN reference-to-video. Up to 5 image/video references with multi-shot support.
/models/wan-reference-to-video/runWAN Video Character Swap
Alibaba WAN character swap. Combine a character image with a reference video to produce a new clip.
/models/wan-video-character-swap/runWAN 2.7 Video Edit
Alibaba WAN 2.7 video editing. Modify an existing clip via prompt with optional reference images.
/models/wan-video-edit/runGrok Imagine Video
xAI Grok Imagine video. Text-to-video or image-to-video, 1-15 seconds at 480p or 720p. Image-to-video can use the Grok Imagine 1.5 backbone for natively-synchronized audio.
/models/xai-video/runGrok Video Edit
xAI Grok Imagine Video edit. Transform short clips via Replicate.
/models/xai-video-edit/runQuick start:
https://pixeldojo.ai/api/v1