grok-r2v | Grok Imagine R2V | 10 | xAI Grok Imagine reference-to-video via Replicate. 1 to 7 reference images plus prompt for 1 to 10 second clips at 480p or 720p. |
grok-video-extend | Grok Video Extend | 12 | xAI Grok Imagine video extension. Continue an existing MP4 with a prompt-directed extension (2 to 10 seconds). |
hailuo-standard | Hailuo Standard | 8 | Premium quality text-to-video and image-to-video |
hailuo-fast | Hailuo Fast | 4 | Fast image-to-video generation |
happyhorse-1.0-r2v | Happy Horse 1.0 Reference to Video | 4/sec | Alibaba Happy Horse 1.0 reference-to-video — multi-reference image input that preserves subject characters, driven by a text prompt. 720p / 1080p, 3-15 second clips. |
happyhorse-1.0-t2v | Happy Horse 1.0 Text-to-Video | 4/sec | Text-to-video with 720p/1080p output and 2-15 second durations |
happyhorse-1.0-i2v | Happy Horse 1.0 Image-to-Video | 4/sec | Image-to-video animation with 720p/1080p output and 2-15 second durations |
happyhorse-1.0-video-edit | Happy Horse 1.0 Video Edit | 4/sec | Alibaba Happy Horse 1.0 video edit — apply style transfer or local replacement to a source video using text prompts and optional reference images. 720p / 1080p, 3-15 second output. |
heygen-avatar | Heygen Avatar | 2/sec | Heygen Avatar 4 via fal.ai. Animate a portrait with prompt-driven speech or an audio track, with optional background and captions. |
kling-motion-control | Kling Motion Control v3 Standard | 3/sec | Kling Video v3 Standard motion control endpoint |
kling-motion-control-pro | Kling Motion Control v3 Pro | 4/sec | Kling Video v3 Pro motion control endpoint |
kling-reference-to-video | Kling Reference to Video | 15 | Kling O3 reference-driven video generation. Image or video references, Standard or Pro tier. |
kling-v2-6 | Kling 2.6 Pro | 15 | Kling Video v2.6 Pro (fal.ai). Text-to-video or image-to-video, 5 or 10 seconds, with audio generation. |
kling-video-v3-standard-text | Kling Video v3 Standard (Text) | 6/sec | Standard text-to-video with native audio |
kling-video-v3-standard-image | Kling Video v3 Standard (Image) | 6/sec | Standard image-to-video with native audio |
kling-video-v3-pro-text | Kling Video v3 Pro (Text) | 8/sec | Pro text-to-video with cinematic quality and native audio |
kling-video-v3-pro-image | Kling Video v3 Pro (Image) | 8/sec | Pro image-to-video with cinematic quality and native audio |
kling-video-edit | Kling Video Edit | 40 | Kling O3 video-to-video edit. Standard or Pro, with optional reference images and audio preservation. |
lip-sync | Lip Sync | 5 | Replicate sync/lipsync-2. Align mouth movements in a video to a separate audio track. |
ltx-2-fast-t2v | LTX 2.3 Fast Text-to-Video | 2/sec | Fast text-to-video generation (6-20s, 1080p-2160p). |
ltx-2-fast-i2v | LTX 2.3 Fast Image-to-Video | 2/sec | Fast image-to-video generation (6-20s, 1080p-2160p). |
ltx-2-pro-t2v | LTX 2.3 Pro Text-to-Video | 2/sec | Higher quality text-to-video generation (6-10s, 1080p-2160p). |
ltx-2-pro-i2v | LTX 2.3 Pro Image-to-Video | 2/sec | Higher quality image-to-video generation (6-10s, 1080p-2160p). |
ltx-2-pro-extend | LTX 2.3 Pro Extend Video | 2/sec | Extend an existing video clip from the start or end (1-20s, Pro tier only). |
omnihuman | OmniHuman 1.5 | 45 | ByteDance OmniHuman 1.5 via Replicate. Audio-driven talking-head video with lip sync. |
p-video | P-Video | 0.5/sec | Pruna P-Video — video generation with text/image/audio conditioning, draft mode, and 720p/1080p outputs. |
p-video-avatar | P Video Avatar | 1/sec | Pruna P Video Avatar — animate a portrait into a talking avatar from a script or an audio file. 30 voices, 10 languages, 720p / 1080p. |
pixverse | Pixverse v5.6 | 7.5 | Pixverse v5.6 video generation via Replicate — text-to-video or image-to-video with optional audio, at 360p–1080p. |
pixverse-v6 | Pixverse V6 | 10 | Pixverse V6 video generation via Runware. Text-to-video, image-to-video (start frame), or multi-clip (start + end frame). |
runway-gen4-video | Runway Gen-4.5 Video | 15 | Runway Gen-4.5 video generation. Text-to-video or image-to-video, 5 or 10 seconds. |
runway-video | Runway | 15 | Canonical version-agnostic Runway video API ID. |
runway-gen4 | Runway Gen-4 (Legacy API ID) | 15 | Legacy alias for clients pinned to runway-gen4; maps to the current Runway model. |
seedance-1.5 | Seedance 1 | 8 | ByteDance Seedance 1 video generation. Text-to-video or image-to-video with optional end frame. |
seedance-2-high | Seedance 2 High | 4/sec | Higher-quality Seedance 2.0 video generation (supports 1080p) |
seedance-2-reference | Seedance 2 Reference to Video | 20 | Seedance 2.0 multimodal reference-to-video. Combine up to 9 images, 3 video clips, and 3 audio tracks to guide characters, motion, and sound. |
seedance-video-edit | Seedance 2 Video Edit | 25 | Edit source videos with Seedance 2.0 using prompted changes, optional reference images, and 480p, 720p, or 1080p output. |
text-to-music | Text to Music | 2 | ElevenLabs Music via Replicate. Generate music from a text prompt. |
veo-3.1-fast | VEO 3.1 Fast | 3/sec | Faster generation at 3 credits per second |
veo-3.1-standard | VEO 3.1 Standard | 8/sec | Higher quality at 8 credits per second |
veo-3.1-lite | VEO 3.1 Lite | 1.5/sec | Runware-powered Lite variant at 1.5 credits/sec for 720p and 2 credits/sec for 1080p. No reference images, no audio generation, no 1:1 aspect ratio. |
video-autocaption | Video Autocaption | 5 | TikTok-style auto-captioning via Replicate. |
video-reframe | Video Reframe | 8 | Luma Reframe Video via Replicate. Change a video's aspect ratio intelligently. |
video-to-sound | Video to Sound | 2 | ThinkSound via Replicate. Generate a sound effect track from a video. |
video-transform | Video Transform | 20 | Runway Gen4 Aleph via Replicate. Transform the first 5 seconds of a video with a prompt. |
video-upscaler | Video Upscaler | 10 | Topaz Labs Video Upscale via Replicate. Upscale video resolution and FPS. |
wan-2.2-standard | WAN 2.2 Standard | 3 | Premium quality with enhanced detail |
wan-2.2-plus | WAN 2.2 Plus | 10 | Official Alibaba model with 1080p support |
wan-2.2-extended | WAN 2.2 Extended | 1.2/sec | fal.ai WAN 2.2 with up to 10-second videos and dual LoRA support |
wan-2.2-animate | WAN 2.2 Animate | 2 | WAN 2.2 video animation. Drive a character image with a motion reference video. |
wan-2.2-i2v-spicy | WAN 2.2 Spicy Image-to-Video | 10 | Image-to-video with WAN 2.2 Spicy. Animate a starting image. 480p or 720p, 5s or 8s clips. |
wan-2.2-replace | WAN 2.2 Replace | 2 | WAN 2.2 character replacement. Swap a character in a source video while preserving scene and motion. |
wan-2.6-standard | WAN 2.6 Standard | 2.5/sec | Higher quality, 720p/1080p support |
wan-2.6-flash | WAN 2.6 Flash | 1/sec | Fast and affordable image-to-video |
wan-2.7-i2v-spicy | WAN 2.7 Spicy Image-to-Video | 20 | Image-to-video with WAN 2.7 Spicy. Animate a starting image with optional driving audio. 720p or 1080p, 2–15 second clips. |
wan-2.7-t2v | WAN 2.7 Text-to-Video | 2.5/sec | Text-to-video with audio sync, 720p/1080p output, and 2-15 second durations |
wan-2.7-i2v | WAN 2.7 Image-to-Video | 2.5/sec | Image-to-video and video continuation with optional last-frame control and audio sync |
wan-reference-to-video | WAN Reference to Video | 4 | Alibaba WAN reference-to-video. Up to 5 image/video references with multi-shot support. |
wan-video-character-swap | WAN Video Character Swap | 20 | Alibaba WAN character swap. Combine a character image with a reference video to produce a new clip. |
wan-video-edit | WAN 2.7 Video Edit | 6 | Alibaba WAN 2.7 video editing. Modify an existing clip via prompt with optional reference images. |
xai-video | Grok Imagine Video | 10 | xAI Grok Imagine video. Text-to-video or image-to-video, 1-15 seconds at 480p or 720p. |
xai-video-edit | Grok Video Edit | 15 | xAI Grok Imagine Video edit. Transform short clips via Replicate. |