Image & Video tools
Claude Code can
actually call.
Install once. Use any MCP client or hit REST. 139+ image and video models, async by default.
# 1. Install in your Claude Code / Cursor / OpenClaw project
npx @pixeldojo/mcp init
# 2. Set your API key
export PIXELDOJO_API_KEY=pd_your_api_key
# 3. Restart your agent. It now has these tools:
# pixeldojo:campaign URL or product -> hero + lifestyle + video
# pixeldojo:campaign_status Poll a campaign by id
# pixeldojo:from_url Paste a URL, get product profile
# pixeldojo:generate Any prompt -> image or video (with preset)
# pixeldojo:edit Edit an image with a text instruction
# pixeldojo:upload Upload a local file -> 24h public URL
# pixeldojo:character Consistent characters across shots
# pixeldojo:storyboard Multi-shot scenes from one brief
# pixeldojo:upscale Enhance any image
# pixeldojo:status Poll a long-running job
# Get your key: https://pixeldojo.ai/api-platform/api-keysQuick Start
Install
Claude Code · Cursor · Codex
One command. Restart your editor. All skills appear.
npx @pixeldojo/mcp initThen set PIXELDOJO_API_KEY in your environment.
Claude Desktop
Edit one JSON file, restart the app.
{
"mcpServers": {
"pixeldojo": {
"command": "npx",
"args": ["-y", "@pixeldojo/mcp"],
"env": { "PIXELDOJO_API_KEY": "pd_..." }
}
}
}File: ~/Library/Application Support/Claude/claude_desktop_config.json
Cowork
Drag-and-drop plugin. No JSON to edit.
- 1. Open the archive, paste your PIXELDOJO_API_KEY.
- 2. Drag the file into Cowork.
- 3. All skills appear.
Named Skills
Named Skills
One install, every skill your agent needs. Your LLM picks the right one per task.
pixeldojo:generate
Any prompt.
Best model, automatically.
Your agent describes what it needs in plain English. PixelDojo routes to the right model (photorealism, text rendering, video) and hands back a URL.
- ✓100+ models, one skill to call
- ✓Images, video, editing. Same call shape
- ✓Credits deducted only on success
>_ Generate a cinematic portrait, Tokyo rain, neon reflections
PixelDojo
Routing to flux-2...
Job queued: job_k9mXpQ2r
✓ output: https://pixeldojo.ai/r/…/portrait.png
✓ 1024×1024 PNG · 1 credit
>_ _
>_ Alex presenting a new phone, marble desk, soft studio light
PixelDojo
Loading ref: alex_character.png...
Routing to flux-edit...
Job queued: job_3vNaL8wK
✓ output: https://pixeldojo.ai/r/…/alex-desk.png
✓ Consistency preserved · 2 credits
>_ _
pixeldojo:character
Same character.
Any scene.
Pass a reference image once. Your agent reuses the character across any number of scenes (different backgrounds, poses, lighting) while preserving their face and features.
- ✓Prompt evolves the scene; the character stays locked
- ✓Works with Ideogram Character, Flux Edit, and more
- ✓No LoRA training required. Just a reference image URL
pixeldojo:upload
Local file in.
Public URL out.
Your agent has a reference image on disk — dragged into the chat, a screenshot, a saved render. pixeldojo:upload reads the file and returns a public URL you can pass straight to :generate or :edit. Storage auto-expires after 24 hours, so nothing lingers.
- ✓Images up to 50 MB, video up to 200 MB
- ✓Or skip it: pass image_path to :generate / :edit and the skill uploads automatically
- ✓Hosted on temp.pixeldojo.ai, deletes itself in 24h, no cleanup
>_ Use this reference photo and generate a cinematic version
PixelDojo
Calling pixeldojo:upload { path: "~/Desktop/ref.png" }
✓ url: https://temp.pixeldojo.ai/…/ref.png (24h)
Calling pixeldojo:generate { prompt, image_url }
✓ output: https://pixeldojo.ai/r/…/cinematic.png
>_ _
pixeldojo:storyboard
One brief.
N shots, planned and generated.
Your agent writes the brief. PixelDojo breaks it into shots, decides which are images and which are video, generates them in parallel, and returns an ordered array of output URLs.
- ✓Mix image and video shots in the same storyboard
- ✓Shot planning included. No need to prompt each frame individually
- ✓Returns an ordered array your agent can pass to an editor or exporter
>_ 60s product reveal: teaser, unboxing, close-up, lifestyle
PixelDojo
Planning 4 shots...
Shot 1/4 ✓ teaser (image)
Shot 2/4 ✓ unboxing (video)
Shot 3/4 ✓ close-up (image)
Shot 4/4 ✓ lifestyle (video)
✓ outputs: [4 URLs] · 6 credits
>_ _
>_ Upscale this product photo to 4K, enhance detail
PixelDojo
Analyzing: 1024×1024 → 4096×4096
Routing to magnific-upscaler...
Job queued: job_8tHjR5mN
✓ output: https://pixeldojo.ai/r/…/upscaled.png
✓ 4096×4096 PNG · 2 credits
>_ _
pixeldojo:upscale
Any image.
Up to 16× sharper.
Pass any image URL. Your agent gets back a high-res version. No upload step, no format conversion. Conservative mode preserves the original; creative mode can enhance textures and fine detail.
- ✓2× to 16× magnification depending on model
- ✓Works on any image URL. No upload required
- ✓Conservative and creative upscale tiers
Agentic Skills
One call, whole campaign
Higher-level skills that compose the named tools above. Drop a URL, get back a full launch package. No prompt-engineering, no chaining by hand.
pixeldojo:campaign
Campaign
One URL or product profile, one MCP call. Returns a hero image, N lifestyle variants, and an optional vertical video. Submits in parallel, polls under one budget.
pixeldojo:campaign({
productUrl: "https://shop.example/atomic"
})pixeldojo:from_url
From URL
Paste a product page, get back { name, description, images } extracted via JSON-LD, OpenGraph, or heuristic fallback. The cold-start fix for any agentic flow.
pixeldojo:from_url({
url: "https://shop.example/atomic"
})pixeldojo:campaign_status
Campaign status
Poll a campaign by ID. Returns assets when every sub-job is terminal, or a handoff describing what is still in flight. Mirrors the per-job pixeldojo:status pattern.
pixeldojo:campaign_status({
campaignId: "campaign_abc123"
})Preset Library
Try a preset, no prompt-writing required
Curated starting points across 60 workflows. Refresh for a different mix.
Flat icon
Real estate interior
Dance loop
Product with copy
Streetwear lookbook
Storefront signage
Logo monogram
Badge design
Product spin
Real estate listing
Travel postcard
Event poster
Character walk
Magazine cover with title
Flat lay tools
Magazine cover
Social quote card
Cinematic portrait
Fashion runway
Minimal poster
Cinematic drone
Pet action
Children storybook
Editorial coffee shop
API Design
Built for automation
Every detail is designed for machines that call APIs, not humans clicking buttons.
image, video, upscale, edit
Submit a job, get a job ID. Poll the status URL or register a webhook. Every model has a JSON schema endpoint, so your agent knows the request shape before calling. No headless browsers, no UI scraping, no screenshots. Credits are deducted on success, not before.
- Async + webhook
- ·
- JSON schema per model
- ·
- llm.txt + OpenAPI 3.1
- ·
- Credit-based pricing
- ·
- One auth
REST API
Endpoint reference
| Method | Endpoint | Description |
|---|---|---|
GET | /api/v1/models | List all available models |
GET | /api/v1/models/{apiId}/schema | Get the JSON schema for a model |
POST | /api/v1/models/{apiId}/run | Submit a generation job |
GET | /api/v1/jobs/{jobId} | Check job status and get output URLs |
POST | /api/v1/jobs/{jobId}/webhook | Register a webhook for completion |
POST | /api/v1/upload | Upload a local file, get a 24-hour public URL for use as a reference image |
Full reference: API Documentation · OpenAPI Spec · llm.txt
139+ models, one API
Same endpoint pattern for every model. Your agent picks the model, we handle the rest.
Boogu Image
Boogu Image — bilingual (EN/ZH) text-to-image generation with crisp detail and 2K output.
/models/boogu-image/runBoogu Image Edit
Boogu Image instruction-based editing. Provide a source image and an edit instruction.
/models/boogu-image-edit/runBria 3.2
Bria 3.2 — text-to-image with 9 aspect ratio presets at 1K resolution, optional image and prompt enhancement, and photography/art medium hints.
/models/bria-3-2/runChange Camera Angle
Camera-aware editing via fal.ai Qwen Image Edit 2511 with multi-angle LoRA. 360° orbit, tilt, and zoom.
/models/change-camera-angle/runClarity Pro Upscaler
Clarity Pro Upscaler via Replicate. Photorealistic upscaling with identity preservation and creative control — up to 16× and 64 megapixels.
/models/clarity-pro-upscaler/runConsistent Characters
Generate consistent character variations with FLUX Kontext, Nano Banana Pro/2, Flux 2 Dev, Qwen Image 2 Pro, or Grok Imagine.
/models/consistent-characters/runCreative Upscale
Clarity Upscaler (creative upscale) via Replicate. Boost detail with stable-diffusion refinement.
/models/creative-upscale/runErnie
Baidu Ernie text-to-image (fal.ai). Multilingual prompts and built-in prompt expansion.
/models/ernie/runGoogle Gemini Omni Flash
Google Gemini Omni Flash: text, image, or video into 3–10s 720p clips with native audio. Image-to-video, reference images, and video editing.
/models/google-gemini-omni-flash/runGrok Imagine R2V
xAI Grok Imagine reference-to-video via Replicate. 1 to 7 reference images plus prompt for 1 to 10 second clips at 480p or 720p.
/models/grok-r2v/runGrok Video Extend
xAI Grok Imagine video extension. Continue an existing MP4 with a prompt-directed extension (2 to 10 seconds).
/models/grok-video-extend/runHailuo Standard
Premium quality text-to-video and image-to-video
/models/hailuo-standard/runHailuo Fast
Fast image-to-video generation
/models/hailuo-fast/runHappy Horse Reference
Alibaba Happy Horse reference-to-video (1.0 or 1.1) — multi-reference image input that preserves subject characters, driven by a text prompt. 720p / 1080p, 3-15 second clips. Version 1.1 runs at a lower per-second credit rate.
/models/happyhorse-1.0-r2v/runWorks with
