Skip to main content

Image & Video tools
Claude Code can
actually call.

Install once. Use any MCP client or hit REST. 139+ image and video models, async by default.

# 1. Install in your Claude Code / Cursor / OpenClaw project
npx @pixeldojo/mcp init

# 2. Set your API key
export PIXELDOJO_API_KEY=pd_your_api_key

# 3. Restart your agent. It now has these tools:
#    pixeldojo:campaign        URL or product -> hero + lifestyle + video
#    pixeldojo:campaign_status Poll a campaign by id
#    pixeldojo:from_url        Paste a URL, get product profile
#    pixeldojo:generate        Any prompt -> image or video (with preset)
#    pixeldojo:edit            Edit an image with a text instruction
#    pixeldojo:upload          Upload a local file -> 24h public URL
#    pixeldojo:character       Consistent characters across shots
#    pixeldojo:storyboard      Multi-shot scenes from one brief
#    pixeldojo:upscale         Enhance any image
#    pixeldojo:status          Poll a long-running job

# Get your key: https://pixeldojo.ai/api-platform/api-keys

Quick Start

Install

01

Claude Code · Cursor · Codex

One command. Restart your editor. All skills appear.

npx @pixeldojo/mcp init

Then set PIXELDOJO_API_KEY in your environment.

02

Claude Desktop

Edit one JSON file, restart the app.

{
  "mcpServers": {
    "pixeldojo": {
      "command": "npx",
      "args": ["-y", "@pixeldojo/mcp"],
      "env": { "PIXELDOJO_API_KEY": "pd_..." }
    }
  }
}

File: ~/Library/Application Support/Claude/claude_desktop_config.json

03

Cowork

Drag-and-drop plugin. No JSON to edit.

Download pixeldojo.plugin
  1. 1. Open the archive, paste your PIXELDOJO_API_KEY.
  2. 2. Drag the file into Cowork.
  3. 3. All skills appear.

Named Skills

Named Skills

One install, every skill your agent needs. Your LLM picks the right one per task.

pixeldojo:generate

Any prompt.
Best model, automatically.

Your agent describes what it needs in plain English. PixelDojo routes to the right model (photorealism, text rendering, video) and hands back a URL.

  • 100+ models, one skill to call
  • Images, video, editing. Same call shape
  • Credits deducted only on success
Terminal

>_ Generate a cinematic portrait, Tokyo rain, neon reflections

PixelDojo

Routing to flux-2...

Job queued: job_k9mXpQ2r

output: https://pixeldojo.ai/r/…/portrait.png

1024×1024 PNG · 1 credit

>_ _

Terminal

>_ Alex presenting a new phone, marble desk, soft studio light

PixelDojo

Loading ref: alex_character.png...

Routing to flux-edit...

Job queued: job_3vNaL8wK

output: https://pixeldojo.ai/r/…/alex-desk.png

Consistency preserved · 2 credits

>_ _

pixeldojo:character

Same character.
Any scene.

Pass a reference image once. Your agent reuses the character across any number of scenes (different backgrounds, poses, lighting) while preserving their face and features.

  • Prompt evolves the scene; the character stays locked
  • Works with Ideogram Character, Flux Edit, and more
  • No LoRA training required. Just a reference image URL

pixeldojo:upload

Local file in.
Public URL out.

Your agent has a reference image on disk — dragged into the chat, a screenshot, a saved render. pixeldojo:upload reads the file and returns a public URL you can pass straight to :generate or :edit. Storage auto-expires after 24 hours, so nothing lingers.

  • Images up to 50 MB, video up to 200 MB
  • Or skip it: pass image_path to :generate / :edit and the skill uploads automatically
  • Hosted on temp.pixeldojo.ai, deletes itself in 24h, no cleanup
Terminal

>_ Use this reference photo and generate a cinematic version

PixelDojo

Calling pixeldojo:upload { path: "~/Desktop/ref.png" }

url: https://temp.pixeldojo.ai/…/ref.png (24h)

Calling pixeldojo:generate { prompt, image_url }

output: https://pixeldojo.ai/r/…/cinematic.png

>_ _

pixeldojo:storyboard

One brief.
N shots, planned and generated.

Your agent writes the brief. PixelDojo breaks it into shots, decides which are images and which are video, generates them in parallel, and returns an ordered array of output URLs.

  • Mix image and video shots in the same storyboard
  • Shot planning included. No need to prompt each frame individually
  • Returns an ordered array your agent can pass to an editor or exporter
Terminal

>_ 60s product reveal: teaser, unboxing, close-up, lifestyle

PixelDojo

Planning 4 shots...

Shot 1/4 teaser (image)

Shot 2/4 unboxing (video)

Shot 3/4 close-up (image)

Shot 4/4 lifestyle (video)

outputs: [4 URLs] · 6 credits

>_ _

Terminal

>_ Upscale this product photo to 4K, enhance detail

PixelDojo

Analyzing: 1024×1024 4096×4096

Routing to magnific-upscaler...

Job queued: job_8tHjR5mN

output: https://pixeldojo.ai/r/…/upscaled.png

4096×4096 PNG · 2 credits

>_ _

pixeldojo:upscale

Any image.
Up to 16× sharper.

Pass any image URL. Your agent gets back a high-res version. No upload step, no format conversion. Conservative mode preserves the original; creative mode can enhance textures and fine detail.

  • 2× to 16× magnification depending on model
  • Works on any image URL. No upload required
  • Conservative and creative upscale tiers

Agentic Skills

One call, whole campaign

Higher-level skills that compose the named tools above. Drop a URL, get back a full launch package. No prompt-engineering, no chaining by hand.

pixeldojo:campaign

Campaign

One URL or product profile, one MCP call. Returns a hero image, N lifestyle variants, and an optional vertical video. Submits in parallel, polls under one budget.

pixeldojo:campaign({
  productUrl: "https://shop.example/atomic"
})

pixeldojo:from_url

From URL

Paste a product page, get back { name, description, images } extracted via JSON-LD, OpenGraph, or heuristic fallback. The cold-start fix for any agentic flow.

pixeldojo:from_url({
  url: "https://shop.example/atomic"
})

pixeldojo:campaign_status

Campaign status

Poll a campaign by ID. Returns assets when every sub-job is terminal, or a handoff describing what is still in flight. Mirrors the per-job pixeldojo:status pattern.

pixeldojo:campaign_status({
  campaignId: "campaign_abc123"
})

Canvas

Chain models in Canvas.

Generate, edit, upscale, animate. All in one freeform session. Hand the chain off to your LLM, or drive it yourself in the browser.

PixelDojo Canvas: chain models in one freeform session

API Design

Built for automation

Every detail is designed for machines that call APIs, not humans clicking buttons.

139+

image, video, upscale, edit

Submit a job, get a job ID. Poll the status URL or register a webhook. Every model has a JSON schema endpoint, so your agent knows the request shape before calling. No headless browsers, no UI scraping, no screenshots. Credits are deducted on success, not before.

  • Async + webhook
  • ·
  • JSON schema per model
  • ·
  • llm.txt + OpenAPI 3.1
  • ·
  • Credit-based pricing
  • ·
  • One auth

REST API

Endpoint reference

MethodEndpointDescription
GET/api/v1/modelsList all available models
GET/api/v1/models/{apiId}/schemaGet the JSON schema for a model
POST/api/v1/models/{apiId}/runSubmit a generation job
GET/api/v1/jobs/{jobId}Check job status and get output URLs
POST/api/v1/jobs/{jobId}/webhookRegister a webhook for completion
POST/api/v1/uploadUpload a local file, get a 24-hour public URL for use as a reference image

Full reference: API Documentation · OpenAPI Spec · llm.txt

139+ models, one API

Same endpoint pattern for every model. Your agent picks the model, we handle the rest.

Boogu Image example
image

Boogu Image

1 credit
ImageEditing

Boogu Image — bilingual (EN/ZH) text-to-image generation with crisp detail and 2K output.

/models/boogu-image/run
Boogu Image Edit example
image

Boogu Image Edit

1 credit
ImageEditing

Boogu Image instruction-based editing. Provide a source image and an edit instruction.

/models/boogu-image-edit/run
Bria 3.2 example
image

Bria 3.2

1 credit
Image

Bria 3.2 — text-to-image with 9 aspect ratio presets at 1K resolution, optional image and prompt enhancement, and photography/art medium hints.

/models/bria-3-2/run
Change Camera Angle example
image

Change Camera Angle

1 credit
Image

Camera-aware editing via fal.ai Qwen Image Edit 2511 with multi-angle LoRA. 360° orbit, tilt, and zoom.

/models/change-camera-angle/run
Clarity Pro Upscaler example
image

Clarity Pro Upscaler

4 credits
Image

Clarity Pro Upscaler via Replicate. Photorealistic upscaling with identity preservation and creative control — up to 16× and 64 megapixels.

/models/clarity-pro-upscaler/run
Consistent Characters example
image

Consistent Characters

1 credit
Image

Generate consistent character variations with FLUX Kontext, Nano Banana Pro/2, Flux 2 Dev, Qwen Image 2 Pro, or Grok Imagine.

/models/consistent-characters/run
Creative Upscale example
image

Creative Upscale

0.5 credits
ImageLoRA

Clarity Upscaler (creative upscale) via Replicate. Boost detail with stable-diffusion refinement.

/models/creative-upscale/run
Ernie example
image

Ernie

1 credit
Image

Baidu Ernie text-to-image (fal.ai). Multilingual prompts and built-in prompt expansion.

/models/ernie/run
Google Gemini Omni Flash video example
video

Google Gemini Omni Flash

16 credits
VideoAudioEditing

Google Gemini Omni Flash: text, image, or video into 3–10s 720p clips with native audio. Image-to-video, reference images, and video editing.

/models/google-gemini-omni-flash/run
Grok Imagine R2V video example
video

Grok Imagine R2V

10 credits
Video

xAI Grok Imagine reference-to-video via Replicate. 1 to 7 reference images plus prompt for 1 to 10 second clips at 480p or 720p.

/models/grok-r2v/run
Grok Video Extend video example
video

Grok Video Extend

12 credits
Video

xAI Grok Imagine video extension. Continue an existing MP4 with a prompt-directed extension (2 to 10 seconds).

/models/grok-video-extend/run
Hailuo Standard video example
video

Hailuo Standard

8 credits
Video

Premium quality text-to-video and image-to-video

/models/hailuo-standard/run
Hailuo Fast video example
video

Hailuo Fast

4 credits
Video

Fast image-to-video generation

/models/hailuo-fast/run
Happy Horse Reference video example
video

Happy Horse Reference

4 credits/sec
Video

Alibaba Happy Horse reference-to-video (1.0 or 1.1) — multi-reference image input that preserves subject characters, driven by a text prompt. 720p / 1080p, 3-15 second clips. Version 1.1 runs at a lower per-second credit rate.

/models/happyhorse-1.0-r2v/run

Works with

Any agent that can make an HTTP request

Claude CodeClaude DesktopCursorCodexClineWindsurfZedLangChainAutoGPTn8nZapierCustom MCP serversAny HTTP client

Your agent. Every model.

One install. 139+ models. First generation in under a minute.

npx @pixeldojo/mcp init