中文文档: README.zh-CN.md
Turn any article into a social-ready carousel with copy and visuals.
clawvisual AI is an open source URL to social carousel generator for creators, growth teams, and builders of agent workflow automation.
Turn a long article or URL into a social-ready carousel (for X/Twitter, Instagram/INS, Xiaohongshu, and similar platforms) with hooks, captions, hashtags, slide copy, and generated visuals. It runs as an MCP-compatible service, so other agents can call it as a reusable skill.
- URL or long-form text in, finished carousel structure out
- Generates real slide images and prompts, not just text summaries
- Async job pipeline with progress events, revisions, and downloadable output
- Supports portrait, square, story, and landscape output ratios
- MCP-compatible, so other agents can call it as a tool
- Topic-matched visual style generation (children/education, tech reporting, business/news), with consistent icon and typography style across slides
- Current production focus is
longform_digestmode
- Quickstart Guide for URL to Social Carousel
- MCP Integration Guide for clawvisual
- Use Cases for Agent Workflow Automation
Default output constraints (fast mode):
post_title: one-sentence hookpost_caption: concise body, normalized to 100-300 charactershashtags: 1-5 tagsaspect_ratios: choose from4:5,1:1,9:16,16:9slides: generated visual slides are required, not text-only output- each slide should include
image_urlandvisual_prompt - cover slide (
slide_id: 1) should prioritize first-glance clarity and hook strength
- each slide should include
Tested locally against this public article:
Generated output (output_language: en-US, max_slides: 4):
{
"post_title": "Why 90% of New Year’s resolutions fail (and how to fix yours).",
"post_caption": "Most people don't actually want to change—they just want to impress others. True transformation isn't about discipline; it's about digging into your psyche to uncover what you actually want.",
"hashtags": ["#Psychology", "#AI", "#Productivity", "#MindsetShift", "#IdentityDesign"]
}Generated slide previews:
- Install dependencies:
npm install- Create local env file:
cp .env.local.template .env.local- Fill required key in
.env.local:
LLM_API_KEY
LLM_API_URL and LLM_MODEL already default to OpenRouter + Gemini Flash:
LLM_API_URL=https://openrouter.ai/api/v1/chat/completionsLLM_MODEL=google/gemini-3-flash-preview
Important local-dev note:
.env.local.templatenow leavesCLAWVISUAL_API_KEYSempty by default.- Local requests do not require
x-api-keyunless you explicitly configureCLAWVISUAL_API_KEYS. - If you enable API-key validation, send the same configured value in the
x-api-keyheader. - For real image generation instead of fallback gradients/SVGs, also set a valid
GEMINI_API_KEYandNANO_BANANA_MODEL. - If
LLM_COPY_POLISH_MODELis unavailable on your provider, the copy-polish stage may be skipped.
- Start dev server:
npm run dev- Open in browser:
http://localhost:3000
If 3000 is already occupied, Next.js will move to another port such as 3001. Use the actual port shown in the terminal.
Install global CLI:
npm install -g clawvisualThen run:
clawvisual help
clawvisual set CLAWVISUAL_LLM_API_KEY "your_openrouter_key"
# optional
clawvisual set CLAWVISUAL_LLM_MODEL "google/gemini-3-flash-preview"
clawvisual initialize
clawvisual stop
clawvisual restart
clawvisual status
clawvisual tools
clawvisual convert --input "Paste long-form text or URL here" --slides auto
clawvisual status --job <job_id>clawvisual initialize will auto-start a local service when CLAWVISUAL_MCP_URL points to localhost. It prints the web URL after startup, then you can continue with clawvisual xxx commands.
clawvisual stop stops the local service started by CLI managed process tracking. clawvisual restart performs stop + initialize.
clawvisual status checks service identity (must be clawvisual) and avoids false positives from other local MCP servers on the same port.
clawvisual set/get/unset/config stores local CLI config at ~/.clawvisual/config.json (keys are case-insensitive, e.g. clawvisual set clawvisual_llm_api_key ...).
CLI environment variables:
CLAWVISUAL_MCP_URL(default:http://localhost:3000/api/mcp)CLAWVISUAL_API_KEY(required only when API key validation is enabled)CLAWVISUAL_LLM_API_KEY/CLAWVISUAL_LLM_API_URL/CLAWVISUAL_LLM_MODEL(CLI-level aliases mapped to serverLLM_*envs)CLAWVISUAL_GEMINI_API_KEY(CLI-level alias mapped to serverGEMINI_API_KEY)- If
GEMINI_API_KEYis not configured, image generation falls back to OpenRouter and is usually slower.
- If
Build image:
docker build -t clawvisual:1.0.0 .Run container:
docker run --rm -p 3000:3000 \
-e LLM_API_KEY=your_openrouter_api_key \
-e GEMINI_API_KEY=your_gemini_api_key \
-e NANO_BANANA_MODEL=gemini-3.1-flash-image-preview \
clawvisual:1.0.0GHCR release-style run command:
docker run --rm -p 3000:3000 \
-e LLM_API_KEY=your_openrouter_api_key \
ghcr.io/<owner>/clawvisual:<tag>- Input URL or long-form text
- Run pipeline:
skill_input_processor->skill_content_planner->skill_visual_prompt_planner->skill_asset_generator->skill_viral_optimizer - Poll async job status until completion
- Review output and optionally run revision actions (
rewrite_copy_style,regenerate_cover,regenerate_slides) - Export/download final assets
In the web composer, use the Aspect ratio selector to switch between portrait, square, story, and landscape (16:9) outputs.
In the left sidebar, click Your chats -> Clear all to remove all chat sessions at once.
After npm run dev, confirm the service is healthy before testing the full UI.
- Open OpenAPI:
curl http://localhost:3000/api/openapi.json- List MCP tools:
curl -X POST http://localhost:3000/api/mcp \
-H 'content-type: application/json' \
--data '{"jsonrpc":"2.0","id":1,"method":"tools/list"}'- Create a conversion job:
curl -X POST http://localhost:3000/api/v1/convert \
-H 'content-type: application/json' \
--data '{
"input_text": "Open source projects grow faster when onboarding is simple and the value is visible on first use.",
"max_slides": 4,
"aspect_ratios": ["16:9"]
}'- Poll the returned
status_urluntilstatusbecomescompletedorfailed.
Expected first-run behavior:
- The job should be accepted immediately and return
202. - In
fastmode, pipeline appendsskill_quality_copy_polish. - In
fullmode, pipeline additionally runsskill_quality_post_copy_loopandskill_quality_final_audit. - Without fully working external model/image credentials, some quality/image steps may degrade or fall back.
- Leaving
NANO_BANANA_MODELas the template placeholder can trigger image-generation retries and fallback placeholder outputs.
clawvisual can be integrated into OpenClaw as a workspace/local skill via MCP.
- Run clawvisual service:
npm install
cp .env.local.template .env.local
npm run dev- Install this skill into OpenClaw:
- copy skills/clawvisual to either:
<openclaw-workspace>/skills/clawvisual(workspace scope), or~/.openclaw/skills/clawvisual(shared local scope)
- Configure skill runtime env:
CLAWVISUAL_MCP_URL=http://localhost:3000/api/mcp
CLAWVISUAL_API_KEY=<your_clawvisual_api_key_if_enabled>If the dev server starts on 3001 or another port, update CLAWVISUAL_MCP_URL accordingly.
If you explicitly configure CLAWVISUAL_API_KEYS, set CLAWVISUAL_API_KEY to one of those accepted values.
- Test the skill client locally:
npm run skill:clawvisual -- tools- Endpoint:
POST /api/mcp - Methods:
initialize,tools/list,tools/call - Tools:
convert,job_status,revise,regenerate_cover
Yes. This repository is self-hostable with Node.js and environment variables in .env.local.
Yes. Use API or MCP calls from scripts/workflows and submit multiple conversion jobs asynchronously.
Yes. The MCP interface is designed for automation and agent workflow orchestration.
Yes. Use the skills/clawvisual package and point it to your CLAWVISUAL_MCP_URL.
- Better batch orchestration and queue controls
- Expanded template/style presets
- Stronger evaluation cases and regression gates
- More granular asset export formats
- Weekly release notes in GitHub Releases
- Framework: Next.js App Router + TypeScript
- API:
POST /api/v1/convertstarts an async conversion pipeline and returnsjob_idGET /api/v1/jobs/:idreturns status/progress/resultPOST /api/mcpJSON-RPC MCP endpoint (initialize,tools/list,tools/call)GET /api/openapi.jsonexports OpenAPI schema
- Skill system:
src/lib/skillscurrently keeps the core longform chain:input-processor.tscontent-planner.tsvisual-prompt-planner.tsasset-generator.tsviral-optimizer.ts
- Pipeline registry:
src/lib/pipeline/registry.tsdefines mode-driven stage lists:longform_digest.fastlongform_digest.full
- Prompt templates:
src/lib/prompts/index.ts - Orchestration:
src/lib/orchestrator.ts - Queue:
- Local in-memory job queue for immediate development
- API key validation:
src/lib/auth/api-key.ts
src/app/page.tsx: clawvisual dashboard UIsrc/app/api/v1/convert/route.ts: conversion entrypointsrc/app/api/v1/jobs/[id]/route.ts: job status endpointsrc/app/api/openapi.json/route.ts: OpenAPI exportsrc/lib/types: standard interfaces and context objectsrc/lib/skills: core longform skill modulessrc/lib/pipeline/registry.ts: pipeline stage registry by content mode and run mode (fast/full)
Existing keys are reusable. Current scaffold reads:
LLM_API_URL(optional, defaulthttps://openrouter.ai/api/v1/chat/completions)LLM_API_KEYLLM_MODEL(optional, defaultgoogle/gemini-3-flash-preview)LLM_TIMEOUT_MS(optional, default25000)LLM_COPY_FALLBACK_MODEL(optional, defaultgoogle/gemini-2.5-flash)LLM_COPY_POLISH_MODEL(optional, defaultopenai/gpt-5.1-mini)GEMINI_API_KEYNANO_BANANA_MODELNANO_BANANA_TIMEOUT_MS(optional, default60000)NANO_BANANA_TRANSIENT_RETRY_MAX(optional, default2)NANO_BANANA_RETRY_BASE_DELAY_MS(optional, default450)QUALITY_LOOP_ENABLED(optional, defaulttrue)QUALITY_AUDIT_THRESHOLD(optional, default78)QUALITY_IMAGE_COVER_THRESHOLD(optional, default85)QUALITY_IMAGE_INNER_THRESHOLD(optional, default78)QUALITY_COVER_FIRST_GLANCE_THRESHOLD(optional, default82)QUALITY_COVER_NOVELTY_THRESHOLD(optional, default80)QUALITY_COVER_CANDIDATE_COUNT(optional, default1)QUALITY_MAX_COPY_ROUNDS(optional, default1)QUALITY_MAX_IMAGE_ROUNDS(optional, default0)QUALITY_MAX_EXTRA_IMAGES(optional, default1)QUALITY_IMAGE_LOOP_MAX_MS(optional, default120000)QUALITY_IMAGE_AUDIT_SCOPE(optional,coverorall, defaultcover)PIPELINE_MODE(optional,fastorfull, defaultfast)PIPELINE_MAX_DURATION_MS(optional, default300000)
Runtime observability:
- Thinking & Actions event timeline now includes per-step token usage deltas (
in/out/total) when providerusageis returned. - Final
skill_logsincludesllm_usage_summaryfor total request-level token aggregation. OPENROUTER_API_KEYTAVILY_API_KEYSERPER_API_KEYJINA_API_KEY
API security controls:
CLAWVISUAL_API_KEYScomma-separated accepted keysCLAWVISUAL_ALLOW_NO_KEYdefaulttruein local development
- This project includes async conversion pipeline + revision engine + MCP-compatible JSON-RPC endpoint.
- Real integrations (Flux/Midjourney, Redis/BullMQ worker process, PostgreSQL persistence, satori rendering) are left as plug-in points.
POST /api/mcp supports:
convert: create conversion jobjob_status: fetch current job status/resultrevise: create revision job for copy/image changesregenerate_cover: regenerate cover via job revision or direct prompt image call
Reusable external skill package:
Convenience command:
npm run skill:clawvisual -- tools
-
Missing x-api-key- Cause: API-key validation was explicitly enabled by setting
CLAWVISUAL_API_KEYS. - Fix: send
x-api-key, or clearCLAWVISUAL_API_KEYSfor local no-auth mode.
- Cause: API-key validation was explicitly enabled by setting
-
MCP client points to the wrong service
- Cause:
npm run devswitched to3001, but the client default is stillhttp://localhost:3000/api/mcp. - Fix: set
CLAWVISUAL_MCP_URLto the real local port.
- Cause:
-
Next.js workspace-root warning during
devorbuild- Cause: another lockfile exists above this repo, so Next.js infers a higher workspace root.
- Fix: set
turbopack.rootinnext.config.tsor remove the unrelated parent lockfile.




