English · 简体中文
Video Driven Skill is split into a Spring Boot backend and a React frontend. The two services communicate through REST APIs for ordinary operations and WebSocket/SSE channels for long-running generation and execution logs.
video upload
-> frame extraction
-> annotation and requirement
-> multimodal skill generation
-> code review and editing
-> local runner
-> export, deploy, or regenerate
The backend owns persistence, file storage, model calls, video processing, and skill execution.
Important modules:
controller/: REST and WebSocket entry points.service/VideoService.java: upload handling, FFmpeg frame extraction, and video streaming.service/AIService.java: prompt construction and OpenAI-compatible multimodal API calls.service/SkillService.java: skill CRUD, import/export, ordering, regeneration, and versioning.service/SkillRunnerService.java: temporary workspace creation, dependency setup, runtime injection, script execution, and log collection.service/KnowledgeService.java: per-skill reference files and manifest handling.model/andrepository/: SQLite-backed domain records.
Runtime data defaults to ~/video-driven-skill/:
uploads/: uploaded videos and extracted frames.skills/: generated skill source files.archives/: reusable video/frame/requirement resources.video-driven-skill.db: SQLite database.
With Docker Compose, data is stored in the app-data volume at /data in the backend container (VIDEO_DRIVEN_SKILL_HOME=/data).
The frontend is a Vite application that provides a studio-like workflow:
HomePage.jsx: upload, import, and recent resources.PlaygroundPage.jsx: frame annotation and skill workspace layout.FrameTimeline.jsx,FrameAnnotator.jsx,FrameList.jsx: visual evidence collection.AIProcessor.jsx: generation control and streamed status.SkillList.jsx: skill repository with manual drag ordering.SkillEditor.jsx,SkillExport.jsx,SkillRunner.jsx: review, export, and execution.RegeneratePanel.jsx,PartialRegeneratePanel.jsx,CodeComparisonView.jsx: iteration workflow.KnowledgeBasePanel.jsx: extra context attached to a skill.
A generated skill is a small folder that can be exported as ZIP:
SKILL.md
package.json
variables.json
scripts/main.js
knowledge/
SKILL.md explains the skill intent and variables. scripts/main.js is the executable entrypoint. variables.json defines user-editable runtime inputs.
The backend expects an OpenAI-compatible chat completions API. Configure it with:
AI_API_KEY=...
AI_BASE_URL=https://api.openai.com/v1
AI_MODEL=gpt-4o-miniProviders with compatible request and response shapes can be used by overriding AI_BASE_URL and AI_MODEL.
The project is local-first by default, but recordings and generated scripts can contain sensitive information. Keep these files out of version control:
.env- SQLite databases
- uploaded videos
- extracted frames
- generated skills
- logs
- build outputs
- dependency folders