An Agentic ETL data development platform powered by data governance and RAG
- Zero-ETL
- Data governance and AI .
- Keep metadata, lineage graph, and semantic assets synchronized across services.
- Metadata governance based on Datapillar Gravitino (customized from Apache Gravitino).
- Agentic ETL and workflow execution with SQL-centric development.
- OpenLineage ingestion and graph persistence for lineage/knowledge analysis.
- RAG-ready AI service with vector retrieval and SQL summary/embedding processing.
- Multi-service local startup script for full-stack debugging.
- Java 21, Spring Boot 3, Spring Cloud Gateway
- Dubbo 3 (RPC communication)
- Nacos (configuration and service discovery)
- Python 3.11+, FastAPI (AI service)
- MySQL (business DB
datapillar, metadata DBgravitino) - Redis (gateway rate limiting, sessions, and cache)
- Neo4j (data-warehouse knowledge graph and lineage graph)
- Milvus (RAG document vector retrieval)
- Apache Flink (SQL execution)
- Datapillar Gravitino (customized extensions based on Apache Gravitino)
- React 19 + TypeScript + Vite
- React Router, Zustand, Tailwind CSS
- Vitest, Playwright, ESLint, Stylelint, Prettier
- JDK 21+
- Maven 3.9+
- Python 3.11+ with
uv - Node.js 20+ with
npm - Nacos 3.x (local default
127.0.0.1:8848) - MySQL 8.x, Redis, Neo4j, Milvus
Make sure the following services are reachable locally (default ports):
- Nacos:
127.0.0.1:8848 - MySQL:
127.0.0.1:3306 - Redis:
127.0.0.1:6379 - Neo4j:
127.0.0.1:7687 - Milvus:
127.0.0.1:19530
The startup script auto-syncs
config/nacos/dev/DATAPILLAR/*.yamlto Nacos (namespace=dev,group=DATAPILLAR).
Run from project root:
./scripts/start-local-all.shThe startup script auto-syncs Nacos configs and starts all backend services in local debug mode.
This script compiles and starts:
datapillar-auth(7001)datapillar-studio-service(7002)datapillar-api-gateway(7000)datapillar-ai(7003)datapillar-openlineage(7004)datapillar-gravitino(8090)
Log directory:
/tmp/datapillar-logscd web/datapillar-studio
npm install
npm run devFrontend default URL:
http://localhost:3001
./scripts/stop-local-all.sh.
├── config/ # Nacos templates (dev/prod)
├── docs/ # Documentation and architecture assets
├── scripts/ # Local start/stop scripts
├── datapillar-api-gateway/ # Gateway service (Spring Cloud Gateway)
├── datapillar-auth/ # Authentication service
├── datapillar-studio-service/ # Core business service (multi-tenant/SQL/workflow)
├── datapillar-ai/ # AI service (FastAPI/RAG/Agent)
├── datapillar-openlineage/ # OpenLineage sink service
├── datapillar-gravitino/ # Gravitino metadata extensions
└── web/datapillar-studio/ # Frontend app (React + Vite)
datapillar-api-gateway: API ingress, routing, JWT validation, and traffic control.datapillar-auth: Identity bootstrap, tenant membership governance, and key management.datapillar-studio-service: Core business domain (tenant/workflow/invitation and studio APIs).datapillar-ai: AI-side orchestration, model config access, and RAG/agent runtime.datapillar-openlineage: OpenLineage event ingest, async task dispatch, and storage write.datapillar-gravitino: Datapillar-specific metadata governance and tenant-scoped extensions.
- Port conflict (
7000~7004,8090): run./scripts/stop-local-all.sh, then retry startup. - Nacos sync/auth issues: verify
NACOS_SERVER_ADDR,NACOS_NAMESPACE,NACOS_USERNAME,NACOS_PASSWORDinscripts/start-local-all.sh. - Build/start failure: inspect
/tmp/datapillar-logs/*.startup.log. - Service reachable but frontend errors: verify gateway at
http://localhost:7000and frontend athttp://localhost:3001.
Please read CONTRIBUTING.md before opening issues or pull requests.
Licensed under the Apache License 2.0.


