Skip to content

SunnyX6/Datapillar

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

74 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Datapillar Logo Datapillar

English | 简体中文

An Agentic ETL data development platform powered by data governance and RAG

Metadata Governance Zero-ETL Orchestration AI Analytics CI License

Datapillar Demo

What Datapillar Solves

  • Zero-ETL
  • Data governance and AI .
  • Keep metadata, lineage graph, and semantic assets synchronized across services.

Core Capabilities

  • Metadata governance based on Datapillar Gravitino (customized from Apache Gravitino).
  • Agentic ETL and workflow execution with SQL-centric development.
  • OpenLineage ingestion and graph persistence for lineage/knowledge analysis.
  • RAG-ready AI service with vector retrieval and SQL summary/embedding processing.
  • Multi-service local startup script for full-stack debugging.

Tech Stack

Backend and Service Frameworks

  • Java 21, Spring Boot 3, Spring Cloud Gateway
  • Dubbo 3 (RPC communication)
  • Nacos (configuration and service discovery)
  • Python 3.11+, FastAPI (AI service)

Data and Compute Engines

  • MySQL (business DB datapillar, metadata DB gravitino)
  • Redis (gateway rate limiting, sessions, and cache)
  • Neo4j (data-warehouse knowledge graph and lineage graph)
  • Milvus (RAG document vector retrieval)
  • Apache Flink (SQL execution)
  • Datapillar Gravitino (customized extensions based on Apache Gravitino)

Frontend and Tooling

  • React 19 + TypeScript + Vite
  • React Router, Zustand, Tailwind CSS
  • Vitest, Playwright, ESLint, Stylelint, Prettier

Technical Architecture

Datapillar Technical Architecture

Local Development Quick Start (Debug)

1. Prerequisites

  • JDK 21+
  • Maven 3.9+
  • Python 3.11+ with uv
  • Node.js 20+ with npm
  • Nacos 3.x (local default 127.0.0.1:8848)
  • MySQL 8.x, Redis, Neo4j, Milvus

2. Start Required Dependencies

Make sure the following services are reachable locally (default ports):

  • Nacos: 127.0.0.1:8848
  • MySQL: 127.0.0.1:3306
  • Redis: 127.0.0.1:6379
  • Neo4j: 127.0.0.1:7687
  • Milvus: 127.0.0.1:19530

The startup script auto-syncs config/nacos/dev/DATAPILLAR/*.yaml to Nacos (namespace=dev, group=DATAPILLAR).

3. Start Backend Services (One Command)

Run from project root:

./scripts/start-local-all.sh

The startup script auto-syncs Nacos configs and starts all backend services in local debug mode.

This script compiles and starts:

  • datapillar-auth (7001)
  • datapillar-studio-service (7002)
  • datapillar-api-gateway (7000)
  • datapillar-ai (7003)
  • datapillar-openlineage (7004)
  • datapillar-gravitino (8090)

Log directory:

/tmp/datapillar-logs

4. Start Frontend

cd web/datapillar-studio
npm install
npm run dev

Frontend default URL:

  • http://localhost:3001

5. Stop Backend Services

./scripts/stop-local-all.sh

Project Structure

.
├── config/                     # Nacos templates (dev/prod)
├── docs/                       # Documentation and architecture assets
├── scripts/                    # Local start/stop scripts
├── datapillar-api-gateway/     # Gateway service (Spring Cloud Gateway)
├── datapillar-auth/            # Authentication service
├── datapillar-studio-service/  # Core business service (multi-tenant/SQL/workflow)
├── datapillar-ai/              # AI service (FastAPI/RAG/Agent)
├── datapillar-openlineage/     # OpenLineage sink service
├── datapillar-gravitino/       # Gravitino metadata extensions
└── web/datapillar-studio/      # Frontend app (React + Vite)

Module Responsibilities

  • datapillar-api-gateway: API ingress, routing, JWT validation, and traffic control.
  • datapillar-auth: Identity bootstrap, tenant membership governance, and key management.
  • datapillar-studio-service: Core business domain (tenant/workflow/invitation and studio APIs).
  • datapillar-ai: AI-side orchestration, model config access, and RAG/agent runtime.
  • datapillar-openlineage: OpenLineage event ingest, async task dispatch, and storage write.
  • datapillar-gravitino: Datapillar-specific metadata governance and tenant-scoped extensions.

Troubleshooting (Local Debug)

  • Port conflict (7000~7004, 8090): run ./scripts/stop-local-all.sh, then retry startup.
  • Nacos sync/auth issues: verify NACOS_SERVER_ADDR, NACOS_NAMESPACE, NACOS_USERNAME, NACOS_PASSWORD in scripts/start-local-all.sh.
  • Build/start failure: inspect /tmp/datapillar-logs/*.startup.log.
  • Service reachable but frontend errors: verify gateway at http://localhost:7000 and frontend at http://localhost:3001.

Contributing

Please read CONTRIBUTING.md before opening issues or pull requests.

License

Licensed under the Apache License 2.0.

Upstream References

About

Raw In, Golden Wings Out

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors