Skip to content

ArqiSoft/leanda-ng

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Leanda.io - Open Science Data Repository Platform

License

Leanda.io is an extensible open science data repository that enables researchers to consume, process, visualize, and analyze diverse scientific data types, formats, and volumes. Unlike traditional file stores or narrow-purpose databases, it features a modular microservices architecture designed for seamless extension with new domain-specific services.

What is Leanda.io?

Leanda.io addresses key deficiencies in existing open science tools by providing:

  • Real-time automated + manual data curation with AI-powered metadata extraction
  • Ontology-based property assignment and complex semantic searches
  • On-the-fly data mining, text extraction, and format conversion during deposition
  • Granular security model supporting private, shared, and public data
  • Rapid ML training dataset composition from integrated sources and processed data
  • Embedded ML framework for research and drug discovery pipelines

Supported Data Domains and Formats

Leanda.io handles a wide range of scientific formats with automatic import/export conversions:

  • Generic images (PNG, GIF, TIFF, BMP)
  • Documents (PDF, MS Office, OpenOffice)
  • Tabular data (CSV, TSV, Excel)
  • Chemical structures (SDF, MOL, SMILES, CDX)
  • Chemical reactions (RXN)
  • Crystallographic data (CIF)
  • Spectra (JDX)
  • Microscopy imaging files
  • Machine learning models & weights

Current Phase

⚠️ IMPORTANT: This project is currently in the planning/design phase. NOTHING is runnable yet.

Infrastructure is designed, documentation is comprehensive, and contracts are defined, but NO code is implemented or runnable yet. The ~92% completion refers to design/planning work, not implementation.

See Project Summary Report for detailed progress.

Journey: A single narrative from Week 1 through Week 5—discovery, reality check, front-end progress, tool limits, and the DynamoDB/S3 decision—is in the Journey Master Summary. It summarizes wins, lessons, and links to all journey docs and images.

Repository Structure

Current (what exists)

leanda-ng/
├── LICENSE
├── infrastructure/       # CDK config (cdk.json, package.json, tsconfig.json), iam/ (IAM policy JSONs)
├── docs/                 # architecture, adr, deployment, security, monitoring, testing, finops, agents, journey, phases, frontend, infrastructure
├── shared/               # contracts/ (AsyncAPI events, blob-storage-api), specs/ (OpenAPI core-api, models, events, implementation, tests)
├── docker/               # docker-compose.yml, Grafana/Prometheus config, test runners (not runnable without services)
└── scripts/              # agents/ (QA and automation scripts)

Planned (once implementation starts)

leanda-ng/
├── services/             # Java/Quarkus microservices (core-api, parsers, blob-storage, etc.)
├── frontend/             # Angular 21 app
├── ml-services/          # Python/FastAPI ML pipelines
└── tests/                # Integration and E2E

Quick Start

⚠️ Nothing is runnable yet. This project is in planning/design phase; implementation will begin after design is complete.

Prerequisites (For Future Implementation)

When implementation begins, you will need:

  • Java 21 LTS (for backend services)
  • Python 3.12+ (for ML services)
  • Node.js 20+ (for frontend and CDK)
  • Docker & Docker Compose (for local development)
  • AWS CLI v2 (for infrastructure deployment)

Current Status

  • ✅ Infrastructure design complete (AWS CDK stacks designed)
  • ✅ Documentation complete (architecture, ADRs, security, deployment guides)
  • ✅ API contracts defined (OpenAPI/AsyncAPI specifications)
  • ⏳ Service implementation (not started)
  • ⏳ Frontend implementation (not started)
  • ⏳ ML services implementation (not started)

See the Development Journey for progress updates.

Technology Stack

Layer Technology Description
Frontend Angular 21 Zoneless architecture, Signal Forms
Backend Java 21, Quarkus 3.17+ Cloud-native microservices
ML Services Python 3.12+, FastAPI ML pipelines and inference
Database MongoDB 7.0 DocumentDB compatible
Cache Redis 7.2 Session and data caching
Messaging Redpanda Kafka-compatible streaming
Search OpenSearch 2.11 Full-text and vector search
Storage MinIO S3-compatible object storage
Infrastructure AWS CDK Infrastructure as Code
Monitoring Prometheus, Grafana Metrics and dashboards

Documentation

More: Cloud architecture · ADRs · Deployment · Security · Monitoring · FinOps · Testing · Agents / coordination · Journey

Planned Services Overview

Note: These services are planned but not yet implemented. See API contracts in shared/contracts/ and shared/specs/ for specifications.

Service Port Description Status
core-api 8080 User management, events, WebSocket ⏳ Planned
blob-storage 8084 File storage and retrieval ⏳ Planned
chemical-parser 8083 Parse SDF, MOL files ⏳ Planned
chemical-properties 8086 Calculate molecular properties ⏳ Planned
reaction-parser 8087 Parse RXN files ⏳ Planned
crystal-parser 8089 Parse CIF files ⏳ Planned
spectra-parser 8090 Parse JDX files ⏳ Planned
imaging 8091 Image processing ⏳ Planned
office-processor 8088 Office document conversion ⏳ Planned
metadata-processing 8098 Metadata extraction ⏳ Planned
indexing 8099 OpenSearch indexing ⏳ Planned

Project Status

Phase 0: Foundation & Design (Current)

  • Infrastructure Design: AWS CDK stacks designed (9 stacks)
  • Documentation: Comprehensive architecture, ADRs, security, deployment guides
  • API Contracts: OpenAPI/AsyncAPI specifications defined
  • Architecture Decisions: 12 ADRs documented
  • Planning: Multi-agent coordination framework designed

Implementation Phase (Next)

  • Phase 1: Core services implementation (not started)
  • Phase 2: Domain parsers implementation (not started)
  • Phase 3: ML services implementation (not started)
  • Phase 4: Frontend implementation (not started)
  • Phase 5: Infrastructure deployment and testing (not started)

Note: ~92% refers to design/planning work, not implementation. See Project Summary Report for details.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contributing

Contributions are welcome! Please see:

Acknowledgments

Leanda.io was originally developed by the ArqiSoft team and before that by Science Data Software team. This modernization effort aims to revitalize the platform for the open science community using modern AWS-native technologies and best practices.


Built with care for the open science community

About

Leanda.io NextGen

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors