LOGOS is a cutting-edge Semantic Knowledge Management Platform that transforms static documents into a visual, interactive universe of interconnected ideas. Using AI-driven vector embeddings and an event-driven architecture, LOGOS "understands" your data, creating automatic relationships between stars (fragments of knowledge) and galaxies (user-defined study areas).
Click the image below to watch the full system walkthrough and see the Galaxy Canvas in action:
LOGOS was engineered as a Multi-Cloud Distributed System with a focus on high availability, asynchronous processing, and horizontal scalability.
We use the C4 Model to provide different levels of abstraction for our architecture.
🔍 Level 1: System Context (The Big Picture)
Provides a high-level view of how users interact with the LOGOS Platform and its external dependencies like Keycloak (GCP), OpenAI, and Google Cloud Storage.
📦 Level 2: Containers (Microservices & Messaging)
Visualizes the internal container boundaries. This level highlights the Event-Driven communication via Apache Kafka (Confluent Cloud), decoupling our Java and Go services for resilient processing.
⚙️ Level 3: Components (Ingestion Service Deep Dive)
A granular view of the Ingestion Service (Golang). This shows the concurrent pipeline designed for high-performance file hashing, zero-allocation cloud streaming, and Kafka event production.
Following European enterprise standards for robust distributed systems, LOGOS utilizes Kafka (via Confluent Cloud) as its central nervous system.
- Decoupling: Services never communicate directly; they react to events, ensuring that if the AI Processor is under heavy load, the rest of the system remains responsive.
- Reliability: Uses SASL_SSL for secure cloud-native messaging.
- Core Topics:
document.ingestion: Notifies the AI engine to begin processing new data.highlight.created: Triggers the vectorization engine for semantic indexing.star.linked: Propagates newly discovered semantic affinities back to the database.
The binary ingestion layer was built in Go to handle high-concurrency uploads with minimal latency.
- Concurrency: Utilizes Goroutines for non-blocking SHA-256 hash generation and GCS streaming.
- Efficiency: Designed for a low memory footprint, significantly reducing cloud infrastructure costs compared to JVM-based entry points.
- Vector Engine: Powered by Pinecone (Serverless), storing high-dimensional embeddings generated by OpenAI.
- Shooting Star Logic: A proprietary algorithm where new content "finds" its place in the galaxy by calculating semantic gravity against existing clusters in real-time.
| Layer | Technologies |
|---|---|
| Backend Core | Java 21, Spring Boot 3.3, Spring Cloud Gateway, LangChain4j |
| Ingestion | Go (Golang) 1.22+ |
| Messaging | Apache Kafka (Confluent Cloud) |
| Databases | PostgreSQL (Neon), Redis (Upstash), Pinecone (Vector DB) |
| Frontend | React 18, TypeScript, Vite, D3.js (Physics Engine), React Flow |
| Cloud & Infra | AWS EC2 (m7i.large - 8GB RAM), Nginx, Google Cloud Storage |
| Identity | Keycloak (OAuth2 / OpenID Connect) |
During production deployment on AWS, Pinecone Serverless indexes used a unique hash-based URL structure. We solved this by implementing dynamic BaseURL injection via environment variables, ensuring stable connectivity between the AWS VPC and the Pinecone SaaS host.
Running 4 Spring Boot services + Go inside containers is resource-heavy. We optimized the environment by migrating to AWS m7i.large (8GB RAM) instances and implementing Linux Swap partitioning, ensuring sub-second response times even during heavy AI indexing.
To comply with non-root container security policies, we developed a custom entrypoint.sh that safely injects Google Service Account JSONs into the /tmp directory at runtime, allowing the services to authenticate with GCS without violating security audits.
The project utilizes a professional CI/CD Pipeline via GitHub Actions:
- Code Validation: Automated build checks for Java and Go.
- Secure Transfer: Encrypted structure transfer to AWS via SCP.
- Docker Orchestration: Automated
docker-composerebuilds with image pruning to maintain server health.
Developed by Benjamim - Engineering a smarter way to connect knowledge.




