Comprehensive documentation for the DestaquesGovBr Data Platform.
| Category | Document | Description |
|---|---|---|
| Getting Started | Development Setup | Set up your development environment |
| Development | PostgresManager | PostgreSQL storage manager guide |
| Architecture | Overview | System architecture and design |
| Database | Schema | Database tables, indexes, and queries |
| Database | Migrations | Setup and manage the database |
| Migration | Plan | HuggingFace → PostgreSQL migration plan |
| Migration | Progress | Migration progress log |
docs/
├── README.md # This file
├── architecture/
│ └── overview.md # System architecture
├── database/
│ ├── schema.md # Database schema reference
│ └── migrations.md # Database setup and migrations
└── development/
└── setup.md # Development environment setup
- Read: Development Setup
- Install: Python 3.11+, Poetry, gcloud CLI
- Setup: Run
poetry installand./scripts/setup_database.sh - Test: Run
pytestto verify installation
- Create feature branch:
git checkout -b feat/my-feature - Make changes and write tests
- Run quality checks:
black . && ruff . && mypy src/ && pytest - Commit and push:
git commit -m "feat: description"
- Connect: See Migrations Guide
- Schema: See Database Schema
- Queries: See Common Queries
- Architecture Overview: overview.md
- Migration Strategy: Migration Plan
- Design Decisions: ADRs
- Partial Normalization: Balance between normalization and performance
- Gradual Migration: Minimize risk with phased approach
- Dual-Write: Transition period writing to both stores
- Storage Adapter: Abstraction for swapping backends
- Schema: Database Schema
- Migrations: Migrations Guide
- Cloud SQL Docs: Cloud SQL
- Setup: Initial Setup
- Backup: Backup Database
- Monitoring: Performance Monitoring
- Troubleshooting: Troubleshooting
The migration from HuggingFace to PostgreSQL is documented in _plan/:
| Document | Purpose |
|---|---|
| README.md | Migration plan with 6 phases |
| PROGRESS.md | Progress log and timeline |
| DECISIONS.md | Architecture Decision Records (ADRs) |
| CHECKLIST.md | Verification checklist per phase |
| CONTEXT.md | Technical context for LLMs |
| SCHEMA.md | Detailed schema design |
Current Status: Phase 1 Complete ✅ (Infrastructure provisioned)
A data platform aggregating news from ~158 Brazilian government agencies:
- Scrapes RSS feeds
- Enriches with AI summaries (AWS Bedrock)
- Classifies into theme taxonomy
- Distributes via HuggingFace, Typesense, and website
Migrating from HuggingFace Dataset to PostgreSQL for:
- Better query capabilities
- Transactional support
- Reduced external dependencies
- Full-text search
- Structured schema
HuggingFace becomes output-only (daily sync for open data distribution).
destaquesgovbr/
├── infra/ # Infrastructure (Terraform)
│ └── terraform/
│ └── cloud_sql.tf # Cloud SQL configuration
├── data-platform/ # This repository
│ ├── docs/ # Documentation (you are here)
│ ├── _plan/ # Migration plan
│ ├── src/ # Source code
│ ├── tests/ # Tests
│ └── scripts/ # Utility scripts
└── [other repos...]
-
Choose appropriate directory:
architecture/: System design, architecturedatabase/: Schema, migrations, SQLdevelopment/: Dev guides, workflows
-
Create markdown file with clear structure
-
Update this index (docs/README.md)
-
Use relative links
- Concise: Brief, to the point
- Practical: Include code examples
- Structured: Use headers, tables, code blocks
- Linked: Cross-reference related docs
Last updated: 2024-12-24