Skip to content

Latest commit

 

History

History
182 lines (129 loc) · 5.57 KB

File metadata and controls

182 lines (129 loc) · 5.57 KB

Documentation

Comprehensive documentation for the DestaquesGovBr Data Platform.


Quick Links

Category Document Description
Getting Started Development Setup Set up your development environment
Development PostgresManager PostgreSQL storage manager guide
Architecture Overview System architecture and design
Database Schema Database tables, indexes, and queries
Database Migrations Setup and manage the database
Migration Plan HuggingFace → PostgreSQL migration plan
Migration Progress Migration progress log

Documentation Structure

docs/
├── README.md                      # This file
├── architecture/
│   └── overview.md               # System architecture
├── database/
│   ├── schema.md                 # Database schema reference
│   └── migrations.md             # Database setup and migrations
└── development/
    └── setup.md                  # Development environment setup

For Developers

First Time Setup

  1. Read: Development Setup
  2. Install: Python 3.11+, Poetry, gcloud CLI
  3. Setup: Run poetry install and ./scripts/setup_database.sh
  4. Test: Run pytest to verify installation

Daily Workflow

  1. Create feature branch: git checkout -b feat/my-feature
  2. Make changes and write tests
  3. Run quality checks: black . && ruff . && mypy src/ && pytest
  4. Commit and push: git commit -m "feat: description"

Working with Database


For Architects

System Design

Key Concepts

  1. Partial Normalization: Balance between normalization and performance
  2. Gradual Migration: Minimize risk with phased approach
  3. Dual-Write: Transition period writing to both stores
  4. Storage Adapter: Abstraction for swapping backends

For Database Administrators

Database Reference

Operations


Migration Documentation

The migration from HuggingFace to PostgreSQL is documented in _plan/:

Document Purpose
README.md Migration plan with 6 phases
PROGRESS.md Progress log and timeline
DECISIONS.md Architecture Decision Records (ADRs)
CHECKLIST.md Verification checklist per phase
CONTEXT.md Technical context for LLMs
SCHEMA.md Detailed schema design

Current Status: Phase 1 Complete ✅ (Infrastructure provisioned)


Project Context

What is DestaquesGovBr?

A data platform aggregating news from ~158 Brazilian government agencies:

  • Scrapes RSS feeds
  • Enriches with AI summaries (AWS Bedrock)
  • Classifies into theme taxonomy
  • Distributes via HuggingFace, Typesense, and website

Why PostgreSQL?

Migrating from HuggingFace Dataset to PostgreSQL for:

  • Better query capabilities
  • Transactional support
  • Reduced external dependencies
  • Full-text search
  • Structured schema

HuggingFace becomes output-only (daily sync for open data distribution).

Repository Structure

destaquesgovbr/
├── infra/                    # Infrastructure (Terraform)
│   └── terraform/
│       └── cloud_sql.tf      # Cloud SQL configuration
├── data-platform/            # This repository
│   ├── docs/                 # Documentation (you are here)
│   ├── _plan/                # Migration plan
│   ├── src/                  # Source code
│   ├── tests/                # Tests
│   └── scripts/              # Utility scripts
└── [other repos...]

Contributing

Adding Documentation

  1. Choose appropriate directory:

    • architecture/: System design, architecture
    • database/: Schema, migrations, SQL
    • development/: Dev guides, workflows
  2. Create markdown file with clear structure

  3. Update this index (docs/README.md)

  4. Use relative links

Documentation Style

  • Concise: Brief, to the point
  • Practical: Include code examples
  • Structured: Use headers, tables, code blocks
  • Linked: Cross-reference related docs

External Resources


Last updated: 2024-12-24