Skip to content

bellDataSc/Municipal-Data-Extraction-and-Validation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

Municipal Data Extraction and Validation

Extraction and validation of Brazilian municipal data through official IBGE API, integrated with internal databases for data quality assurance and demographic analysis at FGV IBRE.


FGV IBRE - Instituto Brasileiro de Economia

Made with ☕ by Isabel Cruz | in Google Colab | in Brazil | Data from IBGE

Medium: https://belgon.medium.com/

LinkedIn: http://www.linkedin.com/in/belcruz

isabel.gon.adm@gmail.com

Overview

This repository contains scripts and notebooks for:

  • Extracting official municipal data via IBGE public API
  • Validating municipality identifiers (ID IBGE) against internal records
  • Reconciling demographic data for analysis
  • Generating data quality reports

Data Sources

Structure

  ```
  ├── notebooks/           # Colab notebooks for data extraction
  ├── scripts/            # Python scripts for validation and processing
  ├── data/               # Input and output data files
  ├── reports/            # Generated validation reports (Excel)
  
  ```

Key Features

  • Automated data extraction from IBGE's public API (all 27 Brazilian states)
  • Municipal name normalization and standardized matching
  • ID reconciliation: internal identifiers vs. official IBGE codes
  • Data quality metrics (match rates, missing values, collisions)
  • Excel reports with validation summaries

Installation

pip install pandas requests openpyxl

Usage

Colab Environment

  1. Upload script: validacao_ibge_corrigido.py
  2. Upload dataset: validacao_colisoes_IBGE - Copia.xlsx
  3. Execute script to generate: validacao_ibge_resultado.xlsx

Local Environment

python validacao_ibge_corrigido.py

Input: validacao_colisoes_IBGE - Copia.xlsx Output: validacao_ibge_resultado.xlsx

Output Format

Excel workbook with three sheets:

Sheet Content
Validacao_Completa Full dataset with IBGE ID validation
Nao_Encontrados Records not matched in IBGE database
Resumo Metrics: total records, match rate, gaps

Validation Metrics

  • Total records processed
  • Records matched with IBGE database
  • Unmatched records requiring review
  • Match rate percentage

Technical Stack

  • Python 3.7+
  • pandas: Data manipulation
  • requests: API calls
  • openpyxl: Excel file generation

About

Extraction and validation of Brazilian municipal data through official IBGE API, integrated with internal databases for data quality assurance and demographic analysis at FGV IBRE.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors