Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 53 additions & 0 deletions statvar_imports/tuberculosis_preventive_treatment/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# WHO Tuberculosis: Percentage of household contacts (or all close contacts) who were started on TB preventive treatment out of those eligible
Comment thread
pravnkumar-cloudsufi marked this conversation as resolved.

## Overview
This dataset provides the percentage of household contacts (or close contacts) of people diagnosed with a new episode of bacteriologically confirmed pulmonary TB disease who started TB preventive treatment, out of those eligible.

## Data Source

**Source URL:**
https://data.who.int/indicators/i/45274BD/F5556F8

The data comes from the official WHO reporting database and includes comprehensive, country-level health metrics detailing annual Tuberculosis notifications and case classifications.

## How To Download Input Data
To download the data, you'll need to run the provided download script `tb_data_download_who.py`. This script automatically queries the WHO API for the indicator, merges it with the WHO geographical master list to append standard `iso3` country codes, and saves the cleaned `Tuberculosis_preventive_treatment_input.csv` file inside an "input_files" folder.

type of place: Country.

statvars: Health / Tuberculosis.

years: 2010 to 2022

place_resolution: manually.

release_frequency: P1Y

## Processing Instructions
To process the WHO Tuberculosis data and generate statistical variables, use the following commands from your root `data` directory:

**Download input file**
```bash
python3 statvar_imports/tuberculosis_preventive_treatment/tb_data_download_who.py
```

**For Test Data Run**
```bash
python3 tools/statvar_importer/stat_var_processor.py \
--input_data="statvar_imports/tuberculosis_preventive_treatment/testdata/Tuberculosis_preventive_treatment_input.csv" \
--pv_map="statvar_imports/tuberculosis_preventive_treatment/tuberculosis_PreventiveTreatment_pv_mapping.csv" \
--output_path="statvar_imports/tuberculosis_preventive_treatment/output_files/tuberculosis_PreventiveTreatment" \
--config_file="statvar_imports/tuberculosis_preventive_treatment/tuberculosis_PreventiveTreatment_metadata.csv" \
--existing_statvar_mcf="gs://unresolved_mcf/scripts/statvar/stat_vars.mcf"

**For Main data run**
```bash
python3 tools/statvar_importer/stat_var_processor.py \
--input_data="statvar_imports/tuberculosis_preventive_treatment/source_files/Tuberculosis_preventive_treatment.csv" \
--pv_map="statvar_imports/tuberculosis_preventive_treatment/tuberculosis_PreventiveTreatment_pv_mapping.csv" \
--output_path="statvar_imports/tuberculosis_preventive_treatment/output_files/tuberculosis_PreventiveTreatment" \
--config_file="statvar_imports/tuberculosis_preventive_treatment/tuberculosis_PreventiveTreatment_metadata.csv" \
--existing_statvar_mcf="gs://unresolved_mcf/scripts/statvar/stat_vars.mcf"
```

#### Refresh type: Fully Autorefresh
26 changes: 26 additions & 0 deletions statvar_imports/tuberculosis_preventive_treatment/manifest.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
{
"import_specifications": [
{
"import_name": "WHO_TuberculosisPreventiveTreatment",
"curator_emails": [
"support@datacommons.org"
],
"provenance_url": "https://data.who.int/indicators/i/45274BD/F5556F8",
"provenance_description": "Tuberculosis: Percentage of household contacts (or all close contacts) who were started on TB preventive treatment out of those eligible",
"scripts": [
"tb_data_download_who.py",
"../../../tools/statvar_importer/stat_var_processor.py --input_data=input_files/Tuberculosis_preventive_treatment.csv --pv_map=tuberculosis_PreventiveTreatment_pv_mapping.csv --config_file=tuberculosis_PreventiveTreatment_metadata.csv --output_path=output/tuberculosis_preventive_output --existing_statvar_mcf=gs://unresolved_mcf/scripts/statvar/stat_vars.mcf"
],
"import_inputs": [
{
"template_mcf": "output/tuberculosis_preventive_output.tmcf",
"cleaned_csv": "output/tuberculosis_preventive_output.csv"
}
],
"source_files": [
"input_files/Tuberculosis_preventive_treatment.csv"
],
"cron_schedule": "0 10 10,21 * *"
}
]
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
import os
import requests
import io
import pandas as pd
import logging

# Configure logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')

def download_tb_percentage_data():
# 1. Get the Clean Data from the API using the new Indicator ID
api_url = "https://xmart-api-public.who.int/DATA_/RELAY_TB_DATA"
params = {
"$filter": "IND_ID eq '45274BDF5556F8'",
"$select": "IND_ID,INDICATOR_NAME,YEAR,COUNTRY,DISAGGR_1,VALUE",
"$format": "csv"
}

logging.info("1. Fetching clean percentage data from WHO API...")
api_response = requests.get(api_url, params=params)
Comment thread
pravnkumar-cloudsufi marked this conversation as resolved.

if api_response.status_code != 200:
logging.error(f"Failed to fetch API data. HTTP {api_response.status_code}")
return

# Load the clean API data into a pandas table
api_df = pd.read_csv(io.StringIO(api_response.text))

# 2. Get ONLY the iso3 code from the master database
logging.info("2. Fetching country iso3 codes from WHO master database...")
master_url = "https://extranet.who.int/tme/generateCSV.asp?ds=notifications"
master_response = requests.get(master_url)
if master_response.status_code != 200:
logging.error(f"Failed to fetch master data. HTTP {master_response.status_code}")
return

# We only pull the 'country' (for matching) and 'iso3' columns
geo_columns = ['country', 'iso3']
master_df = pd.read_csv(io.StringIO(master_response.text), usecols=geo_columns).drop_duplicates()

# 3. Merge the two datasets together based on the country name
logging.info("3. Merging data and formatting...")
# The API uses uppercase 'COUNTRY', the master uses lowercase 'country'
merged_df = pd.merge(api_df, master_df, left_on='COUNTRY', right_on='country', how='left')

# Drop the duplicate lowercase 'country' column used for joining
merged_df = merged_df.drop(columns=['country'])

# Reorder columns so the iso3 code sits right next to the Country name
final_columns = [
'IND_ID', 'INDICATOR_NAME', 'YEAR', 'COUNTRY', 'iso3','DISAGGR_1', 'VALUE'
]
merged_df = merged_df[final_columns]

# 4. Save to CSV in a new folder
output_dir = "statvar_imports/tuberculosis_preventive_treatment/input_files"
filename = os.path.join(output_dir, "Tuberculosis_preventive_treatment.csv")

os.makedirs(output_dir, exist_ok=True)

# Save without the pandas index column
merged_df.to_csv(filename, index=False)
logging.info(f"Success! Data saved locally as '{filename}'")

if __name__ == "__main__":
download_tb_percentage_data()
Loading