-
Notifications
You must be signed in to change notification settings - Fork 142
Add WHO Tuberculosis Preventive Treatment Import #2000
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,54 @@ | ||
| # WHO Tuberculosis: Percentage of household contacts (or all close contacts) who were started on TB preventive treatment out of those eligible | ||
|
|
||
| ## Overview | ||
| This dataset provides the percentage of household contacts (or close contacts) of people diagnosed with a new episode of bacteriologically confirmed pulmonary TB disease who started TB preventive treatment, out of those eligible. | ||
|
|
||
| ## Data Source | ||
|
|
||
| **Source URL:** | ||
| https://data.who.int/indicators/i/45274BD/F5556F8 | ||
|
|
||
| The data comes from the official WHO reporting database and includes comprehensive, country-level health metrics detailing annual Tuberculosis notifications and case classifications. | ||
|
|
||
| ## How To Download Input Data | ||
| To download the data, you'll need to run the provided download script `tb_data_download_who.py`. This script automatically queries the WHO API for the indicator, merges it with the WHO geographical master list to append standard `iso3` country codes, and saves the cleaned `Tuberculosis_preventive_treatment_input.csv` file inside an "input_files" folder. | ||
|
|
||
| type of place: Country. | ||
|
|
||
| statvars: Health / Tuberculosis. | ||
|
|
||
| years: 2010 to 2022 | ||
|
|
||
| place_resolution: manually. | ||
|
|
||
| release_frequency: P1Y | ||
|
|
||
| ## Processing Instructions | ||
| To process the WHO Tuberculosis data and generate statistical variables, use the following commands from your root `data` directory: | ||
|
|
||
| **Download input file** | ||
| ```bash | ||
| python3 statvar_imports/tuberculosis_preventive_treatment/tb_data_download_who.py | ||
| ``` | ||
|
|
||
| **For Test Data Run** | ||
| ```bash | ||
| python3 tools/statvar_importer/stat_var_processor.py \ | ||
| --input_data="statvar_imports/tuberculosis_preventive_treatment/source_files/Tuberculosis_preventive_treatment.csv" \ | ||
| --pv_map="statvar_imports/tuberculosis_preventive_treatment/tuberculosis_PreventiveTreatment_pv_mapping.csv" \ | ||
| --output_path="statvar_imports/tuberculosis_preventive_treatment/output_files/tuberculosis_PreventiveTreatment" \ | ||
| --config_file="statvar_imports/tuberculosis_preventive_treatment/tuberculosis_PreventiveTreatment_metadata.csv" \ | ||
| --existing_statvar_mcf="gs://unresolved_mcf/scripts/statvar/stat_vars.mcf" | ||
| ``` | ||
|
|
||
| **For Main data run** | ||
| ```bash | ||
| python3 tools/statvar_importer/stat_var_processor.py \ | ||
| --input_data="statvar_imports/tuberculosis_preventive_treatment/source_files/Tuberculosis_preventive_treatment.csv" \ | ||
| --pv_map="statvar_imports/tuberculosis_preventive_treatment/tuberculosis_PreventiveTreatment_pv_mapping.csv" \ | ||
| --output_path="statvar_imports/tuberculosis_preventive_treatment/output_files/tuberculosis_PreventiveTreatment" \ | ||
| --config_file="statvar_imports/tuberculosis_preventive_treatment/tuberculosis_PreventiveTreatment_metadata.csv" \ | ||
| --existing_statvar_mcf="gs://unresolved_mcf/scripts/statvar/stat_vars.mcf" | ||
| ``` | ||
|
Comment on lines
+34
to
+52
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The processing instructions have several issues that could confuse users and prevent the commands from running correctly:
Please correct the test command to use the test data and update the main data run command to use the correct path. |
||
|
|
||
| #### Refresh type: Fully Autorefresh | ||
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,26 @@ | ||||||
| { | ||||||
| "import_specifications": [ | ||||||
| { | ||||||
| "import_name": "WHO_TuberculosisPreventiveTreatment", | ||||||
| "curator_emails": [ | ||||||
| "support@datacommons.org" | ||||||
| ], | ||||||
| "provenance_url": "<https://data.who.int/indicators/i/45274BD/F5556F8>", | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The
Suggested change
|
||||||
| "provenance_description": "Tuberculosis: Percentage of household contacts (or all close contacts) who were started on TB preventive treatment out of those eligible", | ||||||
| "scripts": [ | ||||||
| "tb_data_download_who.py", | ||||||
| "../../../tools/statvar_importer/stat_var_processor.py --input_data=input_files/Tuberculosis_preventive_treatment.csv --pv_map=tuberculosis_preventive_pvmap.csv --config_file=metadata.csv --output_path=output/tuberculosis_preventive_output --existing_statvar_mcf=gs://unresolved_mcf/scripts/statvar/stat_vars.mcf" | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There are filename mismatches in the script command which will cause the import to fail:
Please correct these filenames to match the files in the repository. Also, ensure that file paths are not quoted as per repository rules.
Suggested change
References
|
||||||
| ], | ||||||
| "import_inputs": [ | ||||||
| { | ||||||
| "template_mcf": "output/tuberculosis_preventive_output.tmcf", | ||||||
| "cleaned_csv": "output/tuberculosis_preventive_output.csv" | ||||||
| } | ||||||
| ], | ||||||
| "source_files": [ | ||||||
| "input_files/Tuberculosis_preventive_treatment.csv" | ||||||
| ], | ||||||
| "cron_schedule": "0 10 10,21 * *" | ||||||
| } | ||||||
| ] | ||||||
| } | ||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,66 @@ | ||
| import os | ||
| import requests | ||
| import io | ||
| import pandas as pd | ||
| import logging | ||
|
|
||
| # Configure logging | ||
| logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s') | ||
|
|
||
| def download_tb_percentage_data(): | ||
| # 1. Get the Clean Data from the API using the new Indicator ID | ||
| api_url = "https://xmart-api-public.who.int/DATA_/RELAY_TB_DATA" | ||
| params = { | ||
| "$filter": "IND_ID eq '45274BDF5556F8'", | ||
| "$select": "IND_ID,INDICATOR_NAME,YEAR,COUNTRY,DISAGGR_1,VALUE", | ||
| "$format": "csv" | ||
| } | ||
|
|
||
| logging.info("1. Fetching clean percentage data from WHO API...") | ||
| api_response = requests.get(api_url, params=params) | ||
|
|
||
| if api_response.status_code != 200: | ||
| logging.error(f"Failed to fetch API data. HTTP {api_response.status_code}") | ||
| return | ||
|
|
||
| # Load the clean API data into a pandas table | ||
| api_df = pd.read_csv(io.StringIO(api_response.text)) | ||
|
|
||
| # 2. Get ONLY the iso3 code from the master database | ||
| logging.info("2. Fetching country iso3 codes from WHO master database...") | ||
| master_url = "https://extranet.who.int/tme/generateCSV.asp?ds=notifications" | ||
| master_response = requests.get(master_url) | ||
| if master_response.status_code != 200: | ||
| logging.error(f"Failed to fetch master data. HTTP {master_response.status_code}") | ||
| return | ||
|
|
||
| # We only pull the 'country' (for matching) and 'iso3' columns | ||
| geo_columns = ['country', 'iso3'] | ||
| master_df = pd.read_csv(io.StringIO(master_response.text), usecols=geo_columns).drop_duplicates() | ||
|
|
||
| # 3. Merge the two datasets together based on the country name | ||
| logging.info("3. Merging data and formatting...") | ||
| # The API uses uppercase 'COUNTRY', the master uses lowercase 'country' | ||
| merged_df = pd.merge(api_df, master_df, left_on='COUNTRY', right_on='country', how='left') | ||
|
|
||
| # Drop the duplicate lowercase 'country' column used for joining | ||
| merged_df = merged_df.drop(columns=['country']) | ||
|
|
||
| # Reorder columns so the iso3 code sits right next to the Country name | ||
| final_columns = [ | ||
| 'IND_ID', 'INDICATOR_NAME', 'YEAR', 'COUNTRY', 'iso3','DISAGGR_1', 'VALUE' | ||
| ] | ||
| merged_df = merged_df[final_columns] | ||
|
|
||
| # 4. Save to CSV in a new folder | ||
| output_dir = "statvar_imports/tuberculosis_preventive_treatment/input_files" | ||
| filename = os.path.join(output_dir, "Tuberculosis_preventive_treatment.csv") | ||
|
|
||
| os.makedirs(output_dir, exist_ok=True) | ||
|
|
||
| # Save without the pandas index column | ||
| merged_df.to_csv(filename, index=False) | ||
| logging.info(f"Success! Data saved locally as '{filename}'") | ||
|
|
||
| if __name__ == "__main__": | ||
| download_tb_percentage_data() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There appears to be a typo in the main header. The
#should be#.