Skip to content

Add WHO Tuberculosis Preventive Treatment Import#2000

Open
pravnkumar-code wants to merge 1 commit into
datacommonsorg:masterfrom
pravnkumar-code:tuberculosis_preventive_treatment
Open

Add WHO Tuberculosis Preventive Treatment Import#2000
pravnkumar-code wants to merge 1 commit into
datacommonsorg:masterfrom
pravnkumar-code:tuberculosis_preventive_treatment

Conversation

@pravnkumar-code
Copy link
Copy Markdown

Please find the CL/PR Checklist:[https://docs.google.com/spreadsheets/d/1fmOgPpbf3zao7ouz8elEKxTtWmOxj0OrOEkUQzk92yI/edit?usp=drive_link&resourcekey=0--OoqVHRDwzvT84pVUGSFMA]

@google-cla
Copy link
Copy Markdown

google-cla Bot commented May 12, 2026

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new data import for WHO Tuberculosis preventive treatment statistics, including a Python download script, a manifest for automation, and necessary mapping and metadata files. The review identified several critical issues: filename mismatches in the manifest script, an incorrect column count in the metadata configuration, and a logic error in the PV mapping where a redundant scaling factor would corrupt the percentage values. Additionally, the README requires updates to fix a header typo and correct the processing instructions, and the manifest's provenance URL needs formatting corrections.

"provenance_description": "Tuberculosis: Percentage of household contacts (or all close contacts) who were started on TB preventive treatment out of those eligible",
"scripts": [
"tb_data_download_who.py",
"../../../tools/statvar_importer/stat_var_processor.py --input_data=input_files/Tuberculosis_preventive_treatment.csv --pv_map=tuberculosis_preventive_pvmap.csv --config_file=metadata.csv --output_path=output/tuberculosis_preventive_output --existing_statvar_mcf=gs://unresolved_mcf/scripts/statvar/stat_vars.mcf"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

There are filename mismatches in the script command which will cause the import to fail:

  • --pv_map is set to tuberculosis_preventive_pvmap.csv, but the actual filename is tuberculosis_PreventiveTreatment_pv_mapping.csv.
  • --config_file is set to metadata.csv, but the actual filename is tuberculosis_PreventiveTreatment_metadata.csv.

Please correct these filenames to match the files in the repository. Also, ensure that file paths are not quoted as per repository rules.

Suggested change
"../../../tools/statvar_importer/stat_var_processor.py --input_data=input_files/Tuberculosis_preventive_treatment.csv --pv_map=tuberculosis_preventive_pvmap.csv --config_file=metadata.csv --output_path=output/tuberculosis_preventive_output --existing_statvar_mcf=gs://unresolved_mcf/scripts/statvar/stat_vars.mcf"
"../../../tools/statvar_importer/stat_var_processor.py --input_data=input_files/Tuberculosis_preventive_treatment.csv --pv_map=tuberculosis_PreventiveTreatment_pv_mapping.csv --config_file=tuberculosis_PreventiveTreatment_metadata.csv --output_path=output/tuberculosis_preventive_output --existing_statvar_mcf=gs://unresolved_mcf/scripts/statvar/stat_vars.mcf"
References
  1. Do not quote arguments that represent file paths in manifest.json scripts if they are not strictly required for the command to function correctly.

populationType,HouseholdContact
measuredProperty,count
header_rows,1
mapped_columns,6
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The mapped_columns value is set to 6, but the tuberculosis_PreventiveTreatment_pv_mapping.csv file defines 8 property-value pairs for the VALUE key. This mismatch will cause the last two pairs (unit and scalingFactor) to be ignored during processing. Please update this value to 8.

mapped_columns,8

iso3,observationAbout,dcid:country/{Data},,,,,,,,,,,,,,
YEAR,observationDate,@Data,,,,,,,,,,,,,,
DISAGGR_1:0-4 years,age,dcid:Years0To4,,,,,,,,,,,,,,
VALUE,value,@Data,populationType,dcid:Person,measuredProperty,dcid:count,measurementDenominator,dcid:Count_Person_Tuberculosis_EligibleForPreventiveTreatment,medicalCondition,dcid:Tuberculosis,medicalStatus,dcid:StartedOnTBPreventiveTreatment,unit,dcid:Percent,scalingFactor,100 No newline at end of file
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The input data values are already percentages (e.g., 46.0). Using scalingFactor,100 will incorrectly multiply these values by 100, resulting in values like 4600%. Since the unit is already dcid:Percent, the scalingFactor is not needed and should be removed. The mapping file should be curated to handle the data as-is.

VALUE,value,@Data,populationType,dcid:Person,measuredProperty,dcid:count,measurementDenominator,dcid:Count_Person_Tuberculosis_EligibleForPreventiveTreatment,medicalCondition,dcid:Tuberculosis,medicalStatus,dcid:StartedOnTBPreventiveTreatment,unit,dcid:Percent
References
  1. Input data files, and test data files that mirror them, should not be modified to fix typos or inconsistencies. The corresponding mapping file (e.g., pvmap.csv) must be curated to handle the data as-is.

@@ -0,0 +1,54 @@
# WHO Tuberculosis: Percentage of household contacts (or all close contacts) who were started on TB preventive treatment out of those eligible
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There appears to be a typo in the main header. The # should be #.

Suggested change
# WHO Tuberculosis: Percentage of household contacts (or all close contacts) who were started on TB preventive treatment out of those eligible
# WHO Tuberculosis: Percentage of household contacts (or all close contacts) who were started on TB preventive treatment out of those eligible

Comment on lines +34 to +52
**For Test Data Run**
```bash
python3 tools/statvar_importer/stat_var_processor.py \
--input_data="statvar_imports/tuberculosis_preventive_treatment/source_files/Tuberculosis_preventive_treatment.csv" \
--pv_map="statvar_imports/tuberculosis_preventive_treatment/tuberculosis_PreventiveTreatment_pv_mapping.csv" \
--output_path="statvar_imports/tuberculosis_preventive_treatment/output_files/tuberculosis_PreventiveTreatment" \
--config_file="statvar_imports/tuberculosis_preventive_treatment/tuberculosis_PreventiveTreatment_metadata.csv" \
--existing_statvar_mcf="gs://unresolved_mcf/scripts/statvar/stat_vars.mcf"
```

**For Main data run**
```bash
python3 tools/statvar_importer/stat_var_processor.py \
--input_data="statvar_imports/tuberculosis_preventive_treatment/source_files/Tuberculosis_preventive_treatment.csv" \
--pv_map="statvar_imports/tuberculosis_preventive_treatment/tuberculosis_PreventiveTreatment_pv_mapping.csv" \
--output_path="statvar_imports/tuberculosis_preventive_treatment/output_files/tuberculosis_PreventiveTreatment" \
--config_file="statvar_imports/tuberculosis_preventive_treatment/tuberculosis_PreventiveTreatment_metadata.csv" \
--existing_statvar_mcf="gs://unresolved_mcf/scripts/statvar/stat_vars.mcf"
```
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The processing instructions have several issues that could confuse users and prevent the commands from running correctly:

  • The commands for 'Test Data Run' and 'Main data run' are identical, which is redundant and the test command does not use test data.
  • The --input_data path points to .../source_files/..., but the download script saves to .../input_files/.... For testing, it should point to the test data in the .../testdata/ directory.
  • The main data run command is also incorrect and should point to input_files.

Please correct the test command to use the test data and update the main data run command to use the correct path.

"curator_emails": [
"support@datacommons.org"
],
"provenance_url": "<https://data.who.int/indicators/i/45274BD/F5556F8>",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The provenance_url is enclosed in angle brackets (< >). These are typically not part of the URL and should be removed to ensure the link is valid.

Suggested change
"provenance_url": "<https://data.who.int/indicators/i/45274BD/F5556F8>",
"provenance_url": "https://data.who.int/indicators/i/45274BD/F5556F8",

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant