cmr_processing_level_report.json
generate_processing_level_report.py
Summary
The CMR search API's processing_level and processing_level_id parameters completely fail to return collections with ProcessingLevel.Id = "Not provided" (lowercase 'p') in UMM-C format. 40,121 collections (74.5% of all CMR collections) are invisible to processing level searches.
Environment
- CMR Base URL:
https://cmr.earthdata.nasa.gov/search/
- API Endpoint:
collections.umm_json
- Date of Analysis: December 16, 2024
- Total Collections in CMR: 53,852
Bug Description
Expected Behavior
Collections with ProcessingLevel.Id = "Not provided" (lowercase 'p') should be searchable using the processing_level_id parameter.
From exhaustive scan of all collections:
- 40,121 collections have
ProcessingLevel.Id = "Not provided" (lowercase 'p')
- 161 collections have
ProcessingLevel.Id = "Not Provided" (capital 'P')
- These are distinct, non-overlapping sets (verified)
When searching with processing_level_id="Not provided", the API should return the 40,121 + 161 collections (given it is case insensitive)
Actual Behavior
Searching for any case variant ('not provided', 'Not provided', 'NOT PROVIDED') all return only 161 collections - these are the collections with ProcessingLevel.Id = "Not Provided" (capital 'P').
The 40,121 collections with lowercase "Not provided" are completely missing from the search results.
Analysis Result (from cmr_processing_level_search_index_report.json):
{
"search_filter_results": {
"Not provided": {
"processing_level_id": 161,
"processing_level": 161,
"expected": 40121,
},
"Not Provided": {
"processing_level_id": 161,
"processing_level": 161,
"expected": 161,
}
}
}
Impact
- Severity: CRITICAL
- Collections Affected: 40,121 collections (74.5% of all CMR collections)
- User Impact: Users searching for data by processing level miss 99.6% of collections with "Not provided" values. This prevents implementing a fail-closed approach for processing level filtering. For example, when searching for
processing_level=1, users cannot reliably include collections with unspecified/unknown processing levels by adding OR processing_level="Not provided" to their query, because 99.6% of those collections are missing from the search index.
- Scope: Only affects "Not provided" (lowercase 'p'); all other 17 processing levels work correctly
Detailed Analysis
Exhaustive Scan Results (Ground Truth)
From comprehensive analysis of all 53,852 collections:
{
"exhaustive_scan": {
"total_collections": 53852,
"collections_with_levels": 53852,
"unique_levels": 18,
"levels": {
"Not provided": 40121,
"Not Provided": 161,
"NA": 1763,
"3": 4401,
"2": 3476,
"4": 1471,
"1B": 1088,
"1": 705,
"1A": 234,
"0": 221,
"2G": 76,
"2P": 53,
"2B": 34,
"1C": 20,
"2A": 14,
"1T": 11,
"L2": 2,
"Level 3": 1
}
}
}
Search Filter Test Results
Testing all 18 unique processing level values:
| Processing Level |
Expected |
Search Returns |
Status |
| Not provided |
40,121 |
161 |
❌ BROKEN |
| Not Provided |
161 |
161 |
✅ Works |
| NA |
1,763 |
1,763 |
✅ Works |
| 3 |
4,401 |
4,401 |
✅ Works |
| 2 |
3,476 |
3,476 |
✅ Works |
| 4 |
1,471 |
1,471 |
✅ Works |
| 1B |
1,088 |
1,088 |
✅ Works |
| 1 |
705 |
705 |
✅ Works |
| 1A |
234 |
234 |
✅ Works |
| 0 |
221 |
221 |
✅ Works |
| (13 others) |
* |
* |
✅ Works |
Result: 17 out of 18 processing levels work perfectly. Only "Not provided" (lowercase 'p') is broken.
Reproduction Steps
Prerequisites
Reproduce the Bug
Download and run the comprehensive analysis script:
#Download the attached script
# Install dependencies
pip install httpx
# Run analysis (takes ~10-15 minutes to scan all 53,852 collections)
python generate_processing_level_report.py
The json consists of two fields exhaustive_scan and search_filter_results. exhaustive_scan consists of the actual number of collections per processing level in CMR while search_filter_results consists of number of collections returned from searching with filter on that processing level.
Root Cause Analysis
What Works ✅
- All 17 other processing level values index and search correctly
- "Not Provided" (capital 'P') works perfectly (returns all 161 collections)
- Both
processing_level and processing_level_id parameters behave identically
What's Broken ❌
- Collections with
ProcessingLevel.Id = "Not provided" (lowercase 'p') are NOT indexed
- These 40,121 collections are completely invisible to processing level searches
- Searching for any case variant only returns the 161 "Not Provided" (capital P) collections
Attachments
Analysis Files
-
generate_processing_level_report.py: Complete analysis script that:
- Performs exhaustive scan of all 53,852 collections
- Tests search filters for all 18 unique processing levels
- Generates comprehensive JSON report
- Identifies discrepancies and problematic levels
-
cmr_processing_level_search_index_report.json: Full analysis results including:
- Exhaustive scan results (ground truth)
- Search filter test results for each processing level
Conclusion
This is a critical search indexing bug affecting 74% of CMR collections. The bug prevents users from discovering 40,121 collections when filtering by processing level, severely impacting data discovery for Earth science research.
The bug is:
- 100% reproducible with provided analysis script
- Well-isolated to lowercase "Not provided" only
Report Generated: December 16, 2025
Analysis Scope: All 53,852 CMR collections
Test Coverage: All 18 unique processing level values
Reproducibility: 100% (verified with comprehensive automated testing)
cmr_processing_level_report.json
generate_processing_level_report.py
Summary
The CMR search API's
processing_levelandprocessing_level_idparameters completely fail to return collections withProcessingLevel.Id = "Not provided"(lowercase 'p') in UMM-C format. 40,121 collections (74.5% of all CMR collections) are invisible to processing level searches.Environment
https://cmr.earthdata.nasa.gov/search/collections.umm_jsonBug Description
Expected Behavior
Collections with
ProcessingLevel.Id = "Not provided"(lowercase 'p') should be searchable using theprocessing_level_idparameter.From exhaustive scan of all collections:
ProcessingLevel.Id = "Not provided"(lowercase 'p')ProcessingLevel.Id = "Not Provided"(capital 'P')When searching with
processing_level_id="Not provided", the API should return the 40,121 + 161 collections (given it is case insensitive)Actual Behavior
Searching for any case variant ('not provided', 'Not provided', 'NOT PROVIDED') all return only 161 collections - these are the collections with
ProcessingLevel.Id = "Not Provided"(capital 'P').The 40,121 collections with lowercase "Not provided" are completely missing from the search results.
Analysis Result (from
cmr_processing_level_search_index_report.json):{ "search_filter_results": { "Not provided": { "processing_level_id": 161, "processing_level": 161, "expected": 40121, }, "Not Provided": { "processing_level_id": 161, "processing_level": 161, "expected": 161, } } }Impact
processing_level=1, users cannot reliably include collections with unspecified/unknown processing levels by addingOR processing_level="Not provided"to their query, because 99.6% of those collections are missing from the search index.Detailed Analysis
Exhaustive Scan Results (Ground Truth)
From comprehensive analysis of all 53,852 collections:
{ "exhaustive_scan": { "total_collections": 53852, "collections_with_levels": 53852, "unique_levels": 18, "levels": { "Not provided": 40121, "Not Provided": 161, "NA": 1763, "3": 4401, "2": 3476, "4": 1471, "1B": 1088, "1": 705, "1A": 234, "0": 221, "2G": 76, "2P": 53, "2B": 34, "1C": 20, "2A": 14, "1T": 11, "L2": 2, "Level 3": 1 } } }Search Filter Test Results
Testing all 18 unique processing level values:
Result: 17 out of 18 processing levels work perfectly. Only "Not provided" (lowercase 'p') is broken.
Reproduction Steps
Prerequisites
Reproduce the Bug
Download and run the comprehensive analysis script:
The json consists of two fields
exhaustive_scanandsearch_filter_results.exhaustive_scanconsists of the actual number of collections per processing level in CMR whilesearch_filter_resultsconsists of number of collections returned from searching with filter on that processing level.Root Cause Analysis
What Works ✅
processing_levelandprocessing_level_idparameters behave identicallyWhat's Broken ❌
ProcessingLevel.Id = "Not provided"(lowercase 'p') are NOT indexedAttachments
Analysis Files
generate_processing_level_report.py: Complete analysis script that:cmr_processing_level_search_index_report.json: Full analysis results including:Conclusion
This is a critical search indexing bug affecting 74% of CMR collections. The bug prevents users from discovering 40,121 collections when filtering by processing level, severely impacting data discovery for Earth science research.
The bug is:
Report Generated: December 16, 2025
Analysis Scope: All 53,852 CMR collections
Test Coverage: All 18 unique processing level values
Reproducibility: 100% (verified with comprehensive automated testing)