Make ParseDict call sites compatible with types-protobuf v7 stubs#638
Make ParseDict call sites compatible with types-protobuf v7 stubs#638kmontemayor2-sc wants to merge 4 commits into
Conversation
types-protobuf v7 tightened ParseDict's js_dict parameter from an Any-compatible alias to dict[str, Any]. Validate that the YAML root is a mapping and cast before passing to ParseDict so mypy is clean under the new stubs.
types-protobuf v7 tightened ParseDict's js_dict parameter. Validate that dataset.metadata resolves to a mapping and cast before handing it to ParseDict so mypy is clean under the new stubs.
|
/all_test |
GiGL Automation@ 16:17:02UTC : 🔄 @ 16:17:07UTC : ❌ Workflow failed. |
GiGL Automation@ 16:17:03UTC : 🔄 @ 16:21:30UTC : ❌ Workflow failed. |
GiGL Automation@ 16:17:08UTC : 🔄 @ 16:21:33UTC : ❌ Workflow failed. |
GiGL Automation@ 16:17:09UTC : 🔄 @ 17:24:55UTC : ✅ Workflow completed successfully. |
GiGL Automation@ 16:17:10UTC : 🔄 @ 17:37:09UTC : ❌ Workflow failed. |
GiGL Automation@ 16:17:11UTC : 🔄 @ 16:21:53UTC : ❌ Workflow failed. |
| pb = ParseDict( | ||
| js_dict=graph_metadata_dict, message=graph_schema_pb2.GraphMetadata() | ||
| js_dict=cast(dict[str, Any], graph_metadata_dict), | ||
| message=graph_schema_pb2.GraphMetadata(), | ||
| ) |
There was a problem hiding this comment.
High severity and reachable issue identified in your code:
Line 79 has a vulnerable usage of protobuf, introducing a high severity vulnerability.
ℹ️ Why this is reachable
A reachable issue is a real security risk because your project actually executes the vulnerable code. This issue is reachable because your code uses a certain version of protobuf.
Affected versions of protobuf are vulnerable to Uncontrolled Recursion. A denial-of-service vulnerability in the Python protobuf library's JSON parser allows deeply nested google.protobuf.Any messages to bypass the configured max_recursion_depth in json_format.ParseDict. Because the internal Any-handling logic does not update the recursion counter, an attacker supplying a JSON payload with repeatedly nested Any messages can exhaust Python's recursion stack (raising RecursionError) instead of a controlled ParseError, potentially crashing or disrupting services that parse untrusted JSON.
To resolve this comment:
Upgrade this dependency to at least version 5.29.6 at uv.lock.
💬 Ignore this finding
To ignore this, reply with:
/fp <comment>for false positive/ar <comment>for acceptable risk/other <comment>for all other reasons
You can view more details on this finding in the Semgrep AppSec Platform here.
| f"ProtoUtils.read_proto_from_yaml expected a mapping at the YAML root for " | ||
| f"{uri}, got {type(obj_dict).__name__}." | ||
| ) | ||
| proto = ParseDict(js_dict=cast(dict[str, Any], obj_dict), message=proto_cls()) |
There was a problem hiding this comment.
High severity and reachable issue identified in your code:
Line 36 has a vulnerable usage of protobuf, introducing a high severity vulnerability.
ℹ️ Why this is reachable
A reachable issue is a real security risk because your project actually executes the vulnerable code. This issue is reachable because your code uses a certain version of protobuf.
Affected versions of protobuf are vulnerable to Uncontrolled Recursion. A denial-of-service vulnerability in the Python protobuf library's JSON parser allows deeply nested google.protobuf.Any messages to bypass the configured max_recursion_depth in json_format.ParseDict. Because the internal Any-handling logic does not update the recursion counter, an attacker supplying a JSON payload with repeatedly nested Any messages can exhaust Python's recursion stack (raising RecursionError) instead of a controlled ParseError, potentially crashing or disrupting services that parse untrusted JSON.
To resolve this comment:
Upgrade this dependency to at least version 5.29.6 at uv.lock.
💬 Ignore this finding
To ignore this, reply with:
/fp <comment>for false positive/ar <comment>for acceptable risk/other <comment>for all other reasons
You can view more details on this finding in the Semgrep AppSec Platform here.
|
/all_test |
GiGL Automation@ 17:43:04UTC : 🔄 @ 17:53:17UTC : ✅ Workflow completed successfully. |
GiGL Automation@ 17:43:10UTC : 🔄 @ 18:53:11UTC : ✅ Workflow completed successfully. |
GiGL Automation@ 17:43:11UTC : 🔄 @ 19:12:34UTC : ✅ Workflow completed successfully. |
GiGL Automation@ 17:43:15UTC : 🔄 @ 17:45:45UTC : ✅ Workflow completed successfully. |
GiGL Automation@ 17:43:15UTC : 🔄 |
GiGL Automation@ 17:43:17UTC : 🔄 @ 17:52:20UTC : ✅ Workflow completed successfully. |
| f"ProtoUtils.read_proto_from_yaml expected a mapping at the YAML root for " | ||
| f"{uri}, got {type(obj_dict).__name__}." | ||
| ) | ||
| proto = ParseDict(js_dict=cast(dict[str, Any], obj_dict), message=proto_cls()) |
| f"ProtoUtils.read_proto_from_yaml expected a mapping at the YAML root for " | ||
| f"{uri}, got {type(obj_dict).__name__}." | ||
| ) | ||
| proto = ParseDict(js_dict=cast(dict[str, Any], obj_dict), message=proto_cls()) |
There was a problem hiding this comment.
Are proto yaml representations guaranteed to be strings?
There was a problem hiding this comment.
you mean the keys? I'm pretty sure but it's not a big deal here either way the important part is to ensure it's a dict.
There was a problem hiding this comment.
right we are casting to dict[str, Any] should it just be cast(dict) ?
Is what worth upgrading? |
Summary
types-protobufv7 tightenedgoogle.protobuf.json_format.ParseDict'sjs_dictparameter from anAny-compatible alias to a strictdict[str, Any].ParseDictcall sites in this repo pass the result ofOmegaConf.to_object(...)/OmegaConf.to_container(...), which mypy infers as a broad union (dict[str | bytes | int | Enum | float | bool, Any] | list[Any] | str | Any | None). With v7 stubs, both fail mypy.isinstance(..., dict)guard that raisesTypeError(per the "Fail fast on invalid state" coding standard inCLAUDE.md), thencast(dict[str, Any], ...)before handing the value toParseDict. Theisinstancenarrowing alone is insufficient becausedictis invariant in its key type — the cast refines the key type from the broader union down tostr.grep -rn "ParseDict(" gigl snapchat --include="*.py" | grep -v _pb2.pyreturns exactly the two call sites fixed here, so the surface area is fully enumerated.