fix(eap-items): Serialize EAPValue as untagged in attributes_array#7775
fix(eap-items): Serialize EAPValue as untagged in attributes_array#7775
Conversation
Add #[serde(untagged)] to the EAPValue enum so that array attribute
values are serialized as plain values (e.g. 123, "hello") instead of
tagged objects (e.g. {"Int": 123}, {"String": "hello"}).
Co-Authored-By: Claude <noreply@anthropic.com>
Agent transcript: https://claudescope.sentry.dev/share/0Zw0YbfHZZPQ_C-v69RIcW-5ch8xOLSf-_07vsI2fts
| @@ -177,6 +177,7 @@ macro_rules! seq_attrs { | |||
| } | |||
There was a problem hiding this comment.
Bug: Adding #[serde(untagged)] to the EAPValue enum introduces a breaking serialization change, causing downstream Python consumers that expect a tagged object to crash with an AttributeError.
Severity: CRITICAL
Suggested Fix
Either remove the #[serde(untagged)] attribute to revert to the previous tagged serialization format that downstream consumers expect, or update the Python consumer code (specifically the transform_array_value function) to correctly handle the new untagged primitive values.
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.
Location: rust_snuba/src/processors/eap_items.rs#L177
Potential issue: The addition of `#[serde(untagged)]` to the `EAPValue` enum changes its
serialization format from a tagged object (e.g., `{"Int": 123}`) to a primitive value
(e.g., `123`). A downstream Python consumer, specifically the `transform_array_value`
function, expects the old tagged format and attempts to call `.items()` on each value.
When it receives a primitive type like an integer or string instead of a dictionary, the
call will fail with an `AttributeError`. This will cause runtime crashes in the
`get_trace` and `export_trace_items` RPC endpoints whenever they process items
containing array attributes.
Did we get this right? 👍 / 👎 to inform future reviews.
Update process_arrays to handle the new untagged serialization format where array values are plain JSON primitives instead of tagged dicts. Remove the now-unused transform_array_value function. Co-Authored-By: Claude <noreply@anthropic.com> Agent transcript: https://claudescope.sentry.dev/share/Ko_bDPVDo8oChJuLF2L67oOcIuDOkjf1V1B2SIwwHU4
json.loads already returns the correct dict[str, list[Any]] structure, no need to recreate it. Co-Authored-By: Claude <noreply@anthropic.com> Agent transcript: https://claudescope.sentry.dev/share/w74I_wpBjzGAzg8-hafg0nG_ylOEdcbuQbFiod1BtZ0
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
| for key, values in parsed.items(): | ||
| arrays[key] = [transform_array_value(v) for v in values] | ||
| return arrays | ||
| return json.loads(raw) or {} |
There was a problem hiding this comment.
Backward incompatibility with existing tagged array data in storage
High Severity
The simplified process_arrays can no longer handle old tagged data already persisted in ClickHouse. Existing rows have array values in tagged format (e.g., {"Int": 123}), but the removed transform_array_value was the only code that unwrapped those dicts into plain values. Now, queries hitting old data will pass dict objects to _to_any_value or _value_to_attribute, which both raise BadSnubaRPCRequestException on the unhandled dict type. This will break reads until all old data expires per retention policy.
Additional Locations (1)
noahsmartin
left a comment
There was a problem hiding this comment.
As long as no-one other than sessions is already using arrays, this looks great, thanks!
onewland
left a comment
There was a problem hiding this comment.
fine with this change but you should check that this doesn't break expectations for @thetruecpaul / @shashjar who I believe use arrays in occurrences today
toJSONString converts the JSON column through intermediate ClickHouse types, which coerces mixed-type array elements to strings. CAST to String preserves the original JSON representation with proper types. Co-Authored-By: Claude <noreply@anthropic.com>
077d4a1 to
ff1efae
Compare


Add
#[serde(untagged)]to theEAPValueenum so that array attributevalues in
attributes_arrayare serialized as plain values (e.g.123,"hello",3.14,true) instead of tagged objects (e.g.{"Int": 123},{"String": "hello"}).The tagged representation included the enum variant name as a key, which
is not expected by downstream consumers that only need the raw values.
Agent transcript: https://claudescope.sentry.dev/share/Q6CFyxDoBsOrL1Z2w3da6gOTLnKW426exXCS65Bhk-Q