Add data_base64 support and simplify CE JSON marshaling#38
Add data_base64 support and simplify CE JSON marshaling#38
Conversation
9c80958 to
9eeefb6
Compare
There was a problem hiding this comment.
Pull request overview
This PR extends the library’s CloudEvents JSON support to handle data_base64 (including round-trip preservation) and simplifies JSON marshaling by removing the tidwall/sjson dependency in favor of a single json.Marshal() over a header map representation.
Changes:
- Add
RawEventstruct withData+DataBase64, custom JSON (un)marshaling, andBytesForSignature()to preserve wire-form bytes for signature verification. - Refactor
CloudEvent/CloudEventHeadermarshaling to use a sharedheaderToMap()and removegithub.com/tidwall/sjson. - Remove
sjson(and some transitive deps) from module requirements and add tests covering the newRawEventbehavior.
Reviewed changes
Copilot reviewed 4 out of 5 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
cloudevent.go |
Introduces RawEvent as a real struct, adds data_base64 support, wire-form signature bytes helper, and MIME-driven marshaling decisions. |
cloudevent_json.go |
Refactors JSON marshaling/unmarshaling to use a shared header map and a single-pass raw-field decode path. |
cloudevent_test.go |
Adds tests for RawEvent unmarshaling/marshaling behavior and signature bytes selection. |
go.mod |
Removes github.com/tidwall/sjson and transitive indirect requirements. |
go.sum |
Drops most tidwall checksums related to the removed dependency chain. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
cloudevent_json.go
Outdated
There was a problem hiding this comment.
unmarshalCloudEventWithPayload always calls payloadFunc even when the input omits the "data" field. For CloudEvent[A], this results in json.Unmarshal(nil, &c.Data) in setDataField, which returns an error and makes CloudEvent unmarshaling reject valid CloudEvents that have no data. Only invoke the data/payload callback when the corresponding field is present (or explicitly treat missing data as null).
| if err := payloadFunc(dataRaw, dataBase64); err != nil { | |
| return c, err | |
| if dataRaw != nil || dataBase64 != "" { | |
| if err := payloadFunc(dataRaw, dataBase64); err != nil { | |
| return c, err | |
| } |
| func (r *RawEvent) UnmarshalJSON(data []byte) error { | ||
| var dataRaw json.RawMessage | ||
| var dataBase64 string | ||
| header, err := unmarshalCloudEventWithPayload(data, func(d json.RawMessage, b64 string) error { | ||
| dataRaw = d | ||
| dataBase64 = b64 | ||
| return nil | ||
| }) | ||
| if err != nil { | ||
| return err | ||
| } | ||
| r.CloudEventHeader = header | ||
| if dataBase64 != "" { | ||
| decoded, err := base64.StdEncoding.DecodeString(dataBase64) | ||
| if err != nil { | ||
| return err | ||
| } | ||
| r.Data = decoded | ||
| r.DataBase64 = dataBase64 | ||
| } else { | ||
| r.Data = dataRaw | ||
| r.DataBase64 = "" | ||
| } |
There was a problem hiding this comment.
RawEvent.UnmarshalJSON does not detect the invalid case where both "data" and "data_base64" are present. CloudEvents JSON requires that at most one of these fields is set; consider returning an error when both are provided to avoid ambiguous precedence and signature verification mismatches.
cloudevent.go
Outdated
There was a problem hiding this comment.
IsJSONDataContentType only treats the media type returned by mime.TypeByExtension(".json") (typically "application/json") as JSON. CloudEvents commonly uses structured-syntax suffixes like "application/cloudevents+json" or vendor types ending in "+json"; those should also be treated as JSON to pick the correct wire form.
| return err == nil && parsed == mime.TypeByExtension(".json") | |
| if err != nil { | |
| return false | |
| } | |
| // Treat standard application/json and any structured-syntax suffix "+json" as JSON. | |
| return parsed == mime.TypeByExtension(".json") || strings.HasSuffix(parsed, "+json") |
cloudevent.go
Outdated
There was a problem hiding this comment.
RawEvent.MarshalJSON is defined on a pointer receiver. If callers pass a non-pointer RawEvent value to json.Marshal, the custom logic will not run (because the value in an interface isn't addressable), and the output may incorrectly include both fields or the wrong wire form. Use a value receiver for MarshalJSON (and BytesForSignature if desired) to make marshaling consistent for both values and pointers.
| func (r *RawEvent) MarshalJSON() ([]byte, error) { | |
| func (r RawEvent) MarshalJSON() ([]byte, error) { |
cloudevent.go
Outdated
There was a problem hiding this comment.
When emitting "data" for JSON payloads, MarshalJSON unmarshals r.Data into interface{} and re-marshals it. This can lose numeric precision (numbers become float64), change object key order, and otherwise alter the original JSON bytes. Prefer setting out["data"] to json.RawMessage(r.Data) so json.Marshal embeds the raw JSON without reparsing.
| var dataVal any | |
| _ = json.Unmarshal(r.Data, &dataVal) | |
| out["data"] = dataVal | |
| out["data"] = json.RawMessage(r.Data) |
cloudevent_json.go
Outdated
There was a problem hiding this comment.
unmarshalCloudEventWithPayload ignores json.Unmarshal errors for core header fields (id/source/producer/subject/time/type/etc). This can silently accept invalid JSON types or invalid time formats and leave zero values, which changes behavior vs struct-unmarshal and can break downstream logic (e.g., Key()). Capture and return the first unmarshal error instead of discarding it.
| _ = json.Unmarshal(raw, &c.Subject) | |
| } | |
| if raw, ok := rawFields["time"]; ok && len(raw) > 0 { | |
| _ = json.Unmarshal(raw, &c.Time) | |
| } | |
| if raw, ok := rawFields["type"]; ok && len(raw) > 0 { | |
| _ = json.Unmarshal(raw, &c.Type) | |
| } | |
| if raw, ok := rawFields["datacontenttype"]; ok && len(raw) > 0 { | |
| _ = json.Unmarshal(raw, &c.DataContentType) | |
| } | |
| if raw, ok := rawFields["dataschema"]; ok && len(raw) > 0 { | |
| _ = json.Unmarshal(raw, &c.DataSchema) | |
| } | |
| if raw, ok := rawFields["dataversion"]; ok && len(raw) > 0 { | |
| _ = json.Unmarshal(raw, &c.DataVersion) | |
| } | |
| if raw, ok := rawFields["signature"]; ok && len(raw) > 0 { | |
| _ = json.Unmarshal(raw, &c.Signature) | |
| } | |
| if raw, ok := rawFields["tags"]; ok && len(raw) > 0 { | |
| _ = json.Unmarshal(raw, &c.Tags) | |
| } | |
| var dataRaw json.RawMessage | |
| var dataBase64 string | |
| if raw, ok := rawFields["data_base64"]; ok && len(raw) > 0 { | |
| _ = json.Unmarshal(raw, &dataBase64) | |
| if err := json.Unmarshal(raw, &c.Subject); err != nil { | |
| return c, err | |
| } | |
| } | |
| if raw, ok := rawFields["time"]; ok && len(raw) > 0 { | |
| if err := json.Unmarshal(raw, &c.Time); err != nil { | |
| return c, err | |
| } | |
| } | |
| if raw, ok := rawFields["type"]; ok && len(raw) > 0 { | |
| if err := json.Unmarshal(raw, &c.Type); err != nil { | |
| return c, err | |
| } | |
| } | |
| if raw, ok := rawFields["datacontenttype"]; ok && len(raw) > 0 { | |
| if err := json.Unmarshal(raw, &c.DataContentType); err != nil { | |
| return c, err | |
| } | |
| } | |
| if raw, ok := rawFields["dataschema"]; ok && len(raw) > 0 { | |
| if err := json.Unmarshal(raw, &c.DataSchema); err != nil { | |
| return c, err | |
| } | |
| } | |
| if raw, ok := rawFields["dataversion"]; ok && len(raw) > 0 { | |
| if err := json.Unmarshal(raw, &c.DataVersion); err != nil { | |
| return c, err | |
| } | |
| } | |
| if raw, ok := rawFields["signature"]; ok && len(raw) > 0 { | |
| if err := json.Unmarshal(raw, &c.Signature); err != nil { | |
| return c, err | |
| } | |
| } | |
| if raw, ok := rawFields["tags"]; ok && len(raw) > 0 { | |
| if err := json.Unmarshal(raw, &c.Tags); err != nil { | |
| return c, err | |
| } | |
| } | |
| var dataRaw json.RawMessage | |
| var dataBase64 string | |
| if raw, ok := rawFields["data_base64"]; ok && len(raw) > 0 { | |
| if err := json.Unmarshal(raw, &dataBase64); err != nil { | |
| return c, err | |
| } |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 5 changed files in this pull request and generated 7 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if len(r.Data) > 0 || r.DataBase64 != "" { | ||
| if r.DataBase64 != "" { | ||
| out["data_base64"] = r.DataBase64 | ||
| } else { |
There was a problem hiding this comment.
The condition on line 166 checks two scenarios: IsJSONDataContentType(r.DataContentType) || (r.DataContentType == "" && json.Valid(r.Data)). The second part (r.DataContentType == "" && json.Valid(r.Data)) means that when DataContentType is empty and the data happens to be valid JSON, it will emit "data" instead of "data_base64".
However, this creates an ambiguity: if someone has binary data that happens to be valid JSON (e.g., the bytes [1,2,3] which is both valid binary and valid JSON), the marshaling behavior changes based on whether DataContentType is set. This might surprise users and could break round-trip guarantees for edge cases.
Consider documenting this behavior clearly, or requiring explicit DataContentType to avoid ambiguity.
| } else { | |
| } else { | |
| // NOTE: When DataContentType is empty and r.Data happens to be valid JSON, | |
| // we emit a structured JSON "data" field instead of "data_base64". This means | |
| // that for byte slices which are both valid JSON and valid binary data | |
| // (for example: []byte("[1,2,3]")), the wire representation changes based on | |
| // whether DataContentType is set: | |
| // - DataContentType == "" -> "data" | |
| // - DataContentType is non-JSON -> "data_base64" | |
| // | |
| // This is intentional for convenience, but can be surprising and may affect | |
| // strict round-trip guarantees for edge cases. Callers that care about | |
| // treating such payloads as opaque binary data should set a non-JSON | |
| // DataContentType explicitly to force "data_base64". |
| Data json.RawMessage `json:"data,omitempty"` | ||
|
|
||
| // DataBase64 is the raw "data_base64" string when the event was received with | ||
| // data_base64 (CloudEvents spec). When set, MarshalJSON emits data_base64 for | ||
| // round-trip; otherwise wire form is chosen from DataContentType and Data. | ||
| DataBase64 string `json:"data_base64,omitempty"` |
There was a problem hiding this comment.
The RawEvent struct has JSON struct tags on both Data and DataBase64 fields (lines 107 and 112), but it also implements custom MarshalJSON/UnmarshalJSON methods. The struct tags will be ignored during marshaling/unmarshaling because of the custom methods.
While this isn't technically a bug (custom marshalers take precedence), having the tags could be misleading to developers who might expect them to be used. Consider removing the tags or adding a comment indicating they're documentation-only since custom marshalers are used.
| // application/json -> "data"; otherwise -> "data_base64" (CloudEvents spec). | ||
| func (r *RawEvent) MarshalJSON() ([]byte, error) { | ||
| out := headerToMap(r.CloudEventHeader) | ||
| if len(r.Data) > 0 || r.DataBase64 != "" { |
There was a problem hiding this comment.
When both Data is empty (len(r.Data) == 0) and DataBase64 is empty (""), the entire if block is skipped and neither "data" nor "data_base64" is added to the output. This is correct per CloudEvents spec (both fields are optional).
However, there's a subtle case: if DataBase64 is set to "" explicitly (empty string) and len(r.Data) > 0, it will enter the block but the DataBase64 != "" check on line 163 will be false, proceeding to the else clause. This is correct behavior, but the condition on line 162 could be simplified to just check len(r.Data) > 0 since an empty DataBase64 string is equivalent to it not being set.
| // RawEvent is a cloudevent with a json.RawMessage data field. | ||
| type RawEvent = CloudEvent[json.RawMessage] | ||
| // It supports both "data" and "data_base64" (CloudEvents JSON spec). | ||
| type RawEvent struct { | ||
| CloudEventHeader | ||
| Data json.RawMessage `json:"data,omitempty"` | ||
|
|
||
| // DataBase64 is the raw "data_base64" string when the event was received with | ||
| // data_base64 (CloudEvents spec). When set, MarshalJSON emits data_base64 for | ||
| // round-trip; otherwise wire form is chosen from DataContentType and Data. | ||
| DataBase64 string `json:"data_base64,omitempty"` | ||
| } |
There was a problem hiding this comment.
The change from type RawEvent = CloudEvent[json.RawMessage] to a full struct is a breaking API change. Code that previously constructed RawEvent using the CloudEvent[json.RawMessage] type will need to be updated.
For example, code like var re RawEvent = CloudEvent[json.RawMessage]{...} or direct field access patterns that relied on RawEvent being a type alias will break. While there are no internal usages in this repository, external consumers of this library will be affected.
Consider:
- Documenting this breaking change in a CHANGELOG or migration guide
- Bumping the major version if using semantic versioning
- Providing clear upgrade instructions for library consumers
cloudevent.go
Outdated
There was a problem hiding this comment.
There's redundant validation: line 166 checks json.Valid(r.Data) in the condition, and then line 167 checks it again before unmarshaling. The second check on line 167 is redundant because if useData is true and we're in this branch, json.Valid(r.Data) must have already been true from line 166.
Consider simplifying by removing the redundant check on line 167, or restructure the logic to avoid double-checking.
| if useData && json.Valid(r.Data) { | |
| if useData { |
| if len(r.Data) > 0 || r.DataBase64 != "" { | ||
| if r.DataBase64 != "" { | ||
| out["data_base64"] = r.DataBase64 | ||
| } else { | ||
| useData := IsJSONDataContentType(r.DataContentType) || (r.DataContentType == "" && json.Valid(r.Data)) | ||
| if useData && json.Valid(r.Data) { | ||
| var dataVal any | ||
| _ = json.Unmarshal(r.Data, &dataVal) | ||
| out["data"] = dataVal | ||
| } else { | ||
| out["data_base64"] = base64.StdEncoding.EncodeToString(r.Data) | ||
| } | ||
| } |
There was a problem hiding this comment.
According to the CloudEvents JSON format specification, "data" and "data_base64" are mutually exclusive attributes - only one should be present in a CloudEvent JSON representation. However, the MarshalJSON implementation here could theoretically allow both to be emitted if the headerToMap is modified externally or if there's a bug.
The current implementation chooses which field to emit based on conditions (lines 163-174), but it doesn't actively prevent both from being set in the output map. While this is unlikely given the current code flow, consider adding a safeguard or assertion to ensure mutual exclusivity is maintained.
cloudevent_test.go
Outdated
There was a problem hiding this comment.
The new RawEvent tests only cover happy path scenarios. There's no test coverage for error cases such as:
- Invalid base64 in data_base64 field
- Malformed JSON in data field
- Both data and data_base64 present simultaneously
- Invalid DataContentType MIME parsing
Consider adding negative test cases to ensure proper error handling and validation, especially given that RawEvent.UnmarshalJSON returns errors for invalid base64 (line 139-142 in cloudevent.go).
| func TestRawEvent_UnmarshalJSON_InvalidBase64(t *testing.T) { | |
| t.Parallel() | |
| // data_base64 contains characters that are not valid in base64 encoding. | |
| jsonStr := `{"data_base64": "$$not-base64$$"}` | |
| var ev cloudevent.RawEvent | |
| err := json.Unmarshal([]byte(jsonStr), &ev) | |
| require.Error(t, err, "expected error for invalid base64 in data_base64") | |
| } | |
| func TestRawEvent_UnmarshalJSON_MalformedDataJSON(t *testing.T) { | |
| t.Parallel() | |
| // The data field itself is malformed JSON (invalid object syntax). | |
| jsonStr := `{"data": {"invalid": ] }` | |
| var ev cloudevent.RawEvent | |
| err := json.Unmarshal([]byte(jsonStr), &ev) | |
| require.Error(t, err, "expected error for malformed JSON in data field") | |
| } | |
| func TestRawEvent_UnmarshalJSON_BothDataAndDataBase64(t *testing.T) { | |
| t.Parallel() | |
| // Both data and data_base64 are present; this should be rejected. | |
| jsonStr := `{"data": "plain-data", "data_base64": "Zm9vYmFy"}` | |
| var ev cloudevent.RawEvent | |
| err := json.Unmarshal([]byte(jsonStr), &ev) | |
| require.Error(t, err, "expected error when both data and data_base64 are present") | |
| } | |
| func TestRawEvent_UnmarshalJSON_InvalidDataContentType(t *testing.T) { | |
| t.Parallel() | |
| // datacontenttype is not a valid MIME type (missing required slash). | |
| jsonStr := `{"datacontenttype": "invalid-mime", "data": "test"}` | |
| var ev cloudevent.RawEvent | |
| err := json.Unmarshal([]byte(jsonStr), &ev) | |
| require.Error(t, err, "expected error for invalid datacontenttype MIME value") | |
| } |
9eeefb6 to
f9ba8f7
Compare
- Replace headerToMap with sjson for CE JSON marshaling - Return unmarshal errors instead of silently discarding them - Guard payloadFunc call when no data field is present - Use value receivers for RawEvent MarshalJSON/BytesForSignature - Reject events with both "data" and "data_base64" per CE spec - Add negative test cases for error paths Co-authored-by: Cursor <cursoragent@cursor.com>
f9ba8f7 to
49da210
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
Comments suppressed due to low confidence (1)
cloudevent_json.go:28
- The PR description says sjson was removed, but CloudEvent[A].MarshalJSON still uses sjson to inject the "data" field. Either complete the refactor to eliminate sjson usage here or update the PR description/scope so it matches the implementation.
data, err := json.Marshal(c.CloudEventHeader)
if err != nil {
return nil, err
}
data, err = sjson.SetBytes(data, "data", c.Data)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| "strings" | ||
| "time" | ||
|
|
||
| "github.com/tidwall/sjson" | ||
| ) |
There was a problem hiding this comment.
sjson is newly imported/used here, which conflicts with the PR goal of removing the tidwall/sjson dependency. If the intent is dependency removal, RawEvent.MarshalJSON needs to be rewritten without sjson (e.g., marshal header to map and set data/data_base64 before json.Marshal).
| // BytesForSignature returns the bytes that were signed (wire form of data or data_base64). | ||
| // Use for signature verification; not the same as Data when the CE used data_base64. | ||
| func (r RawEvent) BytesForSignature() []byte { | ||
| if r.DataBase64 != "" { | ||
| return []byte(r.DataBase64) |
There was a problem hiding this comment.
BytesForSignature is new behavior but there are no tests asserting its output for both wire forms (data vs data_base64). Adding tests here would lock in the intended semantics for signature verification and prevent regressions (especially around base64 vs decoded bytes).
| // IsJSONDataContentType returns true if the MIME type indicates a JSON payload. | ||
| // Matches "application/json" and any "+json" suffix type (e.g. "application/cloudevents+json"). | ||
| func IsJSONDataContentType(ct string) bool { | ||
| parsed, _, err := mime.ParseMediaType(strings.TrimSpace(ct)) | ||
| return err == nil && (parsed == "application/json" || strings.HasSuffix(parsed, "+json")) |
There was a problem hiding this comment.
RawEvent wire-form selection (IsJSONDataContentType + MarshalJSON choosing "data" vs "data_base64", including handling application/json; charset=utf-8 and application/*+json) is new behavior but isn’t covered by tests in cloudevent_test.go. Add marshal/unmarshal tests that assert the chosen field name and round-trip preservation.
|
|
||
| // unmarshalCloudEvent unmarshals the CloudEventHeader and data field. | ||
| func unmarshalCloudEvent(data []byte, dataFunc func(json.RawMessage) error) (CloudEventHeader, error) { | ||
| return unmarshalCloudEventWithPayload(data, func(dataRaw json.RawMessage, _ string) error { |
There was a problem hiding this comment.
In unmarshalCloudEvent (used by CloudEvent[A]), payloadFunc is invoked even when the input only contains "data_base64". That results in dataFunc being called with a nil json.RawMessage, which makes CloudEvent[A].setDataField attempt json.Unmarshal(nil, &c.Data) and fail. Consider changing the wrapper to only call dataFunc when dataRaw is non-nil (and ignore data_base64 for typed CloudEvent), or explicitly decode data_base64 into a "data" RawMessage when you want CloudEvent[A] to support data_base64.
| return unmarshalCloudEventWithPayload(data, func(dataRaw json.RawMessage, _ string) error { | |
| return unmarshalCloudEventWithPayload(data, func(dataRaw json.RawMessage, _ string) error { | |
| if dataRaw == nil { | |
| // No "data" field present (possibly only "data_base64"); for typed CloudEvent[A], | |
| // ignore this and do not call dataFunc to avoid unmarshalling nil. | |
| return nil | |
| } |
| if dataRaw != nil && dataBase64 != "" { | ||
| return fmt.Errorf("cloudevent: both \"data\" and \"data_base64\" present; only one allowed") | ||
| } | ||
| if dataBase64 != "" { | ||
| decoded, err := base64.StdEncoding.DecodeString(dataBase64) |
There was a problem hiding this comment.
RawEvent treats presence of "data_base64" as dataBase64 != "", so a JSON payload with "data_base64":"" (present-but-empty) is treated as if the field were absent. That loses round-trip fidelity and affects signature bytes. Consider tracking presence separately (e.g., *string or a boolean) so empty base64 payloads are still recognized and preserved.
Summary
data_base64support for CloudEvents JSON (spec):RawEventunmarshals and marshals bothdataanddata_base64, with round-trip preservation when the wire form wasdata_base64.github.com/tidwall/sjson.CloudEventandCloudEventHeadermarshal via a sharedheaderToMap()and a singlejson.Marshal()instead of marshal-then-sjson.SetBytes.RawEventgets customMarshalJSONthat choosesdatavsdata_base64fromDataContentType(and preservesdata_base64when set).map[string]json.RawMessage, then fill header and handle payload; newunmarshalCloudEventWithPayloadsupports bothdataanddata_base64forRawEvent.RawEventis now a real struct (not a type alias) withDataandDataBase64; addedBytesForSignature()for signature verification (returns wire form:data_base64string as bytes orData).IsJSONDataContentType(ct string) boolfor MIME-based wire-form choice.github.com/tidwall/sjsonand its transitive deps (gjson, match, pretty) from go.mod/go.sum.RawEvent(data-only, data_base64-only, round-trip,BytesForSignature, DataContentType-driven wire form).