fix(avro): error on complex (non-nullable) unions instead of silently dropping (#777)#808
Conversation
… dropping (apache#777) Signed-off-by: SAY-5 <saiasish.cnp@gmail.com>
zeroshade
left a comment
There was a problem hiding this comment.
LGTM, thanks for this! Just had one question
| arrowSchemafromAvro(n) | ||
| } | ||
| } else { | ||
| panic(fmt.Errorf("complex (non-nullable) avro union at '%v' is not supported", n.schemaPath())) |
There was a problem hiding this comment.
Should we add support and use the arrow union type? We could also do this as a follow up instead.
|
I'd lean toward doing the Arrow union type support as a follow-up. This PR is intentionally narrow: it just turns the silent data-loss case (non-nullable unions being dropped) into a clear error, which is a safe incremental step. Mapping Avro unions onto |
|
Sounds good, please add a tracking issue for the union support and we can do that as a follow-up. I'll merge this after the CI passes |
Rationale for this change
Fixes #777.
ArrowSchemaFromAvropreviously silently dropped non-nullable Avro unions (e.g.["int","string"]) in botharrowSchemafromAvroanditerateFields. Output schemas were missing fields with no warning to the caller.What changes are included in this PR?
Panic with a clear error message at the two
case "union"/case *avro.UnionSchemaarms when the union is not a simple nullable. The package's existingrecover()inArrowSchemaFromAvroconverts this into a returned error.Are these changes tested?
Yes,
TestComplexUnionReportsErrorexercises a record with["int","string"]and asserts a non-nil error mentioningunion.Are there any user-facing changes?
Callers that previously received an incomplete schema for a complex-union field now get an explicit error.