Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 20 additions & 4 deletions docs/cookbook/matrix-avro.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,15 @@ Matrix projected = MatrixAvroReader.read(
)
```

## Inspect a file schema without reading rows

```groovy
import org.apache.avro.Schema
import se.alipsa.matrix.avro.MatrixAvroReader

Schema writerSchema = MatrixAvroReader.schema(new File('people.avro'))
```

## Write with the Matrix name as the default Avro schema name

```groovy
Expand Down Expand Up @@ -82,6 +91,13 @@ MatrixAvroWriter.write(orders, new File('orders-decimal.avro'), new AvroWriteOpt

Without `inferPrecisionAndScale(true)`, `BigDecimal` columns fall back to Avro `double`.

The same decimal-safe behavior is available through shortcuts:

```groovy
MatrixAvroWriter.write(orders, new File('orders-decimal.avro'), AvroWriteOptions.exactDecimals())
MatrixAvroWriter.writeExactDecimals(orders, new File('orders-decimal.avro'))
```

## Force a fixed decimal schema for one column

```groovy
Expand All @@ -90,7 +106,7 @@ import se.alipsa.matrix.avro.AvroWriteOptions
import se.alipsa.matrix.avro.MatrixAvroWriter

MatrixAvroWriter.write(orders, new File('orders-fixed-decimal.avro'), new AvroWriteOptions()
.columnSchema('total', AvroSchemaDecl.decimal(12, 2))
.columnSchema('total', AvroSchemaDecl.decimalColumn(12, 2))
)
```

Expand Down Expand Up @@ -120,7 +136,7 @@ Matrix nested = Matrix.builder('Nested')
.build()

MatrixAvroWriter.write(nested, new File('nested-map.avro'), new AvroWriteOptions()
.columnSchema('props', AvroSchemaDecl.map(AvroSchemaDecl.type(Integer)))
.columnSchema('props', AvroSchemaDecl.mapOf(Integer))
)
```

Expand Down Expand Up @@ -161,7 +177,7 @@ Matrix data = Matrix.builder('TagData')
.build()

MatrixAvroWriter.write(data, new File('tags.avro'), new AvroWriteOptions()
.columnSchema('tags', AvroSchemaDecl.array(AvroSchemaDecl.type(Long)))
.columnSchema('tags', AvroSchemaDecl.arrayOf(Long))
)
```

Expand Down Expand Up @@ -189,7 +205,7 @@ source.write([
## Common troubleshooting

- UUID reads back as `String`: expected; Avro `uuid` is imported as `String`
- `BigDecimal` reads back as `Double`: expected when `inferPrecisionAndScale` is left at its default `false`
- `BigDecimal` reads back as `Double`: expected when `inferPrecisionAndScale` is left at its default `false`; use `AvroWriteOptions.exactDecimals()` or `writeExactDecimals(...)`
- Nested type looks wrong: the default heuristic uses the first non-null sample for lists and map values
- Map unexpectedly became a record: all non-null rows shared the same key set, so the writer treated it as record-like

Expand Down
11 changes: 7 additions & 4 deletions docs/tutorial/11b-matrix-avro.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ This page walks through the Avro module with the typed options APIs first, then
- use `AvroReadOptions` for naming and schema evolution
- use `AvroWriteOptions` for schema naming, decimal behavior, compression, and explicit nested schema control
- use `AvroSchemaDecl` when list or map sampling heuristics are not enough
- inspect schemas with `MatrixAvroReader.schema(...)` without reading all rows
- use `Matrix.listReadOptions('avro')`, `Matrix.listWriteOptions('avro')`, `AvroReadOptions.describe()`, and `AvroWriteOptions.describe()` to inspect the current option surface at runtime

## Discover the Available Options
Expand Down Expand Up @@ -48,6 +49,7 @@ AvroReadOptions options = new AvroReadOptions()
.readerSchema(projection)

Matrix people = MatrixAvroReader.read(new File('people.avro'), options)
Schema effectiveSchema = MatrixAvroReader.schema(new File('people.avro'), options)
```

Read naming precedence is:
Expand Down Expand Up @@ -124,6 +126,7 @@ The convenience overloads are still available when you want defaults without con
MatrixAvroWriter.write(orders, new File('orders.avro'))
MatrixAvroWriter.write(orders, new File('orders.avro'), true)
byte[] bytes = MatrixAvroWriter.writeBytes(orders)
MatrixAvroWriter.writeExactDecimals(orders, new File('orders-decimal.avro'))
```

## Schema Evolution with `readerSchema(...)`
Expand Down Expand Up @@ -170,9 +173,9 @@ Matrix nested = Matrix.builder("Nested")
.build()

MatrixAvroWriter.write(nested, new File("nested.avro"), new AvroWriteOptions()
.columnSchema('amount', AvroSchemaDecl.decimal(12, 3))
.columnSchema('tags', AvroSchemaDecl.array(AvroSchemaDecl.type(Long)))
.columnSchema('props', AvroSchemaDecl.map(AvroSchemaDecl.type(Integer)))
.columnSchema('amount', AvroSchemaDecl.decimalColumn(12, 3))
.columnSchema('tags', AvroSchemaDecl.arrayOf(Long))
.columnSchema('props', AvroSchemaDecl.mapOf(Integer))
.columnSchema('person', AvroSchemaDecl.record('PersonRecord', [
name: AvroSchemaDecl.type(String),
age : AvroSchemaDecl.type(Integer)
Expand Down Expand Up @@ -218,7 +221,7 @@ The SPI maps are useful when you want format-agnostic entry points, but the type
## Troubleshooting

- A UUID column reads back as `String`: this is the intended read behavior for Avro `uuid`
- A `BigDecimal` column reads back as `Double`: enable `inferPrecisionAndScale(true)` or declare the column with `AvroSchemaDecl.decimal(...)`
- A `BigDecimal` column reads back as `Double`: use `AvroWriteOptions.exactDecimals()`, `writeExactDecimals(...)`, or declare the column with `AvroSchemaDecl.decimalColumn(...)`
- A map column became a record: that happens when the non-null rows share one key set; force map encoding with `AvroSchemaDecl.map(...)`
- A list or map used the wrong nested type: the default inference uses the first non-null sample; use `columnSchema(...)` when the sample is misleading
- Invalid `compressionLevel` or `syncInterval`: the writer validates these fail-fast when options are built or parsed from SPI maps
Expand Down
24 changes: 20 additions & 4 deletions matrix-avro/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ This module reads and writes Avro Object Container Files (`.avro`) with support
## At A Glance

- read Avro from `File`, `Path`, `URL`, `InputStream`, and byte arrays
- inspect Avro schemas from `File`, `Path`, `URL`, `InputStream`, and byte arrays without reading all rows
- write Avro to `File`, `Path`, `OutputStream`, and byte arrays
- control reads with `AvroReadOptions`
- control writes with `AvroWriteOptions`
Expand All @@ -31,8 +32,10 @@ dependencies {
Direct API entry points:

- `MatrixAvroReader.read(...)` for `File`, `Path`, `URL`, `InputStream`, and `byte[]`
- `MatrixAvroReader.schema(...)` for inspecting the writer schema, or the effective reader schema when `readerSchema(...)` is set
- `MatrixAvroWriter.write(...)` for `File`, `Path`, and `OutputStream`
- `MatrixAvroWriter.writeBytes(...)` for in-memory export
- `MatrixAvroWriter.writeExactDecimals(...)` and `writeExactDecimalBytes(...)` for decimal-safe write shortcuts
- `AvroReadOptions` for naming and schema evolution
- `AvroWriteOptions` for schema naming, decimal behavior, compression, and explicit nested schema control
- `AvroSchemaDecl` for per-column decimal, array, map, record, and scalar overrides
Expand Down Expand Up @@ -81,12 +84,16 @@ AvroReadOptions readOptions = new AvroReadOptions()
.readerSchema(projection)

Matrix users = MatrixAvroReader.read(new File('users.avro'), readOptions)

Schema writerSchema = MatrixAvroReader.schema(new File('users.avro'))
Schema effectiveSchema = MatrixAvroReader.schema(new File('users.avro'), readOptions)
```

Useful read options:

- `matrixName(...)` overrides the resulting Matrix name
- `readerSchema(...)` supplies an Avro reader schema for schema evolution or projection
- `AvroReadOptions.defaults()` and `AvroReadOptions.named(...)` are available when a factory reads better at the call site

### Convenience Shortcuts

Expand Down Expand Up @@ -133,6 +140,7 @@ Useful write options:
- `schemaName(...)` overrides the generated record name
- `compression(...)`, `compressionLevel(...)`, and `syncInterval(...)` tune the container file
- `columnSchema(...)` and `columnSchemas(...)` override nested schema inference per column
- `AvroWriteOptions.defaults()` and `AvroWriteOptions.exactDecimals()` are available as typed factories

### Convenience Shortcuts

Expand All @@ -141,6 +149,7 @@ Convenience overloads still exist for default behavior:
```groovy
MatrixAvroWriter.write(matrix, new File('data.avro'))
MatrixAvroWriter.write(matrix, new File('data.avro'), true)
MatrixAvroWriter.writeExactDecimals(matrix, new File('decimal-data.avro'))
byte[] bytes = MatrixAvroWriter.writeBytes(matrix)
```

Expand All @@ -155,9 +164,9 @@ import se.alipsa.matrix.avro.AvroSchemaDecl
import se.alipsa.matrix.avro.AvroWriteOptions

AvroWriteOptions options = new AvroWriteOptions()
.columnSchema('amount', AvroSchemaDecl.decimal(12, 2))
.columnSchema('tags', AvroSchemaDecl.array(AvroSchemaDecl.type(Long)))
.columnSchema('props', AvroSchemaDecl.map(AvroSchemaDecl.type(Integer)))
.columnSchema('amount', AvroSchemaDecl.decimalColumn(12, 2))
.columnSchema('tags', AvroSchemaDecl.arrayOf(Long))
.columnSchema('props', AvroSchemaDecl.mapOf(Integer))
.columnSchema('person', AvroSchemaDecl.record('PersonRecord', [
name: AvroSchemaDecl.type(String),
age : AvroSchemaDecl.type(Integer)
Expand All @@ -167,8 +176,11 @@ AvroWriteOptions options = new AvroWriteOptions()
Supported declaration kinds:

- `decimal(precision, scale)` for fixed decimal metadata
- `decimalColumn(precision, scale)` as a column-oriented alias for fixed decimal metadata
- `array(...)` for explicit array element types
- `arrayOf(Class<?>)` and `arrayOf(AvroScalarTypeDecl)` as scalar array shortcuts
- `map(...)` for explicit map value types
- `mapOf(Class<?>)` and `mapOf(AvroScalarTypeDecl)` as scalar map shortcuts
- `record(...)` for explicit nested record fields
- `type(...)` or `scalar(...)` for direct scalar overrides

Expand Down Expand Up @@ -212,13 +224,15 @@ Read defaults:
- Avro `uuid` values are read as `String`, not `UUID`
- logical types such as `date`, `time-millis`, `timestamp-millis`, `local-timestamp-micros`, and `decimal` are converted to Java values during import
- nested arrays read as `List<?>`, maps as `Map<String, ?>`, and records as `Map<String, Object>`
- `InputStream` read and schema-inspection overloads leave the caller-owned stream open

Write defaults:

- schema naming precedence is `AvroWriteOptions.schemaName(...)`, then `matrix.matrixName`, then `MatrixSchema`
- `inferPrecisionAndScale` defaults to `false`, so `BigDecimal` columns fall back to Avro `double`
- `namespace` defaults to `se.alipsa.matrix.avro`
- `compression` defaults to `NULL`, `compressionLevel` to `-1`, and `syncInterval` to `0`
- `OutputStream` write overloads leave the caller-owned stream open

Nested-type heuristics:

Expand Down Expand Up @@ -257,10 +271,12 @@ Matrix projected = MatrixAvroReader.read(
MatrixAvroWriter.write(
orders,
new File('orders.avro'),
new AvroWriteOptions().inferPrecisionAndScale(true)
AvroWriteOptions.exactDecimals()
)
```

`MatrixAvroWriter.writeExactDecimals(...)` and `writeExactDecimalBytes(...)` are equivalent shortcuts.

### Custom Schema Naming

```groovy
Expand Down
2 changes: 1 addition & 1 deletion matrix-avro/build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ plugins {
}

group = 'se.alipsa.matrix'
version = '0.2.1'
version = '0.3.0'
description = 'Matrix Avro import/export with schema evolution and logical type support'

JavaCompile javaCompile = compileJava {
Expand Down
17 changes: 17 additions & 0 deletions matrix-avro/release.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,22 @@
# Matrix-avro release history

## v0.3.0 in progress

- fixed pre-epoch `local-timestamp-millis` reads by using floor modulo for nanosecond remainders
- made explicit `timestamp-millis` writes of `LocalDateTime` timezone-stable by interpreting them at UTC
- aligned stream ownership with the public docs
- `InputStream` read/schema overloads leave caller-owned streams open
- `OutputStream` write overloads leave caller-owned streams open
- removed the writer schema cache so mutated matrices cannot reuse stale schemas
- added public schema inspection APIs through `MatrixAvroReader.schema(...)`
- added convenience factories and shortcuts
- `AvroReadOptions.defaults()` and `AvroReadOptions.named(...)`
- `AvroWriteOptions.defaults()` and `AvroWriteOptions.exactDecimals()`
- `MatrixAvroWriter.writeExactDecimals(...)` and `writeExactDecimalBytes(...)`
- `AvroSchemaDecl.decimalColumn(...)`, `arrayOf(...)`, and `mapOf(...)`
- tightened public validation for schema building and null options
- refreshed README, tutorial, and cookbook examples for schema inspection and decimal-safe writes

## v0.2.0 2026-03-19

- clarified reader option semantics and naming precedence
Expand Down
Loading
Loading