Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion matrix-avro/build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ plugins {
}

group = 'se.alipsa.matrix'
version = '0.2.1-SNAPSHOT'
version = '0.2.1'
description = 'Matrix Avro import/export with schema evolution and logical type support'

JavaCompile javaCompile = compileJava {
Expand All @@ -21,6 +21,11 @@ JavaCompile javaCompile = compileJava {

compileGroovy {
options.deprecation = true
groovyOptions.configurationScript = rootProject.file('config/groovy/compileStatic.groovy')
}

codenarc {
ignoreFailures = false
}

repositories {
Expand Down
123 changes: 123 additions & 0 deletions matrix-avro/req/v0.3.0.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
# Matrix-AVRO v0.3.0 Plan

## Scope

This plan addresses the module review findings and adds usability improvements for schema inspection, option ergonomics, decimal-safe writing, explicit schema declarations, and documentation. Each numbered section is intended to be implementable and reviewable as a separate pull request.

A task is only complete when its tests have passed and the exact test command has been recorded in the section or PR description.

## 1. Correct Stream Ownership Semantics

1.1 [ ] Decide and document the stream ownership contract for `MatrixAvroReader.read(InputStream, ...)` and `MatrixAvroWriter.write(Matrix, OutputStream, ...)`: either caller-owned streams remain open, or the API explicitly states that Avro closes them.

1.2 [ ] If caller-owned streams should remain open, add close-shield wrappers in `MatrixAvroReader` and `MatrixAvroWriter` so closing `DataFileStream` or `DataFileWriter` does not close the caller's stream.

1.3 [ ] If streams are intentionally closed, update GroovyDoc, README, tutorial, and cookbook examples to state this clearly and show caller code that does not reuse the stream afterward.

1.4 [ ] Add tests in `matrix-avro/src/test/groovy/test/alipsa/matrix/avro/MatrixAvroReaderTest.groovy` and `MatrixAvroWriterTest.groovy` proving the final stream ownership behavior for both direct and options-based overloads.

1.5 [ ] Run tests and record the commands: `./gradlew :matrix-avro:test` and `./gradlew :matrix-avro:codenarcMain :matrix-avro:codenarcTest`.

## 2. Remove or Harden Schema Caching

2.1 [ ] Reevaluate the schema cache in `MatrixAvroWriter`: remove it unless benchmarks show it is needed, because schema inference depends on mutable Matrix data values beyond the current cache key.

2.2 [ ] If the cache is retained, extend `SchemaCacheKey` to include the inferred data profile used by schema generation: decimal precision/scale, list element class, map value class, record-vs-map classification, and record field metadata.

2.3 [ ] Add tests proving repeated writes of the same mutated `Matrix` instance produce fresh schemas when decimal precision/scale, list element type, map value type, or record-like keys change.

2.4 [ ] Add or update benchmarks only if caching is retained, so the performance reason for the additional complexity is documented and reproducible.

2.5 [ ] Run tests and record the commands: `./gradlew :matrix-avro:test` and `./gradlew :matrix-avro:codenarcMain :matrix-avro:codenarcTest`.

## 3. Align Public API Validation

3.1 [ ] Update `MatrixAvroWriter.buildSchema(Matrix, boolean)` to call the same Matrix validation used by write paths, including null, empty matrix, and column-size checks.

3.2 [ ] Update `MatrixAvroWriter.buildSchema(Matrix, AvroWriteOptions)` to fail with `IllegalArgumentException('Options cannot be null')` or the module's established validation style when options are null.

3.3 [ ] Audit public reader and writer methods for inconsistent null handling or NPE-prone paths, and align them with the module's explicit validation exceptions.

3.4 [ ] Add tests covering public `buildSchema` validation and any newly aligned validation paths.

3.5 [ ] Run tests and record the commands: `./gradlew :matrix-avro:test` and `./gradlew :matrix-avro:codenarcMain :matrix-avro:codenarcTest`.

## 4. Fix Negative Local Timestamp Reads

4.1 [ ] Update `MatrixAvroReader.toLocalDateTimeMillis` to use `Math.floorMod(ms, MILLIS_PER_SECOND)` when calculating nanoseconds, matching the micros path and supporting values before the Unix epoch.

4.2 [ ] Add tests with `local-timestamp-millis` and `local-timestamp-micros` values before `1970-01-01T00:00:00` to verify both paths decode correctly.

4.3 [ ] Run tests and record the commands: `./gradlew :matrix-avro:test` and `./gradlew :matrix-avro:codenarcMain :matrix-avro:codenarcTest`.

## 5. Add Schema Inspection APIs

5.1 [ ] Add `MatrixAvroReader.schema(File)`, `schema(Path)`, `schema(URL)`, `schema(byte[])`, and `schema(InputStream)` methods that return the Avro writer schema without reading all rows.

5.2 [ ] Add options-aware schema inspection overloads only where they are meaningful, for example to expose the effective reader schema or projected schema when `AvroReadOptions.readerSchema(...)` is supplied.

5.3 [ ] Ensure schema inspection methods follow the same validation, naming, and stream ownership contract chosen in section 1.

5.4 [ ] Add tests for schema inspection from file, path, byte array, and stream sources, including invalid or empty Avro content.

5.5 [ ] Document schema inspection in `matrix-avro/README.md`, the Avro tutorial, and the Avro cookbook.

5.6 [ ] Run tests and record the commands: `./gradlew :matrix-avro:test` and `./gradlew :matrix-avro:codenarcMain :matrix-avro:codenarcTest`.

## 6. Improve Options Ergonomics

6.1 [ ] Add static factory methods to `AvroReadOptions` and `AvroWriteOptions` for common starting points, such as `AvroWriteOptions.defaults()`, `AvroWriteOptions.exactDecimals()`, and `AvroReadOptions.named(String)`.

6.2 [ ] Evaluate a closure-based Groovy configuration API for options, for example `AvroWriteOptions.configure { inferPrecisionAndScale true; schemaName 'Orders' }`, while keeping the typed fluent API as the primary Java-friendly surface.

6.3 [ ] Add convenience methods for common schema declarations, such as `AvroSchemaDecl.arrayOf(Class<?>)`, `mapOf(Class<?>)`, `decimalColumn(int, int)` naming alternatives, or similar helpers that reduce nested calls without introducing `Object`-typed APIs.

6.4 [ ] Add tests proving the new factories and convenience helpers produce the same option maps and Avro schemas as the existing explicit fluent API.

6.5 [ ] Document the new ergonomic API in README, tutorial, and cookbook examples without removing the existing direct and SPI examples.

6.6 [ ] Run tests and record the commands: `./gradlew :matrix-avro:test` and `./gradlew :matrix-avro:codenarcMain :matrix-avro:codenarcTest`.

## 7. Make Decimal-Safe Writes Easier

7.1 [ ] Add a clearly named shortcut for decimal-safe writes, such as `MatrixAvroWriter.writeExactDecimals(...)`, `MatrixAvroWriter.writeDecimalSafe(...)`, or an `AvroWriteOptions.exactDecimals()` factory, while preserving existing overloads for compatibility.

7.2 [ ] Decide whether v0.3.0 should change the default BigDecimal behavior from double fallback to decimal inference. If changing the default is too breaking, document the compatibility reason and keep the shortcut prominent.

7.3 [ ] Add tests showing BigDecimal columns round-trip as `BigDecimal` through the new decimal-safe shortcut and still follow the documented compatibility path through existing defaults.

7.4 [ ] Update docs to explain the precision tradeoff in one place, with examples for default, decimal-safe, and explicit precision/scale writes.

7.5 [ ] Run tests and record the commands: `./gradlew :matrix-avro:test` and `./gradlew :matrix-avro:codenarcMain :matrix-avro:codenarcTest`.

## 8. Strengthen Explicit Schema Declaration Usability

8.1 [ ] Add ergonomic aliases to `AvroSchemaDecl` for common nested declarations while keeping the existing typed declaration classes package-scoped.

8.2 [ ] Expand explicit schema declaration tests to cover nested arrays, nested maps, mixed record fields, invalid nested Avro field names, and SPI map parsing of the new aliases if aliases are serializable.

8.3 [ ] Review error messages from `AvroSchemaDecl.fromMap(...)` and `AvroWriteOptions.columnSchema(...)` so common mistakes explain the expected typed alternative.

8.4 [ ] Document map-vs-record inference prominently, including how to force map encoding and record encoding with `columnSchema`.

8.5 [ ] Run tests and record the commands: `./gradlew :matrix-avro:test` and `./gradlew :matrix-avro:codenarcMain :matrix-avro:codenarcTest`.

## 9. Documentation and Release Notes

9.1 [ ] Update `matrix-avro/README.md` with a v0.3.0 section covering stream ownership, schema inspection, decimal-safe writes, schema cache behavior, and explicit schema declaration shortcuts.

9.2 [ ] Update the root Avro tutorial page with end-to-end examples for schema inspection, decimal-safe writes, schema evolution, and forcing map-vs-record encoding.

9.3 [ ] Update the root Avro cookbook page with focused recipes for each usability enhancement.

9.4 [ ] Update `matrix-avro/release.md` with user-visible v0.3.0 changes and migration notes for any behavior changes.

9.5 [ ] Run documentation-related checks if available, and record the commands. At minimum, run `./gradlew :matrix-avro:test` after documentation examples are updated.

## 10. Final Verification

10.1 [ ] Run module verification and record the commands: `./gradlew :matrix-avro:compileGroovy :matrix-avro:compileTestGroovy`, `./gradlew :matrix-avro:codenarcMain :matrix-avro:codenarcTest --rerun-tasks`, and `./gradlew :matrix-avro:test`.

10.2 [ ] Run full repository verification and record the command: `./gradlew test`.

10.3 [ ] Confirm `matrix-avro` CodeNarc remains fail-on-warning and the final PR description lists the exact verification commands and outcomes.
Original file line number Diff line number Diff line change
@@ -1,32 +1,32 @@
package se.alipsa.matrix.avro

import groovy.transform.CompileStatic
import groovy.transform.EqualsAndHashCode
import groovy.transform.PackageScope
import groovy.transform.ToString

import org.apache.avro.Schema

/**
* Avro schema declaration for array columns.
*/
@PackageScope
@CompileStatic
@EqualsAndHashCode
@ToString(includeNames = true)
class ArrayAvroSchemaDecl extends AvroSchemaDecl {
final AvroSchemaDecl elementType

final AvroSchemaDecl elementType
ArrayAvroSchemaDecl(AvroSchemaDecl elementType) {
this.elementType = elementType
}

@Override
Map<String, ?> toMap() {
[kind: 'array', elementType: elementType.toMap()]
}

@Override
@PackageScope
Schema toAvroSchema(String defaultName, String namespace) {
Schema elementSchema = elementType.toAvroSchema("${defaultName}_item", namespace)
Schema.createArray(AvroSchemaUtil.nullableSchema(elementSchema))
}

}
Original file line number Diff line number Diff line change
@@ -1,66 +1,54 @@
package se.alipsa.matrix.avro

import groovy.transform.CompileStatic

import se.alipsa.matrix.core.Matrix
import se.alipsa.matrix.core.spi.AbstractFormatProvider
import se.alipsa.matrix.core.spi.OptionDescriptor

/**
* SPI format provider for Avro files.
*/
@CompileStatic
class AvroFormatProvider extends AbstractFormatProvider {

private static final Set<String> EXTENSIONS = ['avro'] as Set<String>

@Override
Set<String> supportedExtensions() {
EXTENSIONS
}

@Override
String formatName() {
'Avro'
}

@Override
boolean canRead() {
true
}

@Override
boolean canWrite() {
true
}

@Override
Matrix read(File file, Map<String, ?> options) {
MatrixAvroReader.read(file, AvroReadOptions.fromMap(options))
}

@Override
Matrix read(URL url, Map<String, ?> options) {
MatrixAvroReader.read(url, AvroReadOptions.fromMap(options))
}

@Override
Matrix read(InputStream is, Map<String, ?> options) {
MatrixAvroReader.read(is, AvroReadOptions.fromMap(options))
}

@Override
void write(Matrix matrix, File file, Map<String, ?> options) {
MatrixAvroWriter.write(matrix, file, AvroWriteOptions.fromMap(options))
}

@Override
List<OptionDescriptor> readOptionDescriptors() {
AvroReadOptions.descriptors()
}

@Override
List<OptionDescriptor> writeOptionDescriptors() {
AvroWriteOptions.descriptors()
}

}
Original file line number Diff line number Diff line change
@@ -1,7 +1,5 @@
package se.alipsa.matrix.avro

import groovy.transform.CompileStatic

import org.apache.avro.Schema

import se.alipsa.matrix.core.spi.OptionDescriptor
Expand All @@ -20,7 +18,7 @@ import se.alipsa.matrix.core.spi.OptionMaps
*
* // With custom matrix name override
* def options = new AvroReadOptions()
* .matrixName("MyData")
* .matrixName('MyData')
* Matrix m = MatrixAvroReader.read(file, options)
*
* // With schema evolution (reader schema)
Expand All @@ -32,23 +30,15 @@ import se.alipsa.matrix.core.spi.OptionMaps
*
* @see MatrixAvroReader
*/
@CompileStatic
class AvroReadOptions {

private String matrixName = null
private Schema readerSchema = null

/**
* Creates a new AvroReadOptions with default settings.
*/
AvroReadOptions() {
}

/**
* Sets the name for the resulting Matrix.
*
* <p>If not set, the reader falls back to the Avro record name from the file schema and then
* to a source-derived fallback such as the file name or "AvroMatrix".
* to a source-derived fallback such as the file name or 'AvroMatrix'.
*
* @param name the Matrix name
* @return this options instance for method chaining
Expand All @@ -57,7 +47,6 @@ class AvroReadOptions {
this.matrixName = name
return this
}

/**
* Sets a reader schema for schema evolution.
*
Expand All @@ -77,23 +66,19 @@ class AvroReadOptions {
this.readerSchema = schema
return this
}

// Getters

/**
* @return the Matrix name, or null if not set
*/
String getMatrixName() {
return matrixName
}

/**
* @return the reader schema, or null if not set
*/
Schema getReaderSchema() {
return readerSchema
}

/**
* Converts this options object to an SPI-friendly map.
*
Expand All @@ -109,7 +94,6 @@ class AvroReadOptions {
}
options
}

/**
* Creates {@link AvroReadOptions} from a generic SPI options map.
*
Expand All @@ -127,17 +111,16 @@ class AvroReadOptions {
}
if (normalized.containsKey('readerschema')) {
def value = normalized.readerschema
if (value instanceof Schema) {
if (Schema.isInstance(value)) {
result.readerSchema(value as Schema)
} else if (value instanceof CharSequence) {
} else if (CharSequence.isInstance(value)) {
result.readerSchema(new Schema.Parser().parse(String.valueOf(value)))
} else {
throw new IllegalArgumentException("readerSchema must be a Schema or schema JSON string but was ${value?.class}")
}
}
result
}

/**
* Returns a human-readable description of all supported read options.
*
Expand All @@ -146,7 +129,6 @@ class AvroReadOptions {
static String describe() {
OptionDescriptor.describe(descriptors())
}

/**
* Returns descriptors for all supported read options.
*
Expand All @@ -158,4 +140,5 @@ class AvroReadOptions {
new OptionDescriptor('readerSchema', Schema, null, 'Reader schema or schema JSON for schema evolution')
]
}

}
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ package se.alipsa.matrix.avro
* Scalar Avro schema declarations supported by explicit per-column overrides.
*/
enum AvroScalarTypeDecl {

STRING,
BOOLEAN,
INT,
Expand All @@ -16,4 +17,5 @@ enum AvroScalarTypeDecl {
TIMESTAMP_MILLIS,
LOCAL_TIMESTAMP_MICROS,
UUID

}
Loading
Loading