Skip to content

Fix/debug apr2026#9

Open
opoudjis wants to merge 5 commits intomainfrom
fix/debug-apr2026
Open

Fix/debug apr2026#9
opoudjis wants to merge 5 commits intomainfrom
fix/debug-apr2026

Conversation

@opoudjis
Copy link
Copy Markdown
Member

@opoudjis opoudjis commented Apr 2, 2026

No description provided.

opoudjis added 5 commits April 2, 2026 15:33
…ession tests

Root causes of the intermittent empty-file bug:
1. digisan/csv-tool QueryFile opened output with O_CREATE (no O_TRUNC) — if a
   prior run wrote a larger file, stale tail bytes remained; if writing failed,
   the output was left empty.
2. On QueryFile error there was no fallback — the empty/partial file persisted.
3. A split error on one file aborted the entire filepath.Walk immediately.

Fixes in split.go:
- Pre-truncate output before QueryFile (defence-in-depth against stale content).
- On QueryFile error: log warning, fall back to copying the untrimmed source so
  the output is never left empty.
- On split error: log warning and continue to the next file (return nil).
- trim-after-split: write to a .trim_tmp path and rename only on success, so a
  trim error cannot empty the split file that was just produced.

Dependency upgrades:
- csv-tool: digisan/csv-tool v0.2.7 -> nsip/csv-tool v0.4.1
  Fork adds streaming CsvReader (FieldsPerRecord=-1, skips malformed rows with
  warning) and safe QueryFile (O_TRUNC + temp-file rename-on-success).
- go-config: v0.2.7 -> v0.3.6 (drops fileflatter, compatible with go-generics
  v0.5.4 root   v0.5.4 root   v0.5.4 root   v0.5.4 root   v0.5.4 root   v0.5.4 root   v: updat  v0.5.4 root   v0.3.6 generic  v0.5.4 root   v0.5.4 root   v0.5.4 root   v0.5.4 root   v0.5.4 root   v0.5.TrimC  v0.5.4 root   v0.5.4 root   v0.5.4 root   v0.5.4 root   v0.5.4 root   vunWithSmallerSource: r  v0.5.4 root   v0.5.4 root   v0.5.4 root   v0.5.4 root   v0.5.4 root   v0.5gression scenario).
- TestTrimNoColumnMatchSkipsFile: file without trim columns is no- TestTrimNoColumnMatMultipleFiles- TestTrimNoColumnMatchSkipsFile: file without trim columns is no- TestOverwrite: in-place trim (OutFolder==InFolder) works correctly.
v0.4.2 fixes the IsNil-per-row freeze: go-generics IsNil falls back to
fmt.Sprint(v) which serialises the entire *bufio.Writer struct (incl. its
1 MiB buffer) via reflection on every single CsvReader row call.  For the
2.3 M-row itemResults.csv that is ~2.3 M × 1 MB of reflect work ≈ a
complete freeze of the trim stage.

Caching wNotNil := !IsNil(w) once before the hot loop eliminates the
problem entirely.  The same release also adds a 1 MiB bufio.Writer in
QueryFile (coalesces per-row syscalls) and fixes the fd.MustCreateDir
race condition in concurrent QueryFile calls.
csv-tool v0.4.3 eliminates the ~2 GB peak memory that was causing the
apparent freeze on itemResults.csv (531 MB / 2.3 M rows):

  - Skip allRows accumulation when streaming to writer (-531 MB per pass)
  - Short-circuit Query for nil CGrp (no intermediate bytes.Buffer, -531 MB)
  - Cache hdrRow in Select callback (avoids 115 M+ CellEsc calls)
  - Add QueryFile: start/done timing logs for observability

Benchmark on 500K rows: 731ms/892MB → 292ms/335MB (2.5× faster, -62% RAM)

cmd/build.sh: mac target now builds GOARCH=arm64 (native M1) instead of
amd64/Rosetta.  linux64 and win64 targets are unchanged.
…essing

- Add currentFile and spinnerIdx package vars to config.go
- Set currentFile at the start of each walk callback entry
- AppendFunc shows a rotating spinner (| / - \) + filename so the bar
  appears active even while a single large file blocks the percentage
- Spinner goroutine ticks spinnerIdx every 150ms for the bar lifetime
- Remove redundant atomic counter + bar.Set; use only bar.Incr()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant