Skip to content

Commit 299a4df

Browse files
jahoomaclaude
andcommitted
Ship tree-sitter.wasm as a sibling file next to the CLI binary
Five attempts to embed the wasm into the bun --compile binary all failed on Windows in different ways. Each one's bytes ended up in the binary (we verified this directly), but every JS-level retrieval mechanism we tried got stripped by the time the runtime ran: 1. `with { type: 'file' }` of `web-tree-sitter/tree-sitter.wasm` subpath — bytes embedded, import variable bound to undefined. 2. `with { type: 'file' }` of a copied-in relative .wasm — same as #1. 3. Single 274KB base64 string literal — got dropped by the minifier. 4. ~268 chunked base64 string literals — same fate. 5. Function-export wrapping the chunked array, with eager file write verification on disk — chunks confirmed on disk after embed, still not present in the compiled output. The bun-compile-on-Windows code path is doing something destructive to JS-source-level wasm asset references that we cannot reliably work around from the source. So bypass the bundler entirely: ship tree-sitter.wasm as a *sibling file* next to the binary. - cli/scripts/build-binary.ts: copies the wasm from node_modules to cli/bin/tree-sitter.wasm after `bun build --compile`, alongside the binary. Drops all the embed/verify machinery from previous rounds. - cli/src/pre-init/tree-sitter-wasm.ts: at runtime, looks for `dirname(process.execPath)/tree-sitter.wasm`, sets the env var that init-node.ts reads, and (best-effort) reads the bytes synchronously to publish on globalThis for the wasmBinary fast path. Both channels feed the same SDK init. - cli/src/pre-init/tree-sitter-wasm-bytes.ts: deleted. No more generated module. - .github/workflows/cli-release-build.yml: tarball includes `tree-sitter.wasm` next to the binary (both matrix and Windows- specific job). - cli/release/index.js + freebuff/cli/release/index.js: the npm postinstall downloader now also moves tree-sitter.wasm out of the temp extraction dir to live next to the installed binary. Verified locally: build copies the wasm into bin/, --smoke-tree-sitter exits 0 with "tree-sitter smoke ok (wasmBinary, 205488 bytes)", full boot smoke passes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 24346bc commit 299a4df

6 files changed

Lines changed: 115 additions & 198 deletions

File tree

.github/workflows/cli-release-build.yml

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -197,7 +197,10 @@ jobs:
197197
if [[ "${{ runner.os }}" == "Windows" ]]; then
198198
BINARY_FILE="${{ inputs.binary-name }}.exe"
199199
fi
200-
tar -czf ${{ inputs.binary-name }}-${{ matrix.target }}.tar.gz -C cli/bin "$BINARY_FILE"
200+
# Bundle the binary alongside tree-sitter.wasm — the CLI loads
201+
# the wasm as a sibling file at runtime since bun --compile
202+
# asset embedding wasn't reliable on Windows.
203+
tar -czf ${{ inputs.binary-name }}-${{ matrix.target }}.tar.gz -C cli/bin "$BINARY_FILE" tree-sitter.wasm
201204
202205
- name: Upload binary artifact
203206
uses: actions/upload-artifact@v7
@@ -340,7 +343,9 @@ jobs:
340343
shell: bash
341344
run: |
342345
BINARY_FILE="${{ inputs.binary-name }}.exe"
343-
tar -czf ${{ inputs.binary-name }}-win32-x64.tar.gz -C cli/bin "$BINARY_FILE"
346+
# Bundle tree-sitter.wasm next to the binary; see the
347+
# equivalent matrix-job tar step for context.
348+
tar -czf ${{ inputs.binary-name }}-win32-x64.tar.gz -C cli/bin "$BINARY_FILE" tree-sitter.wasm
344349
345350
- name: Upload binary artifact
346351
uses: actions/upload-artifact@v7

cli/release/index.js

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -383,6 +383,27 @@ async function downloadBinary(version) {
383383
}
384384
fs.renameSync(tempBinaryPath, CONFIG.binaryPath)
385385

386+
// Move tree-sitter.wasm next to the binary if the tarball included
387+
// it. The CLI binary loads this at startup; embedding it inside the
388+
// binary itself was unreliable on Windows (bun --compile asset
389+
// bundling silently dropped or unbound it across several attempts),
390+
// so we ship it as a sibling file instead. Older artifacts that
391+
// pre-date this change won't have the wasm and will still install —
392+
// they'll just hit the same crash they had before, which is fine.
393+
const tempWasmPath = path.join(CONFIG.tempDownloadDir, 'tree-sitter.wasm')
394+
if (fs.existsSync(tempWasmPath)) {
395+
const targetWasmPath = path.join(
396+
path.dirname(CONFIG.binaryPath),
397+
'tree-sitter.wasm',
398+
)
399+
try {
400+
if (fs.existsSync(targetWasmPath)) fs.unlinkSync(targetWasmPath)
401+
} catch {
402+
// best effort; rename below will surface the real error if it matters
403+
}
404+
fs.renameSync(tempWasmPath, targetWasmPath)
405+
}
406+
386407
// Save version metadata for fast version checking
387408
fs.writeFileSync(
388409
CONFIG.metadataPath,

cli/scripts/build-binary.ts

Lines changed: 12 additions & 134 deletions
Original file line numberDiff line numberDiff line change
@@ -145,11 +145,6 @@ async function main() {
145145
patchOpenTuiAssetPaths()
146146
await ensureOpenTuiNativeBundle(targetInfo)
147147

148-
const treeSitterEmbed = embedTreeSitterWasmAsChunks()
149-
// Even on a build-script crash, restore the empty stub so a developer's
150-
// working tree doesn't end up with a multi-MB diff.
151-
process.on('exit', treeSitterEmbed.restore)
152-
153148
const outputFilename =
154149
targetInfo.platform === 'win32' ? `${binaryName}.exe` : binaryName
155150
const outputFile = join(binDir, outputFilename)
@@ -191,16 +186,18 @@ async function main() {
191186

192187
runCommand('bun', buildArgs, { cwd: cliRoot })
193188

194-
// Restore the empty stub now that the build read the chunks. Eager
195-
// cleanup keeps a successful build clean; the exit handler is a
196-
// backstop for crashes between embed and now.
197-
treeSitterEmbed.restore()
198-
199-
// Fail the build if the chunks didn't actually make it into the
200-
// compiled binary. Catches silent regressions (tree-shaking, minifier
201-
// dropping literals, file-write timing) before we upload an artifact
202-
// that would crash for users.
203-
verifyTreeSitterWasmEmbedded(outputFile, treeSitterEmbed.sampleChunks)
189+
// Ship tree-sitter.wasm as a sibling file next to the binary. Bun
190+
// --compile asset embedding is unreliable on Windows (every JS-level
191+
// retrieval mechanism we tried — `with { type: 'file' }`, base64 string
192+
// literals, chunked base64, function-wrapped chunked base64 — got
193+
// tree-shaken, minified away, or returned an undefined binding even
194+
// when the bytes were in the binary). The pre-init reads it from
195+
// `dirname(process.execPath)`, which works the same on every platform
196+
// because it's a normal disk read, not a bunfs lookup.
197+
const sourceWasm = findWebTreeSitterWasm()
198+
const siblingWasm = join(binDir, 'tree-sitter.wasm')
199+
writeFileSync(siblingWasm, readFileSync(sourceWasm))
200+
logAlways(`Copied tree-sitter.wasm sibling: ${sourceWasm}${siblingWasm}`)
204201

205202
if (targetInfo.platform !== 'win32') {
206203
chmodSync(outputFile, 0o755)
@@ -246,125 +243,6 @@ function findWebTreeSitterWasm(): string {
246243
}
247244
}
248245

249-
/**
250-
* Inline `tree-sitter.wasm` into the binary as base64-encoded string
251-
* literals — but split into many small chunks. A single 274KB string
252-
* literal got dropped/transformed by bun's Windows minifier in an
253-
* earlier attempt; small chunks are individually unremarkable to the
254-
* minifier and survive intact. The pre-init joins them at runtime and
255-
* decodes back to the wasm bytes.
256-
*
257-
* Returns a `restore` function (resets the stub) and a small set of
258-
* `sampleChunks` for the post-build verification step to look for in
259-
* the compiled binary. Always invoke `restore` (eagerly + on exit) so
260-
* a developer's working tree doesn't end up with a multi-MB diff after
261-
* a build.
262-
*/
263-
function embedTreeSitterWasmAsChunks(): {
264-
restore: () => void
265-
sampleChunks: string[]
266-
} {
267-
const stubPath = join(cliRoot, 'src', 'pre-init', 'tree-sitter-wasm-bytes.ts')
268-
const originalStub = readFileSync(stubPath, 'utf8')
269-
let restored = false
270-
const restore = (): void => {
271-
if (restored) return
272-
restored = true
273-
try {
274-
writeFileSync(stubPath, originalStub)
275-
} catch (error) {
276-
console.error('Failed to restore tree-sitter-wasm-bytes stub:', error)
277-
}
278-
}
279-
280-
const sourceWasm = findWebTreeSitterWasm()
281-
const wasmBytes = readFileSync(sourceWasm)
282-
const fullBase64 = wasmBytes.toString('base64')
283-
284-
// ~1KB per chunk: well under any plausible minifier-dropped-literal
285-
// threshold, and small enough that even a heavy-handed inliner would
286-
// emit them as runtime references rather than evaluating the whole
287-
// .join() at compile time. Keeps total chunk count manageable too
288-
// (~270 chunks for a 205KB wasm).
289-
const CHUNK_SIZE = 1024
290-
const chunks: string[] = []
291-
for (let i = 0; i < fullBase64.length; i += CHUNK_SIZE) {
292-
chunks.push(fullBase64.slice(i, i + CHUNK_SIZE))
293-
}
294-
295-
const generated =
296-
`// AUTO-GENERATED by cli/scripts/build-binary.ts during \`bun build --compile\`.\n` +
297-
`// Restored to an empty function after the build finishes — do not commit a\n` +
298-
`// non-empty body here.\n` +
299-
`export function getTreeSitterWasmChunks(): string[] {\n` +
300-
` return [\n` +
301-
chunks.map((c) => ` ${JSON.stringify(c)},`).join('\n') +
302-
`\n ]\n` +
303-
`}\n`
304-
305-
writeFileSync(stubPath, generated)
306-
// Re-read what we just wrote so we can fail loudly if the OS buffered
307-
// the write. On Windows, NTFS writes can lag, and bun --compile would
308-
// then read the stale stub. Verifying here means the build fails
309-
// *during embed* instead of producing a broken binary that surprises
310-
// us later.
311-
const onDisk = readFileSync(stubPath, 'utf8')
312-
if (!onDisk.includes(chunks[0]!)) {
313-
throw new Error(
314-
`Embed wrote ${chunks.length} chunks but re-read of ${stubPath} ` +
315-
`does not contain chunk[0]. File on disk: ${onDisk.slice(0, 200)}…`,
316-
)
317-
}
318-
logAlways(
319-
`Embedded tree-sitter.wasm from ${sourceWasm} (${wasmBytes.length} bytes → ${chunks.length} chunks of ~${CHUNK_SIZE} chars).`,
320-
)
321-
322-
// Pull a few sample chunks from the start, middle, and end for the
323-
// post-build verification scan. If any one is missing in the compiled
324-
// binary, something dropped or transformed the literals.
325-
const samples = [
326-
chunks[0],
327-
chunks[Math.floor(chunks.length / 2)],
328-
chunks[chunks.length - 1],
329-
].filter((c): c is string => Boolean(c))
330-
331-
return { restore, sampleChunks: samples }
332-
}
333-
334-
/**
335-
* Sanity-check the compiled binary actually contains all the chunked
336-
* base64 we just embedded. We pass in a few sample chunks from the
337-
* start / middle / end of the array; each must appear in the binary.
338-
* If any one is missing, the bundler dropped or inlined-away part of
339-
* the literal table, and the runtime decode would produce garbage.
340-
*/
341-
function verifyTreeSitterWasmEmbedded(
342-
outputFile: string,
343-
sampleChunks: string[],
344-
): void {
345-
if (sampleChunks.length === 0) {
346-
throw new Error('verifyTreeSitterWasmEmbedded called with no sample chunks')
347-
}
348-
const binary = readFileSync(outputFile)
349-
for (const chunk of sampleChunks) {
350-
const needle = Buffer.from(chunk, 'utf8')
351-
const idx = binary.indexOf(needle)
352-
if (idx === -1) {
353-
throw new Error(
354-
`Embedded tree-sitter wasm chunk not found in ${outputFile}.\n` +
355-
`Missing chunk (first 80 chars): ${chunk.slice(0, 80)}…\n` +
356-
`Either the \`tree-sitter-wasm-bytes.ts\` literals were tree-shaken,\n` +
357-
`the minifier transformed them away, or the pre-init's import wasn't\n` +
358-
`actually consumed. The runtime tree-sitter init would fail with\n` +
359-
`"Internal error: tree-sitter.wasm not found".`,
360-
)
361-
}
362-
}
363-
logAlways(
364-
`Verified ${sampleChunks.length} embedded base64 chunks in compiled binary.`,
365-
)
366-
}
367-
368246
function patchOpenTuiAssetPaths() {
369247
const coreDir = join(cliRoot, 'node_modules', '@opentui', 'core')
370248
if (!existsSync(coreDir)) {

cli/src/pre-init/tree-sitter-wasm-bytes.ts

Lines changed: 0 additions & 19 deletions
This file was deleted.
Lines changed: 54 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -1,51 +1,62 @@
1-
// Embed tree-sitter.wasm into the bun-compile binary so the SDK's tree-sitter
2-
// parser singleton can find it at runtime. Must be the very first import in
3-
// `index.tsx`: subsequent imports (the SDK / code-map) eagerly construct the
4-
// parser, and its init reads what we publish here on `globalThis`.
1+
// Find tree-sitter.wasm so the SDK's tree-sitter parser singleton can load
2+
// it at runtime. Must be the very first import in `index.tsx`: subsequent
3+
// imports (the SDK / code-map) eagerly construct the parser, and its init
4+
// reads what we publish here on `globalThis` and via the env var.
55
//
6-
// History of failed approaches before this one (all worked on macOS/Linux,
7-
// failed on Windows in different ways):
6+
// Final approach after several attempts to embed the wasm into the bun
7+
// --compile binary all failed on Windows (the bytes ended up in the
8+
// binary, but every JS-level retrieval mechanism — `with { type: 'file' }`
9+
// import binding, base64 string literals, chunked base64 in a generated
10+
// module, function-export wrappers — was either tree-shaken, transformed
11+
// by the minifier, or otherwise stripped):
812
//
9-
// 1. `with { type: 'file' }` of `web-tree-sitter/tree-sitter.wasm` (node_
10-
// modules subpath) — bytes ended up in the binary but the import
11-
// variable was undefined at runtime. Bun/Windows bug with the import-
12-
// attribute binding.
13-
// 2. `with { type: 'file' }` of a copied-in relative .wasm — same as #1,
14-
// so it's not subpath-vs-relative.
15-
// 3. Single 274KB base64 string literal in a generated TS module — the
16-
// literal didn't appear in the compiled binary at all. Probably the
17-
// minifier transforming "huge constant" literals.
18-
// 4. ~268 chunked base64 string literals — same fate; the bundler
19-
// appeared to evaluate the imported array as the empty stub at
20-
// static-analysis time and DCE'd the conditional that consumed it.
13+
// ship tree-sitter.wasm as a sibling file next to the binary.
2114
//
22-
// What this version does: import a *function* whose body returns the
23-
// chunks. Function return values aren't statically inlinable the way
24-
// `export const` values are, so the bundler can't substitute the empty
25-
// stub for the call site. Reference the result unconditionally so DCE
26-
// can't kick in even if some inliner does fold the function.
15+
// It's 200KB, the npm tarball already contains the binary; adding one
16+
// more file is trivial. The build script copies the wasm into `cli/bin/`
17+
// after compile, the release workflow tarballs both, and the freebuff /
18+
// codebuff downloader extracts both into the same directory. At runtime,
19+
// `process.execPath` plus a relative file lookup gets us the wasm with
20+
// zero bundler involvement.
2721

28-
import { getTreeSitterWasmChunks } from './tree-sitter-wasm-bytes'
22+
import { existsSync, readFileSync } from 'fs'
23+
import { dirname, join } from 'path'
2924

30-
const chunks = getTreeSitterWasmChunks()
31-
if (chunks.length > 0) {
32-
const buf = Buffer.from(chunks.join(''), 'base64')
33-
// globalThis is the only cross-bundle channel: the SDK pre-built bundle
34-
// inlines its own copy of `init-node.ts`, so a module-level variable
35-
// here isn't visible to the singleton initialized via the SDK. Slice
36-
// into a fresh Uint8Array view rather than handing over Buffer's shared
37-
// underlying ArrayBuffer.
38-
;(
39-
globalThis as { __CODEBUFF_TREE_SITTER_WASM_BINARY__?: Uint8Array }
40-
).__CODEBUFF_TREE_SITTER_WASM_BINARY__ = new Uint8Array(
41-
buf.buffer,
42-
buf.byteOffset,
43-
buf.byteLength,
44-
)
25+
// Sibling path: same directory as the running binary. Works for both
26+
// production binaries (where the downloader places tree-sitter.wasm
27+
// next to the executable) and dev runs (path won't exist, falls
28+
// through to init-node.ts's path-based resolution which finds the
29+
// node_modules copy).
30+
const siblingPath = join(dirname(process.execPath), 'tree-sitter.wasm')
31+
32+
if (existsSync(siblingPath)) {
33+
// Tell init-node.ts (in code-map / the SDK bundle) where the wasm
34+
// is. The locateFile callback there will hand this path to
35+
// emscripten, which fs.readFile's it.
36+
process.env.CODEBUFF_TREE_SITTER_WASM_PATH = siblingPath
37+
38+
// Also try the synchronous-bytes path: hand the bytes straight to
39+
// Parser.init({ wasmBinary }) so the SDK doesn't need to round-trip
40+
// through emscripten's path resolution. Both channels feed the same
41+
// tree-sitter init; whichever one trips first wins.
42+
try {
43+
const buf = readFileSync(siblingPath)
44+
;(
45+
globalThis as { __CODEBUFF_TREE_SITTER_WASM_BINARY__?: Uint8Array }
46+
).__CODEBUFF_TREE_SITTER_WASM_BINARY__ = new Uint8Array(
47+
buf.buffer,
48+
buf.byteOffset,
49+
buf.byteLength,
50+
)
51+
} catch (err) {
52+
console.error(
53+
'[tree-sitter pre-init] readFileSync failed for sibling wasm at',
54+
siblingPath,
55+
'—',
56+
err instanceof Error ? err.message : String(err),
57+
)
58+
}
4559
}
4660

4761
// `--smoke-tree-sitter` is the deterministic CI gate. The handler lives at
48-
// the top of main() in cli/src/index.tsx (before parseArgs), not here —
49-
// top-level await in this module didn't actually pause subsequent module
50-
// evaluation under bun --compile on Windows. See the comment over the
51-
// handler in index.tsx for the full reasoning.
62+
// the top of main() in cli/src/index.tsx (before parseArgs).

freebuff/cli/release/index.js

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -373,6 +373,27 @@ async function downloadBinary(version) {
373373
}
374374
fs.renameSync(tempBinaryPath, CONFIG.binaryPath)
375375

376+
// Move tree-sitter.wasm next to the binary if the tarball included
377+
// it. The CLI binary loads this at startup; embedding it inside the
378+
// binary itself was unreliable on Windows (bun --compile asset
379+
// bundling silently dropped or unbound it across several attempts),
380+
// so we ship it as a sibling file instead. Older artifacts that
381+
// pre-date this change won't have the wasm and will still install —
382+
// they'll just hit the same crash they had before, which is fine.
383+
const tempWasmPath = path.join(CONFIG.tempDownloadDir, 'tree-sitter.wasm')
384+
if (fs.existsSync(tempWasmPath)) {
385+
const targetWasmPath = path.join(
386+
path.dirname(CONFIG.binaryPath),
387+
'tree-sitter.wasm',
388+
)
389+
try {
390+
if (fs.existsSync(targetWasmPath)) fs.unlinkSync(targetWasmPath)
391+
} catch {
392+
// best effort; rename below will surface the real error if it matters
393+
}
394+
fs.renameSync(tempWasmPath, targetWasmPath)
395+
}
396+
376397
fs.writeFileSync(
377398
CONFIG.metadataPath,
378399
JSON.stringify({ version }, null, 2),

0 commit comments

Comments
 (0)