Paper revision: dataset-size analysis, embedding comparison, response letter#52
Merged
jeremymanning merged 45 commits intomainfrom Mar 28, 2026
Merged
Paper revision: dataset-size analysis, embedding comparison, response letter#52jeremymanning merged 45 commits intomainfrom
jeremymanning merged 45 commits intomainfrom
Conversation
…er updates New analyses for Computational Linguistics revision: - Sigmoid fit to accuracy vs log10(tokens): R²=0.979, ≥95% threshold ≈51K tokens - Embedding comparison pipeline (3 MTEB models: nomic, bge-m3, Qwen3-4B) - Per-book checkpoint/resume support for embedding runs Paper updates: - New methods subsections (data requirements, embedding comparison) - New results subsections with figure references (placeholders for embedding results) - Expanded Discussion: Huang et al. (2025) comparison, benchmark feasibility argument - MTEB citation added to bibliography - Response letter draft (paper/admin/response_letter.tex) Code changes: - Converted model_results_ntokens.pkl.gz → Parquet (format-stable) - Removed brittle pd.__version__ assertions from 3 files - Extracted __main__ from visualization library to standalone script - New figures: accuracy_vs_tokens_sigmoid.pdf, t_test_ntokens.pdf, n_tokens.pdf - Replaced old grid/avg ntokens figures with single-panel designs Tests: 14 new tests (7 sigmoid, 7 embedding) all passing Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three new scripts (existing scripts unchanged): - remote_train_ntokens.sh: launch sweep on GPU cluster - check_ntokens_status.sh: check training progress per token level - sync_ntokens.sh: download configs/loss logs (not weights) from cluster Tested on tensor02.dartmouth.edu (8xA6000): connection OK, status check works. Credentials files created for tensor01 and tensor02 (gitignored). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Session notes with detailed progress tracking - Constitution with scientific rigor principles - Spec, plan, tasks for paper revision (67 tasks across 8 phases) - tensor02 tested: ntokens scripts working, all 1520 models present - bge-m3 embedding results: 76.2% accuracy (completed) - Qwen3-4B running locally Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- run_llm_stylometry.sh: figure flags 6/7, sentence-transformers/pyarrow deps, auto-run - generate_figures.py: dispatch for figures 6 (sigmoid) and 7 (t-test ntokens) - run_stats.sh: ntokens stats, sigmoid fit results, embedding comparison summaries - README.md: document dataset-size experiments, sigmoid fit, embedding comparison, remote ntokens scripts with usage examples Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Results section: filled in nomic and bge-m3 accuracy, Qwen placeholder remains - Discussion: substantive text on why embeddings underperform (content vs style) - Supplement: added embedding appendix with table, purity/confusion figures - Response letter: filled in embedding summary, fixed section references Only remaining PLACEHOLDERs are for Qwen3-4B accuracy (running locally). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Response letter: full verbatim reviewer comments with point-by-point responses (8 pages) - Paper results: filled nomic 81% and bge-m3 76.2% accuracy, Qwen placeholder remains - Paper discussion: substantive embedding interpretation (content vs style conflation) - Supplement: embedding appendix with table, purity/confusion figures, interpretation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
All claims depending on Qwen3-4B results are now either:
- Highlighted yellow (\colorbox{yellow}{...}) in response letter
- Marked with % NOTE/TODO/VERIFY comments in main.tex and supplement.tex
- Explicit [PLACEHOLDER] or TBD markers
Inventory: 3 numbers to fill, ~7 interpretive claims to verify, 14 page refs.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Response letter: \parskip set to \baselineskip (single blank line between paragraphs) - requirements-dev.txt: added sentence-transformers and pyarrow Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reformatted 80 Python files with black (line length 88). Applied ruff auto-fixes where applicable. No functional changes — formatting only. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Qwen3-Embedding-4B complete: 70.2% (59/84) — worst of 3 models. Inverse size-accuracy relationship: 81.0% (137M) > 76.2% (568M) > 70.2% (4B). - Filled all PLACEHOLDERs in main.tex, supplement.tex, response_letter.tex - Removed all yellow highlights (all claims now verified) - Updated supplement interpretation with full 3-model findings - Regenerated all 3 embedding figures with complete data - All paper documents compile cleanly (31 + 10 + 11 pages) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Results section: expanded with inverse size-accuracy pattern (81% > 76% > 70%), per-author findings (Qwen fails on Fitzgerald 0/8, Thompson 38.5%), and cross-reference to supplementary table. Discussion: added inverse pattern interpretation (larger models may amplify content similarity at expense of stylistic distinction), Dickens magnet observation, contrast with author-specific training. Supplement: updated interpretation with full 3-model findings. Removed all stale TODO/VERIFIED comments. Fixed cross-document table reference (tab:embedding-comparison -> Supp Table 1). All documents compile with zero undefined references. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…nt bug - Added embedding comparison figure (Fig. embedding-comparison) to main paper - Fixed supplement Supp. Fig. 6: wrong file (content_only -> pos) - Tightened results paragraph: removed redundant methodology recap - Tightened discussion paragraph: removed repeated accuracy numbers - All documents compile cleanly (31 + 10 + 11 pages) - Zero actual undefined references Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Line 36: Split long email addresses across 3 lines with \small - Line 877: Added \small to Austen/Twain corpus table - Line 897: Added \small to Fitzgerald/Wells corpus table - Supplement: Fixed POS t-stats figure (was using content_only.pdf) Zero overfull hbox warnings after fix. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Previously 19KB with empty bottom panels (Contested, Non-Oz Baum, Non-Oz Thompson). Now 192KB with all 6 panels populated correctly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fixed \author{} missing closing brace + added \date{}
- Added embedding figure macros (\embeddingpurity, \embeddingconfusion)
- Main text references Supp. Figs 9 and 10 for embedding details
- Supplement: moved embedding figures before table
- Added old.tex (from main branch) for latexdiff
- Updated compile.sh with diff target
- All 4 documents compile (main, supplement, response, diff)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Removed all narrative text from supplement embedding section (methods description, notable patterns discussion) - Added MTEB ranking point to main text results paragraph - Supplement now contains only figures, tables, and captions - Added \clearpage before embedding table to enforce ordering - Figures appear before tables in all supplement sections Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Used \captionof{figure} instead of figure* floats to force exact placement.
Added caption package. Embedding purity (Fig 9) and confusion (Fig 10) now
appear on page 8, before the embedding table (Table 4) on page 9.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Added old_supplement.tex from main branch for latexdiff - compile.sh now generates both main and supplement diffs - Fixed cd issue: cd back to SCRIPT_DIR before compile_diff in 'all' case - All 5 documents compile: main, supplement, response, diff, diff_supplement Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Added 3 new citations (verified via web search): - Stamatatos (2009): JASIST 60(3), 538-556 — survey of attribution methods - Stamatatos (2018): JASIST 69(3), 461-473 — topic masking for cross-topic AA - Fincke & Boschee (2024): arXiv:2408.05192 — cross-genre data selection Expanded Discussion cross-domain paragraph with: - Attribution degrades across genres/topics (Stam09, BarlStam20) - Topic masking helps cross-topic (Stam18) - Multi-genre pooling alone doesn't help (FincBosc24) - Oz analysis is preliminary evidence, not controlled evaluation - Cross-genre/register evaluation identified as future work Updated response letter to accurately describe the new discussion. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fixed quote 2: added full sentence ending ("because the only text...")
- Updated section references to match renamed sections:
"Training data requirements" -> "Corpus size analysis"
"Comparison with text embedding methods" -> "Comparisons between
predictive comparison and text embedding matching"
- All 3 inline paper quotes verified verbatim against main.tex
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Grammar: - "GPT-2 implicitly learn" -> "learns" (subject-verb agreement) Redundancy: - Removed duplicate "characterize the data requirements" phrase Also added .specify/ and specs/ to .gitignore. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Line 641: comma splice fixed with semicolons (Blogs50; CCAT50; Guardian) - Line 652: "These would not" -> "These analyses would not" (clarify antecedent) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Page numbers remain yellow-highlighted for easy updating if pages shift. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Parquet file was 140MB (over GitHub's 100MB limit). Re-serialized pkl.gz from parquet using current numpy 1.x (98MB, under limit). The original pkl.gz required numpy 2.x; this new one is compatible with numpy 1.x. Updated all references across 15 files (Python, shell, README, gitignore). All 8 tests pass with the new pkl.gz. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Addresses reviewer comments for Computational Linguistics resubmission (Issue #50). Adds two new analyses, updates the paper text, and drafts a point-by-point response letter.
New analyses
Paper updates
Response letter
Infrastructure
run_llm_stylometry.sh: new figure flags 6/7, sentence-transformers depcompile.sh: builds main + supplement + response + latexdiffTest plan
pytest tests/test_sigmoid_fit.py— 7 tests passpytest tests/test_embedding_comparison.py— 7 tests passpytest tests/test_dataset_size_support.py— 1 test passes./paper/compile.sh all)Closes #50