PAWN: Playstyle-Agnostic World-model Network for Chess

A small causal transformer trained on random chess games that learns legal moves, board state representations, and game dynamics purely from random legal move sequences absent any form of strategic play.

I've found PAWN to be a viable testbed for finetuning and augmentation methods at small scales. Since it is entirely unopinionated, it's a blank slate ready to be adapted, augmented, and finetuned into arbitrary player models with unique playstyles.

Finetuning PAWN has proven significantly more parameter-efficient than training new models from scratch and requires minimal compute resources.

Feel free to use PAWN in your own experiments. PAWN is developed as a personal project by a single developer and his imaginary friend (Claude) and has not been published or audited. If you spot a bug or inaccuracy, please help out by creating an issue or PR.

Model Variants

The model comes in three sizes, all trained from scratch on random chess games generated on-the-fly by a Rust-based chess backend. The v1.0.0 weights were trained together for 200K steps at batch size 256 on a single B200 — all three variants see the same random-game batches each step, with one forward/backward pass per variant in sequence on the same GPU (see cotrain config). The numbers below come from the best 5K-cadence checkpoint by val loss (step 195,000 ≈ 49.9M sequences) for all three variants:

Variant	d_model	Layers	Heads	Params	Top-1	Legal rate	Game completion
PAWN-Small	256	8	4	8.94M	8.54%	99.7451%	52.34%
PAWN (Base)	512	8	8	34.65M	8.57%	99.9962%	98.97%
PAWN-Large	640	10	8	66.91M	8.63%	99.9990%	99.76%

Metrics measured on a 2,048-game validation set of random games. Game completion is the ability to choose a legal move in every position throughout a random game. It is the primary signal that separates capacity between sizes. The number given above is non-autoregressive. See docs/ARCHITECTURE.md.

All variants share the same architecture: RMSNorm, SwiGLU FFN, RoPE, factored move embeddings, and a vocabulary covering:

1,968 move actions (the searchless_chess vocabulary, one entry per legally-reachable (src, dst[, promotion]) tuple),
11 game-outcome tokens (pretraining outcomes: WHITE_CHECKMATES, BLACK_CHECKMATES, STALEMATE, DRAW_BY_RULE, PLY_LIMIT; Lichess-specific outcomes: WHITE_RESIGNS, BLACK_RESIGNS, DRAW_BY_AGREEMENT, WHITE_WINS_ON_TIME, BLACK_WINS_ON_TIME, DRAW_BY_TIME),
and a single PAD token — 1,980 tokens total.

Tokens are coordinate pairs (UCI notation) with no piece type or side-to-move information — e2e4 means the same token whether it's a pawn double-push or a rook move. The model learns to track piece placement, movement rules, and game state entirely from observation, which can be isolated via linear probes.

Quickstart

# Clone and build
git clone https://github.com/thomas-schweich/PAWN.git && cd PAWN

# Build the Rust chess engine (required -- handles all game logic)
cd engine && uv run --with maturin maturin develop --release && cd ..

# Install Python dependencies
uv sync --extra cu128   # NVIDIA GPU (or --extra rocm for AMD)

Train an adapter

Weights and data can be loaded directly from HuggingFace:

uv run python scripts/train.py --run-type adapter --strategy bottleneck \
    --checkpoint thomas-schweich/pawn-base \
    --pgn thomas-schweich/pawn-lichess-full \
    --bottleneck-dim 32 --lr 1e-4 --local-checkpoints

Pretrain from scratch

Random games are generated on-the-fly; no dataset required:

uv run python scripts/train.py --variant base --local-checkpoints

# Or train all three variants simultaneously on shared data
uv run python scripts/train.py --config configs/cotrain_three_variants.json

Run probes and diagnostics

uv run python scripts/eval_probes.py --log-dir logs --device cuda
uv run python -m pawn.dashboard --log-dir logs  # real-time monitoring

Architecture

_{More info: docs/ARCHITECTURE.md}

Standard decoder-only transformer with next-token prediction. Each training example is a move sequence padded to 512 tokens. Factored embeddings decompose each move into source square + destination square + promotion piece. Predictions are not masked to legal moves — the model must infer legality from the move history alone. There is no board representation like AlphaZero's 8x8xN planes; all state tracking is learned internally.

What the Model Learns

Despite training exclusively on random games, PAWN develops rich internal representations. Linear probes on frozen hidden states decode chess concepts the model is never explicitly told about:

Probe	Small	Base	Large
Side to move	100.0%	100.0%	100.0%
En passant square	99.9%	99.9%	99.9%
Castling rights	98.3%	99.3%	99.3%
Is check	95.2%	95.0%	94.9%
Game phase	94.8%	95.8%	95.9%
Piece type	88.8%	91.8%	92.2%
Material count (R²)	0.80	0.84	0.83

Full probe results including diagnostics are on each variant's HuggingFace model card.

Adapter Methods

_{More info: docs/ADAPTERS.md}

PAWN ships with six adapter implementations for finetuning the frozen backbone on human game data. Numbers below are from the v1.0.0 pawn-base backbone trained on 2M Lichess games at 1800-1900 Elo, one pass.

Method	Params	Val top-1 (1800-1900 Elo)	Description
Bottleneck dim=512	8.4M	46.14%	Houlsby-style residual MLP adapters
Bottleneck dim=64	1.05M	41.56%
Bottleneck dim=32	524K	39.82%
LoRA rank=16 qkvo	524K	35.62%	Low-rank attention projection adapters
RoSA retro-sparse d=0.01	84K	30.45%	Gradient-informed sparse mask from LoRA warmup
Sparse density=0.015 qkvo	126K	29.18%	Random binary mask on frozen weights

Hybrid (LoRA+FiLM) and FiLM remain in the codebase but aren't in the current sweep. See docs/ADAPTERS.md for full methodology and Pareto discussion.

Datasets

The "Lichess Full" dataset below was filtered to matches between players rated 1800-1900 and truncated to 1-10 million games, depending on adapter type and size (some smaller adapters saturate too rapidly to benefit from more games). But I parsed all ~300 million games and converted them to UCI as well as PAWN's training format (a) to make future experiments easier and (b) since others might find the pre-converted raw UCI helpful for other projects. And because the Rust engine is super fast anyway. I also kept the SAN notation and metadata from the original PGNs.

Dataset	Games	Description	Link
Lichess Full	286M train + 9.3M val + 9.0M test	Rated games from Q1 2025 (all Elos), holdout from Jan 2026	pawn-lichess-full
Stockfish nodes=1	900K train + 50K val + 50K test	NNUE self-play, 1 node/move	stockfish-nodes1

All datasets use pre-tokenized list[int16] move sequences (tokens column). The Lichess dataset also includes raw san/uci strings, clock annotations, Elo ratings, and full game metadata. Datasets load directly from HuggingFace via Polars lazy scan. Predicate pushdown makes it so that only the subset of data you select is actually downloaded.

Repository Structure

pawn/
├── pawn/                 # Core Python package
│   ├── config.py         # Model configs (small/base/large)
│   ├── model.py          # PAWN transformer
│   ├── data.py           # Random game data pipeline
│   ├── lichess_data.py   # Lichess/Parquet data pipeline
│   ├── trainer.py        # Pretraining loop
│   ├── gpu.py            # GPU auto-detection
│   ├── adapters/         # Bottleneck, LoRA, FiLM, sparse, hybrid, RoSA
│   ├── eval_suite/       # Probes, generation tests, diagnostics
│   └── dashboard/        # Solara training dashboard
├── engine/               # Rust chess engine (PyO3 bindings via shakmaty)
├── scripts/              # Training, evaluation, and data extraction
├── deploy/               # Docker, RunPod deployment, serverless handler
├── tests/                # Unit tests
└── docs/                 # Architecture, training, adapter docs

Chess Engine

PAWN includes a bundled Rust chess engine (engine/) that handles all game simulation, move generation, legal move computation, tokenization, and PGN parsing. The engine uses shakmaty under the hood, with PyO3 bindings to Python. No Python chess libraries are used.

The engine generates training data on-the-fly via chess_engine.generate_random_games(), producing well over 100 million random games per hour. It also includes enriched PGN parsing (extracting clock annotations, Stockfish evals, and headers in a single pass) and UCI engine self-play generation.

More info

Architecture -- model design, embeddings, training objective, game completion analysis
Training -- pretraining, adapter training, deployment
Adapters -- adapter methods, results, quick start
Accuracy Ceiling -- theoretical limits for random game prediction
Legacy Architecture -- the v0.x backbones, why they were retired, and how to load them

Acknowledgments

PAWN builds on ideas and tools from the following projects and publications:

Component	Reference
Transformer	Vaswani et al., "Attention Is All You Need", NeurIPS 2017
RMSNorm	Zhang & Sennrich, "Root Mean Square Layer Normalization", NeurIPS 2019
RoPE	Su et al., "RoFormer: Enhanced Transformer with Rotary Position Embedding", 2021
SwiGLU	Shazeer, "GLU Variants Improve Transformer", 2020
AdamW	Loshchilov & Hutter, "Decoupled Weight Decay Regularization", ICLR 2019
Cosine schedule	Loshchilov & Hutter, "SGDR: Stochastic Gradient Descent with Warm Restarts", ICLR 2017
Mixed precision	Micikevicius et al., "Mixed Precision Training", ICLR 2018
Bottleneck adapters	Houlsby et al., "Parameter-Efficient Transfer Learning for NLP", ICML 2019
LoRA	Hu et al., "LoRA: Low-Rank Adaptation of Large Language Models", ICLR 2022
FiLM	Perez et al., "FiLM: Visual Reasoning with a General Conditioning Layer", AAAI 2018
RoSA	Nikdan et al., "RoSA: Accurate Parameter-Efficient Fine-Tuning via Robust Adaptation", 2024
Linear probes	Alain & Bengio, "Understanding Intermediate Layers Using Linear Classifier Probes", ICLR Workshop 2017
MAIA	McIlroy-Young et al., "Aligning Superhuman AI with Human Behavior: Chess as a Model System", KDD 2020
AlphaZero	Silver et al., "A General Reinforcement Learning Algorithm that Masters Chess, Shogi, and Go through Self-Play", Science 2018
Searchless Chess	Ruoss et al., Amortized Planning with Large-Scale Transformers: A Case Study on Chess
Leela Chess Zero	github.com/LeelaChessZero/lc0
shakmaty	github.com/niklasf/shakmaty
PyO3	github.com/PyO3/pyo3
Lichess	lichess.org / database.lichess.org

Citation

@software{schweich2026pawn,
  author = {Schweich, Thomas},
  title = {{PAWN}: Playstyle-Agnostic World-model Network for Chess},
  year = {2026},
  url = {https://github.com/thomas-schweich/PAWN},
  license = {Apache-2.0}
}

License

Apache 2.0. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 143 Commits
.claude/skills/manage-pod		.claude/skills/manage-pod
.github		.github
cards		cards
configs		configs
deploy		deploy
docs		docs
engine		engine
pawn		pawn
scripts		scripts
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
.mcp.json		.mcp.json
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
Dockerfile.datagen		Dockerfile.datagen
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
searchless_chess_vocabulary.json		searchless_chess_vocabulary.json
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PAWN: Playstyle-Agnostic World-model Network for Chess

Model Variants

Quickstart

Train an adapter

Pretrain from scratch

Run probes and diagnostics

Architecture

What the Model Learns

Adapter Methods

Datasets

Repository Structure

Chess Engine

More info

Acknowledgments

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PAWN: Playstyle-Agnostic World-model Network for Chess

Model Variants

Quickstart

Train an adapter

Pretrain from scratch

Run probes and diagnostics

Architecture

What the Model Learns

Adapter Methods

Datasets

Repository Structure

Chess Engine

More info

Acknowledgments

Citation

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages