pw2048 — Play 2048 with Algorithms

An automation-first 2048 benchmark and experimentation platform for comparing search and reinforcement learning algorithms with reproducible evaluation, reporting, and developer-friendly workflows.

pw2048 uses the 2048 game as a compact benchmark environment, but its real value goes beyond “building a game bot”. It is designed as a small engineering system for:

algorithm comparison
experiment automation
reproducible evaluation
report generation
training / benchmark workflow design
multi-interface developer experience (CLI / TUI / Web)

Screenshots

2048 Game	Web Launcher

Why this exists

A lot of AI side projects stop at one of these stages:

“the algorithm works”
“the demo looks cool”
“the model trained once”

That is not enough if the goal is to build something reusable, comparable, and explainable.

pw2048 was created to treat experimentation as an engineering problem:

How do you benchmark multiple strategies fairly?
How do you keep evaluation reproducible?
How do you compare search algorithms and RL agents in one workflow?
How do you generate outputs that are useful beyond a one-off run?

This makes pw2048 closer to an experiment infrastructure / evaluation tooling project than a simple game automation demo.

What it demonstrates

Although the benchmark environment is 2048, the project demonstrates a broader set of engineering capabilities:

browser automation with Playwright
benchmark workflow design
reproducible metric collection
RL training / evaluation loop design
HTML report generation and visualization
experiment ergonomics through CLI / TUI / Web interfaces
structured project architecture for iterative algorithm work

A better way to read this repository is:

Benchmark Platform
Experiment Automation Tooling
Reproducible Evaluation Workflow
AI-Enhanced Engineering Tooling

not simply:

a 2048 bot
a hobby RL experiment
a game-only side project

At a Glance

Field	Value
Current best algorithm	Expectimax
Highest avg score	~33 000 (Expectimax, 73% win rate)
Best learning algorithm	DQN-v3 / PPO-v3 (behavioural-cloning pre-training)
Benchmark protocol	5 runs × 500 games (auto-parallel)

Current scope

pw2048 currently supports:

Search-based strategies

Random
Greedy
Heuristic
Expectimax
MCTS

Learning-based strategies

DQN (multiple versions)
PPO (multiple versions)
behavioural-cloning pre-training variants

Engineering support layers

benchmark execution
training/evaluation split
visualization
leaderboard generation
CLI/TUI/Web launch paths
TensorBoard-compatible logging

Why this matters beyond 2048

The most important part of this repository is not the game itself.

What matters is the structure:

a controlled environment
multiple comparable strategies
repeatable evaluation loops
automated result reporting
an interface layer that makes the system easier to use

That structure is reusable far beyond 2048.

The same engineering pattern can be adapted to:

agent benchmarking
browser-based task evaluation
policy comparison experiments
lightweight AI test harnesses
internal experimentation platforms for model or strategy iteration

Current Leaderboard

Run python main.py --mode benchmark --report to populate this table.

Rank	Algorithm	Version	Avg Score	P90	Max Score	Best Tile	Win Rate
1	Expectimax	v1	33 030	60 266	132 412	8192	73.4%
2	Heuristic	v1	16 061	28 185	61 064	4096	21.6%
3	MCTS	v2	7 821	12 649	15 416	1024	0.0%
4	Greedy	v1	3 050	5 416	13 820	1024	0.0%
5	Random	v1	1 102	1 720	3 324	256	0.0%
6	DQN-v3*	v3	—	—	—	—	—
7	PPO-v3*	v3	—	—	—	—	—

* DQN-v3 / PPO-v3 include behavioural-cloning pre-training. See RL Training Guide → to push their scores higher.

Quick start

# Install dependencies
pip install -r requirements.txt
python -m playwright install chromium

# Run 20 games with the random algorithm (default)
python main.py

# Run with the TUI wizard (arrow-key menus)
python main.py --tui

# Run with the web UI wizard (browser form)
python main.py --web

# Full benchmark with HTML leaderboard
python main.py --mode benchmark --report

# Fast in-process RL training (no browser, 10–50× faster)
python main.py --algorithm dqn \
               --train-games 5000 \
               --checkpoint-dir checkpoints \
               --tensorboard-dir tb_logs

# Auto-training: train until performance plateaus (no need to guess game count)
python main.py --algorithm dqn \
               --early-stopping-patience 10 \
               --eval-freq 50 \
               --checkpoint-dir checkpoints \
               --games 0

# Maximum speed: GPU + 4 parallel workers + early stopping
python main.py --algorithm dqn \
               --train-workers 4 \
               --early-stopping-patience 15 \
               --eval-freq 100 \
               --checkpoint-dir checkpoints \
               --tensorboard-dir tb_logs \
               --games 0

# Inspect model status after training
python main.py --inspect-checkpoint checkpoints/DQN-v3/best_checkpoint.npz
python main.py --training-status    tb_logs/DQN-v3

4-layer RL Training Pipeline

The system exposes a structured Env / Train / Eval / Play stack for sustained high-score improvement:

Layer	Module	Description
Env	`src/rl_env.py` → `Game2048Env`	Pure-Python gym-style env, no browser, 10–50× faster
Train	`src/rl_trainer.py` → `RLTrainer`	In-process training loop; drives `choose_move()`
Eval	`src/rl_trainer.py` → `EvalCallback`	Greedy eval every N games; saves `best_checkpoint.npz`
Play	`src/runner.py`	Playwright browser benchmark / demo

# Train, then benchmark
python main.py --algorithm dqn \
               --train-games 5000 --checkpoint-dir checkpoints \
               --tensorboard-dir tb_logs --eval-freq 50 --n-eval-games 20

python main.py --algorithm dqn --games 50 --checkpoint-dir checkpoints --report

tensorboard --logdir tb_logs   # view training curves

# Auto-training (early stopping) — stops when score plateaus
python main.py --algorithm dqn \
               --early-stopping-patience 10 --eval-freq 50 \
               --checkpoint-dir checkpoints --games 0

→ Full RL Training Guide
→ Efficient Training Playbook — GPU / Parallel / Status

Roadmap

Baselines

Random — pick a random direction each turn
Greedy — pick the move that maximises immediate score gain
Heuristic — hand-crafted heuristics (corner strategy, monotonicity, empty tiles, merge potential)

Search Algorithms

Expectimax — game-tree search with chance nodes for tile spawns
MCTS — Monte Carlo Tree Search (v1: random rollout, v2: greedy rollout)

Learning Algorithms

DQN v1/v2/v3 — v3 adds BC pre-training, Adam, one-hot encoding, score reward
PPO v1/v2/v3 — v3 adds BC pre-training, Adam, one-hot encoding, score reward
4-layer Env/Train/Eval/Play — in-process training, EvalCallback, TensorBoard
Early stopping — auto-stop when score plateaus (--early-stopping-patience)
GPU acceleration — optional PyTorch backend (Apple MPS / CUDA)
Parallel training — N independent workers, best model selected (--train-workers)

Project structure

pw2048/
├── game.html                  # Self-contained 2048 game (served locally)
├── main.py                    # CLI entry-point
├── requirements.txt
├── docs/                      # Detailed documentation
│   ├── rl-training.md         # RL guide: checkpoints, 4-layer pipeline, TensorBoard
│   ├── cli-reference.md       # All CLI flags, modes, result layout
│   └── ui-wizards.md          # TUI / GUI / Web UI usage
├── src/
│   ├── game.py                # Playwright wrapper
│   ├── runner.py              # Play layer: browser runner (sequential / parallel)
│   ├── rl_env.py              # Env layer: Game2048Env
│   ├── rl_trainer.py          # Train + Eval layers: RLTrainer, EvalCallback, TrainingLogger
│   ├── visualize.py           # Matplotlib charts
│   ├── report.py              # HTML dashboard generator
│   ├── storage.py             # S3 upload / prune helpers
│   ├── tui.py                 # TUI wizard (questionary + rich)
│   ├── gui.py                 # Desktop GUI wizard (tkinter)
│   ├── webui.py               # Web UI launcher (http.server)
│   └── algorithms/
│       ├── base.py
│       ├── random_algo.py
│       ├── greedy_algo.py
│       ├── heuristic_algo.py
│       ├── expectimax_algo.py
│       ├── mcts_algo.py
│       ├── dqn_algo.py        # DQN v1/v2/v3
│       └── ppo_algo.py        # PPO v1/v2/v3
└── tests/
    ├── test_game_and_algorithms.py
    ├── test_rl_env_and_trainer.py
    ├── test_storage_and_report.py
    ├── test_tui.py
    ├── test_gui.py
    └── test_webui.py

Running tests

python -m pytest tests/ -v

Engineering value of this repository

This repository is useful as a technical project because it shows more than raw algorithm experimentation. It also shows:

how to build an evaluation workflow around algorithms
how to separate training, evaluation, and play layers
how to automate result collection and reporting
how to make an experimentation system easier to operate
how to turn a benchmark setup into a reusable engineering asset

That is the main reason pw2048 belongs in a broader professional narrative around:

automation
tooling
experiment infrastructure
AI-enhanced engineering systems

Recommended way to present this repo professionally

If you are using this repository in a portfolio, README, or career narrative, the most useful framing is:

Built an automation-first benchmark platform for comparing search and RL algorithms, with reproducible evaluation, structured reporting, and multiple user interaction layers.

That framing is stronger than:

I made an AI project that plays 2048.

Because it points attention to the engineering system, not just the toy surface.

Summary

pw2048 is an experiment platform built around 2048, but its real contribution is in the surrounding engineering system: benchmark design, experiment automation, evaluation workflows, and report generation.

That is what makes it more than a game project — and closer to a reusable AI engineering asset.

Topic	Doc
Getting high scores with RL, checkpoints, TensorBoard	docs/rl-training.md
GPU acceleration, parallel workers, early stopping, model status	docs/efficient-training.md
All CLI flags, parallel mode, result layout	docs/cli-reference.md
TUI / GUI / Web UI wizards	docs/ui-wizards.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pw2048 — Play 2048 with Algorithms

Screenshots

Why this exists

What it demonstrates

At a Glance

Current scope

Search-based strategies

Learning-based strategies

Engineering support layers

Why this matters beyond 2048

Current Leaderboard

Quick start

4-layer RL Training Pipeline

Roadmap

Baselines

Search Algorithms

Learning Algorithms

Project structure

Running tests

Engineering value of this repository

Recommended way to present this repo professionally

Further reading

Summary

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
docs		docs
src		src
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
game.html		game.html
main.py		main.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

pw2048 — Play 2048 with Algorithms

Screenshots

Why this exists

What it demonstrates

At a Glance

Current scope

Search-based strategies

Learning-based strategies

Engineering support layers

Why this matters beyond 2048

Current Leaderboard

Quick start

4-layer RL Training Pipeline

Roadmap

Baselines

Search Algorithms

Learning Algorithms

Project structure

Running tests

Engineering value of this repository

Recommended way to present this repo professionally

Further reading

Summary

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages