A font classification system that identifies 394 font variants across 32 families from rendered text images, using LoRA fine-tuning of DINOv2. Achieves 98.9% top-1 validation accuracy with only ~1% of parameters trainable.
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtgit clone --filter=blob:none --depth 1 https://github.com/google/fonts.gitpython dataset_generator.py \
--font_dir <path to google fonts> \
--out_dir <output folder> \
--img_size 224 \
--font_size 1024 \
--padding 128Uses all CPU cores by default (--workers N to override). Generates ~575 training images and 40 test images per font variant with randomized colors, alignment, line wrapping, and Gaussian noise.
python dataset_cleaner.py <dataset folder>Prints any corrupted image paths for manual inspection.
pip install -U "huggingface_hub[cli]"
huggingface-cli upload-large-folder <user>/<repo> <dataset folder> --repo-type=datasetFor large datasets (200k+ files), tar the train/test folders first to avoid API rate limits:
tar cf train.tar -C <dataset folder> train/
tar cf test.tar -C <dataset folder> test/
HF_HUB_DISABLE_XET=1 huggingface-cli upload <user>/<repo> train.tar train.tar --repo-type=dataset
HF_HUB_DISABLE_XET=1 huggingface-cli upload <user>/<repo> test.tar test.tar --repo-type=datasetLoRA (default, recommended):
python train_model.py \
--data_dir <dataset folder> \
--output_dir <output folder> \
--batch_size 64 \
--epochs 100 \
--learning_rate 1e-4 \
--lora_rank 8 \
--lora_alpha 16 \
--lora_dropout 0.1Baseline comparisons:
# Full fine-tuning (all 87.2M params)
python train_model.py --full_finetune --data_dir <data> --output_dir <out> --epochs 100
# Linear probe (classifier head only, 606K params)
python train_model.py --linear_probe --data_dir <data> --output_dir <out> --epochs 20
# CNN baseline (ResNet-50)
python train_model.py --resnet_baseline --data_dir <data> --output_dir <out> --epochs 100python train_model.py \
--checkpoint <output folder>/checkpoint-2752 \
--data_dir <dataset folder> \
--output_dir <output folder> \
--epochs 100python train_model.py \
--epochs 0 \
--data_dir <dataset folder> \
--checkpoint <output folder>/checkpoint-2752 \
--huggingface_model_name <user>/<repo>python serve_model.py <model name or path> <image path>Runs training end-to-end on Vast.ai GPU instances: finds a machine, uploads the code, trains, uploads results to HuggingFace, and destroys the instance automatically. Includes auto-retry (up to 5 instances), health checks, and crash log upload.
Setup:
pip install vastai
vastai set api-key <your key>
vastai create ssh-key "$(cat ~/.ssh/id_ed25519.pub)"
huggingface-cli loginUsage:
# Run all baselines on separate instances in parallel
bash cloud_train.sh --hf_dataset dchen0/font_crops_v5 --hf_results dchen0/font-model-results --mode all --gpu RTX_3090 --parallel
# Run a single mode
bash cloud_train.sh --hf_dataset dchen0/font_crops_v5 --hf_results dchen0/font-model-results --mode lora --gpu RTX_3090
# Dry run (tiny test dataset, validates full pipeline in ~5 min)
bash cloud_train.sh --dry_run --gpu RTX_3090Options:
| Flag | Default | Description |
|---|---|---|
--hf_dataset |
(required) | HuggingFace dataset to train on |
--hf_results |
(required) | HuggingFace repo for results upload |
--mode |
lora |
Training mode: lora, lora4, lora16, full, linear, resnet, or all |
--gpu |
RTX_4090 |
GPU type (e.g., RTX_3090, A100) |
--max_price |
2.00 |
Max hourly price in USD |
--batch_size |
64 |
Training batch size |
--epochs |
100 |
Number of training epochs |
--num_gpus |
1 |
GPUs per instance (multi-GPU via accelerate) |
--parallel |
off | Launch each mode on a separate instance |
--dry_run |
off | Use tiny test dataset, 1 epoch, defaults to all modes |
--ssh_key |
~/.ssh/vastai |
SSH key for Vast.ai instances |
Features:
- Auto-retry with up to 5 different instances per mode
- Health check after launch (connectivity, CUDA, pip)
- Checkpoints synced to HuggingFace every 10 minutes (resumable on preemption)
- Training logs uploaded on any exit (crash, signal, or success)
- Instance auto-destroys after uploading results
Dry run:
Always dry run before a full training run to catch issues early:
# Test all modes (default)
bash cloud_train.sh --dry_run --gpu RTX_3090
# Test a specific mode
bash cloud_train.sh --dry_run --mode resnet --gpu RTX_3090This uses a tiny test dataset (dchen0/font_crops_test, 3 classes, 39 images) to validate the entire pipeline in ~5 minutes.
To regenerate the test dataset:
python create_test_dataset.py --synthetic --uploadpython confusion_matrix.py \
--data_dir <dataset folder> \
--model <HuggingFace model name or local path>The model's label set must match the dataset's class folders. The script will check label overlap and abort if there's a mismatch.
Produces:
figures/confusion_matrix.pdf— Row-normalized heatmap grouped by font familyfigures/top_confused_pairs.pdf— Bar chart of most frequent misclassificationsfigures/per_family_accuracy.pdf— Per-family accuracy breakdownfigures/tsne_embeddings.pdf— t-SNE of [CLS] embeddingsfigures/font_dendrogram.pdf— UPGMA clustering of font familiesfigures/metrics.tex— LaTeX macros for paper (including SWER with typographic metadata distance)confusion_matrix.json— Raw countsbad_images.json— All misclassified images
# Full build (evaluation + LaTeX)
bash build_paper.sh --data_dir <dataset folder> --model <model>
# LaTeX only (skip evaluation)
bash build_paper.sh --skip-matrixhandler.py implements the preprocessing pipeline (pad-to-square + resize + normalize) used at both training and inference time. It's bundled with the model on HuggingFace for Inference Endpoints.