PIGEN (Physics-Informed Generation) is a framework for generating novel crystal structures by integrating physics-informed sampling, chemically guided control, and structural evaluation into a denoising diffusion model.
Building on DiffCSP, PIGEN introduces two conditioning metrics — compactness and local-environment diversity (MLED) — which guide the generative process toward physically plausible yet structurally diverse configurations. Conditioning on these descriptors consistently increases the fraction of novel crystal frameworks across diffusion architectures, including DiffCSP and MatterGen, demonstrating a clear generalisation edge for PIGEN.
The model enables generation beyond known chemical spaces and supports out-of-distribution extrapolation, yielding a higher proportion of stable, unique structures per batch compared to previous approaches.
This repository accompanies the preprint:
A. Vasylenko et al., "Physics-informed diffusion models for extrapolating crystal structures beyond known motifs", arXiv:2510.23181 (2025).
conda env create -f environment.yml
conda activate pigenAll dependencies are managed via conda; setup.py is only for local package registration.
PIGEN has been tested on:
- Linux (x86_64) — recommended for full reproducibility and GPU training
- macOS (ARM, Apple Silicon) — supported for CPU inference and development
- CUDA ≥ 12.1 - for GPU acceleration
Note for macOS (ARM):
Some PyTorch Geometric packages (torch-scatter, torch-sparse, etc.) are not available through conda.
After activating the environment, install them manually:
conda env create -f environment.osx-arm64.yml
conda activate pigen
python -c "import torch; print('torch version:', torch.__version__)"
pip install torch_geometric
pip install pyg-lib torch-scatter torch-sparse torch-cluster -f https://data.pyg.org/whl/torch-<your_torch_version>+cpu.htmlpip install -e .https://huggingface.co/datasets/UoLiverpool/Alex_MP_20_M_LED/
For re-training the model with the default dataset - Alex_MP_20_MLED, run:
python pigen/train.pyThis will use the default data and conditioning properties and is equivalent to
python pigen/train.py --data_name Alex_MP_20_M_LED --prop ['entropy_sum', 'target_energy'],where 'entropy_sum' is a technical term for MLED, 'target_energy' stands for Compactness here
You can use your trained model or download the model's checkpoint from: huggingface.co/DeepDrew/PIGEN/
After downloading, place the checkpoint file in:
checkpoints/This ensures pigen/generate.py can locate it.
Run
cd pigen
python generate.pywhich will generate example structures in examples/ folder. It should take less than 10 minutes.
This repository builds on DiffCSP, an open-source implementation of denoising diffusion probabilistic models for crystal structure prediction. We have further developed and extended it as described below.
- Physics-informed logic integrated into the sampling process
- Conditional generation with target-guided control via classifier-free guidance
- Featurised dataset with local chemical and structural environment feature, enabling out-of-distribution extrapolation
- Chemistry-informed structure evaluation tools
- Modular refactoring for better reproducibility and configuration management.
- Support for PyTorch Distributed Data Parallel to accelerate large-scale training across multiple GPUs or nodes
Run:
pytest tests Note: The Docker image is intentionally left with a flexible entry point (/bin/bash) to allow the user to either train or generate as needed, following the instructions below. This design choice supports both CPU and GPU environments.
docker build -t pigen .docker run --rm pigendocker run --rm --gpus all pigen├── checkpoints
├── data
│ └── Alex_MP_20_M_LED/
├── environment.yml
├── log
├── pigen
│ ├── __init__.py
│ ├── assets/
│ ├── common/
│ ├── eval/
│ ├── generate.py
│ ├── normalization
│ ├── partial_sample.py
│ ├── settings.py
│ └── train.py
├── README.md
├── setup.py
├── tests
│ ├── dummy_data/
│ ├── dummy_logs/
│ ├── fixtures/
│ ├── conftest.py
│ ├── test_dependecies.py
│ ├── test_dummy_training.py
│ ├── test_pd_structure_parsing.py
│ └── test_torch_installation.py
└── verify_environment_installs.py
If you use this code or metrics, please consider citing: A. Vasylenko et al., "Physics-informed diffusion models for extrapolating crystal structures beyond known motifs", arXiv:2510.23181
We gratefully acknowledge the authors of DiffCSP for their contribution to the research and open-source community.
This project is licensed under the MIT License, consistent with DiffCSP.
All modifications are © 2025 Andrij Vasylenko.