The CLIMB Project

[Cosmological LambdaCDM Simulations for Inference with Machine Learning and Bayesian statistics]

The project aims to build a Simulation-based Inference pipeline that predicts the four cosmological parameters {$\Omega_{\Lambda}$, $\Omega_m$, $\Omega_b$, h} from Lyman-$\alpha$ forest spectra. For this purpose 50 simulations are run with varied cosmological parameters using the TNG-Arepo code (Pillepich et al. 2017) evolving the simulated boxes from $z=127$ until today. Our boxes have a size of $25^3$ cMpc$^3/h$ with $256$ particles per dimension. Initial conditions are created with the MUSIC code (Hahn and Abel 2011) and the random initial seed is the same for all 50 boxes. 500,000 spectra are created from snapshots at $z=2$ using the TEMET package (Nelson et al. 2025). A Transformer Neural Network is trained on these spectra and then applied to observed spectra from the SDSS survey (DR9) (Lee et al. 2013) to test the goodness of the pipeline.

A selection of some of the plots from the project can be found below.

Simulations

Projection of the mean density of neutral hydrogen at $z=2$ for 20 of the CLIMB boxes ordered by $\Omega_b$. The filamentary cosmic web structure is clearly visible, with variations in the neutral hydrogen distribution reflecting differences in baryon content between the cosmologies:

Gas and Dark Matter for the most massive halo in 10 of the boxes from the CLIMB suite at z = 0. The gray circles represent the regions, where the mean density is larger than 500 times the critical density of the universe.

Halo mass function for all 50 simulated boxes of the CLIMB suite. The shape of all lines have a similar slope, however they show a shift in the number of halos for different cosmologies. Three reference lines from the TNG50, TNG100 and TNG300 simulations (Nelson et al. 2021) are shown as black lines.

Comparison of the CLIMB suits to other works with varying cosmologies. All plots shown here are made from CLIMB high.

Spectra generation

Example gallery of single spectra created from the CLIMB high simulations using TEMET. All spectra shown were made from the same box and different lines of sight.

To increase the amount of information per input spectrum and allow the network to use longer sections of real spectra in the inference mode, the spectra are augmented. Different short spectra (upper panel) are randomly shuffled and patched together to make one long spectrum (lower panel).

Comparison of different noise models. In the upper plot no noise is added to the synthetic spectrum. In the middle plot a constant random Gaussian noise with Signal-to-noise (SNR) 5 is added. In the lower plot the mean SNR ratio per pixel of the SDSS catalog spectra with median SNR > 5 is assumed.

As a reference, two observed spectra from the SDSS DR9 Lyman alpha catalog.

Transformer Model

Flow chart of the Transformer Network used in this work. The final model has about 4 million trainable parameters.

From the 500,00 available spectra 70% is used as a training set, 15% for a validation set and 15% for a test set. The Transformer is trained for 6 Epochs on the trainind dataset. An example training curve can be seen here.

Inference Results

To judge the performence of the Transformer, it is first applied to spectra from a reference box. This box has the cosmological parameters found by the Planck 2015 study and was never seen during training. $\Omega_m$ and $\Omega_\Lambda$ are predicted accurately with sharp peaks, while $\Omega_b$ and $h$ have wider distributions centered generally around the correct values.

Finally the Transformer is also applied to observed spectra from the SDSS survey. The predictions for $\Omega_m$ and $\Omega_\Lambda$ are in agreement with the Planck measurements, while $h$ also is in agreement with the Planck value, although our models seem to favor higher values. $\Omega_b$ is significantly underestimated by the models. For a discussion of this behaviour see the written theses.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
assets		assets
code		code
.gitignore		.gitignore
README.md		README.md
thesis.pdf		thesis.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The CLIMB Project

Simulations

Spectra generation

Transformer Model

Inference Results

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

The CLIMB Project

Simulations

Spectra generation

Transformer Model

Inference Results

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages