Skip to content

Conversation

@quaquel
Copy link
Member

@quaquel quaquel commented Nov 18, 2025

This PR addressed #2884.

It does several things

  1. Adds a decorator for annotating methods that currently take seed or random as a kwarg. The decorator results in a FutureWarning when seed or random is being used instead of the new preferred rng kwarg. I use. FutureWarning because this is intended for end users (See https://docs.python.org/3/library/warnings.html).
  2. Adds this decorator to all methods that have seed or random as a kwarg
  3. Starts adding rng to all decorated classes and update their inner workings to use numpy generators instead of stdlib random generators.

reminder
we need to fix the set_seed widget on the solara side as well!

@github-actions

This comment was marked as outdated.

@quaquel quaquel added enhancement Release notes label trigger-benchmarks Special label that triggers the benchmarking CI deprecation When a new deprecation is introduced labels Nov 18, 2025
@quaquel quaquel linked an issue Nov 18, 2025 that may be closed by this pull request
4 tasks
@quaquel quaquel removed the trigger-benchmarks Special label that triggers the benchmarking CI label Nov 18, 2025
@quaquel quaquel added the trigger-benchmarks Special label that triggers the benchmarking CI label Nov 18, 2025
@github-actions

This comment was marked as outdated.

@quaquel quaquel removed the trigger-benchmarks Special label that triggers the benchmarking CI label Nov 18, 2025
@quaquel quaquel added the trigger-benchmarks Special label that triggers the benchmarking CI label Nov 18, 2025
@github-actions

This comment was marked as outdated.

@quaquel quaquel added trigger-benchmarks Special label that triggers the benchmarking CI and removed trigger-benchmarks Special label that triggers the benchmarking CI labels Nov 20, 2025
@quaquel
Copy link
Member Author

quaquel commented Nov 20, 2025

Given how common it is to draw a single random number (i.e., random.random(), I was wondering about adding a shorthand for this and see how fast this was.

def test_rand(rng, initial_size=100):
    initial_values = rng.random(size=initial_size)

    while True:
        for entry in initial_values:
            yield entry
        initial_value = rng.random(size=initial_size)

def draw():
    return next(a)

a = test_rand(rng, 200)  

And then I timed these as shown below

%%timeit
[random.random() for _ in range(250)]
8.88 μs ± 72.4 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

%%timeit
[draw() for _ in range(250)]
20.7 μs ± 130 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

%%timeit
[next(a) for _ in range(250)]
16.4 μs ± 121 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

%%timeit
[rng.random() for _ in range(250)]
67.9 μs ± 273 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

So, a naive shift from random.random to rng.random, is almost an order of magnitude slower. We can effectively halve this by using the generator and advancing this generator directly via next, wrapping this next call up in another function adds some overhead, but it is still a lot faster than a naive replacement. So, I am considering adding a method to the model, model.rand, that performs this task. The other option would be to figure out how to subclass numpy.random.Generator and add this rand method to it, because then you can just do self.rng.rand as a much faster alternative for rng.random().

@github-actions
Copy link

Performance benchmarks:

Model Size Init time [95% CI] Run time [95% CI]
BoltzmannWealth small 🔴 +17.0% [+15.3%, +18.7%] 🔴 +50.4% [+50.0%, +50.8%]
BoltzmannWealth large 🔴 +9.1% [+7.3%, +10.5%] 🔴 +60.8% [+55.1%, +65.5%]
Schelling small 🔴 +16.5% [+16.1%, +16.9%] 🔵 +3.6% [+2.9%, +4.3%]
Schelling large 🔴 +10.6% [+9.0%, +12.3%] 🔴 +80.4% [+70.8%, +91.3%]
WolfSheep small 🔴 +6.0% [+5.4%, +6.5%] 🔴 +39.3% [+32.2%, +47.1%]
WolfSheep large 🔴 +6.6% [+5.5%, +7.9%] 🔴 +42.3% [+38.9%, +45.7%]
BoidFlockers small 🔵 +3.2% [+2.7%, +3.7%] 🔵 +1.9% [+1.6%, +2.3%]
BoidFlockers large 🔴 +3.6% [+3.2%, +3.9%] 🔵 +1.9% [+1.4%, +2.4%]

@EwoutH
Copy link
Member

EwoutH commented Nov 21, 2025

More broadly, numpy.random.default_rng() is designed for generating and operating on numpy arrays.

I don't know if it's a good idea, but: Can you pre-generate an array (like a 100 or a 1000), use those numbers, and everytime you used up the array generate a new one? Basically generate random numbers in batch?

@quaquel
Copy link
Member Author

quaquel commented Nov 21, 2025

I don't know if it's a good idea, but: Can you pre-generate an array (like a 100 or a 1000), use those numbers, and everytime you used up the array generate a new one? Basically generate random numbers in batch?

yes, that is exactly what I show here: #2888 (comment)

@quaquel quaquel removed the trigger-benchmarks Special label that triggers the benchmarking CI label Nov 25, 2025
@tpike3
Copy link
Member

tpike3 commented Dec 9, 2025

@quaquel Your stuff is always great and this is no exception. I don't think I have anything substantive to add. From my perspective it is better to improve the rng even if it has a performance hit.

Then the next question becomes, are there ways to mitigate the performance hit while ensuring optimal randomness. Two options on the table:

  1. wrapper
  2. subclass

I think 1 is the best bet. Subclassing in python to a generator using c-extensions may create unexpected challenges that may have unintended impacts on user simulations.

At this point I would recommend 1, but always open to discussion!

@EwoutH
Copy link
Member

EwoutH commented Dec 9, 2025

Can we add a Model lever keyword argument, that defaults to using NumPy RNG but allows you to use random if you need te performance?

@quaquel
Copy link
Member Author

quaquel commented Dec 9, 2025

I'll try to take another look at this somewhere this week. As a first step, I'll add the wrapper idea to the model class so you can just do model.rand if you need a single number. Next, I'll rerun the benchmarks to determine the performance difference.

@EwoutH, nothing prevents you from using stdlib.random in your own models in the future. If we decide to make model.rng the only option in MESA 4, you can still have MyCustomModel.random as well but you would have to add this to the model yourself.

@quaquel quaquel added the trigger-benchmarks Special label that triggers the benchmarking CI label Dec 9, 2025
@quaquel quaquel added trigger-benchmarks Special label that triggers the benchmarking CI and removed trigger-benchmarks Special label that triggers the benchmarking CI labels Dec 9, 2025
@github-actions

This comment was marked as outdated.

@quaquel quaquel removed the trigger-benchmarks Special label that triggers the benchmarking CI label Dec 10, 2025
@quaquel
Copy link
Member Author

quaquel commented Dec 10, 2025

The benchmarks are a bit funky via github actions, so I ran them locally. The results are shown below. The TLDR from my perspective is that the performance loss is now acceptable and the benefits of using numpy's random number generation outweight the loss of performance.

This is the benchmarking with a new model.rand method for generating individual random numbers on unit interval. Internally, this wraps a numpy array of a prespecified length that is replaced once exhausted. For further details, see my comment with the code above. The model.rand method is only used by the WolfSheep benchmark, but it significantly reduces the performance loss compared to a naive use of rng.random().

There is still some performance loss as can be seen, but this is more manageable. Moreover, some of this is likely due to the overhead of the deprecation decorator I added as part of this PR. I'll run a local benchmark later, removing this decorator, to get a clearer sense of the performance loss.

/opt/anaconda3/bin/python /Users/jhkwakkel/Documents/GitHub/mesa/benchmarks/compare_timings.py

Model Size Init time [95% CI] Run time [95% CI]
BoltzmannWealth small 🔴 +10.1% [+9.0%, +11.0%] 🔴 +33.9% [+33.4%, +34.2%]
BoltzmannWealth large 🔴 +9.6% [+9.2%, +10.0%] 🔴 +27.2% [+24.8%, +29.0%]
Schelling small 🔴 +15.5% [+14.9%, +16.0%] 🔵 -1.0% [-2.0%, -0.1%]
Schelling large 🔴 +12.1% [+11.5%, +12.5%] 🔴 +62.7% [+59.2%, +65.4%]
WolfSheep small 🔴 +16.5% [+15.8%, +17.3%] 🔴 +30.4% [+23.3%, +37.7%]
WolfSheep large 🔴 +15.1% [+14.9%, +15.4%] 🔴 +25.0% [+22.6%, +26.6%]
BoidFlockers small 🔵 +0.5% [-0.1%, +1.1%] 🔵 -0.3% [-0.5%, -0.1%]
BoidFlockers large 🔵 -2.0% [-4.3%, -0.1%] 🔵 -1.1% [-2.5%, -0.1%]

@EwoutH
Copy link
Member

EwoutH commented Dec 10, 2025

I'll run a local benchmark later, removing this decorator, to get a clearer sense of the performance loss.

Curious for this. Because 25 to 60% runtime is not trivial.

@quaquel
Copy link
Member Author

quaquel commented Dec 10, 2025

Curious for this. Because 25 to 60% runtime is not trivial.

It's a bit tricky to directly compare the runtime of the old and new cases for both wolf sheep and Schelling. In the case of Schelling, if everyone is happy, no further moves take place, and thus the model runs much faster. In the case of wolf sheep, if all the sheep die quickly, again the model runs much faster afterwards. Since the random numbers are different, the fraction of models that reach this fast "state" might very well be different.

Based on my testing of individual operations (e.g., random.shuffle vs. rng.shuffle), there is a performance loss of about 30%-50% on these individual operations. How that adds up in a bigger model is more difficult to assess because these individual operations are just a small part of the overall model. In many ways, Boltzmann might be the best one to look at because this model has virtually no overhead from agent logic.

Additionally, I would like to delve a bit deeper by conducting some line profiling to identify where performance is being lost exactly.

Accidentally, I just ran a quick benchmark with all the decorators removed, and the overhead of the deprecation decorator is just a few percent.

@quaquel quaquel changed the title Deprecate seed in favor of random Deprecate seed in favor of rng Dec 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

deprecation When a new deprecation is introduced enhancement Release notes label

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Deprecating stdlib random

3 participants