-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Deprecate seed in favor of rng #2888
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
for more information, see https://pre-commit.ci
This comment was marked as outdated.
This comment was marked as outdated.
for more information, see https://pre-commit.ci
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
this is orders of magnitude faster if you want to sample only a single item from a list.
for more information, see https://pre-commit.ci
|
Given how common it is to draw a single random number (i.e., def test_rand(rng, initial_size=100):
initial_values = rng.random(size=initial_size)
while True:
for entry in initial_values:
yield entry
initial_value = rng.random(size=initial_size)
def draw():
return next(a)
a = test_rand(rng, 200) And then I timed these as shown below So, a naive shift from |
|
Performance benchmarks:
|
I don't know if it's a good idea, but: Can you pre-generate an array (like a 100 or a 1000), use those numbers, and everytime you used up the array generate a new one? Basically generate random numbers in batch? |
yes, that is exactly what I show here: #2888 (comment) |
|
@quaquel Your stuff is always great and this is no exception. I don't think I have anything substantive to add. From my perspective it is better to improve the rng even if it has a performance hit. Then the next question becomes, are there ways to mitigate the performance hit while ensuring optimal randomness. Two options on the table:
I think 1 is the best bet. Subclassing in python to a generator using c-extensions may create unexpected challenges that may have unintended impacts on user simulations. At this point I would recommend 1, but always open to discussion! |
|
Can we add a Model lever keyword argument, that defaults to using NumPy RNG but allows you to use random if you need te performance? |
|
I'll try to take another look at this somewhere this week. As a first step, I'll add the wrapper idea to the model class so you can just do @EwoutH, nothing prevents you from using stdlib.random in your own models in the future. If we decide to make |
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
This comment was marked as outdated.
This comment was marked as outdated.
|
The benchmarks are a bit funky via github actions, so I ran them locally. The results are shown below. The TLDR from my perspective is that the performance loss is now acceptable and the benefits of using numpy's random number generation outweight the loss of performance. This is the benchmarking with a new There is still some performance loss as can be seen, but this is more manageable. Moreover, some of this is likely due to the overhead of the deprecation decorator I added as part of this PR. I'll run a local benchmark later, removing this decorator, to get a clearer sense of the performance loss. /opt/anaconda3/bin/python /Users/jhkwakkel/Documents/GitHub/mesa/benchmarks/compare_timings.py
|
Curious for this. Because 25 to 60% runtime is not trivial. |
It's a bit tricky to directly compare the runtime of the old and new cases for both wolf sheep and Schelling. In the case of Schelling, if everyone is happy, no further moves take place, and thus the model runs much faster. In the case of wolf sheep, if all the sheep die quickly, again the model runs much faster afterwards. Since the random numbers are different, the fraction of models that reach this fast "state" might very well be different. Based on my testing of individual operations (e.g., random.shuffle vs. rng.shuffle), there is a performance loss of about 30%-50% on these individual operations. How that adds up in a bigger model is more difficult to assess because these individual operations are just a small part of the overall model. In many ways, Boltzmann might be the best one to look at because this model has virtually no overhead from agent logic. Additionally, I would like to delve a bit deeper by conducting some line profiling to identify where performance is being lost exactly. Accidentally, I just ran a quick benchmark with all the decorators removed, and the overhead of the deprecation decorator is just a few percent. |
This PR addressed #2884.
It does several things
FutureWarningwhen seed or random is being used instead of the new preferred rng kwarg. I use. FutureWarning because this is intended for end users (See https://docs.python.org/3/library/warnings.html).reminder
we need to fix the set_seed widget on the solara side as well!