-
Notifications
You must be signed in to change notification settings - Fork 3
Refactor db_handler and model_parameter modules; add possibility to change parameters from command line #1850
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This pull request refactors the database handling and model parameter modules, primarily separating MongoDB-specific code into a dedicated module and adding functionality to overwrite model parameters from the command line. The changes enable users to modify simulation model parameters at runtime using YAML files following the model repository format, while maintaining backward compatibility with existing code.
Key Changes:
- Separated MongoDB-specific database operations into a new
simtools.db.mongo_dbmodule - Renamed
mongo_db_configtodb_configthroughout the codebase to decouple from specific database technology - Added
overwrite_model_parametersparameter to model classes and command-line interface - Refactored parameter modification methods (renamed
change_parametertooverwrite_model_parameter, etc.) - Added functionality to validate sim_telarray metadata in integration tests
Reviewed Changes
Copilot reviewed 71 out of 71 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
tests/unit_tests/model/test_model_parameter.py |
Updated tests for renamed parameter modification methods and removed obsolete tests |
tests/unit_tests/db/test_db_handler.py |
Refactored tests to use new MongoDB handler delegation pattern |
tests/unit_tests/db/test_mongo_db.py |
New comprehensive test suite for MongoDB-specific operations |
tests/conftest.py |
Updated fixtures to use db_config parameter and changed default model version |
src/simtools/db/mongo_db.py |
New module containing MongoDB-specific database operations |
src/simtools/db/db_handler.py |
Refactored to delegate MongoDB operations to the new handler |
src/simtools/model/model_parameter.py |
Added parameter overwriting functionality and renamed methods |
src/simtools/testing/assertions.py |
Added sim_telarray metadata validation support |
src/simtools/configuration/commandline_parser.py |
Added --overwrite_model_parameters command-line argument |
| Multiple application files | Updated to use db_config instead of mongo_db_config |
tobiaskleiner
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added few minor comments, thanks @GernotMaier for the PR.
|
|
||
| db_client: MongoClient | None = None | ||
| _lock = Lock() | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make senser to add a close connection here (close_connection()) to close the client?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The MongoClient is thread-safe and maintains its own connection pool . For normal process exit, sockets and threads are cleaned up; no need to call close() just to avoid leaks. So I think no close() is need.
src/simtools/db/mongo_db.py
Outdated
| def __init__(self, db_config=None): | ||
| """Initialize the MongoDBHandler class.""" | ||
| self._logger = logging.getLogger(__name__) | ||
| self.db_config = self._validate_db_config(db_config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could call self.validate_db_config(db_config) directly here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
| datetime.datetime | ||
| The generation time of the document's ObjectId. | ||
| """ | ||
| return ObjectId(document["_id"]).generation_time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need a check here that id exists?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes and no. If _id doesn’t exist, than the DB is messed up and we have a larger problem. I think the KeyError issued in this cases will be clear enough.
| # _id is a public attribute in GridFS GridOut objects | ||
| # pylint: disable=protected-access | ||
| return file_system.find_one({"filename": kwargs["filename"]})._id | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add a check that the file exists here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For filename: all DB entries are extensively validated against their schema at the filling stage. I think we can safely assume that this key exists.
| """ | ||
| if self.db_config: | ||
| db_server = self.db_config["db_server"] | ||
| domain_pattern = r"^([a-zA-Z0-9-]+\.)+[a-zA-Z]{2,}$" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is also ipaddress.ip_address() to check for IP adress. Not sure regex would work for IPv4 adress?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are testing domain names like ‘"cta-simpipe-protodb.zeuthen.desy.de”’ vs localhost, not IP addresses. Is this still an issue?
| ------- | ||
| list | ||
| List of collection names | ||
| """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess that the cache is not renewed doesnt pose a problem here. Or do we need a refresh or discard method for any case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I get this one - the scenario would be that we add collections between the first call and a later one? I don't think we do this at all.
|
|
@tobiaskleiner - thanks for the careful review. Please have a look and let me know if this is ok. |
|
Thanks @GernotMaier, yes all good now. |




Adds functionality and addresses a couple of issues regarding model parameters and database handling. A bit of a big pull request, but things are somehow interwoven and sometimes difficult to separate.
Change model parameter from command line
Closes #1795
Allow to change model parameter values from the command line using the same format as used in the 'info.yml' files in the simulation models, e.g.:
changes the values for LSTN-01.
See the new test file
tests/resources/info_test_model_parameter_changes.ymlfor a complete example.Simplified changed from simtools modules
The original way of changing parameters (
"par_name": value) is kept for backwards compatibility with existing code (applied only to TelescopeModel or SiteModel)Refactoring of db_handler class
db_handler is too big and this PR separates code related specifically to the mongoDB into
simtools.db.mongo_db.Closes #1769
Add checks for changes in metadata
Add the functionality for the integration tests to test the metdata of the sim_telarray output files. E.g. given in the test configuration file the following block
Will query the metadata for the given values and lead to a failure if the comparison fails.
Note that the numbers correspond to the sim_telarray numbering of telescopes used in the metadata. I know this is not ideal, but probably fine as a start.
General cleanup
Parsed carefully through the code and improved / simplified anything obvious. Probably the most annoying (for reviewer) changes throughout the code is
mongo_db_configtodb_config(there is no need to embed the actual database technology into the code).Main integration test
Modifying parameters and testing the modifications has been added to this integration test:
The modification file is in
tests/resources/info_test_model_parameter_changes.ymlPossible issues / notes
Parameter values might be changed now in the
simtools.models.model_parametermodule. This means we cannot rely on the (quite efficient) structure from db_handler where parameters for a given telescope types are not duplicated (if all or some of them are identical). There is now adeepcopyinsimtools.model.model_parameters._load_parameters_from_db()- otherwise very hard to change parameter for a given telescope, but not for all. I am pretty sure this introduces not a performance or memory issue (unless we start simulate a huge number of telescopes).deprecatedin the file given to overwrite parameters (needed?)