Dart lowres cmeps#657
Open
kdraeder wants to merge 3 commits into
Open
Conversation
Changes are needed in multiple components: cesm, cime, cmeps, mom/MOM6.
The branches are labeled with DART_lowres_{component}.
cesm/driver/ensemble_driver.F90
Remove PETcount versus NINST test to let middle-sized tests work.
if(modulo(PetCount-pio_asyncio_ntasks*number_of_members, &
number_of_members) .ne. 0) then
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This issue stems from cime #4933, which is about developing a large ensemble test
motivated by DART applications.
Because of the large ensemble, the testing will be more managable
if it uses a coarse resolution grid. An ne3 grid is available for CAM and CTSM,
and now a ~10 degree resolution is available in MOM6 (MOM_interface #311).
These have been combined into a new CESM grid and used in ERI and MCC tests,
which also use a new testmod tailored to DART needs.
I'm open to suggestions for a shorter testmod name,
but @billsacks and I feel that it will be helpful to have DART in it.
This grid (especially the MOM6 grid) limits the tasks/instance to 12
(6 for MOM, 6 for the other components).
An MCC test for a small ensemble passes all test stages
(/glade/work/raeder/Exp/CESM+DART_testing/MCC_cG.ne3pg3_10deg.B_DART.lowres)
but ensembles which require more than 1 node mostly fail
with an error in cmeps/cesm/driver/ensemble_driver.F90.
This seems to arise from smaller ensembles fitting into a single (develop qeueu) node,
where the exact number of processors needed is assigned to them,
while larger ensembles need multiple (cpu/main) nodes
and more processors are assigned to the job than are requested.
For example, 40 instances request 12 x 40 = 480 processors.
This requires 4 nodes x 128 = 512 processors are assigned.
This difference causes an error:
PetCount ( 512) - Async IOtasks ( 0) must be evenly divisable by number of members ( 40).
When the check for this error is removed, the job goes farther,
but hangs just before the time stepping in CAM. This can be prevented by choosing MAX_TASKS_PER_NODE in a way that prevents any instance from being laid out across 2 nodes.
The changes required to do this are beyond the scope of this PR,
and are handled in CESM #398.
Description of changes
Commenting out the consistency check between PetCount and number_of_members,
if(modulo(PetCount-pio_asyncio_ntasks*number_of_members, number_of_members) .ne. 0) thenallows the test to proceed.
I could not trace the variables back through ESMF to figure out an if-test
which would handle this situation, and developers I talked to weren't certain that it's essential,
so my temporary solution is to comment out the test, without removing it.
Specific notes
Contributors other than yourself, if any:
@billsacks @jedwards4b
CMEPS Issues Fixed (include github issue #): #461
This is also essential for issues in other components:
ESMCI/cime #4933 (overview issue)
CESM PR #398
ESMCI/ccs_config PR #285
NCAR/MOM6 #413
Are changes expected to change answers? (specify if bfb, different at roundoff, more substantial)
This is not expected to change answers in tests which ran successfully before this change.
Some tests which would not run before will now run. It's possible that some of those should not run,
but I have not looked into those.
Any User Interface Changes (namelist or namelist defaults changes)?
Users who want to run ERI or MCC tests with an ensemble which can fit some,
but not all, instances on 1 node, will need to include the test_mods developed in CESM #398
and follow the instructions for setting MAX_TASKS_PER_NODE.
Testing performed
Please describe the tests along with the target model and machine(s)
If possible, please also added hashes that were used in the testing.
Extensive testing (development) of ERI and MCC tests were conducted in a version of cesm3_0_alpha08d,
modified to enable the 10-degree MOM6 grid, using a BHIST compset, on derecho.
The relevant changes (multiple components) were imported to the cesm3_0_alpha09a tag
and tested in cases in /glade/work/raeder/Exp/CESM+DART_testing: