Currently, to run a set of $N$ single-node tests on $M$ nodes using the --distribute option, ReFrame will generate $N\times M$ test jobs. For large scale runs (many tests, many nodes), this is inefficient for a number of reasons:
- ReFrame will have to instantiate and submit a very large amount of tests.
- ReFrame will have to generate multiple stage directories at once
- Since the jobs are independent, the overall throughput will be low, because every job will wait its turn in the scheduler.
Ideally, we would like such a scenario to be fulfilled by submitting a single job per node to be tested and then ReFrame run the set of tests inside the same job allocation.
Currently, to run a set of$N$ single-node tests on $M$ nodes using the $N\times M$ test jobs. For large scale runs (many tests, many nodes), this is inefficient for a number of reasons:
--distributeoption, ReFrame will generateIdeally, we would like such a scenario to be fulfilled by submitting a single job per node to be tested and then ReFrame run the set of tests inside the same job allocation.