Skip to content

Conversation

@calvinp0
Copy link
Member

Updates the scheduler to handle inconsistent transition state output upon restart, ensuring convergence is correctly assessed.

Also, deduplicates transition state guess methods in the report and addresses potential issues arising from missing output paths after a restart.

Ensures internal consistency of the output dictionary after a restart, particularly regarding convergence status and output paths. Specifically, it introduces a `_missing_required_paths` method to check
if a species has all required output paths, and a `_sanitize_restart_output` method that resets the convergence status if required paths are missing, preventing misleading convergence flags in subsequent calculations.
Also resets output status when switching TS guesses or deleting all species jobs.
Addresses an issue where inconsistent transition state (TS) output during restart could lead to incorrect convergence status.
This commit introduces a mechanism to sanitize the output and ensure accurate convergence determination upon restarting a project.
Avoids repeating methods in the TS guess report by ensuring uniqueness of successful and unsuccessful methods. This improves readability and prevents redundant information.
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves restart and transition state (TS) guess handling in ARC by ensuring convergence is correctly assessed when restarting from inconsistent output states, deduplicating TS method names in reports, and preventing incorrect convergence status when required output paths are missing.

Changes:

  • Adds output sanitization on restart to detect and fix inconsistent convergence/path states
  • Deduplicates TS guess method names in the TS report to improve readability
  • Enhances convergence checking to verify required output paths exist before marking species as converged

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File Description
arc/scheduler.py Adds _sanitize_restart_output() and _missing_required_paths() methods; updates switch_ts() and delete_all_species_jobs() to properly reset output state including convergence, job_types, and paths
arc/species/species.py Updates make_ts_report() to deduplicate successful and unsuccessful TS method lists using dict.fromkeys() before joining
arc/scheduler_test.py Adds unit tests for path validation, restart sanitization, output reset on job deletion, and TS switching
functional/restart_test.py Adds functional test to verify restart sanitization correctly resets convergence when paths are missing

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

arc/scheduler.py Outdated
Comment on lines 3145 to 3155
for job_type, path_key in path_map.items():
if job_type == 'composite':
required = self.composite_method is not None
else:
required = self.job_types.get(job_type, False)
if not required:
continue
if self.species_dict[label].number_of_atoms == 1 and job_type in ['opt', 'freq']:
continue
if not self.output[label]['paths'].get(path_key, ''):
self.output[label]['job_types'][job_type] = False
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _sanitize_restart_output method duplicates logic from _missing_required_paths. Lines 3145-3155 repeat the same path validation logic that's already in _missing_required_paths (lines 3117-3128). Consider refactoring to reuse the existing helper method or extracting the path validation into a shared utility function to improve maintainability and reduce duplication.

Copilot uses AI. Check for mistakes.
if self.species_dict[label].number_of_atoms == 1 and job_type in ['opt', 'freq']:
continue
if not self.output[label]['paths'].get(path_key, ''):
self.output[label]['job_types'][job_type] = False
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the _sanitize_restart_output method, there's no check for the existence of 'job_types' key before accessing it at line 3155. While _missing_required_paths at line 3126 uses .get() with a default value to safely access job_types, this line directly accesses self.output[label]['job_types'][job_type]. If 'job_types' doesn't exist in self.output[label], this will raise a KeyError. Consider using safe dictionary access or add a check for the existence of 'job_types' key.

Suggested change
self.output[label]['job_types'][job_type] = False
if 'job_types' in self.output[label]:
self.output[label]['job_types'][job_type] = False

Copilot uses AI. Check for mistakes.
unique_successful_methods = list(dict.fromkeys(self.successful_methods))
self.ts_report += ','.join(unique_successful_methods)
if self.unsuccessful_methods:
self.ts_report += '\nMethods that were unsuccessfully in generating a TS guess:\n'
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grammatical error: "were unsuccessfully in generating" should be "were unsuccessful in generating" (remove the 'ly').

Suggested change
self.ts_report += '\nMethods that were unsuccessfully in generating a TS guess:\n'
self.ts_report += '\nMethods that were unsuccessful in generating a TS guess:\n'

Copilot uses AI. Check for mistakes.
arc/scheduler.py Outdated
Comment on lines 2624 to 2631
if label in self.output:
self.output[label]['convergence'] = False
for key in ['opt', 'freq', 'sp', 'composite', 'fine']:
if key in self.output[label]['job_types']:
self.output[label]['job_types'][key] = False
if 'paths' in self.output[label]:
for key in self.output[label]['paths']:
self.output[label]['paths'][key] = '' if key != 'irc' else list()
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is duplicated logic for resetting output between switch_ts (lines 2624-2631) and delete_all_species_jobs (lines 3620-3626). Both methods reset convergence, job_types, and paths in nearly identical ways. Since switch_ts calls delete_all_species_jobs at line 2623, consider removing the duplicate resetting logic from switch_ts and relying solely on delete_all_species_jobs to handle the output reset. This would improve maintainability and reduce the risk of the two implementations diverging.

Suggested change
if label in self.output:
self.output[label]['convergence'] = False
for key in ['opt', 'freq', 'sp', 'composite', 'fine']:
if key in self.output[label]['job_types']:
self.output[label]['job_types'][key] = False
if 'paths' in self.output[label]:
for key in self.output[label]['paths']:
self.output[label]['paths'][key] = '' if key != 'irc' else list()

Copilot uses AI. Check for mistakes.
Comment on lines +1539 to +1544
unique_successful_methods = list(dict.fromkeys(self.successful_methods))
self.ts_report += ','.join(unique_successful_methods)
if self.unsuccessful_methods:
self.ts_report += '\nMethods that were unsuccessfully in generating a TS guess:\n'
for unsuccessful_method in self.unsuccessful_methods:
self.ts_report += unsuccessful_method + ','
unique_unsuccessful_methods = list(dict.fromkeys(self.unsuccessful_methods))
self.ts_report += ','.join(unique_unsuccessful_methods)
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The deduplication of TS methods in the report will cause an existing test to fail. The test in arc/species/species_test.py at line 1202-1206 expects the ts_report to contain duplicates (e.g., 'autotst,autotst,autotst,autotst,gcn,gcn,...'), but after this change, the report will show 'autotst,gcn,kinbot' instead. The test should be updated to match the new expected behavior.

Copilot uses AI. Check for mistakes.
Ensures restart output state is internally consistent,
particularly addressing cases where convergence is indicated
without corresponding output paths for TS calculations.

This prevents issues arising from inconsistent data by resetting
the convergence status and job types when required output paths are missing.
The test suite adds a set to track created projects and adds a run_opt_job
mock in the "sanitize TS output" test.
Removes the mock for `delete_all_species_jobs` within the `switch_ts` test.
This mock is no longer required, simplifying the test setup.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants