Skip to content

Add control commands to track jobs in CLI#72

Merged
Vismayak merged 4 commits intomainfrom
51-add-job-monitoring-commands
Mar 27, 2026
Merged

Add control commands to track jobs in CLI#72
Vismayak merged 4 commits intomainfrom
51-add-job-monitoring-commands

Conversation

@Vismayak
Copy link
Copy Markdown
Contributor

@Vismayak Vismayak commented Mar 2, 2026

  • Updated README.md to include new job control commands for listing, checking status, viewing logs, and canceling jobs.
  • Enhanced CLI functionality to track jobs in a local registry, integrating with Slurm for real-time job state updates.
  • Introduced job name generation for SLURM jobs to improve identification and tracking.
  • Updated batch script creation to use dynamic job names based on model identifiers.

- Updated README.md to include new job control commands for listing, checking status, viewing logs, and canceling jobs.
- Enhanced CLI functionality to track jobs in a local registry, integrating with Slurm for real-time job state updates.
- Introduced job name generation for SLURM jobs to improve identification and tracking.
- Updated batch script creation to use dynamic job names based on model identifiers.
@Vismayak Vismayak linked an issue Mar 2, 2026 that may be closed by this pull request
10 tasks
@Vismayak
Copy link
Copy Markdown
Contributor Author

Vismayak commented Mar 2, 2026

Having trouble running more extensive tests because of large queue in CampusCluster but initial tests are promising. I think too much data is shown in LLMFlux status, will need to remove some of the unnecessary data. We also have the job status UNKNOWNthat needs to be handled

@Vismayak
Copy link
Copy Markdown
Contributor Author

Vismayak commented Mar 4, 2026

Note - Now that we are able to get a more accurate elapsed time using SLURM job details, we should remove the inefficient _wait_for_slurm_elapsed_seconds function

Will create a seperate issue for this. We could possibly run the benchmark task as a background process that polls the status of the job and creates benchmark file with the accurate elapsed time when job is completed

Vismayak added 2 commits March 4, 2026 12:43
- Updated CLI to use new job detail retrieval methods, replacing deprecated functions for active and historical jobs.
- Enhanced job state extraction logic to accommodate different Slurm JSON schemas.
- Added helper functions for formatting timestamps and job time limits.
- Improved test coverage for job commands and state extraction.
- Updated documentation strings for clarity and consistency.
- Changed the CLI to use job details from the `jobs` dictionary instead of `slurm_data`.
- Modified `get_active_job_details` to utilize `sacct` for fetching job states, improving accuracy for running and pending jobs.
- Simplified the `get_job_details` function to directly query accounting history, removing the fallback to the queue.
- Enhanced unit tests for job details retrieval, ensuring only relevant job IDs are returned.
@Vismayak Vismayak marked this pull request as ready for review March 4, 2026 18:56
@Vismayak Vismayak requested a review from joshfactorial March 4, 2026 18:56
@joshfactorial
Copy link
Copy Markdown
Collaborator

Interesting idea. How do I use this? example command?

@Vismayak
Copy link
Copy Markdown
Contributor Author

Vismayak commented Mar 5, 2026

Hi @joshfactorial, you can run it with the following instructions in the README 😄

@Ralhazmy1 Ralhazmy1 self-requested a review March 5, 2026 21:37
Copy link
Copy Markdown
Contributor

@Ralhazmy1 Ralhazmy1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested jobs commands worked as expected

@joshfactorial
Copy link
Copy Markdown
Collaborator

Everything worked for me!

@Vismayak Vismayak merged commit a2da0f4 into main Mar 27, 2026
1 check passed
@Vismayak Vismayak deleted the 51-add-job-monitoring-commands branch March 27, 2026 20:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add job monitoring commands (jobs, status, logs, cancel)

3 participants