Skip to content

Conversation

@sujik18
Copy link
Member

@sujik18 sujik18 commented Nov 9, 2025

βœ… PR Checklist

  • Target branch is dev

πŸ“Œ Note: PRs must be raised against dev. Do not commit directly to main.

βœ… Testing & CI

  • Have tested the changes in my local environment, else have properly conveyed in the PR description
  • The change includes a GitHub Action to test the script(if it is possible to be added).
  • No existing GitHub Actions are failing because of this change.

πŸ“š Documentation

  • README or help docs are updated for new features or changes.
  • CLI help messages are meaningful and complete.

πŸ“ File Hygiene & Output Handling

  • No unintended files (e.g., logs, cache, temp files, pycache, output folders) are committed.

πŸ›‘οΈ Safety & Security

  • No secrets or credentials are committed.
  • Paths, shell commands, and environment handling are safe and portable.

πŸ™Œ Contribution Hygiene

  • PR title and description are concise and clearly state the purpose of the change.
  • Related issues (if any) are properly referenced using Fixes # or Closes #.
  • All reviewer feedback has been addressed.

Fixes #37
Have added a feature to maintain a modified_times json file to keep track of last modified time for each script and whenever a MLC command is executed to check whether any files inside the script folders has been changed.
If any changes is detected, by using uid of the script, only that scripts entry is updated in both modified_times and index_script json file.

Before making changes to the parse-gate-question script:
index_script.json
modified_times.json

After making changes to script files
index_script.json
modified_times.json

Logs:

(mlcflow) sujith@ideapad-g3:~$ mlc show repos
[2025-11-09 12:52:15,404 repo_action.py:467 INFO] - Listing all repositories.

Repositories:
-------------
- Alias: local
  Path:  /home/sujith/MLC/repos/local

- Alias: llm-gate-exam-evaluation
  Path:  /home/sujith/MLC/repos/llm-gate-exam-evaluation

- Alias: mlcommons@mlperf-automations
  Path:  /home/sujith/MLC/repos/mlcommons@mlperf-automations

-------------
[2025-11-09 12:52:15,405 repo_action.py:474 INFO] - Repository listing ended
(mlcflow) sujith@ideapad-g3:~$ mlc search script --tags=gate 
[2025-11-09 12:53:26,533 index.py:199 INFO] - Script is modified, index getting updated
[2025-11-09 12:53:26,536 index.py:229 INFO] - Deleting and updating index entry for the script parse-gate-question with UID 8fe2944512654e81
[2025-11-09 12:53:26,552 index.py:61 INFO] - Saving modified times to /home/sujith/MLC/repos/modified_times.json
[2025-11-09 12:53:26,563 index.py:220 INFO] - Index updated (changes detected).
[2025-11-09 12:53:26,582 main.py:109 INFO] - Item path: /home/sujith/MLC/repos/llm-gate-exam-evaluation/script/app-llm-evaluation
[2025-11-09 12:53:26,582 main.py:109 INFO] - Item path: /home/sujith/MLC/repos/llm-gate-exam-evaluation/script/parse-gate-question
(mlcflow) sujith@ideapad-g3:~$ mlc search script --tags=gate 
[2025-11-09 12:53:44,330 main.py:109 INFO] - Item path: /home/sujith/MLC/repos/llm-gate-exam-evaluation/script/app-llm-evaluation
[2025-11-09 12:53:44,330 main.py:109 INFO] - Item path: /home/sujith/MLC/repos/llm-gate-exam-evaluation/script/parse-gate-question
(mlcflow) sujith@ideapad-g3:~$ 

@sujik18 sujik18 requested a review from a team as a code owner November 9, 2025 08:11
@github-actions
Copy link

github-actions bot commented Nov 9, 2025

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ βœ…

@anandhu-eng
Copy link
Contributor

Hi @sujik18 , Thanks for working on the pull request.

I just had a couple of questions:

1.⁠ ⁠With the new changes, what happens if a new user downloads the MLCFlow framework and simply runs a script?
2.⁠ ⁠Regarding the persistent index β€” does the check occur only once when the mlc command starts, or does it happen each time an individual script is executed?

[2025-11-10 06:45:06,707 index.py:190 INFO] - Script is modified, index getting updated
[2025-11-10 06:45:06,713 index.py:228 INFO] - Deleting and updating index entry for the script get-ml-model-deeplabv3-plus with UID cfb2d53b9dbc4dc0
[2025-11-10 06:45:06,714 index.py:190 INFO] - Script is modified, index getting updated
[2025-11-10 06:45:06,716 index.py:228 INFO] - Deleting and updating index entry for the script run-mlperf-power-client with UID bf6a6d0cc97b48ae
[2025-11-10 06:45:06,716 index.py:190 INFO] - Script is modified, index getting updated

Also I think it would be better to keep the logging regarding indexing in DEBUG level rather than at INFO level so that it only gets printed when user runs the command in verbose mode.

@sujik18
Copy link
Member Author

sujik18 commented Nov 10, 2025

Hi @anandhu-eng ,

  1. When a new user downloads the MLCFlow framework and runs any script for the first time, both modified_times.json and index_script.json will be missing. In this case, the indexing logic treats all scripts as new, builds the index from scratch, and creates both files. This happens only on the first run.
  2. The check occurs every time a mlc command starts. During this check, only scripts which has been modified are re-indexed.
  3. Currently, INFO logs are printed only when MLCFlow is run for the first time (index creation), or when a script has actually been modified since the last recorded timestamp. In all other cases, only normal command output appears. Thatswhy, I thought to keep it at INFO level, so that user gets to know that the updated script is taken into account to create the index but if required I can change it to debug level.

Example Logs:
when a script was modified before mlc command is executed:

(mlcflow) sujith@ideapad-g3:~$ mlc search script --tags=gate
[2025-11-10 18:32:01,537 index.py:190 INFO] - Script is modified, index getting updated
[2025-11-10 18:32:01,540 index.py:228 INFO] - Deleting and updating index entry for the script parse-gate-question with UID 8fe2944512654e81
[2025-11-10 18:32:01,557 index.py:61 INFO] - Saving modified times to /home/sujith/MLC/repos/modified_times.json
[2025-11-10 18:32:01,566 index.py:212 INFO] - Index updated (changes detected).
[2025-11-10 18:32:01,584 main.py:109 INFO] - Item path: /home/sujith/MLC/repos/llm-gate-exam-evaluation/script/app-llm-evaluation
[2025-11-10 18:32:01,584 main.py:109 INFO] - Item path: /home/sujith/MLC/repos/llm-gate-exam-evaluation/script/parse-gate-question
(mlcflow) sujith@ideapad-g3:~$ 

when no script was modified

(mlcflow) sujith@ideapad-g3:~$ mlc search script --tags=gate
[2025-11-10 18:42:56,748 main.py:109 INFO] - Item path: /home/sujith/MLC/repos/llm-gate-exam-evaluation/script/app-llm-evaluation
[2025-11-10 18:42:56,748 main.py:109 INFO] - Item path: /home/sujith/MLC/repos/llm-gate-exam-evaluation/script/parse-gate-question
(mlcflow) sujith@ideapad-g3:~$ 

Copy link
Contributor

@amd-arsuresh amd-arsuresh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have an estimation of the speedup due to this change?

mlc/index.py Outdated
config_file = p
break

if not config_file:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when can we have this case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When there is no meta.yaml or .json file in the script directory

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we are using meta.json anymore. Also we should return error if config file is not found as it is mandatory, I think its handled in line 283

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The meta.json check was included in the current implementation itself, here at #L117. Yes, if the config file is not found, the script won’t be taken into account for indexing, this is handled in line #145. Line 283 is only checking the unique ID field to see whether the uid field is present in the script’s meta file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

meta.json is used for cache entries.

@sujik18
Copy link
Member Author

sujik18 commented Nov 26, 2025

Currently in my local setup with two repos llm-gate-exam-evaluation and mlcommons@mlperf-automations
For mlc show repo script
With current implementation time observed when there was no change in any scripts:

real    0m3.456s
user    0m3.412s
sys     0m0.042s

With persistent MLC index time observed was:

real    0m0.182s
user    0m0.143s
sys     0m0.039s

So speedup for this process is approx 18.99

And in the scenario when there is a change in one of the script;
With the current implementation time observed was:

real    0m3.520s
user    0m3.406s
sys     0m0.048s

With persistent MLC index time observed was:

real    0m0.190s
user    0m0.140s
sys     0m0.050s

So speedup for this process is approx 18.52


# Validate and add to indices
if unique_id:
self._delete_by_uid(folder_type, unique_id, alias)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this add extra overhead? how about we use set to avoid duplication?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently deleting older index entry function is called only when a particular script is modified, and while deleting, it just needs to loop through the n number of uid entry, which might takes less than a millisecond to complete, not sure it will be adding any noticeable overhead.

"experiment": os.path.join(repos_path, "index_experiment.json")
}
self.indices = {key: [] for key in self.index_files.keys()}
self.modified_times_file = os.path.join(repos_path, "modified_times.json")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @sujik18 , how about we use corresponding target index file like index_script.json for also storing information about the modified time rather than creating another file separately? Do you think there would be a significant performance loss due to size of those files? If not, we could prevent keeping an additional file for this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently modified_times.json stores a simple dictionary, whereas index_script.json file is list of object which will be slower to parse. So I think loading mtimes would be simpler and cheaper this way as it can be completed even before index processing begins.

mlc/index.py Outdated
if os.path.isfile(config_path):
key = f"{repo_path}/{folder_type}/{automation_dir}"
current_script_keys.add(key)
mtime = self.get_script_mtime(automation_path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is executed for all targets like script, cache and experiment right? So, get_item_mtime may be a better name

mlc/index.py Outdated
continue

# update mtime
logger.debug("Script is modified, index getting updated")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not only script, cache or experiment entries are also possible here

Copy link
Contributor

@arjunsuresh arjunsuresh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some tests are failing now - I believe the problem is that we are not resetting the index on a repo update. When we register and unregister repos, we need to force reindexing - probably the easiest solution is to check the modified time of MLC/repos/repos.json and clear the index.

@arjunsuresh
Copy link
Contributor

Currently in my local setup with two repos llm-gate-exam-evaluation and mlcommons@mlperf-automations For mlc show repo script With current implementation time observed when there was no change in any scripts:

real    0m3.456s
user    0m3.412s
sys     0m0.042s

With persistent MLC index time observed was:

real    0m0.182s
user    0m0.143s
sys     0m0.039s

So speedup for this process is approx 18.99

And in the scenario when there is a change in one of the script; With the current implementation time observed was:

real    0m3.520s
user    0m3.406s
sys     0m0.048s

With persistent MLC index time observed was:

real    0m0.190s
user    0m0.140s
sys     0m0.050s

So speedup for this process is approx 18.52

That's great. But when I tried the same mlc show repo on my system I see "0.7s" and with the changes it becomes "3s". Is something missing?

@sujik18
Copy link
Member Author

sujik18 commented Nov 27, 2025

That's great. But when I tried the same mlc show repo on my system I see "0.7s" and with the changes it becomes "3s". Is something missing?

Was this only happening on the first run? If so, that’s expected as on the first run with the changes will recreate modified_times.json and index.json. Or did you observe the slower time on the second run as well?

@sujik18
Copy link
Member Author

sujik18 commented Nov 27, 2025

Also I think it would be better to keep the logging regarding indexing in DEBUG level rather than at INFO level so that it only gets printed when user runs the command in verbose mode.

Hi @anandhu-eng, I'm facing difficulty seeing the debug logs. I thought using the -v or --verbose flag at the end of the command was the correct way to set the log level to DEBUG. Is this correct, or is there a known issue preventing the logs from being displayed?

@arjunsuresh
Copy link
Contributor

That's great. But when I tried the same mlc show repo on my system I see "0.7s" and with the changes it becomes "3s". Is something missing?

Was this only happening on the first run? If so, that’s expected as on the first run with the changes will recreate modified_times.json and index.json. Or did you observe the slower time on the second run as well?

arjun@intel-spr-i9:~/mlcflow$ time mlc show repo
[2025-11-27 16:33:41,813 repo_action.py:479 INFO] - Listing all repositories.

Repositories:
-------------
- Alias: local
  Path:  /home/arjun/MLC/repos/local

- Alias: arjun
  Path:  /home/arjun/mlcflow/arjun

- Alias: gateoverflow@go-pdfs
  Path:  /home/arjun/MLC/repos/gateoverflow@go-pdfs

- Alias: mlcommons@mlperf-automations
  Path:  /home/arjun/MLC/repos/gateoverflow@mlperf-automations

- Alias: ll
  Path:  /home/arjun/ll

-------------
[2025-11-27 16:33:41,813 repo_action.py:486 INFO] - Repository listing ended

real    0m0.807s
user    0m0.792s
sys     0m0.015s

With the new changes (repeated runs, first run took around 10s)

time mlc show repo
Traceback (most recent call last):
  File "/home/arjun/.local/bin/mlc", line 5, in <module>
    from mlc.main import main
  File "/home/arjun/.local/lib/python3.12/site-packages/mlc/__init__.py", line 3, in <module>
    from .action import access
  File "/home/arjun/.local/lib/python3.12/site-packages/mlc/action.py", line 740, in <module>
    default_parent = Action()
                     ^^^^^^^^
  File "/home/arjun/.local/lib/python3.12/site-packages/mlc/action.py", line 204, in __init__
    self.index = Index(self.repos_path, self.repos)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/arjun/.local/lib/python3.12/site-packages/mlc/index.py", line 42, in __init__
    self.build_index()
  File "/home/arjun/.local/lib/python3.12/site-packages/mlc/index.py", line 208, in build_index
    mtime = self.get_item_mtime(automation_path)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/arjun/.local/lib/python3.12/site-packages/mlc/index.py", line 139, in get_item_mtime
    t = os.path.getmtime(fp)
        ^^^^^^^^^^^^^^^^^^^^
  File "<frozen genericpath>", line 67, in getmtime
FileNotFoundError: [Errno 2] No such file or directory: '/home/arjun/MLC/repos/local/cache/get-mlperf-inference-nvidia-scratch-space_ca1f6015/data/imagenet'

real    0m2.063s
user    0m1.069s
sys     0m0.994s

I think the overhead is coming from more cache entries - here I have 695 cache folders.

@sujik18
Copy link
Member Author

sujik18 commented Nov 27, 2025

With the new changes (repeated runs, first run took around 10s)

@arjunsuresh I think we should skip the data folder while checking mtime. Or will it not make much difference in the total time taken?

Also, I'm not sure why the detect,cpu script is failing. Earlier I thought it might be due to waiting for input to choose a cached script, but even after skipping that check locally, the test succeeds yet on GitHub Actions it still fails.

Should we add another Action run that tests the same script without the -s flag so we can see the full logs?

@arjunsuresh
Copy link
Contributor

With the new changes (repeated runs, first run took around 10s)

@arjunsuresh I think we should skip the data folder while checking mtime. Or will it not make much difference in the total time taken?

Also, I'm not sure why the detect,cpu script is failing. Earlier I thought it might be due to waiting for input to choose a cached script, but even after skipping that check locally, the test succeeds yet on GitHub Actions it still fails.

Should we add another Action run that tests the same script without the -s flag so we can see the full logs?

Actually we only need to check meta.yaml and meta.json files for indexing right? Because the index is affected only by a change of tags.

Regarding the test failure, I could replicate it once but not again. When we use -s no log output with "INFO" should be there but some were coming from index.py. We can see this at end.

@sujik18
Copy link
Member Author

sujik18 commented Nov 27, 2025

When we use -s no log output with "INFO" should be there but some were coming from index.py. We can see this at end.

Run mlcr detect,cpu -j -s --quiet
[2025-11-27 13:48:18,707 index.py:61 INFO] - Saving modified times to /home/runner/MLC/repos/modified_times.json 

[2025-11-27 13:48:18,715 index.py:240 INFO] - Index updated (changes detected). 

I shall change these to DEBUG, since they shouldn’t appear when -s is used.
Also, was this the reason why mlc show repo -v wasn’t showing debug logs earlier, since those flags weren’t being considered properly?

@arjunsuresh
Copy link
Contributor

When we use -s no log output with "INFO" should be there but some were coming from index.py. We can see this at end.

Run mlcr detect,cpu -j -s --quiet
[2025-11-27 13:48:18,707 index.py:61 INFO] - Saving modified times to /home/runner/MLC/repos/modified_times.json 

[2025-11-27 13:48:18,715 index.py:240 INFO] - Index updated (changes detected). 

I shall change these to DEBUG, since they shouldn’t appear when -s is used. Also, was this the reason why mlc show repo -v wasn’t showing debug logs earlier, since those flags weren’t being considered properly?

When we use "-s" mlcflow shouldn't output info logs. That's how it's configured in main.py. If this is happening it's a bug and that needs to be fixed.

But the main problem is performance - I tried restricting the modified time check to just meta.yaml and meta.json and then there is no performance drop but no performance gain also.

@sujik18
Copy link
Member Author

sujik18 commented Nov 28, 2025

File "", line 67, in getmtime
FileNotFoundError: [Errno 2] No such file or directory: '/home/arjun/MLC/repos/local/cache/get-mlperf-inference-nvidia-scratch-space_ca1f6015/data/imagenet'

But the main problem is performance - I tried restricting the modified time check to just meta.yaml and meta.json and then there is no performance drop but no performance gain also.

@arjunsuresh I think we can further optimize the get_item_mtime function to skip folders like data and outputs. I believe this might reduce the performance drop, as I didn't observe the slowdown locally because I don't have any dataset files in my cache.

- Optimized check for meta.json file
- fix bug where changes in multiple scripts where not taken into account
mlc/index.py Outdated
def get_item_mtime(self,folder):
# logger.info(f"Getting latest modified time for folder: {folder}")
latest = 0
for root, _, files in os.walk(folder):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @sujik18 , how about we use os.listdir instead of os.walk in order to get the modified time of folder/files as os.walk might add recursive overhead when some assets like the inference repo is cloned in a cache.

I think its safe as changes in subfolders make the parent folder appear modified even when its own contents haven’t changed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @anandhu-eng , thanks for the suggestion. Actually, I have modified the code, and the function get_item_mtime will now only be called for the meta file of each script. Therefore, we don't need to use either os.listdir or os.walk.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sujik18 that change works great and we do see greater than 10X improvement in mlc performance with the change. But there's some bug if the modified times are absent in the index - most likely the tests are failing due to this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arjunsuresh Glad to know the performance improved!
I’ll debug and reproduce the error to fix the missing-mtime bug and ensure all tests pass.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sujik18 LGTM now. The startup performance is easily improved by 10X. Will do further testing too as this is a core change. We can merge this by the weekend.

@anandhu-eng please test the corner cases like manually removing the script/cache entries and updating the script, cache tags in the meta.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got this error when I manually removed the corresponding cache folder.

 No meta file found in /home/arsuresh/MLC/repos/local/cache/detect-os_356490ac for None
Traceback (most recent call last):
  File "/home/arsuresh/mlcflow/bin/mlc", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/arsuresh/mlcflow/lib/python3.12/site-packages/mlc/main.py", line 342, in main
    res = method(run_args)
          ^^^^^^^^^^^^^^^^
  File "/home/arsuresh/mlcflow/lib/python3.12/site-packages/mlc/cache_action.py", line 125, in show
    res = self.search(run_args)
          ^^^^^^^^^^^^^^^^^^^^^
  File "/home/arsuresh/mlcflow/lib/python3.12/site-packages/mlc/cache_action.py", line 62, in search
    expiration_time = item_meta.get('expiration_time')
                      ^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'get'

Copy link
Member Author

@sujik18 sujik18 Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @amd-arsuresh , it would be fixed in latest commit, as earlier while deleting from index entry it was looking for path of meta file instead of the directory name where the meta file exists, therefore index entry didn't got deleted for deleted scripts.

Copy link
Contributor

@arjunsuresh arjunsuresh Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sujik18 Safety wise now we are exactly same as without persistent index right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arjunsuresh with persistent index it only changes the recreation of index every time a mlc command is run, so I think safety aspects remains unchanged.

Regarding testing, I have tested for various cases locally like removing/modifying the meta file, repos.json file, modified_times.json and none of those had a issue.

However, there is one issue that I am not sure requires a fix or not. Since we are not monitoring the mtime for index script, if the index file is manually removed/modified without deleting the modified times, the index wont be rebuilded again, because the modified times currently wont be able to deduct it.

@anandhu-eng anandhu-eng linked an issue Nov 30, 2025 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Check the possibility of Persistent MLC index

4 participants