Preface: This is actually happening on a GUI using sd-scripts as backend. After talking to the maintainer, he told me it's an issue related to sd-scripts, I came to ask here if the issue is actually related to sd-script and if you faced similar issues, if it's possible to fix this issue.
Describe the bug
I've been doing longer training sessions testing multiple styles lora. I noticed that if I reasume a state more than once I don't get the correct number of steps and epoch.
I'm not sure if this can create issues for scheduler or warmup.
Example:
I want to train a total of 30 epoch.
I train 1000 steps 10 epoch.
I stop the training, epoch 10 state saved.
I reassure the training from epoch 10, everything seems correct.
I train another 1500 steps and 15 epoch and keep the last save.
I resume the state, but somehow I'm restarting from 1500 steps 15 epoch, instead of epoch 25.
Expected behavior
Steps and epoch should stockpile during all sequential training sessions. Doesn't seem to do so.
How to recreate it
Just get two different successive save states, they won't sum. Just a few steps to see the issue.
Thanks
Preface: This is actually happening on a GUI using sd-scripts as backend. After talking to the maintainer, he told me it's an issue related to sd-scripts, I came to ask here if the issue is actually related to sd-script and if you faced similar issues, if it's possible to fix this issue.
Describe the bug
I've been doing longer training sessions testing multiple styles lora. I noticed that if I reasume a state more than once I don't get the correct number of steps and epoch.
I'm not sure if this can create issues for scheduler or warmup.
Example:
I want to train a total of 30 epoch.
I train 1000 steps 10 epoch.
I stop the training, epoch 10 state saved.
I reassure the training from epoch 10, everything seems correct.
I train another 1500 steps and 15 epoch and keep the last save.
I resume the state, but somehow I'm restarting from 1500 steps 15 epoch, instead of epoch 25.
Expected behavior
Steps and epoch should stockpile during all sequential training sessions. Doesn't seem to do so.
How to recreate it
Just get two different successive save states, they won't sum. Just a few steps to see the issue.
Thanks