-
Notifications
You must be signed in to change notification settings - Fork 16
Open
Description
o1js isn't exactly known for it determinism on completing proving work successfully....
Apparently sometimes, as also reported by nori, proving will stop working after a certain amount of proofs generated.
Apart from that, stuff can always go wrong and we should therefore have a robust system to deal with errors in workers.
Currently, we catch and log errors but leave the worker running without actually retrying the tasks or fixing the worker's error (which could be in a faulty state that isn't our fault).
So a strategy to fix this:
- Crash workers along with their proving work failing
- Docker restarts those workers automatically
- Retry tasks when using bullmq (we already do it for the localqueue). The restarted worker will pick it up as soon as it's restarted
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels
Type
Projects
Status
In Progress