Summary
On a clean published-artifact polyglot stack, a resumed Python-authored workflow task can be created and leased after a Python activity completes, but the polling Python worker never receives a usable poll response and the workflow stays waiting until client result timeout.
Reproduction
Environment used:
durableworkflow/server:0.2.109
durable-workflow==0.4.33
- shared queue
polyglot-shared
- one Python worker registered for both workflows and activities
Minimal failing case:
- Start a Python worker with the default
DURABLE_WORKFLOW_POLL_TIMEOUT_SECONDS=30.
- Start workflow
polyglot.python.calls_python with a simple dict payload.
- The Python worker completes the initial workflow task and schedules
polyglot.activity.python.echo.
- The Python worker completes that activity successfully.
- The server creates a follow-up workflow task for the same run, but the workflow remains stuck in
waiting and handle.result(timeout=240) times out.
Clean-stack evidence from the failing run:
- workflow run
python_calls_python-177b48f7 stayed waiting
- initial workflow task
01krmce8ndt6yw1n0ryzc2yr4g completed successfully
- activity execution
01krmce92e36datpbtfdvp2zcn completed successfully
- follow-up workflow task
01krmcf7xaa4efkd2cj00gys90 was created and then leased to the Python worker
- the worker never logged a successful
workflow-tasks/poll response or workflow re-entry for that follow-up task
Relevant durable state from MySQL for the failing run:
- run status:
waiting
- follow-up task row:
01krmcf7xaa4efkd2cj00gys90, status=leased, lease_owner=py-worker-4e478e0100d9-13
- task payload was small metadata only:
{"open_wait_id":"activity:01krmce92e36datpbtfdvp2zcn","activity_type":"polyglot.activity.python.echo",...}
Expected
Once the Python activity completes, the follow-up workflow task should be returned cleanly through worker poll and the workflow should complete.
Actual
The follow-up task is created and leased, but the worker never gets a usable poll response for it, and the workflow remains stuck.
Workaround
Raising the Python worker poll timeout from 30 to 60 made both a targeted python_calls_python repro and the full four-corner polyglot smoke pass on the same published artifacts.
That suggests either:
- the server is leasing the resumed task before the worker poll response is fully deliverable, or
- the resumed-task poll response path is slow enough that the default Python poll timeout is too tight.
Summary
On a clean published-artifact polyglot stack, a resumed Python-authored workflow task can be created and leased after a Python activity completes, but the polling Python worker never receives a usable poll response and the workflow stays
waitinguntil client result timeout.Reproduction
Environment used:
durableworkflow/server:0.2.109durable-workflow==0.4.33polyglot-sharedMinimal failing case:
DURABLE_WORKFLOW_POLL_TIMEOUT_SECONDS=30.polyglot.python.calls_pythonwith a simple dict payload.polyglot.activity.python.echo.waitingandhandle.result(timeout=240)times out.Clean-stack evidence from the failing run:
python_calls_python-177b48f7stayedwaiting01krmce8ndt6yw1n0ryzc2yr4gcompleted successfully01krmce92e36datpbtfdvp2zcncompleted successfully01krmcf7xaa4efkd2cj00gys90was created and then leased to the Python workerworkflow-tasks/pollresponse or workflow re-entry for that follow-up taskRelevant durable state from MySQL for the failing run:
waiting01krmcf7xaa4efkd2cj00gys90,status=leased,lease_owner=py-worker-4e478e0100d9-13{"open_wait_id":"activity:01krmce92e36datpbtfdvp2zcn","activity_type":"polyglot.activity.python.echo",...}Expected
Once the Python activity completes, the follow-up workflow task should be returned cleanly through worker poll and the workflow should complete.
Actual
The follow-up task is created and leased, but the worker never gets a usable poll response for it, and the workflow remains stuck.
Workaround
Raising the Python worker poll timeout from
30to60made both a targetedpython_calls_pythonrepro and the full four-corner polyglot smoke pass on the same published artifacts.That suggests either: