Skip to content

Activity task can be leased and marked running before Python worker receives usable poll response #5

@rmcdaniel

Description

@rmcdaniel

Summary

On a clean published-artifact polyglot stack, a PHP-authored workflow can schedule a Python activity, the server can durably lease that activity task and mark the execution/attempt running, but the Python worker never surfaces a usable activity poll response and never reaches user activity code.

This looks related to, but distinct from, #4:

Reproduction

Environment used:

  • durableworkflow/server:0.2.109
  • durable-workflow/workflow 2.0.0-alpha.143
  • durable-workflow==0.4.33
  • shared queue polyglot-shared
  • one PHP workflow worker, one PHP activity worker, one Python worker

Failing case:

  1. Start the published PHP workflow worker and published Python worker on the same task queue.
  2. Start workflow polyglot.php.calls_python.
  3. The PHP workflow worker handles the initial workflow task and schedules polyglot.activity.python.echo.
  4. The client waits for completion with handle.result(timeout=240).
  5. The run stays waiting until timeout.

Worker evidence

  • PHP workflow worker log:
    • php workflow worker handled polyglot.php.calls_python with 1 command(s)
  • Python worker log:
    • registered successfully
    • kept heartbeating
    • never logged python activity echo received payload type=dict
    • never logged a successful activity-task completion for the run

Durable state from MySQL

Stalled run php_calls_python-584c62aa:

  • workflow run 01krmhz6vcc0s4chwc7ad8ktac: status=waiting
  • initial workflow task 01krmhz6w49k76v51n01m4cbp3: task_type=workflow, status=completed, lease_owner=php-workflow-alpha143-20260515
  • follow-up task 01krmhz73var2n92zm535200kd: task_type=activity, status=leased, lease_owner=py-worker-alpha143-20260515
  • activity execution 01krmhz73qww598x54geyyy910: activity_type=polyglot.activity.python.echo, status=running
  • activity attempt 01KRMHZS3RB9H4AGR4RT1ZZHAT: status=running, lease_owner=py-worker-alpha143-20260515

So the server has already moved durable activity state forward to running, but the Python worker never surfaces usable activity execution on its side.

Expected

If the activity poll response is not durably usable by the worker, the server should not strand the activity in a leased/running state with no replay path.

Actual

The activity task can become leased and the execution/attempt can become running before the Python worker ever logs activity start or completion.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions