Fix integration test worker crashes in Azure Functions on Py3.13#4260
Fix integration test worker crashes in Azure Functions on Py3.13#4260
Conversation
Three changes to prevent pytest-xdist workers from crashing during Azure Functions integration tests: 1. Add `start_new_session=True` to subprocess on Linux so signals (e.g. from test-timeout) cannot propagate between the func host and the xdist worker process. 2. Add an overall 100-second budget to the fixture setup loop so the retry logic never exceeds the 120-second test timeout. When pytest-timeout's thread method fires during fixture setup and the thread doesn't respond, it calls os._exit() which kills the xdist worker – this is the root cause of the "Not properly terminated" crashes. 3. Remove the `UV_PYTHON: "3.10"` workaround from both workflow files so integration tests actually run on Python 3.13. Co-authored-by: larohra <41490930+larohra@users.noreply.github.com>
Co-authored-by: larohra <41490930+larohra@users.noreply.github.com>
Co-authored-by: larohra <41490930+larohra@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Fixes pytest-xdist worker crashes in Azure Functions integration tests on Python 3.13 by preventing fixture startup from exceeding the pytest-timeout budget and isolating the func start subprocess from the worker’s process group. Also removes a stale workflow-level Python 3.10 override so CI actually runs these tests on 3.13.
Changes:
- Isolate the Azure Functions host subprocess on POSIX using
start_new_session=Trueto prevent signal cross-contamination. - Add an overall startup time budget in the
function_app_for_testfixture to ensure clean failure (pytest.fail) before pytest-timeout can hard-exit the worker. - Remove
UV_PYTHON: "3.10"overrides from the Functions integration test jobs so they inherit the workflow’s Python version (3.13).
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
python/packages/azurefunctions/tests/integration_tests/conftest.py |
Adds POSIX process-group isolation for func start and enforces an overall fixture startup budget to avoid pytest-timeout killing xdist workers. |
.github/workflows/python-merge-tests.yml |
Removes the job-level UV_PYTHON pin so merge tests run Functions integration on the workflow’s Python version. |
.github/workflows/python-integration-tests.yml |
Removes the job-level UV_PYTHON pin so integration tests run Functions integration on the workflow’s Python version. |
| overall_start = time.time() | ||
| attempts_made = 0 | ||
|
|
||
| for _ in range(max_attempts): | ||
| remaining = overall_budget - (time.time() - overall_start) | ||
| if remaining < 10: |
There was a problem hiding this comment.
The new startup budget/elapsed-time logic uses time.time(). For timeout budgeting, prefer time.monotonic() (and time.monotonic() - start) so NTP/clock adjustments can’t make remaining jump forwards/backwards and skew the budget enforcement.
There was a problem hiding this comment.
Switched all three budget timing calls to time.monotonic() in 3c46364.
|
@copilot code review[agent] fix the comments and failing jobs |
Addresses review feedback: monotonic clock is immune to NTP/clock adjustments that could skew the budget enforcement. Co-authored-by: larohra <41490930+larohra@users.noreply.github.com>
Motivation and Context
All 20 pytest-xdist workers crash with
[gwN] node down: Not properly terminatedduring Azure Functions integration tests on Python 3.13. Thepython-tests-functionsjob had aUV_PYTHON: "3.10"override masking this, but the parent workflow setsUV_PYTHON: "3.13", so the override was silently ignored.Root cause: the
function_app_for_testfixture retry loop can spend up to ~184s (3 × 60s wait + cleanup), exceeding the 120s--timeout. Whenpytest-timeout's thread method fires mid-fixture and the thread is blocked, it callsos._exit()— killing the xdist worker outright. Compounding this, thefuncsubprocess shares the worker's process group, so signals propagate bidirectionally.Description
conftest.py— subprocess isolationAdded
start_new_session=Trueon Linux so thefunc startprocess runs in its own process group. Prevents signal cross-contamination between pytest-timeout and the function host.conftest.py— fixture timeout budgetAdded a 100s overall budget (under the 120s test timeout) that caps each retry's
max_waitto the remaining time. The fixture now always exits cleanly viapytest.fail()instead of being killed byos._exit(). Usestime.monotonic()for budget tracking so NTP/clock adjustments cannot skew the enforcement.Workflow files — remove stale Python 3.10 pin
Removed
UV_PYTHON: "3.10"frompython-tests-functionsin bothpython-merge-tests.ymlandpython-integration-tests.yml. The job now inherits the workflow-levelUV_PYTHON: "3.13".Contribution Checklist
Original prompt
💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.