Add test suite for IPC:ParallelFinish hang reproduction#1
Open
Add test suite for IPC:ParallelFinish hang reproduction#1
Conversation
Adds comprehensive test suite to reproduce Theory 1 (shared memory queue saturation) for the recurring IPC:ParallelFinish hang issue. Test files: - test_parallel_queue_saturation.sql: Main reproduction test with 250K dead tuples and flood_error_queue() function to saturate 16KB error queues - monitor_parallel_hang.sql: Monitoring script to observe wait events - test_parallel_hang_alternative.sql: 7 alternative test approaches - run_reproduction_test.sh: Automated setup and execution script - test_parallel_hang_README.md: Complete documentation Theory being tested: Workers block indefinitely when error queues fill up, creating circular dependency where workers need leader to drain queue but leader only drains when ParallelMessagePending flag is set, which requires successful worker message send (impossible when queue full). Note: Tests target PostgreSQL 16.3 specifically per production environment.
Analyzes two critical commits merged after PostgreSQL 16.3: - 6f6521d (16.4): Don't enter parallel mode when holding interrupts - 06424e9 (16.5): Improved fix for interrupt handling These commits directly address the IPC:ParallelFinish hang issue by preventing parallel worker launch when leader cannot process interrupts, which eliminates the deadlock scenario where workers block on full error queues while leader cannot drain them. Recommendation: Upgrade to PostgreSQL 16.5+ to resolve production issue.
NikolayS
pushed a commit
that referenced
this pull request
Dec 23, 2025
truncate_useless_pathkeys() seems to have neglected to account for PathKeys that might be useful for WindowClause evaluation. Modify it so that it properly accounts for that. Making this work required adjusting two things: 1. Change from checking query_pathkeys to check sort_pathkeys instead. 2. Add explicit check for window_pathkeys For #1, query_pathkeys gets set in standard_qp_callback() according to the sort order requirements for the first operation to be applied after the join planner is finished, so this changes depending on which upper planner operations a particular query needs. If the query has window functions and no GROUP BY, then query_pathkeys gets set to window_pathkeys. Before this change, this meant PathKeys useful for the ORDER BY were not accounted for in queries with window functions. Because of #1, #2 is now required so that we explicitly check to ensure we don't truncate away PathKeys useful for window functions. Author: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/CAApHDvrj3HTKmXoLMbUjTO=_MNMxM=cnuCSyBKidAVibmYPnrg@mail.gmail.com
NikolayS
pushed a commit
that referenced
this pull request
Feb 6, 2026
cost_tidrangescan() was setting the disabled_nodes value correctly, and then immediately resetting it to zero, due to poor code editing on my part. materialized_finished_plan correctly set matpath.parent to zero, but forgot to also set matpath.parallel_workers = 0, causing an access to uninitialized memory in cost_material. (This shouldn't result in any real problem, but it makes valgrind unhappy.) reparameterize_path was dereferencing a variable before verifying that it was not NULL. Reported-by: Tom Lane <tgl@sss.pgh.pa.us> (issue #1) Reported-by: Michael Paquier <michael@paquier.xyz> (issue #1) Diagnosed-by: Lukas Fittl <lukas@fittl.com> (issue #1) Reported-by: Zsolt Parragi <zsolt.parragi@percona.com> (issue #2) Reported-by: Richard Guo <guofenglinux@gmail.com> (issue #3) Discussion: http://postgr.es/m/CAN4CZFPvwjNJEZ_JT9Y67yR7C=KMNa=LNefOB8ZY7TKDcmAXOA@mail.gmail.com Discussion: http://postgr.es/m/aXrnPgrq6Gggb5TG@paquier.xyz
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds comprehensive test suite to reproduce Theory 1 (shared memory queue saturation) for the recurring IPC:ParallelFinish hang issue.
Test files:
Theory being tested: Workers block indefinitely when error queues fill up, creating circular dependency where workers need leader to drain queue but leader only drains when ParallelMessagePending flag is set, which requires successful worker message send (impossible when queue full).
Note: Tests target PostgreSQL 16.3 specifically per production environment.