Skip to content

Commit 101e8c0

Browse files
leitaosmb49
authored andcommitted
memcg: always call cond_resched() after fn()
BugLink: https://bugs.launchpad.net/bugs/2115678 commit 06717a7b6c86514dbd6ab322e8083ffaa4db5712 upstream. I am seeing soft lockup on certain machine types when a cgroup OOMs. This is happening because killing the process in certain machine might be very slow, which causes the soft lockup and RCU stalls. This happens usually when the cgroup has MANY processes and memory.oom.group is set. Example I am seeing in real production: [462012.244552] Memory cgroup out of memory: Killed process 3370438 (crosvm) .... .... [462037.318059] Memory cgroup out of memory: Killed process 4171372 (adb) .... [462037.348314] watchdog: BUG: soft lockup - CPU#64 stuck for 26s! [stat_manager-ag:1618982] .... Quick look at why this is so slow, it seems to be related to serial flush for certain machine types. For all the crashes I saw, the target CPU was at console_flush_all(). In the case above, there are thousands of processes in the cgroup, and it is soft locking up before it reaches the 1024 limit in the code (which would call the cond_resched()). So, cond_resched() in 1024 blocks is not sufficient. Remove the counter-based conditional rescheduling logic and call cond_resched() unconditionally after each task iteration, after fn() is called. This avoids the lockup independently of how slow fn() is. Link: https://lkml.kernel.org/r/20250523-memcg_fix-v1-1-ad3eafb60477@debian.org Fixes: ade8147 ("memcg: fix soft lockup in the OOM process") Signed-off-by: Breno Leitao <leitao@debian.org> Suggested-by: Rik van Riel <riel@surriel.com> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Michael van der Westhuizen <rmikey@meta.com> Cc: Usama Arif <usamaarif642@gmail.com> Cc: Pavel Begunkov <asml.silence@gmail.com> Cc: Chen Ridong <chenridong@huawei.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Manuel Diewald <manuel.diewald@canonical.com> Signed-off-by: Mehmet Basaran <mehmet.basaran@canonical.com>
1 parent a5e234c commit 101e8c0

File tree

1 file changed

+2
-4
lines changed

1 file changed

+2
-4
lines changed

mm/memcontrol.c

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1161,7 +1161,6 @@ void mem_cgroup_scan_tasks(struct mem_cgroup *memcg,
11611161
{
11621162
struct mem_cgroup *iter;
11631163
int ret = 0;
1164-
int i = 0;
11651164

11661165
BUG_ON(mem_cgroup_is_root(memcg));
11671166

@@ -1171,10 +1170,9 @@ void mem_cgroup_scan_tasks(struct mem_cgroup *memcg,
11711170

11721171
css_task_iter_start(&iter->css, CSS_TASK_ITER_PROCS, &it);
11731172
while (!ret && (task = css_task_iter_next(&it))) {
1174-
/* Avoid potential softlockup warning */
1175-
if ((++i & 1023) == 0)
1176-
cond_resched();
11771173
ret = fn(task, arg);
1174+
/* Avoid potential softlockup warning */
1175+
cond_resched();
11781176
}
11791177
css_task_iter_end(&it);
11801178
if (ret) {

0 commit comments

Comments
 (0)