[PATCH] cgroup/cpuset: Optimize update_tasks_nodemask()

* [PATCH] cgroup/cpuset: Optimize update_tasks_nodemask()
@ 2022-11-23  8:21 haifeng.xu
  2022-11-23 17:05 ` Tejun Heo
  2022-11-23 20:23 ` Waiman Long
  0 siblings, 2 replies; 13+ messages in thread
From: haifeng.xu @ 2022-11-23  8:21 UTC (permalink / raw)
  To: longman; +Cc: lizefan.x, tj, hannes, cgroups, linux-kernel, haifeng.xu

When change the 'cpuset.mems' under some cgroup, system will hung
for a long time. From the dmesg, many processes or theads are
stuck in fork/exit. The reason is show as follows.

thread A:
cpuset_write_resmask /* takes cpuset_rwsem */
  ...
    update_tasks_nodemask
      mpol_rebind_mm /* waits mmap_lock */

thread B:
worker_thread
  ...
    cpuset_migrate_mm_workfn
      do_migrate_pages /* takes mmap_lock */

thread C:
cgroup_procs_write /* takes cgroup_mutex and cgroup_threadgroup_rwsem */
  ...
    cpuset_can_attach
      percpu_down_write /* waits cpuset_rwsem */

Once update the nodemasks of cpuset, thread A wakes up thread B to
migrate mm. But when thread A iterates through all tasks, including
child threads and group leader, it has to wait the mmap_lock which
has been take by thread B. Unfortunately, thread C wants to migrate
tasks into cgroup at this moment, it must wait thread A to release
cpuset_rwsem. If thread B spends much time to migrate mm, the
fork/exit which acquire cgroup_threadgroup_rwsem also need to
wait for a long time.

There is no need to migrate the mm of child threads which is
shared with group leader. Just iterate through the group
leader only.

Signed-off-by: haifeng.xu <haifeng.xu@shopee.com>
---
 kernel/cgroup/cpuset.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 589827ccda8b..43cbd09546d0 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -1968,6 +1968,9 @@ static void update_tasks_nodemask(struct cpuset *cs)
 
 		cpuset_change_task_nodemask(task, &newmems);
 
+		if (!thread_group_leader(task))
+			continue;
+
 		mm = get_task_mm(task);
 		if (!mm)
 			continue;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread