linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Waiman Long <longman@redhat.com>
To: Tejun Heo <tj@kernel.org>, "haifeng.xu" <haifeng.xu@shopee.com>
Cc: lizefan.x@bytedance.com, hannes@cmpxchg.org,
	cgroups@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] cgroup/cpuset: Optimize update_tasks_nodemask()
Date: Wed, 23 Nov 2022 13:48:46 -0500	[thread overview]
Message-ID: <5fccf438-fdbe-1bc8-6460-b3911cc51566@redhat.com> (raw)
In-Reply-To: <Y35Swdpq+rJe+Tu3@slm.duckdns.org>


On 11/23/22 12:05, Tejun Heo wrote:
> On Wed, Nov 23, 2022 at 08:21:57AM +0000, haifeng.xu wrote:
>> When change the 'cpuset.mems' under some cgroup, system will hung
>> for a long time. From the dmesg, many processes or theads are
>> stuck in fork/exit. The reason is show as follows.
>>
>> thread A:
>> cpuset_write_resmask /* takes cpuset_rwsem */
>>    ...
>>      update_tasks_nodemask
>>        mpol_rebind_mm /* waits mmap_lock */
>>
>> thread B:
>> worker_thread
>>    ...
>>      cpuset_migrate_mm_workfn
>>        do_migrate_pages /* takes mmap_lock */
>>
>> thread C:
>> cgroup_procs_write /* takes cgroup_mutex and cgroup_threadgroup_rwsem */
>>    ...
>>      cpuset_can_attach
>>        percpu_down_write /* waits cpuset_rwsem */
>>
>> Once update the nodemasks of cpuset, thread A wakes up thread B to
>> migrate mm. But when thread A iterates through all tasks, including
>> child threads and group leader, it has to wait the mmap_lock which
>> has been take by thread B. Unfortunately, thread C wants to migrate
>> tasks into cgroup at this moment, it must wait thread A to release
>> cpuset_rwsem. If thread B spends much time to migrate mm, the
>> fork/exit which acquire cgroup_threadgroup_rwsem also need to
>> wait for a long time.
>>
>> There is no need to migrate the mm of child threads which is
>> shared with group leader.
> This is only a problem in cgroup1 and cgroup1 doesn't require the threads of
> a given task to be in the same cgroup. I don't think you can optimize it
> this way.

I think it is an issue anyway if different threads of a process are in 
different cpusets with different node mask. It is not a configuration 
that should be used at all.

This patch makes update_tasks_nodemask() somewhat similar to 
cpuset_attach() where all tasks are iterated to update the node mask but 
only the task leaders are required to update the mm. For a non-group 
leader task, maybe we can check if the group leader is in the same 
cpuset. If so, we can skip the mm update. Do we need similar change in 
cpuset_attach()?

I do think the "migrate = is_memory_migrate(cs);" line can be moved 
outside of the loop, though. Of course, that won't help much in this case.

Cheers,
Longman



  reply	other threads:[~2022-11-23 18:49 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-23  8:21 [PATCH] cgroup/cpuset: Optimize update_tasks_nodemask() haifeng.xu
2022-11-23 17:05 ` Tejun Heo
2022-11-23 18:48   ` Waiman Long [this message]
2022-11-23 18:54     ` Tejun Heo
2022-11-23 19:05       ` Waiman Long
2022-11-23 19:07         ` Tejun Heo
2022-11-23 20:23 ` Waiman Long
2022-11-24  3:33   ` Haifeng Xu
2022-11-24  4:24     ` Waiman Long
2022-11-24  7:49       ` Haifeng Xu
2022-11-24 23:00         ` Waiman Long
2022-11-25  2:14           ` Haifeng Xu
2022-11-28  7:34           ` Haifeng Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5fccf438-fdbe-1bc8-6460-b3911cc51566@redhat.com \
    --to=longman@redhat.com \
    --cc=cgroups@vger.kernel.org \
    --cc=haifeng.xu@shopee.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lizefan.x@bytedance.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).