All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mukesh Ojha <quic_mojha@quicinc.com>
To: Imran Khan <imran.f.khan@oracle.com>, <tj@kernel.org>,
	<lizefan.x@bytedance.com>, <hannes@cmpxchg.org>,
	<tglx@linutronix.de>, <steven.price@arm.com>,
	<peterz@infradead.org>
Cc: <cgroups@vger.kernel.org>, <linux-kernel@vger.kernel.org>
Subject: Re: Query regarding deadlock involving cgroup_threadgroup_rwsem and cpu_hotplug_lock
Date: Wed, 20 Jul 2022 17:31:51 +0530	[thread overview]
Message-ID: <224b19f3-912d-b858-7af4-185b8e55bc66@quicinc.com> (raw)
In-Reply-To: <ba48eac5-8ef7-251b-11fe-8163bb7a2d54@quicinc.com>

Looks like these patches are the fixes.

https://lore.kernel.org/all/YtDvN0wJ6CKaEPN8@slm.duckdns.org/#r

Would let Tejun confirm this .

-Mukesh

On 7/20/2022 4:36 PM, Mukesh Ojha wrote:
> Hi,
> 
> On 7/20/2022 8:57 AM, Imran Khan wrote:
>> Hello everyone,
>>
>> I am seeing a deadlock between cgroup_threadgroup_rwsem and 
>> cpu_hotplug_lock in
>> 5.4 kernel.
>>
>> Due to some missing drivers I don't have this test setup for latest 
>> upstream
>> kernel but looking at the code the issue seems to be present in the 
>> latest
>> kernel as well. If needed I can provide stack traces and other 
>> relevant info
>> from the vmcore that I have got from 5.4 setup.
>>
>> The description of the problem is as follows (I am using 5.19-rc7 as 
>> reference
>> below):
>>
>> __cgroup_procs_write acquires cgroup_threadgroup_rwsem via
>> cgroup_procs_write_start and then invokes cgroup_attach_task. Now
>> cgroup_attach_task can invoke following call chain:
>>
>> cgroup_attach_task --> cgroup_migrate --> cgroup_migrate_execute --> 
>> cpuset_attach
>>
>> Here cpuset_attach tries to take cpu_hotplug_lock.
>>
>> But by this time if some other context
>>
>> 1. is already in the middle of cpu hotplug and has acquired 
>> cpu_hotplug_lock in
>> _cpu_up but
>> 2. has not yet reached CPUHP_ONLINE state and
>> 3. one of the intermediate hotplug states (in my case 
>> CPUHP_AP_ONLINE_DYN ) has
>> a callback which involves creation of a thread (or invocation of 
>> copy_process
>> via some other path) the invoked copy_process will get blocked on
>> cgroup_threadgroup_rwsem in following call chain:
>>
>>     copy_process --> cgroup_can_fork --> cgroup_css_set_fork -->
>> cgroup_threadgroup_change_begin
> 
> Similar discussion is at [1], not sure on the conclusion.
> 
> [1]
> https://lore.kernel.org/lkml/20220705123705.764-1-xuewen.yan@unisoc.com/
> 
> -Mukesh
> 
>>
>>
>> I am looking for suggestions to fix this deadlock.
>>
>> Or if I am missing something in the above analysis and the above mention
>> scenario can't happen in latest upstream kernel, then please let me 
>> know as that
>> would help me in back porting relevant changes to 5.4 kernel because 
>> the issue
>> definitely exists in 5.4 kernel.
>>
>> Thanks,
>> -- Imran

WARNING: multiple messages have this Message-ID (diff)
From: Mukesh Ojha <quic_mojha-jfJNa2p1gH1BDgjK7y7TUQ@public.gmane.org>
To: Imran Khan <imran.f.khan-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>,
	tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org,
	lizefan.x-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org,
	hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org,
	tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org,
	steven.price-5wv7dgnIgG8@public.gmane.org,
	peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org
Cc: cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: Query regarding deadlock involving cgroup_threadgroup_rwsem and cpu_hotplug_lock
Date: Wed, 20 Jul 2022 17:31:51 +0530	[thread overview]
Message-ID: <224b19f3-912d-b858-7af4-185b8e55bc66@quicinc.com> (raw)
In-Reply-To: <ba48eac5-8ef7-251b-11fe-8163bb7a2d54-jfJNa2p1gH1BDgjK7y7TUQ@public.gmane.org>

Looks like these patches are the fixes.

https://lore.kernel.org/all/YtDvN0wJ6CKaEPN8-NiLfg/pYEd1N0TnZuCh8vA@public.gmane.org/#r

Would let Tejun confirm this .

-Mukesh

On 7/20/2022 4:36 PM, Mukesh Ojha wrote:
> Hi,
> 
> On 7/20/2022 8:57 AM, Imran Khan wrote:
>> Hello everyone,
>>
>> I am seeing a deadlock between cgroup_threadgroup_rwsem and 
>> cpu_hotplug_lock in
>> 5.4 kernel.
>>
>> Due to some missing drivers I don't have this test setup for latest 
>> upstream
>> kernel but looking at the code the issue seems to be present in the 
>> latest
>> kernel as well. If needed I can provide stack traces and other 
>> relevant info
>> from the vmcore that I have got from 5.4 setup.
>>
>> The description of the problem is as follows (I am using 5.19-rc7 as 
>> reference
>> below):
>>
>> __cgroup_procs_write acquires cgroup_threadgroup_rwsem via
>> cgroup_procs_write_start and then invokes cgroup_attach_task. Now
>> cgroup_attach_task can invoke following call chain:
>>
>> cgroup_attach_task --> cgroup_migrate --> cgroup_migrate_execute --> 
>> cpuset_attach
>>
>> Here cpuset_attach tries to take cpu_hotplug_lock.
>>
>> But by this time if some other context
>>
>> 1. is already in the middle of cpu hotplug and has acquired 
>> cpu_hotplug_lock in
>> _cpu_up but
>> 2. has not yet reached CPUHP_ONLINE state and
>> 3. one of the intermediate hotplug states (in my case 
>> CPUHP_AP_ONLINE_DYN ) has
>> a callback which involves creation of a thread (or invocation of 
>> copy_process
>> via some other path) the invoked copy_process will get blocked on
>> cgroup_threadgroup_rwsem in following call chain:
>>
>>     copy_process --> cgroup_can_fork --> cgroup_css_set_fork -->
>> cgroup_threadgroup_change_begin
> 
> Similar discussion is at [1], not sure on the conclusion.
> 
> [1]
> https://lore.kernel.org/lkml/20220705123705.764-1-xuewen.yan-1tVvrHeaX6nQT0dZR+AlfA@public.gmane.org/
> 
> -Mukesh
> 
>>
>>
>> I am looking for suggestions to fix this deadlock.
>>
>> Or if I am missing something in the above analysis and the above mention
>> scenario can't happen in latest upstream kernel, then please let me 
>> know as that
>> would help me in back porting relevant changes to 5.4 kernel because 
>> the issue
>> definitely exists in 5.4 kernel.
>>
>> Thanks,
>> -- Imran

  reply	other threads:[~2022-07-20 12:02 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <8245b710-8acb-d8e6-7045-99a5f71dad4e@oracle.com>
2022-07-20  2:38 ` Query regarding deadlock involving cgroup_threadgroup_rwsem and cpu_hotplug_lock Imran Khan
2022-07-20  3:27   ` Imran Khan
2022-07-20  3:27     ` Imran Khan
2022-07-20 11:06     ` Mukesh Ojha
2022-07-20 11:06       ` Mukesh Ojha
2022-07-20 12:01       ` Mukesh Ojha [this message]
2022-07-20 12:01         ` Mukesh Ojha
2022-07-20 18:05         ` Tejun Heo
2022-07-20 18:05           ` Tejun Heo
2022-07-27 19:33           ` Tejun Heo
2022-07-27 19:33             ` Tejun Heo
2022-08-12 10:27             ` Mukesh Ojha
2022-08-12 10:27               ` Mukesh Ojha
2022-08-15  9:05               ` Michal Koutný
2022-08-15  9:25                 ` Xuewen Yan
2022-08-15  9:25                   ` Xuewen Yan
2022-08-15  9:39                   ` Michal Koutný
2022-08-15  9:39                     ` Michal Koutný
2022-08-15 10:59                     ` Mukesh Ojha
2022-08-15 10:59                       ` Mukesh Ojha
2022-08-15 23:27                       ` [PATCH cgroup/for-6.0-fixes] cgroup: Fix threadgroup_rwsem <-> cpus_read_lock() deadlock Tejun Heo
2022-08-15 23:27                         ` Tejun Heo
2022-08-16 20:20                         ` Imran Khan
2022-08-16 20:20                           ` Imran Khan
2022-08-17  6:55                         ` Xuewen Yan
2022-08-17  6:55                           ` Xuewen Yan
2022-08-17 17:40                         ` Tejun Heo
2022-08-17 17:40                           ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=224b19f3-912d-b858-7af4-185b8e55bc66@quicinc.com \
    --to=quic_mojha@quicinc.com \
    --cc=cgroups@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=imran.f.khan@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lizefan.x@bytedance.com \
    --cc=peterz@infradead.org \
    --cc=steven.price@arm.com \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.