Query regarding deadlock involving cgroup_threadgroup_rwsem and cpu_hotplug

All of lore.kernel.org
 help / color / mirror / Atom feed

* Query regarding deadlock involving cgroup_threadgroup_rwsem and cpu_hotplug_lock
       [not found] <8245b710-8acb-d8e6-7045-99a5f71dad4e@oracle.com>
@ 2022-07-20  2:38 ` Imran Khan
  2022-07-20  3:27     ` Imran Khan
  0 siblings, 1 reply; 28+ messages in thread
From: Imran Khan @ 2022-07-20  2:38 UTC (permalink / raw)
  To: tj@kernel.org >> Tejun Heo, lizefan.x,
	hannes@cmpxchg.org >> Johannes Weiner,
	tglx@linutronix.de >> Thomas Gleixner, steven.price,
	peterz@infradead.org >> peterz
  Cc: cgroups@vger.kernel.org >> cgroups, linux-kernel

Hello everyone,

I am seeing a deadlock between cgroup_threadgroup_rwsem and cpu_hotplug_lock in
5.4 kernel.

Due to some missing drivers I don't have this test setup for latest upstream
kernel but looking at the code the issue seems to be present in the latest
kernel as well. If needed I can provide stack traces and other relevant info
from the vmcore that I have got from 5.4 setup.

The description of the problem is as follows (I am using 5.19-rc7 as reference
below):

__cgroup_procs_write acquires cgroup_threadgroup_rwsem via
cgroup_procs_write_start and then invokes cgroup_attach_task. Now
cgroup_attach_task can invoke following call chain:

cgroup_attach_task --> cgroup_migrate --> cgroup_migrate_execute --> cpuset_attach

Here cpuset_attach tries to take cpu_hotplug_lock.

But by this time if some other context

1. is already in the middle of cpu hotplug and has acquired cpu_hotplug_lock in
_cpu_up but
2. has not yet reached CPUHP_ONLINE state and
3. one of the intermediate hotplug states (in my case CPUHP_AP_ONLINE_DYN ) has
a callback which involves creation of a thread (or invocation of copy_process
via some other path) the invoked copy_process will get blocked on
cgroup_threadgroup_rwsem in following call chain:

   copy_process --> cgroup_can_fork --> cgroup_css_set_fork -->
cgroup_threadgroup_change_begin

I am looking for suggestions to fix this deadlock.

Or if I am missing something in the above analysis and the above mention
scenario can't happen in latest upstream kernel, then please let me know as that
would help me in back porting relevant changes to 5.4 kernel because the issue
definitely exists in 5.4 kernel.

Thanks,
-- Imran

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Query regarding deadlock involving cgroup_threadgroup_rwsem and cpu_hotplug_lock
@ 2022-07-20  3:27     ` Imran Khan
  0 siblings, 0 replies; 28+ messages in thread
From: Imran Khan @ 2022-07-20  3:27 UTC (permalink / raw)
  To: tj, lizefan.x, hannes, tglx, steven.price, peterz; +Cc: cgroups, linux-kernel

Hello everyone,

I am seeing a deadlock between cgroup_threadgroup_rwsem and cpu_hotplug_lock in
5.4 kernel.

Due to some missing drivers I don't have this test setup for latest upstream
kernel but looking at the code the issue seems to be present in the latest
kernel as well. If needed I can provide stack traces and other relevant info
from the vmcore that I have got from 5.4 setup.

The description of the problem is as follows (I am using 5.19-rc7 as reference
below):

__cgroup_procs_write acquires cgroup_threadgroup_rwsem via
cgroup_procs_write_start and then invokes cgroup_attach_task. Now
cgroup_attach_task can invoke following call chain:

cgroup_attach_task --> cgroup_migrate --> cgroup_migrate_execute --> cpuset_attach

Here cpuset_attach tries to take cpu_hotplug_lock.

But by this time if some other context

1. is already in the middle of cpu hotplug and has acquired cpu_hotplug_lock in
_cpu_up but
2. has not yet reached CPUHP_ONLINE state and
3. one of the intermediate hotplug states (in my case CPUHP_AP_ONLINE_DYN ) has
a callback which involves creation of a thread (or invocation of copy_process
via some other path) the invoked copy_process will get blocked on
cgroup_threadgroup_rwsem in following call chain:

   copy_process --> cgroup_can_fork --> cgroup_css_set_fork -->
cgroup_threadgroup_change_begin

I am looking for suggestions to fix this deadlock.

Or if I am missing something in the above analysis and the above mention
scenario can't happen in latest upstream kernel, then please let me know as that
would help me in back porting relevant changes to 5.4 kernel because the issue
definitely exists in 5.4 kernel.

Thanks,
-- Imran

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Query regarding deadlock involving cgroup_threadgroup_rwsem and cpu_hotplug_lock
@ 2022-07-20  3:27     ` Imran Khan
  0 siblings, 0 replies; 28+ messages in thread
From: Imran Khan @ 2022-07-20  3:27 UTC (permalink / raw)
  To: tj-DgEjT+Ai2ygdnm+yROfE0A, lizefan.x-EC8Uxl6Npydl57MIdRCFDg,
	hannes-druUgvl0LCNAfugRpC6u6w, tglx-hfZtesqFncYOwBW4kG4KsQ,
	steven.price-5wv7dgnIgG8, peterz-wEGCiKHe2LqWVfeAwA7xHQ
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA

Hello everyone,

I am seeing a deadlock between cgroup_threadgroup_rwsem and cpu_hotplug_lock in
5.4 kernel.

Due to some missing drivers I don't have this test setup for latest upstream
kernel but looking at the code the issue seems to be present in the latest
kernel as well. If needed I can provide stack traces and other relevant info
from the vmcore that I have got from 5.4 setup.

The description of the problem is as follows (I am using 5.19-rc7 as reference
below):

__cgroup_procs_write acquires cgroup_threadgroup_rwsem via
cgroup_procs_write_start and then invokes cgroup_attach_task. Now
cgroup_attach_task can invoke following call chain:

cgroup_attach_task --> cgroup_migrate --> cgroup_migrate_execute --> cpuset_attach

Here cpuset_attach tries to take cpu_hotplug_lock.

But by this time if some other context

1. is already in the middle of cpu hotplug and has acquired cpu_hotplug_lock in
_cpu_up but
2. has not yet reached CPUHP_ONLINE state and
3. one of the intermediate hotplug states (in my case CPUHP_AP_ONLINE_DYN ) has
a callback which involves creation of a thread (or invocation of copy_process
via some other path) the invoked copy_process will get blocked on
cgroup_threadgroup_rwsem in following call chain:

   copy_process --> cgroup_can_fork --> cgroup_css_set_fork -->
cgroup_threadgroup_change_begin

I am looking for suggestions to fix this deadlock.

Or if I am missing something in the above analysis and the above mention
scenario can't happen in latest upstream kernel, then please let me know as that
would help me in back porting relevant changes to 5.4 kernel because the issue
definitely exists in 5.4 kernel.

Thanks,
-- Imran

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Query regarding deadlock involving cgroup_threadgroup_rwsem and cpu_hotplug_lock
@ 2022-07-20 11:06       ` Mukesh Ojha
  0 siblings, 0 replies; 28+ messages in thread
From: Mukesh Ojha @ 2022-07-20 11:06 UTC (permalink / raw)
  To: Imran Khan, tj, lizefan.x, hannes, tglx, steven.price, peterz
  Cc: cgroups, linux-kernel

Hi,

On 7/20/2022 8:57 AM, Imran Khan wrote:
> Hello everyone,
> 
> I am seeing a deadlock between cgroup_threadgroup_rwsem and cpu_hotplug_lock in
> 5.4 kernel.
> 
> Due to some missing drivers I don't have this test setup for latest upstream
> kernel but looking at the code the issue seems to be present in the latest
> kernel as well. If needed I can provide stack traces and other relevant info
> from the vmcore that I have got from 5.4 setup.
> 
> The description of the problem is as follows (I am using 5.19-rc7 as reference
> below):
> 
> __cgroup_procs_write acquires cgroup_threadgroup_rwsem via
> cgroup_procs_write_start and then invokes cgroup_attach_task. Now
> cgroup_attach_task can invoke following call chain:
> 
> cgroup_attach_task --> cgroup_migrate --> cgroup_migrate_execute --> cpuset_attach
> 
> Here cpuset_attach tries to take cpu_hotplug_lock.
> 
> But by this time if some other context
> 
> 1. is already in the middle of cpu hotplug and has acquired cpu_hotplug_lock in
> _cpu_up but
> 2. has not yet reached CPUHP_ONLINE state and
> 3. one of the intermediate hotplug states (in my case CPUHP_AP_ONLINE_DYN ) has
> a callback which involves creation of a thread (or invocation of copy_process
> via some other path) the invoked copy_process will get blocked on
> cgroup_threadgroup_rwsem in following call chain:
> 
>     copy_process --> cgroup_can_fork --> cgroup_css_set_fork -->
> cgroup_threadgroup_change_begin

Similar discussion is at [1], not sure on the conclusion.

[1]
https://lore.kernel.org/lkml/20220705123705.764-1-xuewen.yan@unisoc.com/

-Mukesh

> 
> 
> I am looking for suggestions to fix this deadlock.
> 
> Or if I am missing something in the above analysis and the above mention
> scenario can't happen in latest upstream kernel, then please let me know as that
> would help me in back porting relevant changes to 5.4 kernel because the issue
> definitely exists in 5.4 kernel.
> 
> Thanks,
> -- Imran

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Query regarding deadlock involving cgroup_threadgroup_rwsem and cpu_hotplug_lock
@ 2022-07-20 11:06       ` Mukesh Ojha
  0 siblings, 0 replies; 28+ messages in thread
From: Mukesh Ojha @ 2022-07-20 11:06 UTC (permalink / raw)
  To: Imran Khan, tj-DgEjT+Ai2ygdnm+yROfE0A,
	lizefan.x-EC8Uxl6Npydl57MIdRCFDg, hannes-druUgvl0LCNAfugRpC6u6w,
	tglx-hfZtesqFncYOwBW4kG4KsQ, steven.price-5wv7dgnIgG8,
	peterz-wEGCiKHe2LqWVfeAwA7xHQ
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA

Hi,

On 7/20/2022 8:57 AM, Imran Khan wrote:
> Hello everyone,
> 
> I am seeing a deadlock between cgroup_threadgroup_rwsem and cpu_hotplug_lock in
> 5.4 kernel.
> 
> Due to some missing drivers I don't have this test setup for latest upstream
> kernel but looking at the code the issue seems to be present in the latest
> kernel as well. If needed I can provide stack traces and other relevant info
> from the vmcore that I have got from 5.4 setup.
> 
> The description of the problem is as follows (I am using 5.19-rc7 as reference
> below):
> 
> __cgroup_procs_write acquires cgroup_threadgroup_rwsem via
> cgroup_procs_write_start and then invokes cgroup_attach_task. Now
> cgroup_attach_task can invoke following call chain:
> 
> cgroup_attach_task --> cgroup_migrate --> cgroup_migrate_execute --> cpuset_attach
> 
> Here cpuset_attach tries to take cpu_hotplug_lock.
> 
> But by this time if some other context
> 
> 1. is already in the middle of cpu hotplug and has acquired cpu_hotplug_lock in
> _cpu_up but
> 2. has not yet reached CPUHP_ONLINE state and
> 3. one of the intermediate hotplug states (in my case CPUHP_AP_ONLINE_DYN ) has
> a callback which involves creation of a thread (or invocation of copy_process
> via some other path) the invoked copy_process will get blocked on
> cgroup_threadgroup_rwsem in following call chain:
> 
>     copy_process --> cgroup_can_fork --> cgroup_css_set_fork -->
> cgroup_threadgroup_change_begin

Similar discussion is at [1], not sure on the conclusion.

[1]
https://lore.kernel.org/lkml/20220705123705.764-1-xuewen.yan-1tVvrHeaX6nQT0dZR+AlfA@public.gmane.org/

-Mukesh

> 
> 
> I am looking for suggestions to fix this deadlock.
> 
> Or if I am missing something in the above analysis and the above mention
> scenario can't happen in latest upstream kernel, then please let me know as that
> would help me in back porting relevant changes to 5.4 kernel because the issue
> definitely exists in 5.4 kernel.
> 
> Thanks,
> -- Imran

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Query regarding deadlock involving cgroup_threadgroup_rwsem and cpu_hotplug_lock
@ 2022-07-20 12:01         ` Mukesh Ojha
  0 siblings, 0 replies; 28+ messages in thread
From: Mukesh Ojha @ 2022-07-20 12:01 UTC (permalink / raw)
  To: Imran Khan, tj, lizefan.x, hannes, tglx, steven.price, peterz
  Cc: cgroups, linux-kernel

Looks like these patches are the fixes.

https://lore.kernel.org/all/YtDvN0wJ6CKaEPN8@slm.duckdns.org/#r

Would let Tejun confirm this .

-Mukesh

On 7/20/2022 4:36 PM, Mukesh Ojha wrote:
> Hi,
> 
> On 7/20/2022 8:57 AM, Imran Khan wrote:
>> Hello everyone,
>>
>> I am seeing a deadlock between cgroup_threadgroup_rwsem and 
>> cpu_hotplug_lock in
>> 5.4 kernel.
>>
>> Due to some missing drivers I don't have this test setup for latest 
>> upstream
>> kernel but looking at the code the issue seems to be present in the 
>> latest
>> kernel as well. If needed I can provide stack traces and other 
>> relevant info
>> from the vmcore that I have got from 5.4 setup.
>>
>> The description of the problem is as follows (I am using 5.19-rc7 as 
>> reference
>> below):
>>
>> __cgroup_procs_write acquires cgroup_threadgroup_rwsem via
>> cgroup_procs_write_start and then invokes cgroup_attach_task. Now
>> cgroup_attach_task can invoke following call chain:
>>
>> cgroup_attach_task --> cgroup_migrate --> cgroup_migrate_execute --> 
>> cpuset_attach
>>
>> Here cpuset_attach tries to take cpu_hotplug_lock.
>>
>> But by this time if some other context
>>
>> 1. is already in the middle of cpu hotplug and has acquired 
>> cpu_hotplug_lock in
>> _cpu_up but
>> 2. has not yet reached CPUHP_ONLINE state and
>> 3. one of the intermediate hotplug states (in my case 
>> CPUHP_AP_ONLINE_DYN ) has
>> a callback which involves creation of a thread (or invocation of 
>> copy_process
>> via some other path) the invoked copy_process will get blocked on
>> cgroup_threadgroup_rwsem in following call chain:
>>
>>     copy_process --> cgroup_can_fork --> cgroup_css_set_fork -->
>> cgroup_threadgroup_change_begin
> 
> Similar discussion is at [1], not sure on the conclusion.
> 
> [1]
> https://lore.kernel.org/lkml/20220705123705.764-1-xuewen.yan@unisoc.com/
> 
> -Mukesh
> 
>>
>>
>> I am looking for suggestions to fix this deadlock.
>>
>> Or if I am missing something in the above analysis and the above mention
>> scenario can't happen in latest upstream kernel, then please let me 
>> know as that
>> would help me in back porting relevant changes to 5.4 kernel because 
>> the issue
>> definitely exists in 5.4 kernel.
>>
>> Thanks,
>> -- Imran

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Query regarding deadlock involving cgroup_threadgroup_rwsem and cpu_hotplug_lock
@ 2022-07-20 12:01         ` Mukesh Ojha
  0 siblings, 0 replies; 28+ messages in thread
From: Mukesh Ojha @ 2022-07-20 12:01 UTC (permalink / raw)
  To: Imran Khan, tj-DgEjT+Ai2ygdnm+yROfE0A,
	lizefan.x-EC8Uxl6Npydl57MIdRCFDg, hannes-druUgvl0LCNAfugRpC6u6w,
	tglx-hfZtesqFncYOwBW4kG4KsQ, steven.price-5wv7dgnIgG8,
	peterz-wEGCiKHe2LqWVfeAwA7xHQ
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA

Looks like these patches are the fixes.

https://lore.kernel.org/all/YtDvN0wJ6CKaEPN8-NiLfg/pYEd1N0TnZuCh8vA@public.gmane.org/#r

Would let Tejun confirm this .

-Mukesh

On 7/20/2022 4:36 PM, Mukesh Ojha wrote:
> Hi,
> 
> On 7/20/2022 8:57 AM, Imran Khan wrote:
>> Hello everyone,
>>
>> I am seeing a deadlock between cgroup_threadgroup_rwsem and 
>> cpu_hotplug_lock in
>> 5.4 kernel.
>>
>> Due to some missing drivers I don't have this test setup for latest 
>> upstream
>> kernel but looking at the code the issue seems to be present in the 
>> latest
>> kernel as well. If needed I can provide stack traces and other 
>> relevant info
>> from the vmcore that I have got from 5.4 setup.
>>
>> The description of the problem is as follows (I am using 5.19-rc7 as 
>> reference
>> below):
>>
>> __cgroup_procs_write acquires cgroup_threadgroup_rwsem via
>> cgroup_procs_write_start and then invokes cgroup_attach_task. Now
>> cgroup_attach_task can invoke following call chain:
>>
>> cgroup_attach_task --> cgroup_migrate --> cgroup_migrate_execute --> 
>> cpuset_attach
>>
>> Here cpuset_attach tries to take cpu_hotplug_lock.
>>
>> But by this time if some other context
>>
>> 1. is already in the middle of cpu hotplug and has acquired 
>> cpu_hotplug_lock in
>> _cpu_up but
>> 2. has not yet reached CPUHP_ONLINE state and
>> 3. one of the intermediate hotplug states (in my case 
>> CPUHP_AP_ONLINE_DYN ) has
>> a callback which involves creation of a thread (or invocation of 
>> copy_process
>> via some other path) the invoked copy_process will get blocked on
>> cgroup_threadgroup_rwsem in following call chain:
>>
>>     copy_process --> cgroup_can_fork --> cgroup_css_set_fork -->
>> cgroup_threadgroup_change_begin
> 
> Similar discussion is at [1], not sure on the conclusion.
> 
> [1]
> https://lore.kernel.org/lkml/20220705123705.764-1-xuewen.yan-1tVvrHeaX6nQT0dZR+AlfA@public.gmane.org/
> 
> -Mukesh
> 
>>
>>
>> I am looking for suggestions to fix this deadlock.
>>
>> Or if I am missing something in the above analysis and the above mention
>> scenario can't happen in latest upstream kernel, then please let me 
>> know as that
>> would help me in back porting relevant changes to 5.4 kernel because 
>> the issue
>> definitely exists in 5.4 kernel.
>>
>> Thanks,
>> -- Imran

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Query regarding deadlock involving cgroup_threadgroup_rwsem and cpu_hotplug_lock
@ 2022-07-20 18:05           ` Tejun Heo
  0 siblings, 0 replies; 28+ messages in thread
From: Tejun Heo @ 2022-07-20 18:05 UTC (permalink / raw)
  To: Mukesh Ojha
  Cc: Imran Khan, lizefan.x, hannes, tglx, steven.price, peterz,
	cgroups, linux-kernel

On Wed, Jul 20, 2022 at 05:31:51PM +0530, Mukesh Ojha wrote:
> Looks like these patches are the fixes.
> 
> https://lore.kernel.org/all/YtDvN0wJ6CKaEPN8@slm.duckdns.org/#r
> 
> Would let Tejun confirm this .

Yeah, looks like the same issue. I'll write up a patch later this week /
early next unless someone beats me to it.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Query regarding deadlock involving cgroup_threadgroup_rwsem and cpu_hotplug_lock
@ 2022-07-20 18:05           ` Tejun Heo
  0 siblings, 0 replies; 28+ messages in thread
From: Tejun Heo @ 2022-07-20 18:05 UTC (permalink / raw)
  To: Mukesh Ojha
  Cc: Imran Khan, lizefan.x-EC8Uxl6Npydl57MIdRCFDg,
	hannes-druUgvl0LCNAfugRpC6u6w, tglx-hfZtesqFncYOwBW4kG4KsQ,
	steven.price-5wv7dgnIgG8, peterz-wEGCiKHe2LqWVfeAwA7xHQ,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On Wed, Jul 20, 2022 at 05:31:51PM +0530, Mukesh Ojha wrote:
> Looks like these patches are the fixes.
> 
> https://lore.kernel.org/all/YtDvN0wJ6CKaEPN8-NiLfg/pYEd1N0TnZuCh8vA@public.gmane.org/#r
> 
> Would let Tejun confirm this .

Yeah, looks like the same issue. I'll write up a patch later this week /
early next unless someone beats me to it.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Query regarding deadlock involving cgroup_threadgroup_rwsem and cpu_hotplug_lock
@ 2022-07-27 19:33             ` Tejun Heo
  0 siblings, 0 replies; 28+ messages in thread
From: Tejun Heo @ 2022-07-27 19:33 UTC (permalink / raw)
  Cc: Imran Khan, lizefan.x, hannes, tglx, steven.price, peterz,
	cgroups, linux-kernel

On Wed, Jul 20, 2022 at 08:05:03AM -1000, Tejun Heo wrote:
> On Wed, Jul 20, 2022 at 05:31:51PM +0530, Mukesh Ojha wrote:
> > Looks like these patches are the fixes.
> > 
> > https://lore.kernel.org/all/YtDvN0wJ6CKaEPN8@slm.duckdns.org/#r
> > 
> > Would let Tejun confirm this .
> 
> Yeah, looks like the same issue. I'll write up a patch later this week /
> early next unless someone beats me to it.

https://lore.kernel.org/lkml/20220705123705.764-1-xuewen.yan@unisoc.com/ is
the thread with the same issue. Let's follow up there.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Query regarding deadlock involving cgroup_threadgroup_rwsem and cpu_hotplug_lock
@ 2022-07-27 19:33             ` Tejun Heo
  0 siblings, 0 replies; 28+ messages in thread
From: Tejun Heo @ 2022-07-27 19:33 UTC (permalink / raw)
  Cc: Imran Khan, lizefan.x-EC8Uxl6Npydl57MIdRCFDg,
	hannes-druUgvl0LCNAfugRpC6u6w, tglx-hfZtesqFncYOwBW4kG4KsQ,
	steven.price-5wv7dgnIgG8, peterz-wEGCiKHe2LqWVfeAwA7xHQ,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On Wed, Jul 20, 2022 at 08:05:03AM -1000, Tejun Heo wrote:
> On Wed, Jul 20, 2022 at 05:31:51PM +0530, Mukesh Ojha wrote:
> > Looks like these patches are the fixes.
> > 
> > https://lore.kernel.org/all/YtDvN0wJ6CKaEPN8-NiLfg/pYEd1N0TnZuCh8vA@public.gmane.org/#r
> > 
> > Would let Tejun confirm this .
> 
> Yeah, looks like the same issue. I'll write up a patch later this week /
> early next unless someone beats me to it.

https://lore.kernel.org/lkml/20220705123705.764-1-xuewen.yan-1tVvrHeaX6nQT0dZR+AlfA@public.gmane.org/ is
the thread with the same issue. Let's follow up there.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Query regarding deadlock involving cgroup_threadgroup_rwsem and cpu_hotplug_lock
@ 2022-08-12 10:27               ` Mukesh Ojha
  0 siblings, 0 replies; 28+ messages in thread
From: Mukesh Ojha @ 2022-08-12 10:27 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Imran Khan, lizefan.x, hannes, tglx, steven.price, peterz,
	cgroups, linux-kernel

Hi Tejun,


On 7/28/2022 1:03 AM, Tejun Heo wrote:
> On Wed, Jul 20, 2022 at 08:05:03AM -1000, Tejun Heo wrote:
>> On Wed, Jul 20, 2022 at 05:31:51PM +0530, Mukesh Ojha wrote:
>>> Looks like these patches are the fixes.
>>>
>>> https://lore.kernel.org/all/YtDvN0wJ6CKaEPN8@slm.duckdns.org/#r
>>>
>>> Would let Tejun confirm this .
>>
>> Yeah, looks like the same issue. I'll write up a patch later this week /
>> early next unless someone beats me to it.
> 
> https://lore.kernel.org/lkml/20220705123705.764-1-xuewen.yan@unisoc.com/ is
> the thread with the same issue. Let's follow up there.

Since, i am not part of the above thread, is the reason i am commenting 
here.

The original patch of yours [1]  and the revert of [2] is fixing the 
issue and it is also confirmed here [3].
Can we get proper fix merge on your tree?

[1] https://lore.kernel.org/lkml/YuGbYCfAG81mZBnN@slm.duckdns.org/

[2] 
https://lore.kernel.org/all/20220121101210.84926-1-zhangqiao22@huawei.com/

[3] 
https://lore.kernel.org/lkml/CAB8ipk-72V-bYRfL-VcSRSyXTeQqkBVj+1d5MHSVV5CTar9a0Q@mail.gmail.com/

-Mukesh

> 
> Thanks.
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Query regarding deadlock involving cgroup_threadgroup_rwsem and cpu_hotplug_lock
@ 2022-08-12 10:27               ` Mukesh Ojha
  0 siblings, 0 replies; 28+ messages in thread
From: Mukesh Ojha @ 2022-08-12 10:27 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Imran Khan, lizefan.x-EC8Uxl6Npydl57MIdRCFDg,
	hannes-druUgvl0LCNAfugRpC6u6w, tglx-hfZtesqFncYOwBW4kG4KsQ,
	steven.price-5wv7dgnIgG8, peterz-wEGCiKHe2LqWVfeAwA7xHQ,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

Hi Tejun,


On 7/28/2022 1:03 AM, Tejun Heo wrote:
> On Wed, Jul 20, 2022 at 08:05:03AM -1000, Tejun Heo wrote:
>> On Wed, Jul 20, 2022 at 05:31:51PM +0530, Mukesh Ojha wrote:
>>> Looks like these patches are the fixes.
>>>
>>> https://lore.kernel.org/all/YtDvN0wJ6CKaEPN8-NiLfg/pYEd1N0TnZuCh8vA@public.gmane.org/#r
>>>
>>> Would let Tejun confirm this .
>>
>> Yeah, looks like the same issue. I'll write up a patch later this week /
>> early next unless someone beats me to it.
> 
> https://lore.kernel.org/lkml/20220705123705.764-1-xuewen.yan-1tVvrHeaX6nQT0dZR+AlfA@public.gmane.org/ is
> the thread with the same issue. Let's follow up there.

Since, i am not part of the above thread, is the reason i am commenting 
here.

The original patch of yours [1]  and the revert of [2] is fixing the 
issue and it is also confirmed here [3].
Can we get proper fix merge on your tree?

[1] https://lore.kernel.org/lkml/YuGbYCfAG81mZBnN-NiLfg/pYEd1N0TnZuCh8vA@public.gmane.org/

[2] 
https://lore.kernel.org/all/20220121101210.84926-1-zhangqiao22-hv44wF8Li93QT0dZR+AlfA@public.gmane.org/

[3] 
https://lore.kernel.org/lkml/CAB8ipk-72V-bYRfL-VcSRSyXTeQqkBVj+1d5MHSVV5CTar9a0Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org/

-Mukesh

> 
> Thanks.
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Query regarding deadlock involving cgroup_threadgroup_rwsem and cpu_hotplug_lock
  2022-08-12 10:27               ` Mukesh Ojha
  (?)
@ 2022-08-15  9:05               ` Michal Koutný
  2022-08-15  9:25                   ` Xuewen Yan
  -1 siblings, 1 reply; 28+ messages in thread
From: Michal Koutný @ 2022-08-15  9:05 UTC (permalink / raw)
  To: Mukesh Ojha
  Cc: Tejun Heo, Imran Khan, lizefan.x, hannes, tglx, steven.price,
	peterz, cgroups, linux-kernel, Zhao Gongyi, Zhang Qiao

+Cc: Zhao Gongyi <zhaogongyi@huawei.com>, Zhang Qiao <zhangqiao22@huawei.com>

On Fri, Aug 12, 2022 at 03:57:00PM +0530, Mukesh Ojha <quic_mojha@quicinc.com> wrote:
> The original patch of yours [1]  and the revert of [2] is fixing the issue
> and it is also confirmed here [3].
> Can we get proper fix merge on your tree?
> 
> [1] https://lore.kernel.org/lkml/YuGbYCfAG81mZBnN@slm.duckdns.org/
> 
> [2]
> https://lore.kernel.org/all/20220121101210.84926-1-zhangqiao22@huawei.com/

The revert + Tejun's patch looks fine wrt the problem of the reverted
patch (just moves cpus_read_lock to upper callers).

I'd just suggest a comment that'd explicitly document also the lock
order that we stick to, IIUC, it should be:

	cpu_hotplug_lock // cpus_read_lock
	cgroup_threadgroup_rwsem
	cpuset_rwsem

Michal

> 
> [3] https://lore.kernel.org/lkml/CAB8ipk-72V-bYRfL-VcSRSyXTeQqkBVj+1d5MHSVV5CTar9a0Q@mail.gmail.com/
> 
> -Mukesh

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Query regarding deadlock involving cgroup_threadgroup_rwsem and cpu_hotplug_lock
@ 2022-08-15  9:25                   ` Xuewen Yan
  0 siblings, 0 replies; 28+ messages in thread
From: Xuewen Yan @ 2022-08-15  9:25 UTC (permalink / raw)
  To: Michal Koutný
  Cc: Mukesh Ojha, Tejun Heo, Imran Khan, lizefan.x, hannes, tglx,
	steven.price, peterz, cgroups, linux-kernel, Zhao Gongyi,
	Zhang Qiao

Hi Michal

On Mon, Aug 15, 2022 at 5:06 PM Michal Koutný <mkoutny@suse.com> wrote:
>
> +Cc: Zhao Gongyi <zhaogongyi@huawei.com>, Zhang Qiao <zhangqiao22@huawei.com>
>
> On Fri, Aug 12, 2022 at 03:57:00PM +0530, Mukesh Ojha <quic_mojha@quicinc.com> wrote:
> > The original patch of yours [1]  and the revert of [2] is fixing the issue
> > and it is also confirmed here [3].
> > Can we get proper fix merge on your tree?
> >
> > [1] https://lore.kernel.org/lkml/YuGbYCfAG81mZBnN@slm.duckdns.org/
> >
> > [2]
> > https://lore.kernel.org/all/20220121101210.84926-1-zhangqiao22@huawei.com/
>
> The revert + Tejun's patch looks fine wrt the problem of the reverted
> patch (just moves cpus_read_lock to upper callers).

Your means is that the problem should be fixed by [1]+[2]'s revert ?
I just tested the case which reverted the [2]. Need I test with [1] and [2]?

Thanks!

>
> I'd just suggest a comment that'd explicitly document also the lock
> order that we stick to, IIUC, it should be:
>
>         cpu_hotplug_lock // cpus_read_lock
>         cgroup_threadgroup_rwsem
>         cpuset_rwsem
>
> Michal
>
> >
> > [3] https://lore.kernel.org/lkml/CAB8ipk-72V-bYRfL-VcSRSyXTeQqkBVj+1d5MHSVV5CTar9a0Q@mail.gmail.com/
> >
> > -Mukesh

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Query regarding deadlock involving cgroup_threadgroup_rwsem and cpu_hotplug_lock
@ 2022-08-15  9:25                   ` Xuewen Yan
  0 siblings, 0 replies; 28+ messages in thread
From: Xuewen Yan @ 2022-08-15  9:25 UTC (permalink / raw)
  To: Michal Koutný
  Cc: Mukesh Ojha, Tejun Heo, Imran Khan,
	lizefan.x-EC8Uxl6Npydl57MIdRCFDg, hannes-druUgvl0LCNAfugRpC6u6w,
	tglx-hfZtesqFncYOwBW4kG4KsQ, steven.price-5wv7dgnIgG8,
	peterz-wEGCiKHe2LqWVfeAwA7xHQ, cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Zhao Gongyi, Zhang Qiao

Hi Michal

On Mon, Aug 15, 2022 at 5:06 PM Michal Koutn√Ω <mkoutny-IBi9RG/b67k@public.gmane.org> wrote:
>
> +Cc: Zhao Gongyi <zhaogongyi-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>, Zhang Qiao <zhangqiao22@huawei.com>
>
> On Fri, Aug 12, 2022 at 03:57:00PM +0530, Mukesh Ojha <quic_mojha@quicinc.com> wrote:
> > The original patch of yours [1]  and the revert of [2] is fixing the issue
> > and it is also confirmed here [3].
> > Can we get proper fix merge on your tree?
> >
> > [1] https://lore.kernel.org/lkml/YuGbYCfAG81mZBnN-NiLfg/pYEd1N0TnZuCh8vA@public.gmane.org/
> >
> > [2]
> > https://lore.kernel.org/all/20220121101210.84926-1-zhangqiao22@huawei.com/
>
> The revert + Tejun's patch looks fine wrt the problem of the reverted
> patch (just moves cpus_read_lock to upper callers).

Your means is that the problem should be fixed by [1]+[2]'s revert ?
I just tested the case which reverted the [2]. Need I test with [1] and [2]?

Thanks!

>
> I'd just suggest a comment that'd explicitly document also the lock
> order that we stick to, IIUC, it should be:
>
>         cpu_hotplug_lock // cpus_read_lock
>         cgroup_threadgroup_rwsem
>         cpuset_rwsem
>
> Michal
>
> >
> > [3] https://lore.kernel.org/lkml/CAB8ipk-72V-bYRfL-VcSRSyXTeQqkBVj+1d5MHSVV5CTar9a0Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org/
> >
> > -Mukesh

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Query regarding deadlock involving cgroup_threadgroup_rwsem and cpu_hotplug_lock
@ 2022-08-15  9:39                     ` Michal Koutný
  0 siblings, 0 replies; 28+ messages in thread
From: Michal Koutný @ 2022-08-15  9:39 UTC (permalink / raw)
  To: Xuewen Yan
  Cc: Mukesh Ojha, Tejun Heo, Imran Khan, lizefan.x, hannes, tglx,
	steven.price, peterz, cgroups, linux-kernel, Zhao Gongyi,
	Zhang Qiao

On Mon, Aug 15, 2022 at 05:25:52PM +0800, Xuewen Yan <xuewen.yan94@gmail.com> wrote:
> Your means is that the problem should be fixed by [1]+[2]'s revert ?

I understood that was already the combination you had tested.
You write in [T] that [1] alone causes (another) deadlock and therefore
the revert of [2] was suggested.

> I just tested the case which reverted the [2]. Need I test with [1] and [2]?

It'd be better (unless you haven't already :-), my reasoning is for the
[1]+[2] combo.

Thanks,
Michal

[T] https://lore.kernel.org/r/CAB8ipk_gCLtvEahsp2DvPJf4NxRsM8WCYmmH=yTd7zQE+81_Yg@mail.gmail.com/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Query regarding deadlock involving cgroup_threadgroup_rwsem and cpu_hotplug_lock
@ 2022-08-15  9:39                     ` Michal Koutný
  0 siblings, 0 replies; 28+ messages in thread
From: Michal Koutný @ 2022-08-15  9:39 UTC (permalink / raw)
  To: Xuewen Yan
  Cc: Mukesh Ojha, Tejun Heo, Imran Khan,
	lizefan.x-EC8Uxl6Npydl57MIdRCFDg, hannes-druUgvl0LCNAfugRpC6u6w,
	tglx-hfZtesqFncYOwBW4kG4KsQ, steven.price-5wv7dgnIgG8,
	peterz-wEGCiKHe2LqWVfeAwA7xHQ, cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Zhao Gongyi, Zhang Qiao

On Mon, Aug 15, 2022 at 05:25:52PM +0800, Xuewen Yan <xuewen.yan94-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> Your means is that the problem should be fixed by [1]+[2]'s revert ?

I understood that was already the combination you had tested.
You write in [T] that [1] alone causes (another) deadlock and therefore
the revert of [2] was suggested.

> I just tested the case which reverted the [2]. Need I test with [1] and [2]?

It'd be better (unless you haven't already :-), my reasoning is for the
[1]+[2] combo.

Thanks,
Michal

[T] https://lore.kernel.org/r/CAB8ipk_gCLtvEahsp2DvPJf4NxRsM8WCYmmH=yTd7zQE+81_Yg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Query regarding deadlock involving cgroup_threadgroup_rwsem and cpu_hotplug_lock
@ 2022-08-15 10:59                       ` Mukesh Ojha
  0 siblings, 0 replies; 28+ messages in thread
From: Mukesh Ojha @ 2022-08-15 10:59 UTC (permalink / raw)
  To: Michal Koutný, Xuewen Yan
  Cc: Tejun Heo, Imran Khan, lizefan.x, hannes, tglx, steven.price,
	peterz, cgroups, linux-kernel, Zhao Gongyi, Zhang Qiao



On 8/15/2022 3:09 PM, Michal Koutný wrote:
> On Mon, Aug 15, 2022 at 05:25:52PM +0800, Xuewen Yan <xuewen.yan94@gmail.com> wrote:
>> Your means is that the problem should be fixed by [1]+[2]'s revert ?
> 
> I understood that was already the combination you had tested.
> You write in [T] that [1] alone causes (another) deadlock and therefore
> the revert of [2] was suggested.
> 
>> I just tested the case which reverted the [2]. Need I test with [1] and [2]?
> 
> It'd be better (unless you haven't already :-), my reasoning is for the
> [1]+[2] combo.

Feel free to add my

Reported-and-tested-by: Mukesh Ojha <quic_mojha@quicinc.com>

-Mukesh
> 
> Thanks,
> Michal
> 
> [T] https://lore.kernel.org/r/CAB8ipk_gCLtvEahsp2DvPJf4NxRsM8WCYmmH=yTd7zQE+81_Yg@mail.gmail.com/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Query regarding deadlock involving cgroup_threadgroup_rwsem and cpu_hotplug_lock
@ 2022-08-15 10:59                       ` Mukesh Ojha
  0 siblings, 0 replies; 28+ messages in thread
From: Mukesh Ojha @ 2022-08-15 10:59 UTC (permalink / raw)
  To: Michal Koutný, Xuewen Yan
  Cc: Tejun Heo, Imran Khan, lizefan.x-EC8Uxl6Npydl57MIdRCFDg,
	hannes-druUgvl0LCNAfugRpC6u6w, tglx-hfZtesqFncYOwBW4kG4KsQ,
	steven.price-5wv7dgnIgG8, peterz-wEGCiKHe2LqWVfeAwA7xHQ,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Zhao Gongyi, Zhang Qiao



On 8/15/2022 3:09 PM, Michal KoutnÃ½ wrote:
> On Mon, Aug 15, 2022 at 05:25:52PM +0800, Xuewen Yan <xuewen.yan94-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> Your means is that the problem should be fixed by [1]+[2]'s revert ?
> 
> I understood that was already the combination you had tested.
> You write in [T] that [1] alone causes (another) deadlock and therefore
> the revert of [2] was suggested.
> 
>> I just tested the case which reverted the [2]. Need I test with [1] and [2]?
> 
> It'd be better (unless you haven't already :-), my reasoning is for the
> [1]+[2] combo.

Feel free to add my

Reported-and-tested-by: Mukesh Ojha <quic_mojha-jfJNa2p1gH1BDgjK7y7TUQ@public.gmane.org>

-Mukesh
> 
> Thanks,
> Michal
> 
> [T] https://lore.kernel.org/r/CAB8ipk_gCLtvEahsp2DvPJf4NxRsM8WCYmmH=yTd7zQE+81_Yg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH cgroup/for-6.0-fixes] cgroup: Fix threadgroup_rwsem <-> cpus_read_lock() deadlock
@ 2022-08-15 23:27                         ` Tejun Heo
  0 siblings, 0 replies; 28+ messages in thread
From: Tejun Heo @ 2022-08-15 23:27 UTC (permalink / raw)
  To: Mukesh Ojha
  Cc: Michal Koutný,
	Xuewen Yan, Imran Khan, lizefan.x, hannes, tglx, steven.price,
	peterz, cgroups, linux-kernel, Zhao Gongyi, Zhang Qiao

Bringing up a CPU may involve creating new tasks which requires read-locking
threadgroup_rwsem, so threadgroup_rwsem nests inside cpus_read_lock().
However, cpuset's ->attach(), which may be called with thredagroup_rwsem
write-locked, also wants to disable CPU hotplug and acquires
cpus_read_lock(), leading to a deadlock.

Fix it by guaranteeing that ->attach() is always called with CPU hotplug
disabled and removing cpus_read_lock() call from cpuset_attach().

Signed-off-by: Tejun Heo <tj@kernel.org>
---
Hello, sorry about the delay.

So, the previous patch + the revert isn't quite correct because we sometimes
elide both cpus_read_lock() and threadgroup_rwsem together and
cpuset_attach() woudl end up running without CPU hotplug enabled. Can you
please test whether this patch fixes the problem?

Thanks.

 kernel/cgroup/cgroup.c |   77 ++++++++++++++++++++++++++++++++++---------------
 kernel/cgroup/cpuset.c |    3 -
 2 files changed, 55 insertions(+), 25 deletions(-)

diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index ffaccd6373f1e..52502f34fae8c 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -2369,6 +2369,47 @@ int task_cgroup_path(struct task_struct *task, char *buf, size_t buflen)
 }
 EXPORT_SYMBOL_GPL(task_cgroup_path);
 
+/**
+ * cgroup_attach_lock - Lock for ->attach()
+ * @lock_threadgroup: whether to down_write cgroup_threadgroup_rwsem
+ *
+ * cgroup migration sometimes needs to stabilize threadgroups against forks and
+ * exits by write-locking cgroup_threadgroup_rwsem. However, some ->attach()
+ * implementations (e.g. cpuset), also need to disable CPU hotplug.
+ * Unfortunately, letting ->attach() operations acquire cpus_read_lock() can
+ * lead to deadlocks.
+ *
+ * Bringing up a CPU may involve creating new tasks which requires read-locking
+ * threadgroup_rwsem, so threadgroup_rwsem nests inside cpus_read_lock(). If we
+ * call an ->attach() which acquires the cpus lock while write-locking
+ * threadgroup_rwsem, the locking order is reversed and we end up waiting for an
+ * on-going CPU hotplug operation which in turn is waiting for the
+ * threadgroup_rwsem to be released to create new tasks. For more details:
+ *
+ *   http://lkml.kernel.org/r/20220711174629.uehfmqegcwn2lqzu@wubuntu
+ *
+ * Resolve the situation by always acquiring cpus_read_lock() before optionally
+ * write-locking cgroup_threadgroup_rwsem. This allows ->attach() to assume that
+ * CPU hotplug is disabled on entry.
+ */
+static void cgroup_attach_lock(bool lock_threadgroup)
+{
+	cpus_read_lock();
+	if (lock_threadgroup)
+		percpu_down_write(&cgroup_threadgroup_rwsem);
+}
+
+/**
+ * cgroup_attach_unlock - Undo cgroup_attach_lock()
+ * @lock_threadgroup: whether to up_write cgroup_threadgroup_rwsem
+ */
+static void cgroup_attach_unlock(bool lock_threadgroup)
+{
+	if (lock_threadgroup)
+		percpu_up_write(&cgroup_threadgroup_rwsem);
+	cpus_read_unlock();
+}
+
 /**
  * cgroup_migrate_add_task - add a migration target task to a migration context
  * @task: target task
@@ -2841,8 +2882,7 @@ int cgroup_attach_task(struct cgroup *dst_cgrp, struct task_struct *leader,
 }
 
 struct task_struct *cgroup_procs_write_start(char *buf, bool threadgroup,
-					     bool *locked)
-	__acquires(&cgroup_threadgroup_rwsem)
+					     bool *threadgroup_locked)
 {
 	struct task_struct *tsk;
 	pid_t pid;
@@ -2859,12 +2899,8 @@ struct task_struct *cgroup_procs_write_start(char *buf, bool threadgroup,
 	 * Therefore, we can skip the global lock.
 	 */
 	lockdep_assert_held(&cgroup_mutex);
-	if (pid || threadgroup) {
-		percpu_down_write(&cgroup_threadgroup_rwsem);
-		*locked = true;
-	} else {
-		*locked = false;
-	}
+	*threadgroup_locked = pid || threadgroup;
+	cgroup_attach_lock(*threadgroup_locked);
 
 	rcu_read_lock();
 	if (pid) {
@@ -2895,17 +2931,14 @@ struct task_struct *cgroup_procs_write_start(char *buf, bool threadgroup,
 	goto out_unlock_rcu;
 
 out_unlock_threadgroup:
-	if (*locked) {
-		percpu_up_write(&cgroup_threadgroup_rwsem);
-		*locked = false;
-	}
+	cgroup_attach_unlock(*threadgroup_locked);
+	*threadgroup_locked = false;
 out_unlock_rcu:
 	rcu_read_unlock();
 	return tsk;
 }
 
-void cgroup_procs_write_finish(struct task_struct *task, bool locked)
-	__releases(&cgroup_threadgroup_rwsem)
+void cgroup_procs_write_finish(struct task_struct *task, bool threadgroup_locked)
 {
 	struct cgroup_subsys *ss;
 	int ssid;
@@ -2913,8 +2946,8 @@ void cgroup_procs_write_finish(struct task_struct *task, bool locked)
 	/* release reference from cgroup_procs_write_start() */
 	put_task_struct(task);
 
-	if (locked)
-		percpu_up_write(&cgroup_threadgroup_rwsem);
+	cgroup_attach_unlock(threadgroup_locked);
+
 	for_each_subsys(ss, ssid)
 		if (ss->post_attach)
 			ss->post_attach();
@@ -3000,8 +3033,7 @@ static int cgroup_update_dfl_csses(struct cgroup *cgrp)
 	 * write-locking can be skipped safely.
 	 */
 	has_tasks = !list_empty(&mgctx.preloaded_src_csets);
-	if (has_tasks)
-		percpu_down_write(&cgroup_threadgroup_rwsem);
+	cgroup_attach_lock(has_tasks);
 
 	/* NULL dst indicates self on default hierarchy */
 	ret = cgroup_migrate_prepare_dst(&mgctx);
@@ -3022,8 +3054,7 @@ static int cgroup_update_dfl_csses(struct cgroup *cgrp)
 	ret = cgroup_migrate_execute(&mgctx);
 out_finish:
 	cgroup_migrate_finish(&mgctx);
-	if (has_tasks)
-		percpu_up_write(&cgroup_threadgroup_rwsem);
+	cgroup_attach_unlock(has_tasks);
 	return ret;
 }
 
@@ -4971,13 +5002,13 @@ static ssize_t __cgroup_procs_write(struct kernfs_open_file *of, char *buf,
 	struct task_struct *task;
 	const struct cred *saved_cred;
 	ssize_t ret;
-	bool locked;
+	bool threadgroup_locked;
 
 	dst_cgrp = cgroup_kn_lock_live(of->kn, false);
 	if (!dst_cgrp)
 		return -ENODEV;
 
-	task = cgroup_procs_write_start(buf, threadgroup, &locked);
+	task = cgroup_procs_write_start(buf, threadgroup, &threadgroup_locked);
 	ret = PTR_ERR_OR_ZERO(task);
 	if (ret)
 		goto out_unlock;
@@ -5003,7 +5034,7 @@ static ssize_t __cgroup_procs_write(struct kernfs_open_file *of, char *buf,
 	ret = cgroup_attach_task(dst_cgrp, task, threadgroup);
 
 out_finish:
-	cgroup_procs_write_finish(task, locked);
+	cgroup_procs_write_finish(task, threadgroup_locked);
 out_unlock:
 	cgroup_kn_unlock(of->kn);
 
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 58aadfda9b8b3..1f3a55297f39d 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -2289,7 +2289,7 @@ static void cpuset_attach(struct cgroup_taskset *tset)
 	cgroup_taskset_first(tset, &css);
 	cs = css_cs(css);
 
-	cpus_read_lock();
+	lockdep_assert_cpus_held();	/* see cgroup_attach_lock() */
 	percpu_down_write(&cpuset_rwsem);
 
 	guarantee_online_mems(cs, &cpuset_attach_nodemask_to);
@@ -2343,7 +2343,6 @@ static void cpuset_attach(struct cgroup_taskset *tset)
 		wake_up(&cpuset_attach_wq);
 
 	percpu_up_write(&cpuset_rwsem);
-	cpus_read_unlock();
 }
 
 /* The various types of files and directories in a cpuset file system */

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH cgroup/for-6.0-fixes] cgroup: Fix threadgroup_rwsem <-> cpus_read_lock() deadlock
@ 2022-08-15 23:27                         ` Tejun Heo
  0 siblings, 0 replies; 28+ messages in thread
From: Tejun Heo @ 2022-08-15 23:27 UTC (permalink / raw)
  To: Mukesh Ojha
  Cc: Michal Koutný,
	Xuewen Yan, Imran Khan, lizefan.x-EC8Uxl6Npydl57MIdRCFDg,
	hannes-druUgvl0LCNAfugRpC6u6w, tglx-hfZtesqFncYOwBW4kG4KsQ,
	steven.price-5wv7dgnIgG8, peterz-wEGCiKHe2LqWVfeAwA7xHQ,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Zhao Gongyi, Zhang Qiao

Bringing up a CPU may involve creating new tasks which requires read-locking
threadgroup_rwsem, so threadgroup_rwsem nests inside cpus_read_lock().
However, cpuset's ->attach(), which may be called with thredagroup_rwsem
write-locked, also wants to disable CPU hotplug and acquires
cpus_read_lock(), leading to a deadlock.

Fix it by guaranteeing that ->attach() is always called with CPU hotplug
disabled and removing cpus_read_lock() call from cpuset_attach().

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
Hello, sorry about the delay.

So, the previous patch + the revert isn't quite correct because we sometimes
elide both cpus_read_lock() and threadgroup_rwsem together and
cpuset_attach() woudl end up running without CPU hotplug enabled. Can you
please test whether this patch fixes the problem?

Thanks.

 kernel/cgroup/cgroup.c |   77 ++++++++++++++++++++++++++++++++++---------------
 kernel/cgroup/cpuset.c |    3 -
 2 files changed, 55 insertions(+), 25 deletions(-)

diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index ffaccd6373f1e..52502f34fae8c 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -2369,6 +2369,47 @@ int task_cgroup_path(struct task_struct *task, char *buf, size_t buflen)
 }
 EXPORT_SYMBOL_GPL(task_cgroup_path);
 
+/**
+ * cgroup_attach_lock - Lock for ->attach()
+ * @lock_threadgroup: whether to down_write cgroup_threadgroup_rwsem
+ *
+ * cgroup migration sometimes needs to stabilize threadgroups against forks and
+ * exits by write-locking cgroup_threadgroup_rwsem. However, some ->attach()
+ * implementations (e.g. cpuset), also need to disable CPU hotplug.
+ * Unfortunately, letting ->attach() operations acquire cpus_read_lock() can
+ * lead to deadlocks.
+ *
+ * Bringing up a CPU may involve creating new tasks which requires read-locking
+ * threadgroup_rwsem, so threadgroup_rwsem nests inside cpus_read_lock(). If we
+ * call an ->attach() which acquires the cpus lock while write-locking
+ * threadgroup_rwsem, the locking order is reversed and we end up waiting for an
+ * on-going CPU hotplug operation which in turn is waiting for the
+ * threadgroup_rwsem to be released to create new tasks. For more details:
+ *
+ *   http://lkml.kernel.org/r/20220711174629.uehfmqegcwn2lqzu@wubuntu
+ *
+ * Resolve the situation by always acquiring cpus_read_lock() before optionally
+ * write-locking cgroup_threadgroup_rwsem. This allows ->attach() to assume that
+ * CPU hotplug is disabled on entry.
+ */
+static void cgroup_attach_lock(bool lock_threadgroup)
+{
+	cpus_read_lock();
+	if (lock_threadgroup)
+		percpu_down_write(&cgroup_threadgroup_rwsem);
+}
+
+/**
+ * cgroup_attach_unlock - Undo cgroup_attach_lock()
+ * @lock_threadgroup: whether to up_write cgroup_threadgroup_rwsem
+ */
+static void cgroup_attach_unlock(bool lock_threadgroup)
+{
+	if (lock_threadgroup)
+		percpu_up_write(&cgroup_threadgroup_rwsem);
+	cpus_read_unlock();
+}
+
 /**
  * cgroup_migrate_add_task - add a migration target task to a migration context
  * @task: target task
@@ -2841,8 +2882,7 @@ int cgroup_attach_task(struct cgroup *dst_cgrp, struct task_struct *leader,
 }
 
 struct task_struct *cgroup_procs_write_start(char *buf, bool threadgroup,
-					     bool *locked)
-	__acquires(&cgroup_threadgroup_rwsem)
+					     bool *threadgroup_locked)
 {
 	struct task_struct *tsk;
 	pid_t pid;
@@ -2859,12 +2899,8 @@ struct task_struct *cgroup_procs_write_start(char *buf, bool threadgroup,
 	 * Therefore, we can skip the global lock.
 	 */
 	lockdep_assert_held(&cgroup_mutex);
-	if (pid || threadgroup) {
-		percpu_down_write(&cgroup_threadgroup_rwsem);
-		*locked = true;
-	} else {
-		*locked = false;
-	}
+	*threadgroup_locked = pid || threadgroup;
+	cgroup_attach_lock(*threadgroup_locked);
 
 	rcu_read_lock();
 	if (pid) {
@@ -2895,17 +2931,14 @@ struct task_struct *cgroup_procs_write_start(char *buf, bool threadgroup,
 	goto out_unlock_rcu;
 
 out_unlock_threadgroup:
-	if (*locked) {
-		percpu_up_write(&cgroup_threadgroup_rwsem);
-		*locked = false;
-	}
+	cgroup_attach_unlock(*threadgroup_locked);
+	*threadgroup_locked = false;
 out_unlock_rcu:
 	rcu_read_unlock();
 	return tsk;
 }
 
-void cgroup_procs_write_finish(struct task_struct *task, bool locked)
-	__releases(&cgroup_threadgroup_rwsem)
+void cgroup_procs_write_finish(struct task_struct *task, bool threadgroup_locked)
 {
 	struct cgroup_subsys *ss;
 	int ssid;
@@ -2913,8 +2946,8 @@ void cgroup_procs_write_finish(struct task_struct *task, bool locked)
 	/* release reference from cgroup_procs_write_start() */
 	put_task_struct(task);
 
-	if (locked)
-		percpu_up_write(&cgroup_threadgroup_rwsem);
+	cgroup_attach_unlock(threadgroup_locked);
+
 	for_each_subsys(ss, ssid)
 		if (ss->post_attach)
 			ss->post_attach();
@@ -3000,8 +3033,7 @@ static int cgroup_update_dfl_csses(struct cgroup *cgrp)
 	 * write-locking can be skipped safely.
 	 */
 	has_tasks = !list_empty(&mgctx.preloaded_src_csets);
-	if (has_tasks)
-		percpu_down_write(&cgroup_threadgroup_rwsem);
+	cgroup_attach_lock(has_tasks);
 
 	/* NULL dst indicates self on default hierarchy */
 	ret = cgroup_migrate_prepare_dst(&mgctx);
@@ -3022,8 +3054,7 @@ static int cgroup_update_dfl_csses(struct cgroup *cgrp)
 	ret = cgroup_migrate_execute(&mgctx);
 out_finish:
 	cgroup_migrate_finish(&mgctx);
-	if (has_tasks)
-		percpu_up_write(&cgroup_threadgroup_rwsem);
+	cgroup_attach_unlock(has_tasks);
 	return ret;
 }
 
@@ -4971,13 +5002,13 @@ static ssize_t __cgroup_procs_write(struct kernfs_open_file *of, char *buf,
 	struct task_struct *task;
 	const struct cred *saved_cred;
 	ssize_t ret;
-	bool locked;
+	bool threadgroup_locked;
 
 	dst_cgrp = cgroup_kn_lock_live(of->kn, false);
 	if (!dst_cgrp)
 		return -ENODEV;
 
-	task = cgroup_procs_write_start(buf, threadgroup, &locked);
+	task = cgroup_procs_write_start(buf, threadgroup, &threadgroup_locked);
 	ret = PTR_ERR_OR_ZERO(task);
 	if (ret)
 		goto out_unlock;
@@ -5003,7 +5034,7 @@ static ssize_t __cgroup_procs_write(struct kernfs_open_file *of, char *buf,
 	ret = cgroup_attach_task(dst_cgrp, task, threadgroup);
 
 out_finish:
-	cgroup_procs_write_finish(task, locked);
+	cgroup_procs_write_finish(task, threadgroup_locked);
 out_unlock:
 	cgroup_kn_unlock(of->kn);
 
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 58aadfda9b8b3..1f3a55297f39d 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -2289,7 +2289,7 @@ static void cpuset_attach(struct cgroup_taskset *tset)
 	cgroup_taskset_first(tset, &css);
 	cs = css_cs(css);
 
-	cpus_read_lock();
+	lockdep_assert_cpus_held();	/* see cgroup_attach_lock() */
 	percpu_down_write(&cpuset_rwsem);
 
 	guarantee_online_mems(cs, &cpuset_attach_nodemask_to);
@@ -2343,7 +2343,6 @@ static void cpuset_attach(struct cgroup_taskset *tset)
 		wake_up(&cpuset_attach_wq);
 
 	percpu_up_write(&cpuset_rwsem);
-	cpus_read_unlock();
 }
 
 /* The various types of files and directories in a cpuset file system */

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH cgroup/for-6.0-fixes] cgroup: Fix threadgroup_rwsem <-> cpus_read_lock() deadlock
@ 2022-08-16 20:20                           ` Imran Khan
  0 siblings, 0 replies; 28+ messages in thread
From: Imran Khan @ 2022-08-16 20:20 UTC (permalink / raw)
  To: Tejun Heo, Mukesh Ojha
  Cc: Michal Koutný,
	Xuewen Yan, lizefan.x, hannes, tglx, steven.price, peterz,
	cgroups, linux-kernel, Zhao Gongyi, Zhang Qiao

Hello Tejun,

On 16/8/22 9:27 am, Tejun Heo wrote:
> Bringing up a CPU may involve creating new tasks which requires read-locking
> threadgroup_rwsem, so threadgroup_rwsem nests inside cpus_read_lock().
> However, cpuset's ->attach(), which may be called with thredagroup_rwsem
> write-locked, also wants to disable CPU hotplug and acquires
> cpus_read_lock(), leading to a deadlock.
> 
> Fix it by guaranteeing that ->attach() is always called with CPU hotplug
> disabled and removing cpus_read_lock() call from cpuset_attach().
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> ---
> Hello, sorry about the delay.
> 
> So, the previous patch + the revert isn't quite correct because we sometimes
> elide both cpus_read_lock() and threadgroup_rwsem together and
> cpuset_attach() woudl end up running without CPU hotplug enabled. Can you
> please test whether this patch fixes the problem?
> 

This fixes the issue seen in my setup. As my setup is 5.4 based I used
cgroup_attach_lock/unlock(true) in the backport version of your patch.

Feel free to add my

Reviewed-and-tested-by: Imran Khan <imran.f.khan@oracle.com>

Thanks,
-- Imran

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH cgroup/for-6.0-fixes] cgroup: Fix threadgroup_rwsem <-> cpus_read_lock() deadlock
@ 2022-08-16 20:20                           ` Imran Khan
  0 siblings, 0 replies; 28+ messages in thread
From: Imran Khan @ 2022-08-16 20:20 UTC (permalink / raw)
  To: Tejun Heo, Mukesh Ojha
  Cc: Michal Koutný,
	Xuewen Yan, lizefan.x-EC8Uxl6Npydl57MIdRCFDg,
	hannes-druUgvl0LCNAfugRpC6u6w, tglx-hfZtesqFncYOwBW4kG4KsQ,
	steven.price-5wv7dgnIgG8, peterz-wEGCiKHe2LqWVfeAwA7xHQ,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Zhao Gongyi, Zhang Qiao

Hello Tejun,

On 16/8/22 9:27 am, Tejun Heo wrote:
> Bringing up a CPU may involve creating new tasks which requires read-locking
> threadgroup_rwsem, so threadgroup_rwsem nests inside cpus_read_lock().
> However, cpuset's ->attach(), which may be called with thredagroup_rwsem
> write-locked, also wants to disable CPU hotplug and acquires
> cpus_read_lock(), leading to a deadlock.
> 
> Fix it by guaranteeing that ->attach() is always called with CPU hotplug
> disabled and removing cpus_read_lock() call from cpuset_attach().
> 
> Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> ---
> Hello, sorry about the delay.
> 
> So, the previous patch + the revert isn't quite correct because we sometimes
> elide both cpus_read_lock() and threadgroup_rwsem together and
> cpuset_attach() woudl end up running without CPU hotplug enabled. Can you
> please test whether this patch fixes the problem?
> 

This fixes the issue seen in my setup. As my setup is 5.4 based I used
cgroup_attach_lock/unlock(true) in the backport version of your patch.

Feel free to add my

Reviewed-and-tested-by: Imran Khan <imran.f.khan-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>

Thanks,
-- Imran

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH cgroup/for-6.0-fixes] cgroup: Fix threadgroup_rwsem <-> cpus_read_lock() deadlock
@ 2022-08-17  6:55                           ` Xuewen Yan
  0 siblings, 0 replies; 28+ messages in thread
From: Xuewen Yan @ 2022-08-17  6:55 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Mukesh Ojha, Michal Koutný,
	Imran Khan, lizefan.x, hannes, tglx, steven.price, peterz,
	cgroups, linux-kernel, Zhao Gongyi, Zhang Qiao,
	王科 (Ke Wang),
	orson.zhai, Xuewen Yan

Hi Tejun

On Tue, Aug 16, 2022 at 7:27 AM Tejun Heo <tj@kernel.org> wrote:
>
> Bringing up a CPU may involve creating new tasks which requires read-locking
> threadgroup_rwsem, so threadgroup_rwsem nests inside cpus_read_lock().

Indeed, it is creating new kthreads. And not only creating new
kthread, but also destroying kthread. the backtrace is:

__switch_to
__schedule
schedule
percpu_rwsem_wait   <<< wait for cgroup_threadgroup_rwsem
__percpu_down_read
exit_signals
do_exit
kthread

> However, cpuset's ->attach(), which may be called with thredagroup_rwsem
> write-locked, also wants to disable CPU hotplug and acquires
> cpus_read_lock(), leading to a deadlock.
>
> Fix it by guaranteeing that ->attach() is always called with CPU hotplug
> disabled and removing cpus_read_lock() call from cpuset_attach().
>
> Signed-off-by: Tejun Heo <tj@kernel.org>
> ---
> Hello, sorry about the delay.
>
> So, the previous patch + the revert isn't quite correct because we sometimes
> elide both cpus_read_lock() and threadgroup_rwsem together and
> cpuset_attach() woudl end up running without CPU hotplug enabled. Can you
> please test whether this patch fixes the problem?
>
> Thanks.
>
>  kernel/cgroup/cgroup.c |   77 ++++++++++++++++++++++++++++++++++---------------
>  kernel/cgroup/cpuset.c |    3 -
>  2 files changed, 55 insertions(+), 25 deletions(-)
>
> diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
> index ffaccd6373f1e..52502f34fae8c 100644
> --- a/kernel/cgroup/cgroup.c
> +++ b/kernel/cgroup/cgroup.c
> @@ -2369,6 +2369,47 @@ int task_cgroup_path(struct task_struct *task, char *buf, size_t buflen)
>  }
>  EXPORT_SYMBOL_GPL(task_cgroup_path);
>
> +/**
> + * cgroup_attach_lock - Lock for ->attach()
> + * @lock_threadgroup: whether to down_write cgroup_threadgroup_rwsem
> + *
> + * cgroup migration sometimes needs to stabilize threadgroups against forks and
> + * exits by write-locking cgroup_threadgroup_rwsem. However, some ->attach()
> + * implementations (e.g. cpuset), also need to disable CPU hotplug.
> + * Unfortunately, letting ->attach() operations acquire cpus_read_lock() can
> + * lead to deadlocks.
> + *
> + * Bringing up a CPU may involve creating new tasks which requires read-locking

Is it better to change to creating new kthreads and destroying kthreads?

> + * threadgroup_rwsem, so threadgroup_rwsem nests inside cpus_read_lock(). If we
> + * call an ->attach() which acquires the cpus lock while write-locking
> + * threadgroup_rwsem, the locking order is reversed and we end up waiting for an
> + * on-going CPU hotplug operation which in turn is waiting for the
> + * threadgroup_rwsem to be released to create new tasks. For more details:
> + *
> + *   http://lkml.kernel.org/r/20220711174629.uehfmqegcwn2lqzu@wubuntu
> + *
> + * Resolve the situation by always acquiring cpus_read_lock() before optionally
> + * write-locking cgroup_threadgroup_rwsem. This allows ->attach() to assume that
> + * CPU hotplug is disabled on entry.
> + */
> +static void cgroup_attach_lock(bool lock_threadgroup)
> +{
> +       cpus_read_lock();
> +       if (lock_threadgroup)
> +               percpu_down_write(&cgroup_threadgroup_rwsem);
> +}
> +
> +/**
> + * cgroup_attach_unlock - Undo cgroup_attach_lock()
> + * @lock_threadgroup: whether to up_write cgroup_threadgroup_rwsem
> + */
> +static void cgroup_attach_unlock(bool lock_threadgroup)
> +{
> +       if (lock_threadgroup)
> +               percpu_up_write(&cgroup_threadgroup_rwsem);
> +       cpus_read_unlock();
> +}
> +
>  /**
>   * cgroup_migrate_add_task - add a migration target task to a migration context
>   * @task: target task
> @@ -2841,8 +2882,7 @@ int cgroup_attach_task(struct cgroup *dst_cgrp, struct task_struct *leader,
>  }
>
>  struct task_struct *cgroup_procs_write_start(char *buf, bool threadgroup,
> -                                            bool *locked)
> -       __acquires(&cgroup_threadgroup_rwsem)
> +                                            bool *threadgroup_locked)
>  {
>         struct task_struct *tsk;
>         pid_t pid;
> @@ -2859,12 +2899,8 @@ struct task_struct *cgroup_procs_write_start(char *buf, bool threadgroup,
>          * Therefore, we can skip the global lock.
>          */
>         lockdep_assert_held(&cgroup_mutex);
> -       if (pid || threadgroup) {
> -               percpu_down_write(&cgroup_threadgroup_rwsem);
> -               *locked = true;
> -       } else {
> -               *locked = false;
> -       }
> +       *threadgroup_locked = pid || threadgroup;
> +       cgroup_attach_lock(*threadgroup_locked);
>
>         rcu_read_lock();
>         if (pid) {
> @@ -2895,17 +2931,14 @@ struct task_struct *cgroup_procs_write_start(char *buf, bool threadgroup,
>         goto out_unlock_rcu;
>
>  out_unlock_threadgroup:
> -       if (*locked) {
> -               percpu_up_write(&cgroup_threadgroup_rwsem);
> -               *locked = false;
> -       }
> +       cgroup_attach_unlock(*threadgroup_locked);
> +       *threadgroup_locked = false;
>  out_unlock_rcu:
>         rcu_read_unlock();
>         return tsk;
>  }
>
> -void cgroup_procs_write_finish(struct task_struct *task, bool locked)
> -       __releases(&cgroup_threadgroup_rwsem)
> +void cgroup_procs_write_finish(struct task_struct *task, bool threadgroup_locked)
>  {
>         struct cgroup_subsys *ss;
>         int ssid;
> @@ -2913,8 +2946,8 @@ void cgroup_procs_write_finish(struct task_struct *task, bool locked)
>         /* release reference from cgroup_procs_write_start() */
>         put_task_struct(task);
>
> -       if (locked)
> -               percpu_up_write(&cgroup_threadgroup_rwsem);
> +       cgroup_attach_unlock(threadgroup_locked);
> +
>         for_each_subsys(ss, ssid)
>                 if (ss->post_attach)
>                         ss->post_attach();
> @@ -3000,8 +3033,7 @@ static int cgroup_update_dfl_csses(struct cgroup *cgrp)
>          * write-locking can be skipped safely.
>          */
>         has_tasks = !list_empty(&mgctx.preloaded_src_csets);
> -       if (has_tasks)
> -               percpu_down_write(&cgroup_threadgroup_rwsem);
> +       cgroup_attach_lock(has_tasks);
>
>         /* NULL dst indicates self on default hierarchy */
>         ret = cgroup_migrate_prepare_dst(&mgctx);
> @@ -3022,8 +3054,7 @@ static int cgroup_update_dfl_csses(struct cgroup *cgrp)
>         ret = cgroup_migrate_execute(&mgctx);
>  out_finish:
>         cgroup_migrate_finish(&mgctx);
> -       if (has_tasks)
> -               percpu_up_write(&cgroup_threadgroup_rwsem);
> +       cgroup_attach_unlock(has_tasks);

In kernel5.15, I just set cgroup_attach_lock/unlock(true).

>         return ret;
>  }
>
> @@ -4971,13 +5002,13 @@ static ssize_t __cgroup_procs_write(struct kernfs_open_file *of, char *buf,
>         struct task_struct *task;
>         const struct cred *saved_cred;
>         ssize_t ret;
> -       bool locked;
> +       bool threadgroup_locked;
>
>         dst_cgrp = cgroup_kn_lock_live(of->kn, false);
>         if (!dst_cgrp)
>                 return -ENODEV;
>
> -       task = cgroup_procs_write_start(buf, threadgroup, &locked);
> +       task = cgroup_procs_write_start(buf, threadgroup, &threadgroup_locked);
>         ret = PTR_ERR_OR_ZERO(task);
>         if (ret)
>                 goto out_unlock;
> @@ -5003,7 +5034,7 @@ static ssize_t __cgroup_procs_write(struct kernfs_open_file *of, char *buf,
>         ret = cgroup_attach_task(dst_cgrp, task, threadgroup);
>
>  out_finish:
> -       cgroup_procs_write_finish(task, locked);
> +       cgroup_procs_write_finish(task, threadgroup_locked);
>  out_unlock:
>         cgroup_kn_unlock(of->kn);
>
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index 58aadfda9b8b3..1f3a55297f39d 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -2289,7 +2289,7 @@ static void cpuset_attach(struct cgroup_taskset *tset)
>         cgroup_taskset_first(tset, &css);
>         cs = css_cs(css);
>
> -       cpus_read_lock();
> +       lockdep_assert_cpus_held();     /* see cgroup_attach_lock() */
>         percpu_down_write(&cpuset_rwsem);
>
>         guarantee_online_mems(cs, &cpuset_attach_nodemask_to);
> @@ -2343,7 +2343,6 @@ static void cpuset_attach(struct cgroup_taskset *tset)
>                 wake_up(&cpuset_attach_wq);
>
>         percpu_up_write(&cpuset_rwsem);
> -       cpus_read_unlock();
>  }
>
>  /* The various types of files and directories in a cpuset file system */

I backported your patch. to kernel5.4 and kernel5.15, and just setting
cgroup_attach_lock/unlock(true) when there are conflicts.
And the deadlock has not occured.

Reported-and-tested-by: Xuewen Yan <xuewen.yan@unisoc.com>

Thanks！

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH cgroup/for-6.0-fixes] cgroup: Fix threadgroup_rwsem <-> cpus_read_lock() deadlock
@ 2022-08-17  6:55                           ` Xuewen Yan
  0 siblings, 0 replies; 28+ messages in thread
From: Xuewen Yan @ 2022-08-17  6:55 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Mukesh Ojha, Michal Koutný,
	Imran Khan, lizefan.x-EC8Uxl6Npydl57MIdRCFDg,
	hannes-druUgvl0LCNAfugRpC6u6w, tglx-hfZtesqFncYOwBW4kG4KsQ,
	steven.price-5wv7dgnIgG8, peterz-wEGCiKHe2LqWVfeAwA7xHQ,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Zhao Gongyi, Zhang Qiao,
	王科 (Ke Wang),
	orson.zhai-1tVvrHeaX6nQT0dZR+AlfA, Xuewen Yan

Hi Tejun

On Tue, Aug 16, 2022 at 7:27 AM Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
>
> Bringing up a CPU may involve creating new tasks which requires read-locking
> threadgroup_rwsem, so threadgroup_rwsem nests inside cpus_read_lock().

Indeed, it is creating new kthreads. And not only creating new
kthread, but also destroying kthread. the backtrace is:

__switch_to
__schedule
schedule
percpu_rwsem_wait   <<< wait for cgroup_threadgroup_rwsem
__percpu_down_read
exit_signals
do_exit
kthread

> However, cpuset's ->attach(), which may be called with thredagroup_rwsem
> write-locked, also wants to disable CPU hotplug and acquires
> cpus_read_lock(), leading to a deadlock.
>
> Fix it by guaranteeing that ->attach() is always called with CPU hotplug
> disabled and removing cpus_read_lock() call from cpuset_attach().
>
> Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> ---
> Hello, sorry about the delay.
>
> So, the previous patch + the revert isn't quite correct because we sometimes
> elide both cpus_read_lock() and threadgroup_rwsem together and
> cpuset_attach() woudl end up running without CPU hotplug enabled. Can you
> please test whether this patch fixes the problem?
>
> Thanks.
>
>  kernel/cgroup/cgroup.c |   77 ++++++++++++++++++++++++++++++++++---------------
>  kernel/cgroup/cpuset.c |    3 -
>  2 files changed, 55 insertions(+), 25 deletions(-)
>
> diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
> index ffaccd6373f1e..52502f34fae8c 100644
> --- a/kernel/cgroup/cgroup.c
> +++ b/kernel/cgroup/cgroup.c
> @@ -2369,6 +2369,47 @@ int task_cgroup_path(struct task_struct *task, char *buf, size_t buflen)
>  }
>  EXPORT_SYMBOL_GPL(task_cgroup_path);
>
> +/**
> + * cgroup_attach_lock - Lock for ->attach()
> + * @lock_threadgroup: whether to down_write cgroup_threadgroup_rwsem
> + *
> + * cgroup migration sometimes needs to stabilize threadgroups against forks and
> + * exits by write-locking cgroup_threadgroup_rwsem. However, some ->attach()
> + * implementations (e.g. cpuset), also need to disable CPU hotplug.
> + * Unfortunately, letting ->attach() operations acquire cpus_read_lock() can
> + * lead to deadlocks.
> + *
> + * Bringing up a CPU may involve creating new tasks which requires read-locking

Is it better to change to creating new kthreads and destroying kthreads?

> + * threadgroup_rwsem, so threadgroup_rwsem nests inside cpus_read_lock(). If we
> + * call an ->attach() which acquires the cpus lock while write-locking
> + * threadgroup_rwsem, the locking order is reversed and we end up waiting for an
> + * on-going CPU hotplug operation which in turn is waiting for the
> + * threadgroup_rwsem to be released to create new tasks. For more details:
> + *
> + *   http://lkml.kernel.org/r/20220711174629.uehfmqegcwn2lqzu@wubuntu
> + *
> + * Resolve the situation by always acquiring cpus_read_lock() before optionally
> + * write-locking cgroup_threadgroup_rwsem. This allows ->attach() to assume that
> + * CPU hotplug is disabled on entry.
> + */
> +static void cgroup_attach_lock(bool lock_threadgroup)
> +{
> +       cpus_read_lock();
> +       if (lock_threadgroup)
> +               percpu_down_write(&cgroup_threadgroup_rwsem);
> +}
> +
> +/**
> + * cgroup_attach_unlock - Undo cgroup_attach_lock()
> + * @lock_threadgroup: whether to up_write cgroup_threadgroup_rwsem
> + */
> +static void cgroup_attach_unlock(bool lock_threadgroup)
> +{
> +       if (lock_threadgroup)
> +               percpu_up_write(&cgroup_threadgroup_rwsem);
> +       cpus_read_unlock();
> +}
> +
>  /**
>   * cgroup_migrate_add_task - add a migration target task to a migration context
>   * @task: target task
> @@ -2841,8 +2882,7 @@ int cgroup_attach_task(struct cgroup *dst_cgrp, struct task_struct *leader,
>  }
>
>  struct task_struct *cgroup_procs_write_start(char *buf, bool threadgroup,
> -                                            bool *locked)
> -       __acquires(&cgroup_threadgroup_rwsem)
> +                                            bool *threadgroup_locked)
>  {
>         struct task_struct *tsk;
>         pid_t pid;
> @@ -2859,12 +2899,8 @@ struct task_struct *cgroup_procs_write_start(char *buf, bool threadgroup,
>          * Therefore, we can skip the global lock.
>          */
>         lockdep_assert_held(&cgroup_mutex);
> -       if (pid || threadgroup) {
> -               percpu_down_write(&cgroup_threadgroup_rwsem);
> -               *locked = true;
> -       } else {
> -               *locked = false;
> -       }
> +       *threadgroup_locked = pid || threadgroup;
> +       cgroup_attach_lock(*threadgroup_locked);
>
>         rcu_read_lock();
>         if (pid) {
> @@ -2895,17 +2931,14 @@ struct task_struct *cgroup_procs_write_start(char *buf, bool threadgroup,
>         goto out_unlock_rcu;
>
>  out_unlock_threadgroup:
> -       if (*locked) {
> -               percpu_up_write(&cgroup_threadgroup_rwsem);
> -               *locked = false;
> -       }
> +       cgroup_attach_unlock(*threadgroup_locked);
> +       *threadgroup_locked = false;
>  out_unlock_rcu:
>         rcu_read_unlock();
>         return tsk;
>  }
>
> -void cgroup_procs_write_finish(struct task_struct *task, bool locked)
> -       __releases(&cgroup_threadgroup_rwsem)
> +void cgroup_procs_write_finish(struct task_struct *task, bool threadgroup_locked)
>  {
>         struct cgroup_subsys *ss;
>         int ssid;
> @@ -2913,8 +2946,8 @@ void cgroup_procs_write_finish(struct task_struct *task, bool locked)
>         /* release reference from cgroup_procs_write_start() */
>         put_task_struct(task);
>
> -       if (locked)
> -               percpu_up_write(&cgroup_threadgroup_rwsem);
> +       cgroup_attach_unlock(threadgroup_locked);
> +
>         for_each_subsys(ss, ssid)
>                 if (ss->post_attach)
>                         ss->post_attach();
> @@ -3000,8 +3033,7 @@ static int cgroup_update_dfl_csses(struct cgroup *cgrp)
>          * write-locking can be skipped safely.
>          */
>         has_tasks = !list_empty(&mgctx.preloaded_src_csets);
> -       if (has_tasks)
> -               percpu_down_write(&cgroup_threadgroup_rwsem);
> +       cgroup_attach_lock(has_tasks);
>
>         /* NULL dst indicates self on default hierarchy */
>         ret = cgroup_migrate_prepare_dst(&mgctx);
> @@ -3022,8 +3054,7 @@ static int cgroup_update_dfl_csses(struct cgroup *cgrp)
>         ret = cgroup_migrate_execute(&mgctx);
>  out_finish:
>         cgroup_migrate_finish(&mgctx);
> -       if (has_tasks)
> -               percpu_up_write(&cgroup_threadgroup_rwsem);
> +       cgroup_attach_unlock(has_tasks);

In kernel5.15, I just set cgroup_attach_lock/unlock(true).

>         return ret;
>  }
>
> @@ -4971,13 +5002,13 @@ static ssize_t __cgroup_procs_write(struct kernfs_open_file *of, char *buf,
>         struct task_struct *task;
>         const struct cred *saved_cred;
>         ssize_t ret;
> -       bool locked;
> +       bool threadgroup_locked;
>
>         dst_cgrp = cgroup_kn_lock_live(of->kn, false);
>         if (!dst_cgrp)
>                 return -ENODEV;
>
> -       task = cgroup_procs_write_start(buf, threadgroup, &locked);
> +       task = cgroup_procs_write_start(buf, threadgroup, &threadgroup_locked);
>         ret = PTR_ERR_OR_ZERO(task);
>         if (ret)
>                 goto out_unlock;
> @@ -5003,7 +5034,7 @@ static ssize_t __cgroup_procs_write(struct kernfs_open_file *of, char *buf,
>         ret = cgroup_attach_task(dst_cgrp, task, threadgroup);
>
>  out_finish:
> -       cgroup_procs_write_finish(task, locked);
> +       cgroup_procs_write_finish(task, threadgroup_locked);
>  out_unlock:
>         cgroup_kn_unlock(of->kn);
>
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index 58aadfda9b8b3..1f3a55297f39d 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -2289,7 +2289,7 @@ static void cpuset_attach(struct cgroup_taskset *tset)
>         cgroup_taskset_first(tset, &css);
>         cs = css_cs(css);
>
> -       cpus_read_lock();
> +       lockdep_assert_cpus_held();     /* see cgroup_attach_lock() */
>         percpu_down_write(&cpuset_rwsem);
>
>         guarantee_online_mems(cs, &cpuset_attach_nodemask_to);
> @@ -2343,7 +2343,6 @@ static void cpuset_attach(struct cgroup_taskset *tset)
>                 wake_up(&cpuset_attach_wq);
>
>         percpu_up_write(&cpuset_rwsem);
> -       cpus_read_unlock();
>  }
>
>  /* The various types of files and directories in a cpuset file system */

I backported your patch. to kernel5.4 and kernel5.15, and just setting
cgroup_attach_lock/unlock(true) when there are conflicts.
And the deadlock has not occured.

Reported-and-tested-by: Xuewen Yan <xuewen.yan-1tVvrHeaX6nQT0dZR+AlfA@public.gmane.org>

ThanksÔºÅ

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH cgroup/for-6.0-fixes] cgroup: Fix threadgroup_rwsem <-> cpus_read_lock() deadlock
@ 2022-08-17 17:40                           ` Tejun Heo
  0 siblings, 0 replies; 28+ messages in thread
From: Tejun Heo @ 2022-08-17 17:40 UTC (permalink / raw)
  To: Mukesh Ojha
  Cc: Michal Koutný,
	Xuewen Yan, Imran Khan, lizefan.x, hannes, tglx, steven.price,
	peterz, cgroups, linux-kernel, Zhao Gongyi, Zhang Qiao

On Mon, Aug 15, 2022 at 01:27:38PM -1000, Tejun Heo wrote:
> Bringing up a CPU may involve creating new tasks which requires read-locking
> threadgroup_rwsem, so threadgroup_rwsem nests inside cpus_read_lock().
> However, cpuset's ->attach(), which may be called with thredagroup_rwsem
> write-locked, also wants to disable CPU hotplug and acquires
> cpus_read_lock(), leading to a deadlock.
> 
> Fix it by guaranteeing that ->attach() is always called with CPU hotplug
> disabled and removing cpus_read_lock() call from cpuset_attach().
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>

Applied to cgroup/for-6.0-fixes w/ commit message and comment update
suggested by Xuewen and Fixes / stable tags added.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH cgroup/for-6.0-fixes] cgroup: Fix threadgroup_rwsem <-> cpus_read_lock() deadlock
@ 2022-08-17 17:40                           ` Tejun Heo
  0 siblings, 0 replies; 28+ messages in thread
From: Tejun Heo @ 2022-08-17 17:40 UTC (permalink / raw)
  To: Mukesh Ojha
  Cc: Michal Koutný,
	Xuewen Yan, Imran Khan, lizefan.x-EC8Uxl6Npydl57MIdRCFDg,
	hannes-druUgvl0LCNAfugRpC6u6w, tglx-hfZtesqFncYOwBW4kG4KsQ,
	steven.price-5wv7dgnIgG8, peterz-wEGCiKHe2LqWVfeAwA7xHQ,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Zhao Gongyi, Zhang Qiao

On Mon, Aug 15, 2022 at 01:27:38PM -1000, Tejun Heo wrote:
> Bringing up a CPU may involve creating new tasks which requires read-locking
> threadgroup_rwsem, so threadgroup_rwsem nests inside cpus_read_lock().
> However, cpuset's ->attach(), which may be called with thredagroup_rwsem
> write-locked, also wants to disable CPU hotplug and acquires
> cpus_read_lock(), leading to a deadlock.
> 
> Fix it by guaranteeing that ->attach() is always called with CPU hotplug
> disabled and removing cpus_read_lock() call from cpuset_attach().
> 
> Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

Applied to cgroup/for-6.0-fixes w/ commit message and comment update
suggested by Xuewen and Fixes / stable tags added.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2022-08-17 17:40 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <8245b710-8acb-d8e6-7045-99a5f71dad4e@oracle.com>
2022-07-20  2:38 ` Query regarding deadlock involving cgroup_threadgroup_rwsem and cpu_hotplug_lock Imran Khan
2022-07-20  3:27   ` Imran Khan
2022-07-20  3:27     ` Imran Khan
2022-07-20 11:06     ` Mukesh Ojha
2022-07-20 11:06       ` Mukesh Ojha
2022-07-20 12:01       ` Mukesh Ojha
2022-07-20 12:01         ` Mukesh Ojha
2022-07-20 18:05         ` Tejun Heo
2022-07-20 18:05           ` Tejun Heo
2022-07-27 19:33           ` Tejun Heo
2022-07-27 19:33             ` Tejun Heo
2022-08-12 10:27             ` Mukesh Ojha
2022-08-12 10:27               ` Mukesh Ojha
2022-08-15  9:05               ` Michal Koutný
2022-08-15  9:25                 ` Xuewen Yan
2022-08-15  9:25                   ` Xuewen Yan
2022-08-15  9:39                   ` Michal Koutný
2022-08-15  9:39                     ` Michal Koutný
2022-08-15 10:59                     ` Mukesh Ojha
2022-08-15 10:59                       ` Mukesh Ojha
2022-08-15 23:27                       ` [PATCH cgroup/for-6.0-fixes] cgroup: Fix threadgroup_rwsem <-> cpus_read_lock() deadlock Tejun Heo
2022-08-15 23:27                         ` Tejun Heo
2022-08-16 20:20                         ` Imran Khan
2022-08-16 20:20                           ` Imran Khan
2022-08-17  6:55                         ` Xuewen Yan
2022-08-17  6:55                           ` Xuewen Yan
2022-08-17 17:40                         ` Tejun Heo
2022-08-17 17:40                           ` Tejun Heo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.