* Query regarding deadlock involving cgroup_threadgroup_rwsem and cpu_hotplug_lock [not found] <8245b710-8acb-d8e6-7045-99a5f71dad4e@oracle.com> @ 2022-07-20 2:38 ` Imran Khan 2022-07-20 3:27 ` Imran Khan 0 siblings, 1 reply; 28+ messages in thread From: Imran Khan @ 2022-07-20 2:38 UTC (permalink / raw) To: tj@kernel.org >> Tejun Heo, lizefan.x, hannes@cmpxchg.org >> Johannes Weiner, tglx@linutronix.de >> Thomas Gleixner, steven.price, peterz@infradead.org >> peterz Cc: cgroups@vger.kernel.org >> cgroups, linux-kernel Hello everyone, I am seeing a deadlock between cgroup_threadgroup_rwsem and cpu_hotplug_lock in 5.4 kernel. Due to some missing drivers I don't have this test setup for latest upstream kernel but looking at the code the issue seems to be present in the latest kernel as well. If needed I can provide stack traces and other relevant info from the vmcore that I have got from 5.4 setup. The description of the problem is as follows (I am using 5.19-rc7 as reference below): __cgroup_procs_write acquires cgroup_threadgroup_rwsem via cgroup_procs_write_start and then invokes cgroup_attach_task. Now cgroup_attach_task can invoke following call chain: cgroup_attach_task --> cgroup_migrate --> cgroup_migrate_execute --> cpuset_attach Here cpuset_attach tries to take cpu_hotplug_lock. But by this time if some other context 1. is already in the middle of cpu hotplug and has acquired cpu_hotplug_lock in _cpu_up but 2. has not yet reached CPUHP_ONLINE state and 3. one of the intermediate hotplug states (in my case CPUHP_AP_ONLINE_DYN ) has a callback which involves creation of a thread (or invocation of copy_process via some other path) the invoked copy_process will get blocked on cgroup_threadgroup_rwsem in following call chain: copy_process --> cgroup_can_fork --> cgroup_css_set_fork --> cgroup_threadgroup_change_begin I am looking for suggestions to fix this deadlock. Or if I am missing something in the above analysis and the above mention scenario can't happen in latest upstream kernel, then please let me know as that would help me in back porting relevant changes to 5.4 kernel because the issue definitely exists in 5.4 kernel. Thanks, -- Imran ^ permalink raw reply [flat|nested] 28+ messages in thread
* Query regarding deadlock involving cgroup_threadgroup_rwsem and cpu_hotplug_lock @ 2022-07-20 3:27 ` Imran Khan 0 siblings, 0 replies; 28+ messages in thread From: Imran Khan @ 2022-07-20 3:27 UTC (permalink / raw) To: tj, lizefan.x, hannes, tglx, steven.price, peterz; +Cc: cgroups, linux-kernel Hello everyone, I am seeing a deadlock between cgroup_threadgroup_rwsem and cpu_hotplug_lock in 5.4 kernel. Due to some missing drivers I don't have this test setup for latest upstream kernel but looking at the code the issue seems to be present in the latest kernel as well. If needed I can provide stack traces and other relevant info from the vmcore that I have got from 5.4 setup. The description of the problem is as follows (I am using 5.19-rc7 as reference below): __cgroup_procs_write acquires cgroup_threadgroup_rwsem via cgroup_procs_write_start and then invokes cgroup_attach_task. Now cgroup_attach_task can invoke following call chain: cgroup_attach_task --> cgroup_migrate --> cgroup_migrate_execute --> cpuset_attach Here cpuset_attach tries to take cpu_hotplug_lock. But by this time if some other context 1. is already in the middle of cpu hotplug and has acquired cpu_hotplug_lock in _cpu_up but 2. has not yet reached CPUHP_ONLINE state and 3. one of the intermediate hotplug states (in my case CPUHP_AP_ONLINE_DYN ) has a callback which involves creation of a thread (or invocation of copy_process via some other path) the invoked copy_process will get blocked on cgroup_threadgroup_rwsem in following call chain: copy_process --> cgroup_can_fork --> cgroup_css_set_fork --> cgroup_threadgroup_change_begin I am looking for suggestions to fix this deadlock. Or if I am missing something in the above analysis and the above mention scenario can't happen in latest upstream kernel, then please let me know as that would help me in back porting relevant changes to 5.4 kernel because the issue definitely exists in 5.4 kernel. Thanks, -- Imran ^ permalink raw reply [flat|nested] 28+ messages in thread
* Query regarding deadlock involving cgroup_threadgroup_rwsem and cpu_hotplug_lock @ 2022-07-20 3:27 ` Imran Khan 0 siblings, 0 replies; 28+ messages in thread From: Imran Khan @ 2022-07-20 3:27 UTC (permalink / raw) To: tj-DgEjT+Ai2ygdnm+yROfE0A, lizefan.x-EC8Uxl6Npydl57MIdRCFDg, hannes-druUgvl0LCNAfugRpC6u6w, tglx-hfZtesqFncYOwBW4kG4KsQ, steven.price-5wv7dgnIgG8, peterz-wEGCiKHe2LqWVfeAwA7xHQ Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA Hello everyone, I am seeing a deadlock between cgroup_threadgroup_rwsem and cpu_hotplug_lock in 5.4 kernel. Due to some missing drivers I don't have this test setup for latest upstream kernel but looking at the code the issue seems to be present in the latest kernel as well. If needed I can provide stack traces and other relevant info from the vmcore that I have got from 5.4 setup. The description of the problem is as follows (I am using 5.19-rc7 as reference below): __cgroup_procs_write acquires cgroup_threadgroup_rwsem via cgroup_procs_write_start and then invokes cgroup_attach_task. Now cgroup_attach_task can invoke following call chain: cgroup_attach_task --> cgroup_migrate --> cgroup_migrate_execute --> cpuset_attach Here cpuset_attach tries to take cpu_hotplug_lock. But by this time if some other context 1. is already in the middle of cpu hotplug and has acquired cpu_hotplug_lock in _cpu_up but 2. has not yet reached CPUHP_ONLINE state and 3. one of the intermediate hotplug states (in my case CPUHP_AP_ONLINE_DYN ) has a callback which involves creation of a thread (or invocation of copy_process via some other path) the invoked copy_process will get blocked on cgroup_threadgroup_rwsem in following call chain: copy_process --> cgroup_can_fork --> cgroup_css_set_fork --> cgroup_threadgroup_change_begin I am looking for suggestions to fix this deadlock. Or if I am missing something in the above analysis and the above mention scenario can't happen in latest upstream kernel, then please let me know as that would help me in back porting relevant changes to 5.4 kernel because the issue definitely exists in 5.4 kernel. Thanks, -- Imran ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Query regarding deadlock involving cgroup_threadgroup_rwsem and cpu_hotplug_lock @ 2022-07-20 11:06 ` Mukesh Ojha 0 siblings, 0 replies; 28+ messages in thread From: Mukesh Ojha @ 2022-07-20 11:06 UTC (permalink / raw) To: Imran Khan, tj, lizefan.x, hannes, tglx, steven.price, peterz Cc: cgroups, linux-kernel Hi, On 7/20/2022 8:57 AM, Imran Khan wrote: > Hello everyone, > > I am seeing a deadlock between cgroup_threadgroup_rwsem and cpu_hotplug_lock in > 5.4 kernel. > > Due to some missing drivers I don't have this test setup for latest upstream > kernel but looking at the code the issue seems to be present in the latest > kernel as well. If needed I can provide stack traces and other relevant info > from the vmcore that I have got from 5.4 setup. > > The description of the problem is as follows (I am using 5.19-rc7 as reference > below): > > __cgroup_procs_write acquires cgroup_threadgroup_rwsem via > cgroup_procs_write_start and then invokes cgroup_attach_task. Now > cgroup_attach_task can invoke following call chain: > > cgroup_attach_task --> cgroup_migrate --> cgroup_migrate_execute --> cpuset_attach > > Here cpuset_attach tries to take cpu_hotplug_lock. > > But by this time if some other context > > 1. is already in the middle of cpu hotplug and has acquired cpu_hotplug_lock in > _cpu_up but > 2. has not yet reached CPUHP_ONLINE state and > 3. one of the intermediate hotplug states (in my case CPUHP_AP_ONLINE_DYN ) has > a callback which involves creation of a thread (or invocation of copy_process > via some other path) the invoked copy_process will get blocked on > cgroup_threadgroup_rwsem in following call chain: > > copy_process --> cgroup_can_fork --> cgroup_css_set_fork --> > cgroup_threadgroup_change_begin Similar discussion is at [1], not sure on the conclusion. [1] https://lore.kernel.org/lkml/20220705123705.764-1-xuewen.yan@unisoc.com/ -Mukesh > > > I am looking for suggestions to fix this deadlock. > > Or if I am missing something in the above analysis and the above mention > scenario can't happen in latest upstream kernel, then please let me know as that > would help me in back porting relevant changes to 5.4 kernel because the issue > definitely exists in 5.4 kernel. > > Thanks, > -- Imran ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Query regarding deadlock involving cgroup_threadgroup_rwsem and cpu_hotplug_lock @ 2022-07-20 11:06 ` Mukesh Ojha 0 siblings, 0 replies; 28+ messages in thread From: Mukesh Ojha @ 2022-07-20 11:06 UTC (permalink / raw) To: Imran Khan, tj-DgEjT+Ai2ygdnm+yROfE0A, lizefan.x-EC8Uxl6Npydl57MIdRCFDg, hannes-druUgvl0LCNAfugRpC6u6w, tglx-hfZtesqFncYOwBW4kG4KsQ, steven.price-5wv7dgnIgG8, peterz-wEGCiKHe2LqWVfeAwA7xHQ Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA Hi, On 7/20/2022 8:57 AM, Imran Khan wrote: > Hello everyone, > > I am seeing a deadlock between cgroup_threadgroup_rwsem and cpu_hotplug_lock in > 5.4 kernel. > > Due to some missing drivers I don't have this test setup for latest upstream > kernel but looking at the code the issue seems to be present in the latest > kernel as well. If needed I can provide stack traces and other relevant info > from the vmcore that I have got from 5.4 setup. > > The description of the problem is as follows (I am using 5.19-rc7 as reference > below): > > __cgroup_procs_write acquires cgroup_threadgroup_rwsem via > cgroup_procs_write_start and then invokes cgroup_attach_task. Now > cgroup_attach_task can invoke following call chain: > > cgroup_attach_task --> cgroup_migrate --> cgroup_migrate_execute --> cpuset_attach > > Here cpuset_attach tries to take cpu_hotplug_lock. > > But by this time if some other context > > 1. is already in the middle of cpu hotplug and has acquired cpu_hotplug_lock in > _cpu_up but > 2. has not yet reached CPUHP_ONLINE state and > 3. one of the intermediate hotplug states (in my case CPUHP_AP_ONLINE_DYN ) has > a callback which involves creation of a thread (or invocation of copy_process > via some other path) the invoked copy_process will get blocked on > cgroup_threadgroup_rwsem in following call chain: > > copy_process --> cgroup_can_fork --> cgroup_css_set_fork --> > cgroup_threadgroup_change_begin Similar discussion is at [1], not sure on the conclusion. [1] https://lore.kernel.org/lkml/20220705123705.764-1-xuewen.yan-1tVvrHeaX6nQT0dZR+AlfA@public.gmane.org/ -Mukesh > > > I am looking for suggestions to fix this deadlock. > > Or if I am missing something in the above analysis and the above mention > scenario can't happen in latest upstream kernel, then please let me know as that > would help me in back porting relevant changes to 5.4 kernel because the issue > definitely exists in 5.4 kernel. > > Thanks, > -- Imran ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Query regarding deadlock involving cgroup_threadgroup_rwsem and cpu_hotplug_lock @ 2022-07-20 12:01 ` Mukesh Ojha 0 siblings, 0 replies; 28+ messages in thread From: Mukesh Ojha @ 2022-07-20 12:01 UTC (permalink / raw) To: Imran Khan, tj, lizefan.x, hannes, tglx, steven.price, peterz Cc: cgroups, linux-kernel Looks like these patches are the fixes. https://lore.kernel.org/all/YtDvN0wJ6CKaEPN8@slm.duckdns.org/#r Would let Tejun confirm this . -Mukesh On 7/20/2022 4:36 PM, Mukesh Ojha wrote: > Hi, > > On 7/20/2022 8:57 AM, Imran Khan wrote: >> Hello everyone, >> >> I am seeing a deadlock between cgroup_threadgroup_rwsem and >> cpu_hotplug_lock in >> 5.4 kernel. >> >> Due to some missing drivers I don't have this test setup for latest >> upstream >> kernel but looking at the code the issue seems to be present in the >> latest >> kernel as well. If needed I can provide stack traces and other >> relevant info >> from the vmcore that I have got from 5.4 setup. >> >> The description of the problem is as follows (I am using 5.19-rc7 as >> reference >> below): >> >> __cgroup_procs_write acquires cgroup_threadgroup_rwsem via >> cgroup_procs_write_start and then invokes cgroup_attach_task. Now >> cgroup_attach_task can invoke following call chain: >> >> cgroup_attach_task --> cgroup_migrate --> cgroup_migrate_execute --> >> cpuset_attach >> >> Here cpuset_attach tries to take cpu_hotplug_lock. >> >> But by this time if some other context >> >> 1. is already in the middle of cpu hotplug and has acquired >> cpu_hotplug_lock in >> _cpu_up but >> 2. has not yet reached CPUHP_ONLINE state and >> 3. one of the intermediate hotplug states (in my case >> CPUHP_AP_ONLINE_DYN ) has >> a callback which involves creation of a thread (or invocation of >> copy_process >> via some other path) the invoked copy_process will get blocked on >> cgroup_threadgroup_rwsem in following call chain: >> >> copy_process --> cgroup_can_fork --> cgroup_css_set_fork --> >> cgroup_threadgroup_change_begin > > Similar discussion is at [1], not sure on the conclusion. > > [1] > https://lore.kernel.org/lkml/20220705123705.764-1-xuewen.yan@unisoc.com/ > > -Mukesh > >> >> >> I am looking for suggestions to fix this deadlock. >> >> Or if I am missing something in the above analysis and the above mention >> scenario can't happen in latest upstream kernel, then please let me >> know as that >> would help me in back porting relevant changes to 5.4 kernel because >> the issue >> definitely exists in 5.4 kernel. >> >> Thanks, >> -- Imran ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Query regarding deadlock involving cgroup_threadgroup_rwsem and cpu_hotplug_lock @ 2022-07-20 12:01 ` Mukesh Ojha 0 siblings, 0 replies; 28+ messages in thread From: Mukesh Ojha @ 2022-07-20 12:01 UTC (permalink / raw) To: Imran Khan, tj-DgEjT+Ai2ygdnm+yROfE0A, lizefan.x-EC8Uxl6Npydl57MIdRCFDg, hannes-druUgvl0LCNAfugRpC6u6w, tglx-hfZtesqFncYOwBW4kG4KsQ, steven.price-5wv7dgnIgG8, peterz-wEGCiKHe2LqWVfeAwA7xHQ Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA Looks like these patches are the fixes. https://lore.kernel.org/all/YtDvN0wJ6CKaEPN8-NiLfg/pYEd1N0TnZuCh8vA@public.gmane.org/#r Would let Tejun confirm this . -Mukesh On 7/20/2022 4:36 PM, Mukesh Ojha wrote: > Hi, > > On 7/20/2022 8:57 AM, Imran Khan wrote: >> Hello everyone, >> >> I am seeing a deadlock between cgroup_threadgroup_rwsem and >> cpu_hotplug_lock in >> 5.4 kernel. >> >> Due to some missing drivers I don't have this test setup for latest >> upstream >> kernel but looking at the code the issue seems to be present in the >> latest >> kernel as well. If needed I can provide stack traces and other >> relevant info >> from the vmcore that I have got from 5.4 setup. >> >> The description of the problem is as follows (I am using 5.19-rc7 as >> reference >> below): >> >> __cgroup_procs_write acquires cgroup_threadgroup_rwsem via >> cgroup_procs_write_start and then invokes cgroup_attach_task. Now >> cgroup_attach_task can invoke following call chain: >> >> cgroup_attach_task --> cgroup_migrate --> cgroup_migrate_execute --> >> cpuset_attach >> >> Here cpuset_attach tries to take cpu_hotplug_lock. >> >> But by this time if some other context >> >> 1. is already in the middle of cpu hotplug and has acquired >> cpu_hotplug_lock in >> _cpu_up but >> 2. has not yet reached CPUHP_ONLINE state and >> 3. one of the intermediate hotplug states (in my case >> CPUHP_AP_ONLINE_DYN ) has >> a callback which involves creation of a thread (or invocation of >> copy_process >> via some other path) the invoked copy_process will get blocked on >> cgroup_threadgroup_rwsem in following call chain: >> >> copy_process --> cgroup_can_fork --> cgroup_css_set_fork --> >> cgroup_threadgroup_change_begin > > Similar discussion is at [1], not sure on the conclusion. > > [1] > https://lore.kernel.org/lkml/20220705123705.764-1-xuewen.yan-1tVvrHeaX6nQT0dZR+AlfA@public.gmane.org/ > > -Mukesh > >> >> >> I am looking for suggestions to fix this deadlock. >> >> Or if I am missing something in the above analysis and the above mention >> scenario can't happen in latest upstream kernel, then please let me >> know as that >> would help me in back porting relevant changes to 5.4 kernel because >> the issue >> definitely exists in 5.4 kernel. >> >> Thanks, >> -- Imran ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Query regarding deadlock involving cgroup_threadgroup_rwsem and cpu_hotplug_lock @ 2022-07-20 18:05 ` Tejun Heo 0 siblings, 0 replies; 28+ messages in thread From: Tejun Heo @ 2022-07-20 18:05 UTC (permalink / raw) To: Mukesh Ojha Cc: Imran Khan, lizefan.x, hannes, tglx, steven.price, peterz, cgroups, linux-kernel On Wed, Jul 20, 2022 at 05:31:51PM +0530, Mukesh Ojha wrote: > Looks like these patches are the fixes. > > https://lore.kernel.org/all/YtDvN0wJ6CKaEPN8@slm.duckdns.org/#r > > Would let Tejun confirm this . Yeah, looks like the same issue. I'll write up a patch later this week / early next unless someone beats me to it. Thanks. -- tejun ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Query regarding deadlock involving cgroup_threadgroup_rwsem and cpu_hotplug_lock @ 2022-07-20 18:05 ` Tejun Heo 0 siblings, 0 replies; 28+ messages in thread From: Tejun Heo @ 2022-07-20 18:05 UTC (permalink / raw) To: Mukesh Ojha Cc: Imran Khan, lizefan.x-EC8Uxl6Npydl57MIdRCFDg, hannes-druUgvl0LCNAfugRpC6u6w, tglx-hfZtesqFncYOwBW4kG4KsQ, steven.price-5wv7dgnIgG8, peterz-wEGCiKHe2LqWVfeAwA7xHQ, cgroups-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA On Wed, Jul 20, 2022 at 05:31:51PM +0530, Mukesh Ojha wrote: > Looks like these patches are the fixes. > > https://lore.kernel.org/all/YtDvN0wJ6CKaEPN8-NiLfg/pYEd1N0TnZuCh8vA@public.gmane.org/#r > > Would let Tejun confirm this . Yeah, looks like the same issue. I'll write up a patch later this week / early next unless someone beats me to it. Thanks. -- tejun ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Query regarding deadlock involving cgroup_threadgroup_rwsem and cpu_hotplug_lock @ 2022-07-27 19:33 ` Tejun Heo 0 siblings, 0 replies; 28+ messages in thread From: Tejun Heo @ 2022-07-27 19:33 UTC (permalink / raw) Cc: Imran Khan, lizefan.x, hannes, tglx, steven.price, peterz, cgroups, linux-kernel On Wed, Jul 20, 2022 at 08:05:03AM -1000, Tejun Heo wrote: > On Wed, Jul 20, 2022 at 05:31:51PM +0530, Mukesh Ojha wrote: > > Looks like these patches are the fixes. > > > > https://lore.kernel.org/all/YtDvN0wJ6CKaEPN8@slm.duckdns.org/#r > > > > Would let Tejun confirm this . > > Yeah, looks like the same issue. I'll write up a patch later this week / > early next unless someone beats me to it. https://lore.kernel.org/lkml/20220705123705.764-1-xuewen.yan@unisoc.com/ is the thread with the same issue. Let's follow up there. Thanks. -- tejun ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Query regarding deadlock involving cgroup_threadgroup_rwsem and cpu_hotplug_lock @ 2022-07-27 19:33 ` Tejun Heo 0 siblings, 0 replies; 28+ messages in thread From: Tejun Heo @ 2022-07-27 19:33 UTC (permalink / raw) Cc: Imran Khan, lizefan.x-EC8Uxl6Npydl57MIdRCFDg, hannes-druUgvl0LCNAfugRpC6u6w, tglx-hfZtesqFncYOwBW4kG4KsQ, steven.price-5wv7dgnIgG8, peterz-wEGCiKHe2LqWVfeAwA7xHQ, cgroups-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA On Wed, Jul 20, 2022 at 08:05:03AM -1000, Tejun Heo wrote: > On Wed, Jul 20, 2022 at 05:31:51PM +0530, Mukesh Ojha wrote: > > Looks like these patches are the fixes. > > > > https://lore.kernel.org/all/YtDvN0wJ6CKaEPN8-NiLfg/pYEd1N0TnZuCh8vA@public.gmane.org/#r > > > > Would let Tejun confirm this . > > Yeah, looks like the same issue. I'll write up a patch later this week / > early next unless someone beats me to it. https://lore.kernel.org/lkml/20220705123705.764-1-xuewen.yan-1tVvrHeaX6nQT0dZR+AlfA@public.gmane.org/ is the thread with the same issue. Let's follow up there. Thanks. -- tejun ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Query regarding deadlock involving cgroup_threadgroup_rwsem and cpu_hotplug_lock @ 2022-08-12 10:27 ` Mukesh Ojha 0 siblings, 0 replies; 28+ messages in thread From: Mukesh Ojha @ 2022-08-12 10:27 UTC (permalink / raw) To: Tejun Heo Cc: Imran Khan, lizefan.x, hannes, tglx, steven.price, peterz, cgroups, linux-kernel Hi Tejun, On 7/28/2022 1:03 AM, Tejun Heo wrote: > On Wed, Jul 20, 2022 at 08:05:03AM -1000, Tejun Heo wrote: >> On Wed, Jul 20, 2022 at 05:31:51PM +0530, Mukesh Ojha wrote: >>> Looks like these patches are the fixes. >>> >>> https://lore.kernel.org/all/YtDvN0wJ6CKaEPN8@slm.duckdns.org/#r >>> >>> Would let Tejun confirm this . >> >> Yeah, looks like the same issue. I'll write up a patch later this week / >> early next unless someone beats me to it. > > https://lore.kernel.org/lkml/20220705123705.764-1-xuewen.yan@unisoc.com/ is > the thread with the same issue. Let's follow up there. Since, i am not part of the above thread, is the reason i am commenting here. The original patch of yours [1] and the revert of [2] is fixing the issue and it is also confirmed here [3]. Can we get proper fix merge on your tree? [1] https://lore.kernel.org/lkml/YuGbYCfAG81mZBnN@slm.duckdns.org/ [2] https://lore.kernel.org/all/20220121101210.84926-1-zhangqiao22@huawei.com/ [3] https://lore.kernel.org/lkml/CAB8ipk-72V-bYRfL-VcSRSyXTeQqkBVj+1d5MHSVV5CTar9a0Q@mail.gmail.com/ -Mukesh > > Thanks. > ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Query regarding deadlock involving cgroup_threadgroup_rwsem and cpu_hotplug_lock @ 2022-08-12 10:27 ` Mukesh Ojha 0 siblings, 0 replies; 28+ messages in thread From: Mukesh Ojha @ 2022-08-12 10:27 UTC (permalink / raw) To: Tejun Heo Cc: Imran Khan, lizefan.x-EC8Uxl6Npydl57MIdRCFDg, hannes-druUgvl0LCNAfugRpC6u6w, tglx-hfZtesqFncYOwBW4kG4KsQ, steven.price-5wv7dgnIgG8, peterz-wEGCiKHe2LqWVfeAwA7xHQ, cgroups-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA Hi Tejun, On 7/28/2022 1:03 AM, Tejun Heo wrote: > On Wed, Jul 20, 2022 at 08:05:03AM -1000, Tejun Heo wrote: >> On Wed, Jul 20, 2022 at 05:31:51PM +0530, Mukesh Ojha wrote: >>> Looks like these patches are the fixes. >>> >>> https://lore.kernel.org/all/YtDvN0wJ6CKaEPN8-NiLfg/pYEd1N0TnZuCh8vA@public.gmane.org/#r >>> >>> Would let Tejun confirm this . >> >> Yeah, looks like the same issue. I'll write up a patch later this week / >> early next unless someone beats me to it. > > https://lore.kernel.org/lkml/20220705123705.764-1-xuewen.yan-1tVvrHeaX6nQT0dZR+AlfA@public.gmane.org/ is > the thread with the same issue. Let's follow up there. Since, i am not part of the above thread, is the reason i am commenting here. The original patch of yours [1] and the revert of [2] is fixing the issue and it is also confirmed here [3]. Can we get proper fix merge on your tree? [1] https://lore.kernel.org/lkml/YuGbYCfAG81mZBnN-NiLfg/pYEd1N0TnZuCh8vA@public.gmane.org/ [2] https://lore.kernel.org/all/20220121101210.84926-1-zhangqiao22-hv44wF8Li93QT0dZR+AlfA@public.gmane.org/ [3] https://lore.kernel.org/lkml/CAB8ipk-72V-bYRfL-VcSRSyXTeQqkBVj+1d5MHSVV5CTar9a0Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org/ -Mukesh > > Thanks. > ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Query regarding deadlock involving cgroup_threadgroup_rwsem and cpu_hotplug_lock 2022-08-12 10:27 ` Mukesh Ojha (?) @ 2022-08-15 9:05 ` Michal Koutný 2022-08-15 9:25 ` Xuewen Yan -1 siblings, 1 reply; 28+ messages in thread From: Michal Koutný @ 2022-08-15 9:05 UTC (permalink / raw) To: Mukesh Ojha Cc: Tejun Heo, Imran Khan, lizefan.x, hannes, tglx, steven.price, peterz, cgroups, linux-kernel, Zhao Gongyi, Zhang Qiao +Cc: Zhao Gongyi <zhaogongyi@huawei.com>, Zhang Qiao <zhangqiao22@huawei.com> On Fri, Aug 12, 2022 at 03:57:00PM +0530, Mukesh Ojha <quic_mojha@quicinc.com> wrote: > The original patch of yours [1] and the revert of [2] is fixing the issue > and it is also confirmed here [3]. > Can we get proper fix merge on your tree? > > [1] https://lore.kernel.org/lkml/YuGbYCfAG81mZBnN@slm.duckdns.org/ > > [2] > https://lore.kernel.org/all/20220121101210.84926-1-zhangqiao22@huawei.com/ The revert + Tejun's patch looks fine wrt the problem of the reverted patch (just moves cpus_read_lock to upper callers). I'd just suggest a comment that'd explicitly document also the lock order that we stick to, IIUC, it should be: cpu_hotplug_lock // cpus_read_lock cgroup_threadgroup_rwsem cpuset_rwsem Michal > > [3] https://lore.kernel.org/lkml/CAB8ipk-72V-bYRfL-VcSRSyXTeQqkBVj+1d5MHSVV5CTar9a0Q@mail.gmail.com/ > > -Mukesh ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Query regarding deadlock involving cgroup_threadgroup_rwsem and cpu_hotplug_lock @ 2022-08-15 9:25 ` Xuewen Yan 0 siblings, 0 replies; 28+ messages in thread From: Xuewen Yan @ 2022-08-15 9:25 UTC (permalink / raw) To: Michal Koutný Cc: Mukesh Ojha, Tejun Heo, Imran Khan, lizefan.x, hannes, tglx, steven.price, peterz, cgroups, linux-kernel, Zhao Gongyi, Zhang Qiao Hi Michal On Mon, Aug 15, 2022 at 5:06 PM Michal Koutný <mkoutny@suse.com> wrote: > > +Cc: Zhao Gongyi <zhaogongyi@huawei.com>, Zhang Qiao <zhangqiao22@huawei.com> > > On Fri, Aug 12, 2022 at 03:57:00PM +0530, Mukesh Ojha <quic_mojha@quicinc.com> wrote: > > The original patch of yours [1] and the revert of [2] is fixing the issue > > and it is also confirmed here [3]. > > Can we get proper fix merge on your tree? > > > > [1] https://lore.kernel.org/lkml/YuGbYCfAG81mZBnN@slm.duckdns.org/ > > > > [2] > > https://lore.kernel.org/all/20220121101210.84926-1-zhangqiao22@huawei.com/ > > The revert + Tejun's patch looks fine wrt the problem of the reverted > patch (just moves cpus_read_lock to upper callers). Your means is that the problem should be fixed by [1]+[2]'s revert ? I just tested the case which reverted the [2]. Need I test with [1] and [2]? Thanks! > > I'd just suggest a comment that'd explicitly document also the lock > order that we stick to, IIUC, it should be: > > cpu_hotplug_lock // cpus_read_lock > cgroup_threadgroup_rwsem > cpuset_rwsem > > Michal > > > > > [3] https://lore.kernel.org/lkml/CAB8ipk-72V-bYRfL-VcSRSyXTeQqkBVj+1d5MHSVV5CTar9a0Q@mail.gmail.com/ > > > > -Mukesh ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Query regarding deadlock involving cgroup_threadgroup_rwsem and cpu_hotplug_lock @ 2022-08-15 9:25 ` Xuewen Yan 0 siblings, 0 replies; 28+ messages in thread From: Xuewen Yan @ 2022-08-15 9:25 UTC (permalink / raw) To: Michal Koutný Cc: Mukesh Ojha, Tejun Heo, Imran Khan, lizefan.x-EC8Uxl6Npydl57MIdRCFDg, hannes-druUgvl0LCNAfugRpC6u6w, tglx-hfZtesqFncYOwBW4kG4KsQ, steven.price-5wv7dgnIgG8, peterz-wEGCiKHe2LqWVfeAwA7xHQ, cgroups-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Zhao Gongyi, Zhang Qiao Hi Michal On Mon, Aug 15, 2022 at 5:06 PM Michal Koutn√Ω <mkoutny-IBi9RG/b67k@public.gmane.org> wrote: > > +Cc: Zhao Gongyi <zhaogongyi-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>, Zhang Qiao <zhangqiao22@huawei.com> > > On Fri, Aug 12, 2022 at 03:57:00PM +0530, Mukesh Ojha <quic_mojha@quicinc.com> wrote: > > The original patch of yours [1] and the revert of [2] is fixing the issue > > and it is also confirmed here [3]. > > Can we get proper fix merge on your tree? > > > > [1] https://lore.kernel.org/lkml/YuGbYCfAG81mZBnN-NiLfg/pYEd1N0TnZuCh8vA@public.gmane.org/ > > > > [2] > > https://lore.kernel.org/all/20220121101210.84926-1-zhangqiao22@huawei.com/ > > The revert + Tejun's patch looks fine wrt the problem of the reverted > patch (just moves cpus_read_lock to upper callers). Your means is that the problem should be fixed by [1]+[2]'s revert ? I just tested the case which reverted the [2]. Need I test with [1] and [2]? Thanks! > > I'd just suggest a comment that'd explicitly document also the lock > order that we stick to, IIUC, it should be: > > cpu_hotplug_lock // cpus_read_lock > cgroup_threadgroup_rwsem > cpuset_rwsem > > Michal > > > > > [3] https://lore.kernel.org/lkml/CAB8ipk-72V-bYRfL-VcSRSyXTeQqkBVj+1d5MHSVV5CTar9a0Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org/ > > > > -Mukesh ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Query regarding deadlock involving cgroup_threadgroup_rwsem and cpu_hotplug_lock @ 2022-08-15 9:39 ` Michal Koutný 0 siblings, 0 replies; 28+ messages in thread From: Michal Koutný @ 2022-08-15 9:39 UTC (permalink / raw) To: Xuewen Yan Cc: Mukesh Ojha, Tejun Heo, Imran Khan, lizefan.x, hannes, tglx, steven.price, peterz, cgroups, linux-kernel, Zhao Gongyi, Zhang Qiao On Mon, Aug 15, 2022 at 05:25:52PM +0800, Xuewen Yan <xuewen.yan94@gmail.com> wrote: > Your means is that the problem should be fixed by [1]+[2]'s revert ? I understood that was already the combination you had tested. You write in [T] that [1] alone causes (another) deadlock and therefore the revert of [2] was suggested. > I just tested the case which reverted the [2]. Need I test with [1] and [2]? It'd be better (unless you haven't already :-), my reasoning is for the [1]+[2] combo. Thanks, Michal [T] https://lore.kernel.org/r/CAB8ipk_gCLtvEahsp2DvPJf4NxRsM8WCYmmH=yTd7zQE+81_Yg@mail.gmail.com/ ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Query regarding deadlock involving cgroup_threadgroup_rwsem and cpu_hotplug_lock @ 2022-08-15 9:39 ` Michal Koutný 0 siblings, 0 replies; 28+ messages in thread From: Michal Koutný @ 2022-08-15 9:39 UTC (permalink / raw) To: Xuewen Yan Cc: Mukesh Ojha, Tejun Heo, Imran Khan, lizefan.x-EC8Uxl6Npydl57MIdRCFDg, hannes-druUgvl0LCNAfugRpC6u6w, tglx-hfZtesqFncYOwBW4kG4KsQ, steven.price-5wv7dgnIgG8, peterz-wEGCiKHe2LqWVfeAwA7xHQ, cgroups-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Zhao Gongyi, Zhang Qiao On Mon, Aug 15, 2022 at 05:25:52PM +0800, Xuewen Yan <xuewen.yan94-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > Your means is that the problem should be fixed by [1]+[2]'s revert ? I understood that was already the combination you had tested. You write in [T] that [1] alone causes (another) deadlock and therefore the revert of [2] was suggested. > I just tested the case which reverted the [2]. Need I test with [1] and [2]? It'd be better (unless you haven't already :-), my reasoning is for the [1]+[2] combo. Thanks, Michal [T] https://lore.kernel.org/r/CAB8ipk_gCLtvEahsp2DvPJf4NxRsM8WCYmmH=yTd7zQE+81_Yg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org/ ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Query regarding deadlock involving cgroup_threadgroup_rwsem and cpu_hotplug_lock @ 2022-08-15 10:59 ` Mukesh Ojha 0 siblings, 0 replies; 28+ messages in thread From: Mukesh Ojha @ 2022-08-15 10:59 UTC (permalink / raw) To: Michal Koutný, Xuewen Yan Cc: Tejun Heo, Imran Khan, lizefan.x, hannes, tglx, steven.price, peterz, cgroups, linux-kernel, Zhao Gongyi, Zhang Qiao On 8/15/2022 3:09 PM, Michal Koutný wrote: > On Mon, Aug 15, 2022 at 05:25:52PM +0800, Xuewen Yan <xuewen.yan94@gmail.com> wrote: >> Your means is that the problem should be fixed by [1]+[2]'s revert ? > > I understood that was already the combination you had tested. > You write in [T] that [1] alone causes (another) deadlock and therefore > the revert of [2] was suggested. > >> I just tested the case which reverted the [2]. Need I test with [1] and [2]? > > It'd be better (unless you haven't already :-), my reasoning is for the > [1]+[2] combo. Feel free to add my Reported-and-tested-by: Mukesh Ojha <quic_mojha@quicinc.com> -Mukesh > > Thanks, > Michal > > [T] https://lore.kernel.org/r/CAB8ipk_gCLtvEahsp2DvPJf4NxRsM8WCYmmH=yTd7zQE+81_Yg@mail.gmail.com/ ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Query regarding deadlock involving cgroup_threadgroup_rwsem and cpu_hotplug_lock @ 2022-08-15 10:59 ` Mukesh Ojha 0 siblings, 0 replies; 28+ messages in thread From: Mukesh Ojha @ 2022-08-15 10:59 UTC (permalink / raw) To: Michal Koutný, Xuewen Yan Cc: Tejun Heo, Imran Khan, lizefan.x-EC8Uxl6Npydl57MIdRCFDg, hannes-druUgvl0LCNAfugRpC6u6w, tglx-hfZtesqFncYOwBW4kG4KsQ, steven.price-5wv7dgnIgG8, peterz-wEGCiKHe2LqWVfeAwA7xHQ, cgroups-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Zhao Gongyi, Zhang Qiao On 8/15/2022 3:09 PM, Michal Koutný wrote: > On Mon, Aug 15, 2022 at 05:25:52PM +0800, Xuewen Yan <xuewen.yan94-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: >> Your means is that the problem should be fixed by [1]+[2]'s revert ? > > I understood that was already the combination you had tested. > You write in [T] that [1] alone causes (another) deadlock and therefore > the revert of [2] was suggested. > >> I just tested the case which reverted the [2]. Need I test with [1] and [2]? > > It'd be better (unless you haven't already :-), my reasoning is for the > [1]+[2] combo. Feel free to add my Reported-and-tested-by: Mukesh Ojha <quic_mojha-jfJNa2p1gH1BDgjK7y7TUQ@public.gmane.org> -Mukesh > > Thanks, > Michal > > [T] https://lore.kernel.org/r/CAB8ipk_gCLtvEahsp2DvPJf4NxRsM8WCYmmH=yTd7zQE+81_Yg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org/ ^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH cgroup/for-6.0-fixes] cgroup: Fix threadgroup_rwsem <-> cpus_read_lock() deadlock @ 2022-08-15 23:27 ` Tejun Heo 0 siblings, 0 replies; 28+ messages in thread From: Tejun Heo @ 2022-08-15 23:27 UTC (permalink / raw) To: Mukesh Ojha Cc: Michal Koutný, Xuewen Yan, Imran Khan, lizefan.x, hannes, tglx, steven.price, peterz, cgroups, linux-kernel, Zhao Gongyi, Zhang Qiao Bringing up a CPU may involve creating new tasks which requires read-locking threadgroup_rwsem, so threadgroup_rwsem nests inside cpus_read_lock(). However, cpuset's ->attach(), which may be called with thredagroup_rwsem write-locked, also wants to disable CPU hotplug and acquires cpus_read_lock(), leading to a deadlock. Fix it by guaranteeing that ->attach() is always called with CPU hotplug disabled and removing cpus_read_lock() call from cpuset_attach(). Signed-off-by: Tejun Heo <tj@kernel.org> --- Hello, sorry about the delay. So, the previous patch + the revert isn't quite correct because we sometimes elide both cpus_read_lock() and threadgroup_rwsem together and cpuset_attach() woudl end up running without CPU hotplug enabled. Can you please test whether this patch fixes the problem? Thanks. kernel/cgroup/cgroup.c | 77 ++++++++++++++++++++++++++++++++++--------------- kernel/cgroup/cpuset.c | 3 - 2 files changed, 55 insertions(+), 25 deletions(-) diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index ffaccd6373f1e..52502f34fae8c 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -2369,6 +2369,47 @@ int task_cgroup_path(struct task_struct *task, char *buf, size_t buflen) } EXPORT_SYMBOL_GPL(task_cgroup_path); +/** + * cgroup_attach_lock - Lock for ->attach() + * @lock_threadgroup: whether to down_write cgroup_threadgroup_rwsem + * + * cgroup migration sometimes needs to stabilize threadgroups against forks and + * exits by write-locking cgroup_threadgroup_rwsem. However, some ->attach() + * implementations (e.g. cpuset), also need to disable CPU hotplug. + * Unfortunately, letting ->attach() operations acquire cpus_read_lock() can + * lead to deadlocks. + * + * Bringing up a CPU may involve creating new tasks which requires read-locking + * threadgroup_rwsem, so threadgroup_rwsem nests inside cpus_read_lock(). If we + * call an ->attach() which acquires the cpus lock while write-locking + * threadgroup_rwsem, the locking order is reversed and we end up waiting for an + * on-going CPU hotplug operation which in turn is waiting for the + * threadgroup_rwsem to be released to create new tasks. For more details: + * + * http://lkml.kernel.org/r/20220711174629.uehfmqegcwn2lqzu@wubuntu + * + * Resolve the situation by always acquiring cpus_read_lock() before optionally + * write-locking cgroup_threadgroup_rwsem. This allows ->attach() to assume that + * CPU hotplug is disabled on entry. + */ +static void cgroup_attach_lock(bool lock_threadgroup) +{ + cpus_read_lock(); + if (lock_threadgroup) + percpu_down_write(&cgroup_threadgroup_rwsem); +} + +/** + * cgroup_attach_unlock - Undo cgroup_attach_lock() + * @lock_threadgroup: whether to up_write cgroup_threadgroup_rwsem + */ +static void cgroup_attach_unlock(bool lock_threadgroup) +{ + if (lock_threadgroup) + percpu_up_write(&cgroup_threadgroup_rwsem); + cpus_read_unlock(); +} + /** * cgroup_migrate_add_task - add a migration target task to a migration context * @task: target task @@ -2841,8 +2882,7 @@ int cgroup_attach_task(struct cgroup *dst_cgrp, struct task_struct *leader, } struct task_struct *cgroup_procs_write_start(char *buf, bool threadgroup, - bool *locked) - __acquires(&cgroup_threadgroup_rwsem) + bool *threadgroup_locked) { struct task_struct *tsk; pid_t pid; @@ -2859,12 +2899,8 @@ struct task_struct *cgroup_procs_write_start(char *buf, bool threadgroup, * Therefore, we can skip the global lock. */ lockdep_assert_held(&cgroup_mutex); - if (pid || threadgroup) { - percpu_down_write(&cgroup_threadgroup_rwsem); - *locked = true; - } else { - *locked = false; - } + *threadgroup_locked = pid || threadgroup; + cgroup_attach_lock(*threadgroup_locked); rcu_read_lock(); if (pid) { @@ -2895,17 +2931,14 @@ struct task_struct *cgroup_procs_write_start(char *buf, bool threadgroup, goto out_unlock_rcu; out_unlock_threadgroup: - if (*locked) { - percpu_up_write(&cgroup_threadgroup_rwsem); - *locked = false; - } + cgroup_attach_unlock(*threadgroup_locked); + *threadgroup_locked = false; out_unlock_rcu: rcu_read_unlock(); return tsk; } -void cgroup_procs_write_finish(struct task_struct *task, bool locked) - __releases(&cgroup_threadgroup_rwsem) +void cgroup_procs_write_finish(struct task_struct *task, bool threadgroup_locked) { struct cgroup_subsys *ss; int ssid; @@ -2913,8 +2946,8 @@ void cgroup_procs_write_finish(struct task_struct *task, bool locked) /* release reference from cgroup_procs_write_start() */ put_task_struct(task); - if (locked) - percpu_up_write(&cgroup_threadgroup_rwsem); + cgroup_attach_unlock(threadgroup_locked); + for_each_subsys(ss, ssid) if (ss->post_attach) ss->post_attach(); @@ -3000,8 +3033,7 @@ static int cgroup_update_dfl_csses(struct cgroup *cgrp) * write-locking can be skipped safely. */ has_tasks = !list_empty(&mgctx.preloaded_src_csets); - if (has_tasks) - percpu_down_write(&cgroup_threadgroup_rwsem); + cgroup_attach_lock(has_tasks); /* NULL dst indicates self on default hierarchy */ ret = cgroup_migrate_prepare_dst(&mgctx); @@ -3022,8 +3054,7 @@ static int cgroup_update_dfl_csses(struct cgroup *cgrp) ret = cgroup_migrate_execute(&mgctx); out_finish: cgroup_migrate_finish(&mgctx); - if (has_tasks) - percpu_up_write(&cgroup_threadgroup_rwsem); + cgroup_attach_unlock(has_tasks); return ret; } @@ -4971,13 +5002,13 @@ static ssize_t __cgroup_procs_write(struct kernfs_open_file *of, char *buf, struct task_struct *task; const struct cred *saved_cred; ssize_t ret; - bool locked; + bool threadgroup_locked; dst_cgrp = cgroup_kn_lock_live(of->kn, false); if (!dst_cgrp) return -ENODEV; - task = cgroup_procs_write_start(buf, threadgroup, &locked); + task = cgroup_procs_write_start(buf, threadgroup, &threadgroup_locked); ret = PTR_ERR_OR_ZERO(task); if (ret) goto out_unlock; @@ -5003,7 +5034,7 @@ static ssize_t __cgroup_procs_write(struct kernfs_open_file *of, char *buf, ret = cgroup_attach_task(dst_cgrp, task, threadgroup); out_finish: - cgroup_procs_write_finish(task, locked); + cgroup_procs_write_finish(task, threadgroup_locked); out_unlock: cgroup_kn_unlock(of->kn); diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 58aadfda9b8b3..1f3a55297f39d 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -2289,7 +2289,7 @@ static void cpuset_attach(struct cgroup_taskset *tset) cgroup_taskset_first(tset, &css); cs = css_cs(css); - cpus_read_lock(); + lockdep_assert_cpus_held(); /* see cgroup_attach_lock() */ percpu_down_write(&cpuset_rwsem); guarantee_online_mems(cs, &cpuset_attach_nodemask_to); @@ -2343,7 +2343,6 @@ static void cpuset_attach(struct cgroup_taskset *tset) wake_up(&cpuset_attach_wq); percpu_up_write(&cpuset_rwsem); - cpus_read_unlock(); } /* The various types of files and directories in a cpuset file system */ ^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH cgroup/for-6.0-fixes] cgroup: Fix threadgroup_rwsem <-> cpus_read_lock() deadlock @ 2022-08-15 23:27 ` Tejun Heo 0 siblings, 0 replies; 28+ messages in thread From: Tejun Heo @ 2022-08-15 23:27 UTC (permalink / raw) To: Mukesh Ojha Cc: Michal Koutný, Xuewen Yan, Imran Khan, lizefan.x-EC8Uxl6Npydl57MIdRCFDg, hannes-druUgvl0LCNAfugRpC6u6w, tglx-hfZtesqFncYOwBW4kG4KsQ, steven.price-5wv7dgnIgG8, peterz-wEGCiKHe2LqWVfeAwA7xHQ, cgroups-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Zhao Gongyi, Zhang Qiao Bringing up a CPU may involve creating new tasks which requires read-locking threadgroup_rwsem, so threadgroup_rwsem nests inside cpus_read_lock(). However, cpuset's ->attach(), which may be called with thredagroup_rwsem write-locked, also wants to disable CPU hotplug and acquires cpus_read_lock(), leading to a deadlock. Fix it by guaranteeing that ->attach() is always called with CPU hotplug disabled and removing cpus_read_lock() call from cpuset_attach(). Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> --- Hello, sorry about the delay. So, the previous patch + the revert isn't quite correct because we sometimes elide both cpus_read_lock() and threadgroup_rwsem together and cpuset_attach() woudl end up running without CPU hotplug enabled. Can you please test whether this patch fixes the problem? Thanks. kernel/cgroup/cgroup.c | 77 ++++++++++++++++++++++++++++++++++--------------- kernel/cgroup/cpuset.c | 3 - 2 files changed, 55 insertions(+), 25 deletions(-) diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index ffaccd6373f1e..52502f34fae8c 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -2369,6 +2369,47 @@ int task_cgroup_path(struct task_struct *task, char *buf, size_t buflen) } EXPORT_SYMBOL_GPL(task_cgroup_path); +/** + * cgroup_attach_lock - Lock for ->attach() + * @lock_threadgroup: whether to down_write cgroup_threadgroup_rwsem + * + * cgroup migration sometimes needs to stabilize threadgroups against forks and + * exits by write-locking cgroup_threadgroup_rwsem. However, some ->attach() + * implementations (e.g. cpuset), also need to disable CPU hotplug. + * Unfortunately, letting ->attach() operations acquire cpus_read_lock() can + * lead to deadlocks. + * + * Bringing up a CPU may involve creating new tasks which requires read-locking + * threadgroup_rwsem, so threadgroup_rwsem nests inside cpus_read_lock(). If we + * call an ->attach() which acquires the cpus lock while write-locking + * threadgroup_rwsem, the locking order is reversed and we end up waiting for an + * on-going CPU hotplug operation which in turn is waiting for the + * threadgroup_rwsem to be released to create new tasks. For more details: + * + * http://lkml.kernel.org/r/20220711174629.uehfmqegcwn2lqzu@wubuntu + * + * Resolve the situation by always acquiring cpus_read_lock() before optionally + * write-locking cgroup_threadgroup_rwsem. This allows ->attach() to assume that + * CPU hotplug is disabled on entry. + */ +static void cgroup_attach_lock(bool lock_threadgroup) +{ + cpus_read_lock(); + if (lock_threadgroup) + percpu_down_write(&cgroup_threadgroup_rwsem); +} + +/** + * cgroup_attach_unlock - Undo cgroup_attach_lock() + * @lock_threadgroup: whether to up_write cgroup_threadgroup_rwsem + */ +static void cgroup_attach_unlock(bool lock_threadgroup) +{ + if (lock_threadgroup) + percpu_up_write(&cgroup_threadgroup_rwsem); + cpus_read_unlock(); +} + /** * cgroup_migrate_add_task - add a migration target task to a migration context * @task: target task @@ -2841,8 +2882,7 @@ int cgroup_attach_task(struct cgroup *dst_cgrp, struct task_struct *leader, } struct task_struct *cgroup_procs_write_start(char *buf, bool threadgroup, - bool *locked) - __acquires(&cgroup_threadgroup_rwsem) + bool *threadgroup_locked) { struct task_struct *tsk; pid_t pid; @@ -2859,12 +2899,8 @@ struct task_struct *cgroup_procs_write_start(char *buf, bool threadgroup, * Therefore, we can skip the global lock. */ lockdep_assert_held(&cgroup_mutex); - if (pid || threadgroup) { - percpu_down_write(&cgroup_threadgroup_rwsem); - *locked = true; - } else { - *locked = false; - } + *threadgroup_locked = pid || threadgroup; + cgroup_attach_lock(*threadgroup_locked); rcu_read_lock(); if (pid) { @@ -2895,17 +2931,14 @@ struct task_struct *cgroup_procs_write_start(char *buf, bool threadgroup, goto out_unlock_rcu; out_unlock_threadgroup: - if (*locked) { - percpu_up_write(&cgroup_threadgroup_rwsem); - *locked = false; - } + cgroup_attach_unlock(*threadgroup_locked); + *threadgroup_locked = false; out_unlock_rcu: rcu_read_unlock(); return tsk; } -void cgroup_procs_write_finish(struct task_struct *task, bool locked) - __releases(&cgroup_threadgroup_rwsem) +void cgroup_procs_write_finish(struct task_struct *task, bool threadgroup_locked) { struct cgroup_subsys *ss; int ssid; @@ -2913,8 +2946,8 @@ void cgroup_procs_write_finish(struct task_struct *task, bool locked) /* release reference from cgroup_procs_write_start() */ put_task_struct(task); - if (locked) - percpu_up_write(&cgroup_threadgroup_rwsem); + cgroup_attach_unlock(threadgroup_locked); + for_each_subsys(ss, ssid) if (ss->post_attach) ss->post_attach(); @@ -3000,8 +3033,7 @@ static int cgroup_update_dfl_csses(struct cgroup *cgrp) * write-locking can be skipped safely. */ has_tasks = !list_empty(&mgctx.preloaded_src_csets); - if (has_tasks) - percpu_down_write(&cgroup_threadgroup_rwsem); + cgroup_attach_lock(has_tasks); /* NULL dst indicates self on default hierarchy */ ret = cgroup_migrate_prepare_dst(&mgctx); @@ -3022,8 +3054,7 @@ static int cgroup_update_dfl_csses(struct cgroup *cgrp) ret = cgroup_migrate_execute(&mgctx); out_finish: cgroup_migrate_finish(&mgctx); - if (has_tasks) - percpu_up_write(&cgroup_threadgroup_rwsem); + cgroup_attach_unlock(has_tasks); return ret; } @@ -4971,13 +5002,13 @@ static ssize_t __cgroup_procs_write(struct kernfs_open_file *of, char *buf, struct task_struct *task; const struct cred *saved_cred; ssize_t ret; - bool locked; + bool threadgroup_locked; dst_cgrp = cgroup_kn_lock_live(of->kn, false); if (!dst_cgrp) return -ENODEV; - task = cgroup_procs_write_start(buf, threadgroup, &locked); + task = cgroup_procs_write_start(buf, threadgroup, &threadgroup_locked); ret = PTR_ERR_OR_ZERO(task); if (ret) goto out_unlock; @@ -5003,7 +5034,7 @@ static ssize_t __cgroup_procs_write(struct kernfs_open_file *of, char *buf, ret = cgroup_attach_task(dst_cgrp, task, threadgroup); out_finish: - cgroup_procs_write_finish(task, locked); + cgroup_procs_write_finish(task, threadgroup_locked); out_unlock: cgroup_kn_unlock(of->kn); diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 58aadfda9b8b3..1f3a55297f39d 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -2289,7 +2289,7 @@ static void cpuset_attach(struct cgroup_taskset *tset) cgroup_taskset_first(tset, &css); cs = css_cs(css); - cpus_read_lock(); + lockdep_assert_cpus_held(); /* see cgroup_attach_lock() */ percpu_down_write(&cpuset_rwsem); guarantee_online_mems(cs, &cpuset_attach_nodemask_to); @@ -2343,7 +2343,6 @@ static void cpuset_attach(struct cgroup_taskset *tset) wake_up(&cpuset_attach_wq); percpu_up_write(&cpuset_rwsem); - cpus_read_unlock(); } /* The various types of files and directories in a cpuset file system */ ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH cgroup/for-6.0-fixes] cgroup: Fix threadgroup_rwsem <-> cpus_read_lock() deadlock @ 2022-08-16 20:20 ` Imran Khan 0 siblings, 0 replies; 28+ messages in thread From: Imran Khan @ 2022-08-16 20:20 UTC (permalink / raw) To: Tejun Heo, Mukesh Ojha Cc: Michal Koutný, Xuewen Yan, lizefan.x, hannes, tglx, steven.price, peterz, cgroups, linux-kernel, Zhao Gongyi, Zhang Qiao Hello Tejun, On 16/8/22 9:27 am, Tejun Heo wrote: > Bringing up a CPU may involve creating new tasks which requires read-locking > threadgroup_rwsem, so threadgroup_rwsem nests inside cpus_read_lock(). > However, cpuset's ->attach(), which may be called with thredagroup_rwsem > write-locked, also wants to disable CPU hotplug and acquires > cpus_read_lock(), leading to a deadlock. > > Fix it by guaranteeing that ->attach() is always called with CPU hotplug > disabled and removing cpus_read_lock() call from cpuset_attach(). > > Signed-off-by: Tejun Heo <tj@kernel.org> > --- > Hello, sorry about the delay. > > So, the previous patch + the revert isn't quite correct because we sometimes > elide both cpus_read_lock() and threadgroup_rwsem together and > cpuset_attach() woudl end up running without CPU hotplug enabled. Can you > please test whether this patch fixes the problem? > This fixes the issue seen in my setup. As my setup is 5.4 based I used cgroup_attach_lock/unlock(true) in the backport version of your patch. Feel free to add my Reviewed-and-tested-by: Imran Khan <imran.f.khan@oracle.com> Thanks, -- Imran ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH cgroup/for-6.0-fixes] cgroup: Fix threadgroup_rwsem <-> cpus_read_lock() deadlock @ 2022-08-16 20:20 ` Imran Khan 0 siblings, 0 replies; 28+ messages in thread From: Imran Khan @ 2022-08-16 20:20 UTC (permalink / raw) To: Tejun Heo, Mukesh Ojha Cc: Michal Koutný, Xuewen Yan, lizefan.x-EC8Uxl6Npydl57MIdRCFDg, hannes-druUgvl0LCNAfugRpC6u6w, tglx-hfZtesqFncYOwBW4kG4KsQ, steven.price-5wv7dgnIgG8, peterz-wEGCiKHe2LqWVfeAwA7xHQ, cgroups-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Zhao Gongyi, Zhang Qiao Hello Tejun, On 16/8/22 9:27 am, Tejun Heo wrote: > Bringing up a CPU may involve creating new tasks which requires read-locking > threadgroup_rwsem, so threadgroup_rwsem nests inside cpus_read_lock(). > However, cpuset's ->attach(), which may be called with thredagroup_rwsem > write-locked, also wants to disable CPU hotplug and acquires > cpus_read_lock(), leading to a deadlock. > > Fix it by guaranteeing that ->attach() is always called with CPU hotplug > disabled and removing cpus_read_lock() call from cpuset_attach(). > > Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> > --- > Hello, sorry about the delay. > > So, the previous patch + the revert isn't quite correct because we sometimes > elide both cpus_read_lock() and threadgroup_rwsem together and > cpuset_attach() woudl end up running without CPU hotplug enabled. Can you > please test whether this patch fixes the problem? > This fixes the issue seen in my setup. As my setup is 5.4 based I used cgroup_attach_lock/unlock(true) in the backport version of your patch. Feel free to add my Reviewed-and-tested-by: Imran Khan <imran.f.khan-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> Thanks, -- Imran ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH cgroup/for-6.0-fixes] cgroup: Fix threadgroup_rwsem <-> cpus_read_lock() deadlock @ 2022-08-17 6:55 ` Xuewen Yan 0 siblings, 0 replies; 28+ messages in thread From: Xuewen Yan @ 2022-08-17 6:55 UTC (permalink / raw) To: Tejun Heo Cc: Mukesh Ojha, Michal Koutný, Imran Khan, lizefan.x, hannes, tglx, steven.price, peterz, cgroups, linux-kernel, Zhao Gongyi, Zhang Qiao, 王科 (Ke Wang), orson.zhai, Xuewen Yan Hi Tejun On Tue, Aug 16, 2022 at 7:27 AM Tejun Heo <tj@kernel.org> wrote: > > Bringing up a CPU may involve creating new tasks which requires read-locking > threadgroup_rwsem, so threadgroup_rwsem nests inside cpus_read_lock(). Indeed, it is creating new kthreads. And not only creating new kthread, but also destroying kthread. the backtrace is: __switch_to __schedule schedule percpu_rwsem_wait <<< wait for cgroup_threadgroup_rwsem __percpu_down_read exit_signals do_exit kthread > However, cpuset's ->attach(), which may be called with thredagroup_rwsem > write-locked, also wants to disable CPU hotplug and acquires > cpus_read_lock(), leading to a deadlock. > > Fix it by guaranteeing that ->attach() is always called with CPU hotplug > disabled and removing cpus_read_lock() call from cpuset_attach(). > > Signed-off-by: Tejun Heo <tj@kernel.org> > --- > Hello, sorry about the delay. > > So, the previous patch + the revert isn't quite correct because we sometimes > elide both cpus_read_lock() and threadgroup_rwsem together and > cpuset_attach() woudl end up running without CPU hotplug enabled. Can you > please test whether this patch fixes the problem? > > Thanks. > > kernel/cgroup/cgroup.c | 77 ++++++++++++++++++++++++++++++++++--------------- > kernel/cgroup/cpuset.c | 3 - > 2 files changed, 55 insertions(+), 25 deletions(-) > > diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c > index ffaccd6373f1e..52502f34fae8c 100644 > --- a/kernel/cgroup/cgroup.c > +++ b/kernel/cgroup/cgroup.c > @@ -2369,6 +2369,47 @@ int task_cgroup_path(struct task_struct *task, char *buf, size_t buflen) > } > EXPORT_SYMBOL_GPL(task_cgroup_path); > > +/** > + * cgroup_attach_lock - Lock for ->attach() > + * @lock_threadgroup: whether to down_write cgroup_threadgroup_rwsem > + * > + * cgroup migration sometimes needs to stabilize threadgroups against forks and > + * exits by write-locking cgroup_threadgroup_rwsem. However, some ->attach() > + * implementations (e.g. cpuset), also need to disable CPU hotplug. > + * Unfortunately, letting ->attach() operations acquire cpus_read_lock() can > + * lead to deadlocks. > + * > + * Bringing up a CPU may involve creating new tasks which requires read-locking Is it better to change to creating new kthreads and destroying kthreads? > + * threadgroup_rwsem, so threadgroup_rwsem nests inside cpus_read_lock(). If we > + * call an ->attach() which acquires the cpus lock while write-locking > + * threadgroup_rwsem, the locking order is reversed and we end up waiting for an > + * on-going CPU hotplug operation which in turn is waiting for the > + * threadgroup_rwsem to be released to create new tasks. For more details: > + * > + * http://lkml.kernel.org/r/20220711174629.uehfmqegcwn2lqzu@wubuntu > + * > + * Resolve the situation by always acquiring cpus_read_lock() before optionally > + * write-locking cgroup_threadgroup_rwsem. This allows ->attach() to assume that > + * CPU hotplug is disabled on entry. > + */ > +static void cgroup_attach_lock(bool lock_threadgroup) > +{ > + cpus_read_lock(); > + if (lock_threadgroup) > + percpu_down_write(&cgroup_threadgroup_rwsem); > +} > + > +/** > + * cgroup_attach_unlock - Undo cgroup_attach_lock() > + * @lock_threadgroup: whether to up_write cgroup_threadgroup_rwsem > + */ > +static void cgroup_attach_unlock(bool lock_threadgroup) > +{ > + if (lock_threadgroup) > + percpu_up_write(&cgroup_threadgroup_rwsem); > + cpus_read_unlock(); > +} > + > /** > * cgroup_migrate_add_task - add a migration target task to a migration context > * @task: target task > @@ -2841,8 +2882,7 @@ int cgroup_attach_task(struct cgroup *dst_cgrp, struct task_struct *leader, > } > > struct task_struct *cgroup_procs_write_start(char *buf, bool threadgroup, > - bool *locked) > - __acquires(&cgroup_threadgroup_rwsem) > + bool *threadgroup_locked) > { > struct task_struct *tsk; > pid_t pid; > @@ -2859,12 +2899,8 @@ struct task_struct *cgroup_procs_write_start(char *buf, bool threadgroup, > * Therefore, we can skip the global lock. > */ > lockdep_assert_held(&cgroup_mutex); > - if (pid || threadgroup) { > - percpu_down_write(&cgroup_threadgroup_rwsem); > - *locked = true; > - } else { > - *locked = false; > - } > + *threadgroup_locked = pid || threadgroup; > + cgroup_attach_lock(*threadgroup_locked); > > rcu_read_lock(); > if (pid) { > @@ -2895,17 +2931,14 @@ struct task_struct *cgroup_procs_write_start(char *buf, bool threadgroup, > goto out_unlock_rcu; > > out_unlock_threadgroup: > - if (*locked) { > - percpu_up_write(&cgroup_threadgroup_rwsem); > - *locked = false; > - } > + cgroup_attach_unlock(*threadgroup_locked); > + *threadgroup_locked = false; > out_unlock_rcu: > rcu_read_unlock(); > return tsk; > } > > -void cgroup_procs_write_finish(struct task_struct *task, bool locked) > - __releases(&cgroup_threadgroup_rwsem) > +void cgroup_procs_write_finish(struct task_struct *task, bool threadgroup_locked) > { > struct cgroup_subsys *ss; > int ssid; > @@ -2913,8 +2946,8 @@ void cgroup_procs_write_finish(struct task_struct *task, bool locked) > /* release reference from cgroup_procs_write_start() */ > put_task_struct(task); > > - if (locked) > - percpu_up_write(&cgroup_threadgroup_rwsem); > + cgroup_attach_unlock(threadgroup_locked); > + > for_each_subsys(ss, ssid) > if (ss->post_attach) > ss->post_attach(); > @@ -3000,8 +3033,7 @@ static int cgroup_update_dfl_csses(struct cgroup *cgrp) > * write-locking can be skipped safely. > */ > has_tasks = !list_empty(&mgctx.preloaded_src_csets); > - if (has_tasks) > - percpu_down_write(&cgroup_threadgroup_rwsem); > + cgroup_attach_lock(has_tasks); > > /* NULL dst indicates self on default hierarchy */ > ret = cgroup_migrate_prepare_dst(&mgctx); > @@ -3022,8 +3054,7 @@ static int cgroup_update_dfl_csses(struct cgroup *cgrp) > ret = cgroup_migrate_execute(&mgctx); > out_finish: > cgroup_migrate_finish(&mgctx); > - if (has_tasks) > - percpu_up_write(&cgroup_threadgroup_rwsem); > + cgroup_attach_unlock(has_tasks); In kernel5.15, I just set cgroup_attach_lock/unlock(true). > return ret; > } > > @@ -4971,13 +5002,13 @@ static ssize_t __cgroup_procs_write(struct kernfs_open_file *of, char *buf, > struct task_struct *task; > const struct cred *saved_cred; > ssize_t ret; > - bool locked; > + bool threadgroup_locked; > > dst_cgrp = cgroup_kn_lock_live(of->kn, false); > if (!dst_cgrp) > return -ENODEV; > > - task = cgroup_procs_write_start(buf, threadgroup, &locked); > + task = cgroup_procs_write_start(buf, threadgroup, &threadgroup_locked); > ret = PTR_ERR_OR_ZERO(task); > if (ret) > goto out_unlock; > @@ -5003,7 +5034,7 @@ static ssize_t __cgroup_procs_write(struct kernfs_open_file *of, char *buf, > ret = cgroup_attach_task(dst_cgrp, task, threadgroup); > > out_finish: > - cgroup_procs_write_finish(task, locked); > + cgroup_procs_write_finish(task, threadgroup_locked); > out_unlock: > cgroup_kn_unlock(of->kn); > > diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c > index 58aadfda9b8b3..1f3a55297f39d 100644 > --- a/kernel/cgroup/cpuset.c > +++ b/kernel/cgroup/cpuset.c > @@ -2289,7 +2289,7 @@ static void cpuset_attach(struct cgroup_taskset *tset) > cgroup_taskset_first(tset, &css); > cs = css_cs(css); > > - cpus_read_lock(); > + lockdep_assert_cpus_held(); /* see cgroup_attach_lock() */ > percpu_down_write(&cpuset_rwsem); > > guarantee_online_mems(cs, &cpuset_attach_nodemask_to); > @@ -2343,7 +2343,6 @@ static void cpuset_attach(struct cgroup_taskset *tset) > wake_up(&cpuset_attach_wq); > > percpu_up_write(&cpuset_rwsem); > - cpus_read_unlock(); > } > > /* The various types of files and directories in a cpuset file system */ I backported your patch. to kernel5.4 and kernel5.15, and just setting cgroup_attach_lock/unlock(true) when there are conflicts. And the deadlock has not occured. Reported-and-tested-by: Xuewen Yan <xuewen.yan@unisoc.com> Thanks! ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH cgroup/for-6.0-fixes] cgroup: Fix threadgroup_rwsem <-> cpus_read_lock() deadlock @ 2022-08-17 6:55 ` Xuewen Yan 0 siblings, 0 replies; 28+ messages in thread From: Xuewen Yan @ 2022-08-17 6:55 UTC (permalink / raw) To: Tejun Heo Cc: Mukesh Ojha, Michal Koutný, Imran Khan, lizefan.x-EC8Uxl6Npydl57MIdRCFDg, hannes-druUgvl0LCNAfugRpC6u6w, tglx-hfZtesqFncYOwBW4kG4KsQ, steven.price-5wv7dgnIgG8, peterz-wEGCiKHe2LqWVfeAwA7xHQ, cgroups-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Zhao Gongyi, Zhang Qiao, 王科 (Ke Wang), orson.zhai-1tVvrHeaX6nQT0dZR+AlfA, Xuewen Yan Hi Tejun On Tue, Aug 16, 2022 at 7:27 AM Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote: > > Bringing up a CPU may involve creating new tasks which requires read-locking > threadgroup_rwsem, so threadgroup_rwsem nests inside cpus_read_lock(). Indeed, it is creating new kthreads. And not only creating new kthread, but also destroying kthread. the backtrace is: __switch_to __schedule schedule percpu_rwsem_wait <<< wait for cgroup_threadgroup_rwsem __percpu_down_read exit_signals do_exit kthread > However, cpuset's ->attach(), which may be called with thredagroup_rwsem > write-locked, also wants to disable CPU hotplug and acquires > cpus_read_lock(), leading to a deadlock. > > Fix it by guaranteeing that ->attach() is always called with CPU hotplug > disabled and removing cpus_read_lock() call from cpuset_attach(). > > Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> > --- > Hello, sorry about the delay. > > So, the previous patch + the revert isn't quite correct because we sometimes > elide both cpus_read_lock() and threadgroup_rwsem together and > cpuset_attach() woudl end up running without CPU hotplug enabled. Can you > please test whether this patch fixes the problem? > > Thanks. > > kernel/cgroup/cgroup.c | 77 ++++++++++++++++++++++++++++++++++--------------- > kernel/cgroup/cpuset.c | 3 - > 2 files changed, 55 insertions(+), 25 deletions(-) > > diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c > index ffaccd6373f1e..52502f34fae8c 100644 > --- a/kernel/cgroup/cgroup.c > +++ b/kernel/cgroup/cgroup.c > @@ -2369,6 +2369,47 @@ int task_cgroup_path(struct task_struct *task, char *buf, size_t buflen) > } > EXPORT_SYMBOL_GPL(task_cgroup_path); > > +/** > + * cgroup_attach_lock - Lock for ->attach() > + * @lock_threadgroup: whether to down_write cgroup_threadgroup_rwsem > + * > + * cgroup migration sometimes needs to stabilize threadgroups against forks and > + * exits by write-locking cgroup_threadgroup_rwsem. However, some ->attach() > + * implementations (e.g. cpuset), also need to disable CPU hotplug. > + * Unfortunately, letting ->attach() operations acquire cpus_read_lock() can > + * lead to deadlocks. > + * > + * Bringing up a CPU may involve creating new tasks which requires read-locking Is it better to change to creating new kthreads and destroying kthreads? > + * threadgroup_rwsem, so threadgroup_rwsem nests inside cpus_read_lock(). If we > + * call an ->attach() which acquires the cpus lock while write-locking > + * threadgroup_rwsem, the locking order is reversed and we end up waiting for an > + * on-going CPU hotplug operation which in turn is waiting for the > + * threadgroup_rwsem to be released to create new tasks. For more details: > + * > + * http://lkml.kernel.org/r/20220711174629.uehfmqegcwn2lqzu@wubuntu > + * > + * Resolve the situation by always acquiring cpus_read_lock() before optionally > + * write-locking cgroup_threadgroup_rwsem. This allows ->attach() to assume that > + * CPU hotplug is disabled on entry. > + */ > +static void cgroup_attach_lock(bool lock_threadgroup) > +{ > + cpus_read_lock(); > + if (lock_threadgroup) > + percpu_down_write(&cgroup_threadgroup_rwsem); > +} > + > +/** > + * cgroup_attach_unlock - Undo cgroup_attach_lock() > + * @lock_threadgroup: whether to up_write cgroup_threadgroup_rwsem > + */ > +static void cgroup_attach_unlock(bool lock_threadgroup) > +{ > + if (lock_threadgroup) > + percpu_up_write(&cgroup_threadgroup_rwsem); > + cpus_read_unlock(); > +} > + > /** > * cgroup_migrate_add_task - add a migration target task to a migration context > * @task: target task > @@ -2841,8 +2882,7 @@ int cgroup_attach_task(struct cgroup *dst_cgrp, struct task_struct *leader, > } > > struct task_struct *cgroup_procs_write_start(char *buf, bool threadgroup, > - bool *locked) > - __acquires(&cgroup_threadgroup_rwsem) > + bool *threadgroup_locked) > { > struct task_struct *tsk; > pid_t pid; > @@ -2859,12 +2899,8 @@ struct task_struct *cgroup_procs_write_start(char *buf, bool threadgroup, > * Therefore, we can skip the global lock. > */ > lockdep_assert_held(&cgroup_mutex); > - if (pid || threadgroup) { > - percpu_down_write(&cgroup_threadgroup_rwsem); > - *locked = true; > - } else { > - *locked = false; > - } > + *threadgroup_locked = pid || threadgroup; > + cgroup_attach_lock(*threadgroup_locked); > > rcu_read_lock(); > if (pid) { > @@ -2895,17 +2931,14 @@ struct task_struct *cgroup_procs_write_start(char *buf, bool threadgroup, > goto out_unlock_rcu; > > out_unlock_threadgroup: > - if (*locked) { > - percpu_up_write(&cgroup_threadgroup_rwsem); > - *locked = false; > - } > + cgroup_attach_unlock(*threadgroup_locked); > + *threadgroup_locked = false; > out_unlock_rcu: > rcu_read_unlock(); > return tsk; > } > > -void cgroup_procs_write_finish(struct task_struct *task, bool locked) > - __releases(&cgroup_threadgroup_rwsem) > +void cgroup_procs_write_finish(struct task_struct *task, bool threadgroup_locked) > { > struct cgroup_subsys *ss; > int ssid; > @@ -2913,8 +2946,8 @@ void cgroup_procs_write_finish(struct task_struct *task, bool locked) > /* release reference from cgroup_procs_write_start() */ > put_task_struct(task); > > - if (locked) > - percpu_up_write(&cgroup_threadgroup_rwsem); > + cgroup_attach_unlock(threadgroup_locked); > + > for_each_subsys(ss, ssid) > if (ss->post_attach) > ss->post_attach(); > @@ -3000,8 +3033,7 @@ static int cgroup_update_dfl_csses(struct cgroup *cgrp) > * write-locking can be skipped safely. > */ > has_tasks = !list_empty(&mgctx.preloaded_src_csets); > - if (has_tasks) > - percpu_down_write(&cgroup_threadgroup_rwsem); > + cgroup_attach_lock(has_tasks); > > /* NULL dst indicates self on default hierarchy */ > ret = cgroup_migrate_prepare_dst(&mgctx); > @@ -3022,8 +3054,7 @@ static int cgroup_update_dfl_csses(struct cgroup *cgrp) > ret = cgroup_migrate_execute(&mgctx); > out_finish: > cgroup_migrate_finish(&mgctx); > - if (has_tasks) > - percpu_up_write(&cgroup_threadgroup_rwsem); > + cgroup_attach_unlock(has_tasks); In kernel5.15, I just set cgroup_attach_lock/unlock(true). > return ret; > } > > @@ -4971,13 +5002,13 @@ static ssize_t __cgroup_procs_write(struct kernfs_open_file *of, char *buf, > struct task_struct *task; > const struct cred *saved_cred; > ssize_t ret; > - bool locked; > + bool threadgroup_locked; > > dst_cgrp = cgroup_kn_lock_live(of->kn, false); > if (!dst_cgrp) > return -ENODEV; > > - task = cgroup_procs_write_start(buf, threadgroup, &locked); > + task = cgroup_procs_write_start(buf, threadgroup, &threadgroup_locked); > ret = PTR_ERR_OR_ZERO(task); > if (ret) > goto out_unlock; > @@ -5003,7 +5034,7 @@ static ssize_t __cgroup_procs_write(struct kernfs_open_file *of, char *buf, > ret = cgroup_attach_task(dst_cgrp, task, threadgroup); > > out_finish: > - cgroup_procs_write_finish(task, locked); > + cgroup_procs_write_finish(task, threadgroup_locked); > out_unlock: > cgroup_kn_unlock(of->kn); > > diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c > index 58aadfda9b8b3..1f3a55297f39d 100644 > --- a/kernel/cgroup/cpuset.c > +++ b/kernel/cgroup/cpuset.c > @@ -2289,7 +2289,7 @@ static void cpuset_attach(struct cgroup_taskset *tset) > cgroup_taskset_first(tset, &css); > cs = css_cs(css); > > - cpus_read_lock(); > + lockdep_assert_cpus_held(); /* see cgroup_attach_lock() */ > percpu_down_write(&cpuset_rwsem); > > guarantee_online_mems(cs, &cpuset_attach_nodemask_to); > @@ -2343,7 +2343,6 @@ static void cpuset_attach(struct cgroup_taskset *tset) > wake_up(&cpuset_attach_wq); > > percpu_up_write(&cpuset_rwsem); > - cpus_read_unlock(); > } > > /* The various types of files and directories in a cpuset file system */ I backported your patch. to kernel5.4 and kernel5.15, and just setting cgroup_attach_lock/unlock(true) when there are conflicts. And the deadlock has not occured. Reported-and-tested-by: Xuewen Yan <xuewen.yan-1tVvrHeaX6nQT0dZR+AlfA@public.gmane.org> ThanksԺŠ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH cgroup/for-6.0-fixes] cgroup: Fix threadgroup_rwsem <-> cpus_read_lock() deadlock @ 2022-08-17 17:40 ` Tejun Heo 0 siblings, 0 replies; 28+ messages in thread From: Tejun Heo @ 2022-08-17 17:40 UTC (permalink / raw) To: Mukesh Ojha Cc: Michal Koutný, Xuewen Yan, Imran Khan, lizefan.x, hannes, tglx, steven.price, peterz, cgroups, linux-kernel, Zhao Gongyi, Zhang Qiao On Mon, Aug 15, 2022 at 01:27:38PM -1000, Tejun Heo wrote: > Bringing up a CPU may involve creating new tasks which requires read-locking > threadgroup_rwsem, so threadgroup_rwsem nests inside cpus_read_lock(). > However, cpuset's ->attach(), which may be called with thredagroup_rwsem > write-locked, also wants to disable CPU hotplug and acquires > cpus_read_lock(), leading to a deadlock. > > Fix it by guaranteeing that ->attach() is always called with CPU hotplug > disabled and removing cpus_read_lock() call from cpuset_attach(). > > Signed-off-by: Tejun Heo <tj@kernel.org> Applied to cgroup/for-6.0-fixes w/ commit message and comment update suggested by Xuewen and Fixes / stable tags added. Thanks. -- tejun ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH cgroup/for-6.0-fixes] cgroup: Fix threadgroup_rwsem <-> cpus_read_lock() deadlock @ 2022-08-17 17:40 ` Tejun Heo 0 siblings, 0 replies; 28+ messages in thread From: Tejun Heo @ 2022-08-17 17:40 UTC (permalink / raw) To: Mukesh Ojha Cc: Michal Koutný, Xuewen Yan, Imran Khan, lizefan.x-EC8Uxl6Npydl57MIdRCFDg, hannes-druUgvl0LCNAfugRpC6u6w, tglx-hfZtesqFncYOwBW4kG4KsQ, steven.price-5wv7dgnIgG8, peterz-wEGCiKHe2LqWVfeAwA7xHQ, cgroups-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Zhao Gongyi, Zhang Qiao On Mon, Aug 15, 2022 at 01:27:38PM -1000, Tejun Heo wrote: > Bringing up a CPU may involve creating new tasks which requires read-locking > threadgroup_rwsem, so threadgroup_rwsem nests inside cpus_read_lock(). > However, cpuset's ->attach(), which may be called with thredagroup_rwsem > write-locked, also wants to disable CPU hotplug and acquires > cpus_read_lock(), leading to a deadlock. > > Fix it by guaranteeing that ->attach() is always called with CPU hotplug > disabled and removing cpus_read_lock() call from cpuset_attach(). > > Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> Applied to cgroup/for-6.0-fixes w/ commit message and comment update suggested by Xuewen and Fixes / stable tags added. Thanks. -- tejun ^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2022-08-17 17:40 UTC | newest] Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <8245b710-8acb-d8e6-7045-99a5f71dad4e@oracle.com> 2022-07-20 2:38 ` Query regarding deadlock involving cgroup_threadgroup_rwsem and cpu_hotplug_lock Imran Khan 2022-07-20 3:27 ` Imran Khan 2022-07-20 3:27 ` Imran Khan 2022-07-20 11:06 ` Mukesh Ojha 2022-07-20 11:06 ` Mukesh Ojha 2022-07-20 12:01 ` Mukesh Ojha 2022-07-20 12:01 ` Mukesh Ojha 2022-07-20 18:05 ` Tejun Heo 2022-07-20 18:05 ` Tejun Heo 2022-07-27 19:33 ` Tejun Heo 2022-07-27 19:33 ` Tejun Heo 2022-08-12 10:27 ` Mukesh Ojha 2022-08-12 10:27 ` Mukesh Ojha 2022-08-15 9:05 ` Michal Koutný 2022-08-15 9:25 ` Xuewen Yan 2022-08-15 9:25 ` Xuewen Yan 2022-08-15 9:39 ` Michal Koutný 2022-08-15 9:39 ` Michal Koutný 2022-08-15 10:59 ` Mukesh Ojha 2022-08-15 10:59 ` Mukesh Ojha 2022-08-15 23:27 ` [PATCH cgroup/for-6.0-fixes] cgroup: Fix threadgroup_rwsem <-> cpus_read_lock() deadlock Tejun Heo 2022-08-15 23:27 ` Tejun Heo 2022-08-16 20:20 ` Imran Khan 2022-08-16 20:20 ` Imran Khan 2022-08-17 6:55 ` Xuewen Yan 2022-08-17 6:55 ` Xuewen Yan 2022-08-17 17:40 ` Tejun Heo 2022-08-17 17:40 ` Tejun Heo
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.