linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* BUG: HANG_DETECT waiting for migration_cpu_stop() complete
@ 2022-09-05  2:47 Jing-Ting Wu
  2022-09-05  6:44 ` Mukesh Ojha
  0 siblings, 1 reply; 13+ messages in thread
From: Jing-Ting Wu @ 2022-09-05  2:47 UTC (permalink / raw)
  To: Peter Zijlstra, Valentin Schneider, Tejun Heo
  Cc: wsd_upstream, linux-kernel, linux-arm-kernel, linux-mediatek,
	Jonathan.JMChen, chris.redpath, Dietmar Eggemann,
	Vincent Donnefort, Ingo Molnar, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Christian Brauner, cgroups, lixiong.liu, wenju.xu

Hi,

We meet the HANG_DETECT happened in T SW version with kernel-5.15.
Many tasks have been blocked for a long time.


Root cause:
migration_cpu_stop() is not complete due to is_migration_disabled(p) is
true, complete is false and complete_all() never get executed.
It let other task wait the rwsem.

Detail:
system_server waiting for cgroup_threadgroup_rwsem.
OomAdjuster is holding the cgroup_threadgroup_rwsem and waiting for
cpuset_rwsem.
cpuset_hotplug_workfn is holding the cpuset_rwsem and waiting for
affine_move_task() complete.
affine_move_task() waiting for migration_cpu_stop() complete.

The backtrace of system_server:
__switch_to
__schedule
schedule
percpu_rwsem_wait
__percpu_down_read
cgroup_css_set_fork => wait for cgroup_threadgroup_rwsem
cgroup_can_fork
copy_process
kernel_clone

The backtrace of OomAdjuster:
__switch_to
__schedule
schedule
percpu_rwsem_wait
percpu_down_write
cpuset_can_attach => wait for cpuset_rwsem
cgroup_migrate_execute
cgroup_attach_task
__cgroup1_procs_write => hold cgroup_threadgroup_rwsem
cgroup1_procs_write
cgroup_file_write
kernfs_fop_write_iter
vfs_write
ksys_write

The backtrace of cpuset_hotplug_workfn:
__switch_to
__schedule
schedule
schedule_timeout
wait_for_common
affine_move_task => wait for complete
__set_cpus_allowed_ptr_locked
update_tasks_cpumask
cpuset_hotplug_update_tasks => hold cpuset_rwsem
cpuset_hotplug_workfn
process_one_work
worker_thread
kthread


In affine_move_task() will call migration_cpu_stop() and wait for it
complete.
In normal case, if migration_cpu_stop() complete it will inform
everyone that he is done.
But there is an exception case that will not notify.
If is_migration_disabled(p) is true and complete will always is false,
then complete_all() never get executed.

static int migration_cpu_stop(void *data)
{
...
    bool complete = false;
...

    if (task_rq(p) == rq) {
        if (is_migration_disabled(p))
              goto out; => is_migration_disabled(p) = true,
                           so complete = false.
            ...
        }
...

out:
...
    if (complete) => complete = false,
                     so complete_all() never get executed.
        complete_all(&pending->done);

        return 0;
}


Review the code, we found that there are many places can change
is_migration_disabled() value.
(such as: __rt_spin_lock(), rt_read_lock(), rt_write_lock(), ...)

Do you have any suggestion for this issue?
Thank you.




Best regards,
Jing-Ting Wu



_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BUG: HANG_DETECT waiting for migration_cpu_stop() complete
  2022-09-05  2:47 BUG: HANG_DETECT waiting for migration_cpu_stop() complete Jing-Ting Wu
@ 2022-09-05  6:44 ` Mukesh Ojha
  2022-09-05  8:22   ` Jing-Ting Wu
  0 siblings, 1 reply; 13+ messages in thread
From: Mukesh Ojha @ 2022-09-05  6:44 UTC (permalink / raw)
  To: Jing-Ting Wu, Peter Zijlstra, Valentin Schneider, Tejun Heo
  Cc: wsd_upstream, linux-kernel, linux-arm-kernel, linux-mediatek,
	Jonathan.JMChen, chris.redpath, Dietmar Eggemann,
	Vincent Donnefort, Ingo Molnar, Juri Lelli, Vincent Guittot,
	Steven Rostedt, Ben Segall, Mel Gorman, Christian Brauner,
	cgroups, lixiong.liu, wenju.xu


This is fixed by this.

https://lore.kernel.org/lkml/YvrWaml3F+x9Dk+T@slm.duckdns.org/

-Mukesh

On 9/5/2022 8:17 AM, Jing-Ting Wu wrote:
> Hi,
> 
> We meet the HANG_DETECT happened in T SW version with kernel-5.15.
> Many tasks have been blocked for a long time.
> 
> 
> Root cause:
> migration_cpu_stop() is not complete due to is_migration_disabled(p) is
> true, complete is false and complete_all() never get executed.
> It let other task wait the rwsem.
> 
> Detail:
> system_server waiting for cgroup_threadgroup_rwsem.
> OomAdjuster is holding the cgroup_threadgroup_rwsem and waiting for
> cpuset_rwsem.
> cpuset_hotplug_workfn is holding the cpuset_rwsem and waiting for
> affine_move_task() complete.
> affine_move_task() waiting for migration_cpu_stop() complete.
> 
> The backtrace of system_server:
> __switch_to
> __schedule
> schedule
> percpu_rwsem_wait
> __percpu_down_read
> cgroup_css_set_fork => wait for cgroup_threadgroup_rwsem
> cgroup_can_fork
> copy_process
> kernel_clone
> 
> The backtrace of OomAdjuster:
> __switch_to
> __schedule
> schedule
> percpu_rwsem_wait
> percpu_down_write
> cpuset_can_attach => wait for cpuset_rwsem
> cgroup_migrate_execute
> cgroup_attach_task
> __cgroup1_procs_write => hold cgroup_threadgroup_rwsem
> cgroup1_procs_write
> cgroup_file_write
> kernfs_fop_write_iter
> vfs_write
> ksys_write
> 
> The backtrace of cpuset_hotplug_workfn:
> __switch_to
> __schedule
> schedule
> schedule_timeout
> wait_for_common
> affine_move_task => wait for complete
> __set_cpus_allowed_ptr_locked
> update_tasks_cpumask
> cpuset_hotplug_update_tasks => hold cpuset_rwsem
> cpuset_hotplug_workfn
> process_one_work
> worker_thread
> kthread
> 
> 
> In affine_move_task() will call migration_cpu_stop() and wait for it
> complete.
> In normal case, if migration_cpu_stop() complete it will inform
> everyone that he is done.
> But there is an exception case that will not notify.
> If is_migration_disabled(p) is true and complete will always is false,
> then complete_all() never get executed.
> 
> static int migration_cpu_stop(void *data)
> {
> ...
>      bool complete = false;
> ...
> 
>      if (task_rq(p) == rq) {
>          if (is_migration_disabled(p))
>                goto out; => is_migration_disabled(p) = true,
>                             so complete = false.
>              ...
>          }
> ...
> 
> out:
> ...
>      if (complete) => complete = false,
>                       so complete_all() never get executed.
>          complete_all(&pending->done);
> 
>          return 0;
> }
> 
> 
> Review the code, we found that there are many places can change
> is_migration_disabled() value.
> (such as: __rt_spin_lock(), rt_read_lock(), rt_write_lock(), ...)
> 
> Do you have any suggestion for this issue?
> Thank you.
> 
> 
> 
> 
> Best regards,
> Jing-Ting Wu
> 
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BUG: HANG_DETECT waiting for migration_cpu_stop() complete
  2022-09-05  6:44 ` Mukesh Ojha
@ 2022-09-05  8:22   ` Jing-Ting Wu
  2022-09-06 18:30     ` Tejun Heo
  0 siblings, 1 reply; 13+ messages in thread
From: Jing-Ting Wu @ 2022-09-05  8:22 UTC (permalink / raw)
  To: Mukesh Ojha, Peter Zijlstra, Valentin Schneider, Tejun Heo
  Cc: wsd_upstream, linux-kernel, linux-arm-kernel, linux-mediatek,
	Jonathan.JMChen, chris.redpath, Dietmar Eggemann,
	Vincent Donnefort, Ingo Molnar, Juri Lelli, Vincent Guittot,
	Steven Rostedt, Ben Segall, Mel Gorman, Christian Brauner,
	cgroups, lixiong.liu, wenju.xu

Hi, Mukesh



https://lore.kernel.org/lkml/YvrWaml3F+x9Dk+T@slm.duckdns.org/ is for
fix cgroup_threadgroup_rwsem <-> cpus_read_lock() deadlock.
But this issue is cgroup_threadgroup_rwsem <-> cpuset_rwsem deadlock.

I think they are not same issue.
Do the patch is useful for this issue?



Best regards,
Jing-Ting Wu


On Mon, 2022-09-05 at 12:14 +0530, Mukesh Ojha wrote:
> This is fixed by this.
> 
> https://lore.kernel.org/lkml/YvrWaml3F+x9Dk+T@slm.duckdns.org/
> 
> -Mukesh
> 
> On 9/5/2022 8:17 AM, Jing-Ting Wu wrote:
> > Hi,
> > 
> > We meet the HANG_DETECT happened in T SW version with kernel-5.15.
> > Many tasks have been blocked for a long time.
> > 
> > 
> > Root cause:
> > migration_cpu_stop() is not complete due to
> > is_migration_disabled(p) is
> > true, complete is false and complete_all() never get executed.
> > It let other task wait the rwsem.
> > 
> > Detail:
> > system_server waiting for cgroup_threadgroup_rwsem.
> > OomAdjuster is holding the cgroup_threadgroup_rwsem and waiting for
> > cpuset_rwsem.
> > cpuset_hotplug_workfn is holding the cpuset_rwsem and waiting for
> > affine_move_task() complete.
> > affine_move_task() waiting for migration_cpu_stop() complete.
> > 
> > The backtrace of system_server:
> > __switch_to
> > __schedule
> > schedule
> > percpu_rwsem_wait
> > __percpu_down_read
> > cgroup_css_set_fork => wait for cgroup_threadgroup_rwsem
> > cgroup_can_fork
> > copy_process
> > kernel_clone
> > 
> > The backtrace of OomAdjuster:
> > __switch_to
> > __schedule
> > schedule
> > percpu_rwsem_wait
> > percpu_down_write
> > cpuset_can_attach => wait for cpuset_rwsem
> > cgroup_migrate_execute
> > cgroup_attach_task
> > __cgroup1_procs_write => hold cgroup_threadgroup_rwsem
> > cgroup1_procs_write
> > cgroup_file_write
> > kernfs_fop_write_iter
> > vfs_write
> > ksys_write
> > 
> > The backtrace of cpuset_hotplug_workfn:
> > __switch_to
> > __schedule
> > schedule
> > schedule_timeout
> > wait_for_common
> > affine_move_task => wait for complete
> > __set_cpus_allowed_ptr_locked
> > update_tasks_cpumask
> > cpuset_hotplug_update_tasks => hold cpuset_rwsem
> > cpuset_hotplug_workfn
> > process_one_work
> > worker_thread
> > kthread
> > 
> > 
> > In affine_move_task() will call migration_cpu_stop() and wait for
> > it
> > complete.
> > In normal case, if migration_cpu_stop() complete it will inform
> > everyone that he is done.
> > But there is an exception case that will not notify.
> > If is_migration_disabled(p) is true and complete will always is
> > false,
> > then complete_all() never get executed.
> > 
> > static int migration_cpu_stop(void *data)
> > {
> > ...
> >      bool complete = false;
> > ...
> > 
> >      if (task_rq(p) == rq) {
> >          if (is_migration_disabled(p))
> >                goto out; => is_migration_disabled(p) = true,
> >                             so complete = false.
> >              ...
> >          }
> > ...
> > 
> > out:
> > ...
> >      if (complete) => complete = false,
> >                       so complete_all() never get executed.
> >          complete_all(&pending->done);
> > 
> >          return 0;
> > }
> > 
> > 
> > Review the code, we found that there are many places can change
> > is_migration_disabled() value.
> > (such as: __rt_spin_lock(), rt_read_lock(), rt_write_lock(), ...)
> > 
> > Do you have any suggestion for this issue?
> > Thank you.
> > 
> > 
> > 
> > 
> > Best regards,
> > Jing-Ting Wu
> > 
> > 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BUG: HANG_DETECT waiting for migration_cpu_stop() complete
  2022-09-05  8:22   ` Jing-Ting Wu
@ 2022-09-06 18:30     ` Tejun Heo
  2022-09-06 20:01       ` Waiman Long
  0 siblings, 1 reply; 13+ messages in thread
From: Tejun Heo @ 2022-09-06 18:30 UTC (permalink / raw)
  To: Jing-Ting Wu
  Cc: Mukesh Ojha, Peter Zijlstra, Valentin Schneider, wsd_upstream,
	linux-kernel, linux-arm-kernel, linux-mediatek, Jonathan.JMChen,
	chris.redpath, Dietmar Eggemann, Vincent Donnefort, Ingo Molnar,
	Juri Lelli, Vincent Guittot, Steven Rostedt, Ben Segall,
	Mel Gorman, Christian Brauner, cgroups, lixiong.liu, wenju.xu,
	Waiman Long

Hello,

(cc'ing Waiman in case he has a better idea)

On Mon, Sep 05, 2022 at 04:22:29PM +0800, Jing-Ting Wu wrote:
> https://lore.kernel.org/lkml/YvrWaml3F+x9Dk+T@slm.duckdns.org/ is for
> fix cgroup_threadgroup_rwsem <-> cpus_read_lock() deadlock.
> But this issue is cgroup_threadgroup_rwsem <-> cpuset_rwsem deadlock.

If I'm understanding what you're writing correctly, this isn't a deadlock.
The cpuset_hotplug_workfn simply isn't being woken up while holding
cpuset_rwsem and others are just waiting for that lock to be released.

Thanks.

-- 
tejun

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BUG: HANG_DETECT waiting for migration_cpu_stop() complete
  2022-09-06 18:30     ` Tejun Heo
@ 2022-09-06 20:01       ` Waiman Long
  2022-09-06 20:40         ` Waiman Long
  0 siblings, 1 reply; 13+ messages in thread
From: Waiman Long @ 2022-09-06 20:01 UTC (permalink / raw)
  To: Tejun Heo, Jing-Ting Wu
  Cc: Mukesh Ojha, Peter Zijlstra, Valentin Schneider, wsd_upstream,
	linux-kernel, linux-arm-kernel, linux-mediatek, Jonathan.JMChen,
	chris.redpath, Dietmar Eggemann, Vincent Donnefort, Ingo Molnar,
	Juri Lelli, Vincent Guittot, Steven Rostedt, Ben Segall,
	Mel Gorman, Christian Brauner, cgroups, lixiong.liu, wenju.xu

On 9/6/22 14:30, Tejun Heo wrote:
> Hello,
>
> (cc'ing Waiman in case he has a better idea)
>
> On Mon, Sep 05, 2022 at 04:22:29PM +0800, Jing-Ting Wu wrote:
>> https://lore.kernel.org/lkml/YvrWaml3F+x9Dk+T@slm.duckdns.org/ is for
>> fix cgroup_threadgroup_rwsem <-> cpus_read_lock() deadlock.
>> But this issue is cgroup_threadgroup_rwsem <-> cpuset_rwsem deadlock.
> If I'm understanding what you're writing correctly, this isn't a deadlock.
> The cpuset_hotplug_workfn simply isn't being woken up while holding
> cpuset_rwsem and others are just waiting for that lock to be released.

I believe it is probably a bug in the scheduler core code. 
__set_cpus_allowed_ptr_locked() calls affine_move_task() to move to a 
random cpu within the new set allowable CPUs. However, if migration is 
disabled, it shouldn't call affine_move_task() at all. Instead, I would 
suggest that if the current cpu is within the new allowable cpus, it 
should just skip doing affine_move_task(). Otherwise, it should fail 
__set_cpus_allowed_ptr_locked().

My 2 cents.

Cheers,
Longman


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BUG: HANG_DETECT waiting for migration_cpu_stop() complete
  2022-09-06 20:01       ` Waiman Long
@ 2022-09-06 20:40         ` Waiman Long
  2022-09-06 20:50           ` Peter Zijlstra
  0 siblings, 1 reply; 13+ messages in thread
From: Waiman Long @ 2022-09-06 20:40 UTC (permalink / raw)
  To: Tejun Heo, Jing-Ting Wu
  Cc: Mukesh Ojha, Peter Zijlstra, Valentin Schneider, wsd_upstream,
	linux-kernel, linux-arm-kernel, linux-mediatek, Jonathan.JMChen,
	chris.redpath, Dietmar Eggemann, Vincent Donnefort, Ingo Molnar,
	Juri Lelli, Vincent Guittot, Steven Rostedt, Ben Segall,
	Mel Gorman, Christian Brauner, cgroups, lixiong.liu, wenju.xu


On 9/6/22 16:01, Waiman Long wrote:
> On 9/6/22 14:30, Tejun Heo wrote:
>> Hello,
>>
>> (cc'ing Waiman in case he has a better idea)
>>
>> On Mon, Sep 05, 2022 at 04:22:29PM +0800, Jing-Ting Wu wrote:
>>> https://lore.kernel.org/lkml/YvrWaml3F+x9Dk+T@slm.duckdns.org/ is for
>>> fix cgroup_threadgroup_rwsem <-> cpus_read_lock() deadlock.
>>> But this issue is cgroup_threadgroup_rwsem <-> cpuset_rwsem deadlock.
>> If I'm understanding what you're writing correctly, this isn't a 
>> deadlock.
>> The cpuset_hotplug_workfn simply isn't being woken up while holding
>> cpuset_rwsem and others are just waiting for that lock to be released.
>
> I believe it is probably a bug in the scheduler core code. 
> __set_cpus_allowed_ptr_locked() calls affine_move_task() to move to a 
> random cpu within the new set allowable CPUs. However, if migration is 
> disabled, it shouldn't call affine_move_task() at all. Instead, I 
> would suggest that if the current cpu is within the new allowable 
> cpus, it should just skip doing affine_move_task(). Otherwise, it 
> should fail __set_cpus_allowed_ptr_locked().

Maybe like

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 838623b68031..5d9ea1553ec0 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2794,9 +2794,9 @@ static int __set_cpus_allowed_ptr_locked(struct 
task_struct *p,
                 if (cpumask_equal(&p->cpus_mask, new_mask))
                         goto out;

-               if (WARN_ON_ONCE(p == current &&
-                                is_migration_disabled(p) &&
-                                !cpumask_test_cpu(task_cpu(p), 
new_mask))) {
+               if (is_migration_disabled(p) &&
+                   !cpumask_test_cpu(task_cpu(p), new_mask)) {
+                       WARN_ON_ONCE(p == current);
                         ret = -EBUSY;
                         goto out;
                 }
@@ -2818,7 +2818,11 @@ static int __set_cpus_allowed_ptr_locked(struct 
task_struct *p,
         if (flags & SCA_USER)
                 user_mask = clear_user_cpus_ptr(p);

-       ret = affine_move_task(rq, p, rf, dest_cpu, flags);
+       if (!is_migration_disabled(p) || (flags & SCA_MIGRATE_ENABLE)) {
+               ret = affine_move_task(rq, p, rf, dest_cpu, flags);
+       } else {
+               task_rq_unlock(rq, p, rf);
+       }

         kfree(user_mask);

I haven't tested it myself, though.

Cheers,
Longman


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: BUG: HANG_DETECT waiting for migration_cpu_stop() complete
  2022-09-06 20:40         ` Waiman Long
@ 2022-09-06 20:50           ` Peter Zijlstra
  2022-09-06 21:02             ` Waiman Long
  2022-09-23 14:20             ` Mukesh Ojha
  0 siblings, 2 replies; 13+ messages in thread
From: Peter Zijlstra @ 2022-09-06 20:50 UTC (permalink / raw)
  To: Waiman Long
  Cc: Tejun Heo, Jing-Ting Wu, Mukesh Ojha, Valentin Schneider,
	wsd_upstream, linux-kernel, linux-arm-kernel, linux-mediatek,
	Jonathan.JMChen, chris.redpath, Dietmar Eggemann,
	Vincent Donnefort, Ingo Molnar, Juri Lelli, Vincent Guittot,
	Steven Rostedt, Ben Segall, Mel Gorman, Christian Brauner,
	cgroups, lixiong.liu, wenju.xu

On Tue, Sep 06, 2022 at 04:40:03PM -0400, Waiman Long wrote:

I've not followed the earlier stuff due to being unreadable; just
reacting to this..

> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 838623b68031..5d9ea1553ec0 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2794,9 +2794,9 @@ static int __set_cpus_allowed_ptr_locked(struct
> task_struct *p,
>                 if (cpumask_equal(&p->cpus_mask, new_mask))
>                         goto out;
> 
> -               if (WARN_ON_ONCE(p == current &&
> -                                is_migration_disabled(p) &&
> -                                !cpumask_test_cpu(task_cpu(p), new_mask)))
> {
> +               if (is_migration_disabled(p) &&
> +                   !cpumask_test_cpu(task_cpu(p), new_mask)) {
> +                       WARN_ON_ONCE(p == current);
>                         ret = -EBUSY;
>                         goto out;
>                 }
> @@ -2818,7 +2818,11 @@ static int __set_cpus_allowed_ptr_locked(struct
> task_struct *p,
>         if (flags & SCA_USER)
>                 user_mask = clear_user_cpus_ptr(p);
> 
> -       ret = affine_move_task(rq, p, rf, dest_cpu, flags);
> +       if (!is_migration_disabled(p) || (flags & SCA_MIGRATE_ENABLE)) {
> +               ret = affine_move_task(rq, p, rf, dest_cpu, flags);
> +       } else {
> +               task_rq_unlock(rq, p, rf);
> +       }

This cannot be right. There might be previous set_cpus_allowed_ptr()
callers that are blocked and waiting for the task to land on a valid
CPU.


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BUG: HANG_DETECT waiting for migration_cpu_stop() complete
  2022-09-06 20:50           ` Peter Zijlstra
@ 2022-09-06 21:02             ` Waiman Long
  2022-09-23 14:20             ` Mukesh Ojha
  1 sibling, 0 replies; 13+ messages in thread
From: Waiman Long @ 2022-09-06 21:02 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Tejun Heo, Jing-Ting Wu, Mukesh Ojha, Valentin Schneider,
	wsd_upstream, linux-kernel, linux-arm-kernel, linux-mediatek,
	Jonathan.JMChen, chris.redpath, Dietmar Eggemann,
	Vincent Donnefort, Ingo Molnar, Juri Lelli, Vincent Guittot,
	Steven Rostedt, Ben Segall, Mel Gorman, Christian Brauner,
	cgroups, lixiong.liu, wenju.xu


On 9/6/22 16:50, Peter Zijlstra wrote:
> On Tue, Sep 06, 2022 at 04:40:03PM -0400, Waiman Long wrote:
>
> I've not followed the earlier stuff due to being unreadable; just
> reacting to this..
>
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index 838623b68031..5d9ea1553ec0 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -2794,9 +2794,9 @@ static int __set_cpus_allowed_ptr_locked(struct
>> task_struct *p,
>>                  if (cpumask_equal(&p->cpus_mask, new_mask))
>>                          goto out;
>>
>> -               if (WARN_ON_ONCE(p == current &&
>> -                                is_migration_disabled(p) &&
>> -                                !cpumask_test_cpu(task_cpu(p), new_mask)))
>> {
>> +               if (is_migration_disabled(p) &&
>> +                   !cpumask_test_cpu(task_cpu(p), new_mask)) {
>> +                       WARN_ON_ONCE(p == current);
>>                          ret = -EBUSY;
>>                          goto out;
>>                  }
>> @@ -2818,7 +2818,11 @@ static int __set_cpus_allowed_ptr_locked(struct
>> task_struct *p,
>>          if (flags & SCA_USER)
>>                  user_mask = clear_user_cpus_ptr(p);
>>
>> -       ret = affine_move_task(rq, p, rf, dest_cpu, flags);
>> +       if (!is_migration_disabled(p) || (flags & SCA_MIGRATE_ENABLE)) {
>> +               ret = affine_move_task(rq, p, rf, dest_cpu, flags);
>> +       } else {
>> +               task_rq_unlock(rq, p, rf);
>> +       }
> This cannot be right. There might be previous set_cpus_allowed_ptr()
> callers that are blocked and waiting for the task to land on a valid
> CPU.

You are probably right. I haven't fully understand all the migration 
disable code yet. However, if migration is disabled, there are some 
corner cases we need to handle properly.

Cheers,
Longman


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BUG: HANG_DETECT waiting for migration_cpu_stop() complete
  2022-09-06 20:50           ` Peter Zijlstra
  2022-09-06 21:02             ` Waiman Long
@ 2022-09-23 14:20             ` Mukesh Ojha
  2022-09-29 15:13               ` Mukesh Ojha
  1 sibling, 1 reply; 13+ messages in thread
From: Mukesh Ojha @ 2022-09-23 14:20 UTC (permalink / raw)
  To: Peter Zijlstra, Waiman Long
  Cc: Tejun Heo, Jing-Ting Wu, Valentin Schneider, wsd_upstream,
	linux-kernel, linux-arm-kernel, linux-mediatek, Jonathan.JMChen,
	chris.redpath, Dietmar Eggemann, Vincent Donnefort, Ingo Molnar,
	Juri Lelli, Vincent Guittot, Steven Rostedt, Ben Segall,
	Mel Gorman, Christian Brauner, cgroups, lixiong.liu, wenju.xu

Hi Peter,


On 9/7/2022 2:20 AM, Peter Zijlstra wrote:
> On Tue, Sep 06, 2022 at 04:40:03PM -0400, Waiman Long wrote:
> 
> I've not followed the earlier stuff due to being unreadable; just
> reacting to this..

We are able to reproduce this issue explained at this link

https://lore.kernel.org/lkml/88b2910181bda955ac46011b695c53f7da39ac47.camel@mediatek.com/


> 
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index 838623b68031..5d9ea1553ec0 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -2794,9 +2794,9 @@ static int __set_cpus_allowed_ptr_locked(struct
>> task_struct *p,
>>                  if (cpumask_equal(&p->cpus_mask, new_mask))
>>                          goto out;
>>
>> -               if (WARN_ON_ONCE(p == current &&
>> -                                is_migration_disabled(p) &&
>> -                                !cpumask_test_cpu(task_cpu(p), new_mask)))
>> {
>> +               if (is_migration_disabled(p) &&
>> +                   !cpumask_test_cpu(task_cpu(p), new_mask)) {
>> +                       WARN_ON_ONCE(p == current);
>>                          ret = -EBUSY;
>>                          goto out;
>>                  }
>> @@ -2818,7 +2818,11 @@ static int __set_cpus_allowed_ptr_locked(struct
>> task_struct *p,
>>          if (flags & SCA_USER)
>>                  user_mask = clear_user_cpus_ptr(p);
>>
>> -       ret = affine_move_task(rq, p, rf, dest_cpu, flags);
>> +       if (!is_migration_disabled(p) || (flags & SCA_MIGRATE_ENABLE)) {
>> +               ret = affine_move_task(rq, p, rf, dest_cpu, flags);
>> +       } else {
>> +               task_rq_unlock(rq, p, rf);
>> +       }
> 
> This cannot be right. There might be previous set_cpus_allowed_ptr()
> callers that are blocked and waiting for the task to land on a valid
> CPU.
> 

Was thinking if just skipping as below will help here, well i am not sure .

But thinking what if we keep the task as it is on the same cpu and let's 
wait for migration to be enabled for the task to take care of it later.

------------------->O------------------------------------------

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index d90d37c..7717733 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2390,8 +2390,10 @@ static int migration_cpu_stop(void *data)
          * we're holding p->pi_lock.
          */
         if (task_rq(p) == rq) {
-               if (is_migration_disabled(p))
+               if (is_migration_disabled(p)) {
+                       complete = true;
                         goto out;
+               }

                 if (pending) {


-Mukesh

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: BUG: HANG_DETECT waiting for migration_cpu_stop() complete
  2022-09-23 14:20             ` Mukesh Ojha
@ 2022-09-29 15:13               ` Mukesh Ojha
  0 siblings, 0 replies; 13+ messages in thread
From: Mukesh Ojha @ 2022-09-29 15:13 UTC (permalink / raw)
  To: Peter Zijlstra, Vincent Guittot, Ben Segall, Mel Gorman, Steven Rostedt
  Cc: Tejun Heo, Jing-Ting Wu, Valentin Schneider, wsd_upstream,
	linux-kernel, linux-arm-kernel, linux-mediatek, Jonathan.JMChen,
	chris.redpath, Dietmar Eggemann, Vincent Donnefort, Ingo Molnar,
	Juri Lelli, Christian Brauner, cgroups, lixiong.liu, wenju.xu

Hi All,

On 9/23/2022 7:50 PM, Mukesh Ojha wrote:
> Hi Peter,
> 
> 
> On 9/7/2022 2:20 AM, Peter Zijlstra wrote:
>> On Tue, Sep 06, 2022 at 04:40:03PM -0400, Waiman Long wrote:
>>
>> I've not followed the earlier stuff due to being unreadable; just
>> reacting to this..
> 
> We are able to reproduce this issue explained at this link
> 
> https://lore.kernel.org/lkml/88b2910181bda955ac46011b695c53f7da39ac47.camel@mediatek.com/ 
> 
> 
> 
>>
>>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>>> index 838623b68031..5d9ea1553ec0 100644
>>> --- a/kernel/sched/core.c
>>> +++ b/kernel/sched/core.c
>>> @@ -2794,9 +2794,9 @@ static int __set_cpus_allowed_ptr_locked(struct
>>> task_struct *p,
>>>                  if (cpumask_equal(&p->cpus_mask, new_mask))
>>>                          goto out;
>>>
>>> -               if (WARN_ON_ONCE(p == current &&
>>> -                                is_migration_disabled(p) &&
>>> -                                !cpumask_test_cpu(task_cpu(p), 
>>> new_mask)))
>>> {
>>> +               if (is_migration_disabled(p) &&
>>> +                   !cpumask_test_cpu(task_cpu(p), new_mask)) {
>>> +                       WARN_ON_ONCE(p == current);
>>>                          ret = -EBUSY;
>>>                          goto out;
>>>                  }
>>> @@ -2818,7 +2818,11 @@ static int __set_cpus_allowed_ptr_locked(struct
>>> task_struct *p,
>>>          if (flags & SCA_USER)
>>>                  user_mask = clear_user_cpus_ptr(p);
>>>
>>> -       ret = affine_move_task(rq, p, rf, dest_cpu, flags);
>>> +       if (!is_migration_disabled(p) || (flags & SCA_MIGRATE_ENABLE)) {
>>> +               ret = affine_move_task(rq, p, rf, dest_cpu, flags);
>>> +       } else {
>>> +               task_rq_unlock(rq, p, rf);
>>> +       }
>>
>> This cannot be right. There might be previous set_cpus_allowed_ptr()
>> callers that are blocked and waiting for the task to land on a valid
>> CPU.
>>
> 
> Was thinking if just skipping as below will help here, well i am not sure .
> 
> But thinking what if we keep the task as it is on the same cpu and let's 
> wait for migration to be enabled for the task to take care of it later.
> 
> ------------------->O------------------------------------------
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index d90d37c..7717733 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2390,8 +2390,10 @@ static int migration_cpu_stop(void *data)
>           * we're holding p->pi_lock.
>           */
>          if (task_rq(p) == rq) {
> -               if (is_migration_disabled(p))
> +               if (is_migration_disabled(p)) {
> +                       complete = true;
>                          goto out;
> +               }
> 
>                  if (pending) {
> 

Any suggestion on this bug ?


-Mukesh

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BUG: HANG_DETECT waiting for migration_cpu_stop() complete
  2023-03-22  9:37 Ryan Xiao (肖水林)
@ 2023-03-27  4:05 ` Ryan Xiao (肖水林)
  0 siblings, 0 replies; 13+ messages in thread
From: Ryan Xiao (肖水林) @ 2023-03-27  4:05 UTC (permalink / raw)
  To: tj, peterz, vschneid
  Cc: linux-arm-kernel, wsd_upstream, linux-kernel, linux-mediatek,
	Yongjun Luo (罗勇军)

On Wed, 2023-03-22 at 17:37 +0800, ryan xiao wrote:
Hi Sir:
    Do you have any suggestion for this issue?
Thank you
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* BUG: HANG_DETECT waiting for migration_cpu_stop() complete
@ 2023-03-22  9:37 Ryan Xiao (肖水林)
  2023-03-27  4:05 ` Ryan Xiao (肖水林)
  0 siblings, 1 reply; 13+ messages in thread
From: Ryan Xiao (肖水林) @ 2023-03-22  9:37 UTC (permalink / raw)
  To: tj, peterz, vschneid
  Cc: linux-arm-kernel, wsd_upstream, linux-kernel, linux-mediatek,
	Yongjun Luo (罗勇军)

Hi,

We meet the HANG_DETECT happened in T SW version with kernel-5.15.
Many tasks have been blocked for a long time.


Root cause:
migration_cpu_stop() is not complete due to is_migration_disabled(p) is
true, complete is false and complete_all() never get executed.
It let other task wait the rwsem.

Detail:
system_server waiting for cgroup_threadgroup_rwsem.
OomAdjuster is holding the cgroup_threadgroup_rwsem and waiting for
cpuset_rwsem.
cpuset_hotplug_workfn is holding the cpuset_rwsem and waiting for
affine_move_task() complete.
affine_move_task() waiting for migration_cpu_stop() complete.

The backtrace of system_server:
__switch_to
__schedule
schedule
percpu_rwsem_wait
__percpu_down_read
cgroup_css_set_fork => wait for cgroup_threadgroup_rwsem
cgroup_can_fork
copy_process
kernel_clone

The backtrace of OomAdjuster:
__switch_to
__schedule
schedule
percpu_rwsem_wait
percpu_down_write
cpuset_can_attach => wait for cpuset_rwsem
cgroup_migrate_execute
cgroup_attach_task
__cgroup1_procs_write => hold cgroup_threadgroup_rwsem
cgroup1_procs_write
cgroup_file_write
kernfs_fop_write_iter
vfs_write
ksys_write
The backtrace of cpuset_hotplug_workfn:
__switch_to
__schedule
schedule
schedule_timeout
wait_for_common
affine_move_task => wait for complete
__set_cpus_allowed_ptr_locked
update_tasks_cpumask
cpuset_hotplug_update_tasks => hold cpuset_rwsem
cpuset_hotplug_workfn
process_one_work
worker_thread
kthread


In affine_move_task() will call migration_cpu_stop() and wait for it
complete.
In normal case, if migration_cpu_stop() complete it will inform
everyone that he is done.
But there is an exception case that will not notify.
If is_migration_disabled(p) is true and complete will always is false,
then complete_all() never get executed.

static int migration_cpu_stop(void *data)
{
...
    bool complete = false;
...

    if (task_rq(p) == rq) {
        if (is_migration_disabled(p))
              goto out; => is_migration_disabled(p) = true,
                           so complete = false.
            ...
        }
...

out:
...
    if (complete) => complete = false,
                     so complete_all() never get executed.
        complete_all(&pending->done);

        return 0;
}
Review the code, we found that there are many places can change
is_migration_disabled() value.
(such as: __rt_spin_lock(), rt_read_lock(), rt_write_lock(), ...)

Do you have any suggestion for this issue?
Thank you.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BUG: HANG_DETECT waiting for migration_cpu_stop() complete
       [not found] <20220907000741.2496-1-hdanton@sina.com>
@ 2022-09-22  5:40 ` Jing-Ting Wu
  0 siblings, 0 replies; 13+ messages in thread
From: Jing-Ting Wu @ 2022-09-22  5:40 UTC (permalink / raw)
  To: Hillf Danton
  Cc: Peter Zijlstra, linux-kernel, linux-mm, Waiman Long,
	ValentinSchneider, TejunHeo, wsd_upstream, linux-arm-kernel,
	linux-mediatek, Jonathan.JMChen, chris.redpath, Dietmar Eggemann,
	Vincent Donnefort, Ingo Molnar, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Christian Brauner, cgroups, lixiong.liu, wenju.xu

On Wed, 2022-09-07 at 08:07 +0800, Hillf Danton wrote:
> On 5 Sep 2022 10:47:36 +0800 Jing-Ting Wu <jing-ting.wu@mediatek.com>
> wrote
> > 
> > We meet the HANG_DETECT happened in T SW version with kernel-5.15.
> > Many tasks have been blocked for a long time.
> > 
> > Root cause:
> > migration_cpu_stop() is not complete due to
> > is_migration_disabled(p) is
> > true, complete is false and complete_all() never get executed.
> > It let other task wait the rwsem.
> 
> See if handing task over to stopper again in case of migration
> disabled
> could survive your tests.
> 
> Hillf
> 
> --- linux-5.15/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2322,9 +2322,7 @@ static int migration_cpu_stop(void *data
>  	 * holding rq->lock, if p->on_rq == 0 it cannot get enqueued
> because
>  	 * we're holding p->pi_lock.
>  	 */
> -	if (task_rq(p) == rq) {
> -		if (is_migration_disabled(p))
> -			goto out;
> +	if (task_rq(p) == rq && !is_migration_disabled(p)) {
>  
>  		if (pending) {
>  			p->migration_pending = NULL;

Because Peter have some concern for patch by Waiman.
We add Hillf's patch to our stability test.
But there are side effects after patched.
The warning appear once < two weeks. 

Backtrace as follows:
[name:panic&]WARNING: CPU: 6 PID: 32583 at affine_move_task
pc : affine_move_task
lr : __set_cpus_allowed_ptr_locked
Call trace:
affine_move_task
__set_cpus_allowed_ptr_locked
migrate_enable
__cgroup_bpf_run_filter_skb
ip_finish_output
ip_output


The root cause is when is_migration_disabled(p) is true,the patched
version will set p->migration_pending to NULL by migration_cpu_stop.
And in affine_move_task will raise a WARN_ON_ONCE(!pending).

Kernel-5.15/kernel/sched/core.c:
static int affine_move_task(struct rq *rq, struct task_struct *p,
struct rq_flags *rf, int dest_cpu, unsigned int flags) {
...
	If (WARN_ON_ONCE(!pending)) {
 	  Task_rq_unlock(rq,p,fr);
  	  return -EINVAL;
	}
...
}

But the tasks have not been migrated to the new affinity CPU, so there
should be pending tasks to be processed, so p->migration_pending should
not be NULL.



Without patch:
When is_migration_disabled is true, then goto out and not set p-
>migration_pending to NULL.

static int migration_cpu_stop(void *data) {
...
	If (task_rq(p) == rq) {
        	     if (is_migration_disabled(p))
                	           goto out;
	}
...
}


With patch:
When is_migration_disabled is true and pending is true, goto else if
flow. Because p->cpus_ptr not updated when migrate_disable, so this
condition is always true and p->migration_pending will set to NULL.

static int migration_cpu_stop(void *data) {
...
	If (task_rq(p) == rq && !is_migration_disabled(p) ) {
 	  ...
	} else if (pending) {
	  ...
	  If (cpumask_test_cpu(task_cpu(p), p-> cpus_ ptr)) { 
        	p->migration_pending = NULL;
      		 complete = true;
      		 goto out;
	}
...
}






Best regards,
Jing-Ting Wu



_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2023-03-27  4:26 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-05  2:47 BUG: HANG_DETECT waiting for migration_cpu_stop() complete Jing-Ting Wu
2022-09-05  6:44 ` Mukesh Ojha
2022-09-05  8:22   ` Jing-Ting Wu
2022-09-06 18:30     ` Tejun Heo
2022-09-06 20:01       ` Waiman Long
2022-09-06 20:40         ` Waiman Long
2022-09-06 20:50           ` Peter Zijlstra
2022-09-06 21:02             ` Waiman Long
2022-09-23 14:20             ` Mukesh Ojha
2022-09-29 15:13               ` Mukesh Ojha
     [not found] <20220907000741.2496-1-hdanton@sina.com>
2022-09-22  5:40 ` Jing-Ting Wu
2023-03-22  9:37 Ryan Xiao (肖水林)
2023-03-27  4:05 ` Ryan Xiao (肖水林)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).