All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mukesh Ojha <quic_mojha@quicinc.com>
To: Peter Zijlstra <peterz@infradead.org>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	Steven Rostedt <rostedt@goodmis.org>
Cc: Tejun Heo <tj@kernel.org>,
	Jing-Ting Wu <jing-ting.wu@mediatek.com>,
	Valentin Schneider <vschneid@redhat.com>,
	<wsd_upstream@mediatek.com>, <linux-kernel@vger.kernel.org>,
	<linux-arm-kernel@lists.infradead.org>,
	<linux-mediatek@lists.infradead.org>,
	<Jonathan.JMChen@mediatek.com>,
	"chris.redpath@arm.com" <chris.redpath@arm.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Vincent Donnefort <vdonnefort@gmail.com>,
	Ingo Molnar <mingo@redhat.com>,
	Juri Lelli <juri.lelli@redhat.com>,
	Christian Brauner <brauner@kernel.org>, <cgroups@vger.kernel.org>,
	<lixiong.liu@mediatek.com>, <wenju.xu@mediatek.com>
Subject: Re: BUG: HANG_DETECT waiting for migration_cpu_stop() complete
Date: Thu, 29 Sep 2022 20:43:43 +0530	[thread overview]
Message-ID: <cdb597d4-6543-3e34-cbbd-6a776b0d6581@quicinc.com> (raw)
In-Reply-To: <e6153b89-1f41-3fff-241b-a767e41a1e7e@quicinc.com>

Hi All,

On 9/23/2022 7:50 PM, Mukesh Ojha wrote:
> Hi Peter,
> 
> 
> On 9/7/2022 2:20 AM, Peter Zijlstra wrote:
>> On Tue, Sep 06, 2022 at 04:40:03PM -0400, Waiman Long wrote:
>>
>> I've not followed the earlier stuff due to being unreadable; just
>> reacting to this..
> 
> We are able to reproduce this issue explained at this link
> 
> https://lore.kernel.org/lkml/88b2910181bda955ac46011b695c53f7da39ac47.camel@mediatek.com/ 
> 
> 
> 
>>
>>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>>> index 838623b68031..5d9ea1553ec0 100644
>>> --- a/kernel/sched/core.c
>>> +++ b/kernel/sched/core.c
>>> @@ -2794,9 +2794,9 @@ static int __set_cpus_allowed_ptr_locked(struct
>>> task_struct *p,
>>>                  if (cpumask_equal(&p->cpus_mask, new_mask))
>>>                          goto out;
>>>
>>> -               if (WARN_ON_ONCE(p == current &&
>>> -                                is_migration_disabled(p) &&
>>> -                                !cpumask_test_cpu(task_cpu(p), 
>>> new_mask)))
>>> {
>>> +               if (is_migration_disabled(p) &&
>>> +                   !cpumask_test_cpu(task_cpu(p), new_mask)) {
>>> +                       WARN_ON_ONCE(p == current);
>>>                          ret = -EBUSY;
>>>                          goto out;
>>>                  }
>>> @@ -2818,7 +2818,11 @@ static int __set_cpus_allowed_ptr_locked(struct
>>> task_struct *p,
>>>          if (flags & SCA_USER)
>>>                  user_mask = clear_user_cpus_ptr(p);
>>>
>>> -       ret = affine_move_task(rq, p, rf, dest_cpu, flags);
>>> +       if (!is_migration_disabled(p) || (flags & SCA_MIGRATE_ENABLE)) {
>>> +               ret = affine_move_task(rq, p, rf, dest_cpu, flags);
>>> +       } else {
>>> +               task_rq_unlock(rq, p, rf);
>>> +       }
>>
>> This cannot be right. There might be previous set_cpus_allowed_ptr()
>> callers that are blocked and waiting for the task to land on a valid
>> CPU.
>>
> 
> Was thinking if just skipping as below will help here, well i am not sure .
> 
> But thinking what if we keep the task as it is on the same cpu and let's 
> wait for migration to be enabled for the task to take care of it later.
> 
> ------------------->O------------------------------------------
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index d90d37c..7717733 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2390,8 +2390,10 @@ static int migration_cpu_stop(void *data)
>           * we're holding p->pi_lock.
>           */
>          if (task_rq(p) == rq) {
> -               if (is_migration_disabled(p))
> +               if (is_migration_disabled(p)) {
> +                       complete = true;
>                          goto out;
> +               }
> 
>                  if (pending) {
> 

Any suggestion on this bug ?


-Mukesh

WARNING: multiple messages have this Message-ID (diff)
From: Mukesh Ojha <quic_mojha@quicinc.com>
To: Peter Zijlstra <peterz@infradead.org>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	Steven Rostedt <rostedt@goodmis.org>
Cc: Tejun Heo <tj@kernel.org>,
	Jing-Ting Wu <jing-ting.wu@mediatek.com>,
	Valentin Schneider <vschneid@redhat.com>,
	<wsd_upstream@mediatek.com>, <linux-kernel@vger.kernel.org>,
	<linux-arm-kernel@lists.infradead.org>,
	<linux-mediatek@lists.infradead.org>,
	<Jonathan.JMChen@mediatek.com>,
	"chris.redpath@arm.com" <chris.redpath@arm.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Vincent Donnefort <vdonnefort@gmail.com>,
	Ingo Molnar <mingo@redhat.com>,
	Juri Lelli <juri.lelli@redhat.com>,
	Christian Brauner <brauner@kernel.org>, <cgroups@vger.kernel.org>,
	<lixiong.liu@mediatek.com>, <wenju.xu@mediatek.com>
Subject: Re: BUG: HANG_DETECT waiting for migration_cpu_stop() complete
Date: Thu, 29 Sep 2022 20:43:43 +0530	[thread overview]
Message-ID: <cdb597d4-6543-3e34-cbbd-6a776b0d6581@quicinc.com> (raw)
In-Reply-To: <e6153b89-1f41-3fff-241b-a767e41a1e7e@quicinc.com>

Hi All,

On 9/23/2022 7:50 PM, Mukesh Ojha wrote:
> Hi Peter,
> 
> 
> On 9/7/2022 2:20 AM, Peter Zijlstra wrote:
>> On Tue, Sep 06, 2022 at 04:40:03PM -0400, Waiman Long wrote:
>>
>> I've not followed the earlier stuff due to being unreadable; just
>> reacting to this..
> 
> We are able to reproduce this issue explained at this link
> 
> https://lore.kernel.org/lkml/88b2910181bda955ac46011b695c53f7da39ac47.camel@mediatek.com/ 
> 
> 
> 
>>
>>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>>> index 838623b68031..5d9ea1553ec0 100644
>>> --- a/kernel/sched/core.c
>>> +++ b/kernel/sched/core.c
>>> @@ -2794,9 +2794,9 @@ static int __set_cpus_allowed_ptr_locked(struct
>>> task_struct *p,
>>>                  if (cpumask_equal(&p->cpus_mask, new_mask))
>>>                          goto out;
>>>
>>> -               if (WARN_ON_ONCE(p == current &&
>>> -                                is_migration_disabled(p) &&
>>> -                                !cpumask_test_cpu(task_cpu(p), 
>>> new_mask)))
>>> {
>>> +               if (is_migration_disabled(p) &&
>>> +                   !cpumask_test_cpu(task_cpu(p), new_mask)) {
>>> +                       WARN_ON_ONCE(p == current);
>>>                          ret = -EBUSY;
>>>                          goto out;
>>>                  }
>>> @@ -2818,7 +2818,11 @@ static int __set_cpus_allowed_ptr_locked(struct
>>> task_struct *p,
>>>          if (flags & SCA_USER)
>>>                  user_mask = clear_user_cpus_ptr(p);
>>>
>>> -       ret = affine_move_task(rq, p, rf, dest_cpu, flags);
>>> +       if (!is_migration_disabled(p) || (flags & SCA_MIGRATE_ENABLE)) {
>>> +               ret = affine_move_task(rq, p, rf, dest_cpu, flags);
>>> +       } else {
>>> +               task_rq_unlock(rq, p, rf);
>>> +       }
>>
>> This cannot be right. There might be previous set_cpus_allowed_ptr()
>> callers that are blocked and waiting for the task to land on a valid
>> CPU.
>>
> 
> Was thinking if just skipping as below will help here, well i am not sure .
> 
> But thinking what if we keep the task as it is on the same cpu and let's 
> wait for migration to be enabled for the task to take care of it later.
> 
> ------------------->O------------------------------------------
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index d90d37c..7717733 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2390,8 +2390,10 @@ static int migration_cpu_stop(void *data)
>           * we're holding p->pi_lock.
>           */
>          if (task_rq(p) == rq) {
> -               if (is_migration_disabled(p))
> +               if (is_migration_disabled(p)) {
> +                       complete = true;
>                          goto out;
> +               }
> 
>                  if (pending) {
> 

Any suggestion on this bug ?


-Mukesh

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

WARNING: multiple messages have this Message-ID (diff)
From: Mukesh Ojha <quic_mojha-jfJNa2p1gH1BDgjK7y7TUQ@public.gmane.org>
To: Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>,
	Vincent Guittot
	<vincent.guittot-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>,
	Ben Segall <bsegall-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
	Mel Gorman <mgorman-l3A5Bk7waGM@public.gmane.org>,
	Steven Rostedt <rostedt-nx8X9YLhiw1AfugRpC6u6w@public.gmane.org>
Cc: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	Jing-Ting Wu
	<jing-ting.wu-NuS5LvNUpcJWk0Htik3J/w@public.gmane.org>,
	Valentin Schneider
	<vschneid-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	wsd_upstream-NuS5LvNUpcJWk0Htik3J/w@public.gmane.org,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org,
	linux-mediatek-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org,
	Jonathan.JMChen-NuS5LvNUpcJWk0Htik3J/w@public.gmane.org,
	"chris.redpath-5wv7dgnIgG8@public.gmane.org"
	<chris.redpath-5wv7dgnIgG8@public.gmane.org>,
	Dietmar Eggemann <dietmar.eggemann-5wv7dgnIgG8@public.gmane.org>,
	Vincent Donnefort
	<vdonnefort-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
	Ingo Molnar <mingo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	Juri Lelli <juri.lelli-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	Christian Brauner
	<brauner-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	lixiong.liu-NuS5LvNUpcJWk0Htik3J/w@public.gmane.org,
	wenju.xu-NuS5LvNUpcJWk0Htik3J/w@public.gmane.org
Subject: Re: BUG: HANG_DETECT waiting for migration_cpu_stop() complete
Date: Thu, 29 Sep 2022 20:43:43 +0530	[thread overview]
Message-ID: <cdb597d4-6543-3e34-cbbd-6a776b0d6581@quicinc.com> (raw)
In-Reply-To: <e6153b89-1f41-3fff-241b-a767e41a1e7e-jfJNa2p1gH1BDgjK7y7TUQ@public.gmane.org>

Hi All,

On 9/23/2022 7:50 PM, Mukesh Ojha wrote:
> Hi Peter,
> 
> 
> On 9/7/2022 2:20 AM, Peter Zijlstra wrote:
>> On Tue, Sep 06, 2022 at 04:40:03PM -0400, Waiman Long wrote:
>>
>> I've not followed the earlier stuff due to being unreadable; just
>> reacting to this..
> 
> We are able to reproduce this issue explained at this link
> 
> https://lore.kernel.org/lkml/88b2910181bda955ac46011b695c53f7da39ac47.camel-NuS5LvNUpcJWk0Htik3J/w@public.gmane.org/ 
> 
> 
> 
>>
>>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>>> index 838623b68031..5d9ea1553ec0 100644
>>> --- a/kernel/sched/core.c
>>> +++ b/kernel/sched/core.c
>>> @@ -2794,9 +2794,9 @@ static int __set_cpus_allowed_ptr_locked(struct
>>> task_struct *p,
>>>                  if (cpumask_equal(&p->cpus_mask, new_mask))
>>>                          goto out;
>>>
>>> -               if (WARN_ON_ONCE(p == current &&
>>> -                                is_migration_disabled(p) &&
>>> -                                !cpumask_test_cpu(task_cpu(p), 
>>> new_mask)))
>>> {
>>> +               if (is_migration_disabled(p) &&
>>> +                   !cpumask_test_cpu(task_cpu(p), new_mask)) {
>>> +                       WARN_ON_ONCE(p == current);
>>>                          ret = -EBUSY;
>>>                          goto out;
>>>                  }
>>> @@ -2818,7 +2818,11 @@ static int __set_cpus_allowed_ptr_locked(struct
>>> task_struct *p,
>>>          if (flags & SCA_USER)
>>>                  user_mask = clear_user_cpus_ptr(p);
>>>
>>> -       ret = affine_move_task(rq, p, rf, dest_cpu, flags);
>>> +       if (!is_migration_disabled(p) || (flags & SCA_MIGRATE_ENABLE)) {
>>> +               ret = affine_move_task(rq, p, rf, dest_cpu, flags);
>>> +       } else {
>>> +               task_rq_unlock(rq, p, rf);
>>> +       }
>>
>> This cannot be right. There might be previous set_cpus_allowed_ptr()
>> callers that are blocked and waiting for the task to land on a valid
>> CPU.
>>
> 
> Was thinking if just skipping as below will help here, well i am not sure .
> 
> But thinking what if we keep the task as it is on the same cpu and let's 
> wait for migration to be enabled for the task to take care of it later.
> 
> ------------------->O------------------------------------------
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index d90d37c..7717733 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2390,8 +2390,10 @@ static int migration_cpu_stop(void *data)
>           * we're holding p->pi_lock.
>           */
>          if (task_rq(p) == rq) {
> -               if (is_migration_disabled(p))
> +               if (is_migration_disabled(p)) {
> +                       complete = true;
>                          goto out;
> +               }
> 
>                  if (pending) {
> 

Any suggestion on this bug ?


-Mukesh

  reply	other threads:[~2022-09-29 15:14 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-05  2:47 BUG: HANG_DETECT waiting for migration_cpu_stop() complete Jing-Ting Wu
2022-09-05  2:47 ` Jing-Ting Wu
2022-09-05  2:47 ` Jing-Ting Wu
2022-09-05  6:44 ` Mukesh Ojha
2022-09-05  6:44   ` Mukesh Ojha
2022-09-05  6:44   ` Mukesh Ojha
2022-09-05  8:22   ` Jing-Ting Wu
2022-09-05  8:22     ` Jing-Ting Wu
2022-09-05  8:22     ` Jing-Ting Wu
2022-09-06 18:30     ` Tejun Heo
2022-09-06 18:30       ` Tejun Heo
2022-09-06 18:30       ` Tejun Heo
2022-09-06 20:01       ` Waiman Long
2022-09-06 20:01         ` Waiman Long
2022-09-06 20:40         ` Waiman Long
2022-09-06 20:40           ` Waiman Long
2022-09-06 20:40           ` Waiman Long
2022-09-06 20:50           ` Peter Zijlstra
2022-09-06 20:50             ` Peter Zijlstra
2022-09-06 20:50             ` Peter Zijlstra
2022-09-06 21:02             ` Waiman Long
2022-09-06 21:02               ` Waiman Long
2022-09-06 21:02               ` Waiman Long
2022-09-23 14:20             ` Mukesh Ojha
2022-09-23 14:20               ` Mukesh Ojha
2022-09-23 14:20               ` Mukesh Ojha
2022-09-29 15:13               ` Mukesh Ojha [this message]
2022-09-29 15:13                 ` Mukesh Ojha
2022-09-29 15:13                 ` Mukesh Ojha
2022-09-07  0:07 ` Hillf Danton
2022-09-22  5:40   ` Jing-Ting Wu
2022-09-22  5:40     ` Jing-Ting Wu
2022-09-22  5:40     ` Jing-Ting Wu
2022-09-22 12:02     ` Hillf Danton
2023-03-22  9:37 Ryan Xiao (肖水林)
2023-03-22  9:37 ` Ryan Xiao (肖水林)
2023-03-27  4:05 ` Ryan Xiao (肖水林)
2023-03-27  4:05   ` Ryan Xiao (肖水林)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cdb597d4-6543-3e34-cbbd-6a776b0d6581@quicinc.com \
    --to=quic_mojha@quicinc.com \
    --cc=Jonathan.JMChen@mediatek.com \
    --cc=brauner@kernel.org \
    --cc=bsegall@google.com \
    --cc=cgroups@vger.kernel.org \
    --cc=chris.redpath@arm.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=jing-ting.wu@mediatek.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mediatek@lists.infradead.org \
    --cc=lixiong.liu@mediatek.com \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tj@kernel.org \
    --cc=vdonnefort@gmail.com \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    --cc=wenju.xu@mediatek.com \
    --cc=wsd_upstream@mediatek.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.