linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* WARNING: CPU: 0 PID: 14 at kernel/kthread.c:501 kthread_park+0x44/0xa4 (was: Re: kthread: Simplify kthread_park() completion)
       [not found] <20180813220011.D03DE21C8F@pdx-korg-gitolite-1.ci.codeaurora.org>
@ 2018-08-21 18:38 ` Geert Uytterhoeven
  2018-08-23 13:15   ` Lorenzo Pieralisi
  0 siblings, 1 reply; 2+ messages in thread
From: Geert Uytterhoeven @ 2018-08-21 18:38 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Mark Rutland, Lorenzo Pieralisi, Linux ARM, Linux-Renesas,
	Linux Kernel Mailing List

Hoi Peter,

On Tue, Aug 14, 2018 at 12:00 AM Linux Kernel Mailing List
<linux-kernel@vger.kernel.org> wrote:
> Web:        https://git.kernel.org/torvalds/c/f83ee19be4272564ad592ef90145db7295229490
> Commit:     f83ee19be4272564ad592ef90145db7295229490
> Parent:     167a88677b05d6a810f23b871cfb2b5db1808e60
> Refname:    refs/heads/master
> Author:     Peter Zijlstra <peterz@infradead.org>
> AuthorDate: Thu Jun 7 10:55:56 2018 +0200
> Committer:  Ingo Molnar <mingo@kernel.org>
> CommitDate: Tue Jul 3 09:20:44 2018 +0200
>
>     kthread: Simplify kthread_park() completion
>
>     Oleg explains the reason we could hit park+park is that
>     smpboot_update_cpumask_percpu_thread()'s
>
>       for_each_cpu_and(cpu, &tmp, cpu_online_mask)
>             smpboot_park_kthread();
>
>     turns into:
>
>       for ((cpu) = 0; (cpu) < 1; (cpu)++, (void)mask, (void)and)
>             smpboot_park_kthread();
>
>     on UP, ignoring the mask. But since we just completely removed that
>     function, this is no longer relevant.
>
>     So revert commit:
>
>       b1f5b378e126 ("kthread: Allow kthread_park() on a parked kthread")
>
>     Suggested-by: Oleg Nesterov <oleg@redhat.com>
>     Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>     Cc: Linus Torvalds <torvalds@linux-foundation.org>
>     Cc: Peter Zijlstra <peterz@infradead.org>
>     Cc: Thomas Gleixner <tglx@linutronix.de>
>     Cc: linux-kernel@vger.kernel.org
>     Signed-off-by: Ingo Molnar <mingo@kernel.org>
> ---
>  kernel/kthread.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/kthread.c b/kernel/kthread.c
> index 750cb8082694..11b591ee51ab 100644
> --- a/kernel/kthread.c
> +++ b/kernel/kthread.c
> @@ -190,7 +190,7 @@ static void __kthread_parkme(struct kthread *self)
>                 if (!test_bit(KTHREAD_SHOULD_PARK, &self->flags))
>                         break;
>
> -               complete_all(&self->parked);
> +               complete(&self->parked);
>                 schedule();
>         }
>         __set_current_state(TASK_RUNNING);
> @@ -465,7 +465,6 @@ void kthread_unpark(struct task_struct *k)
>         if (test_bit(KTHREAD_IS_PER_CPU, &kthread->flags))
>                 __kthread_bind(k, kthread->cpu, TASK_PARKED);
>
> -       reinit_completion(&kthread->parked);
>         clear_bit(KTHREAD_SHOULD_PARK, &kthread->flags);
>         /*
>          * __kthread_parkme() will either see !SHOULD_PARK or get the wakeup.
> @@ -493,6 +492,9 @@ int kthread_park(struct task_struct *k)
>         if (WARN_ON(k->flags & PF_EXITING))
>                 return -ENOSYS;
>
> +       if (WARN_ON_ONCE(test_bit(KTHREAD_SHOULD_PARK, &kthread->flags)))
> +               return -EBUSY;
> +
>         set_bit(KTHREAD_SHOULD_PARK, &kthread->flags);
>         if (k != current) {
>                 wake_up_process(k);

The above WARN_ON_ONCE() triggers during psci_checker operation when booting
on R-Car Gen3 (arm64) SoCs where a trusted OS is resident on CPU0.

Reverting the commit fixes the issue.

Dmesg before/after on R-Car H3 ES2.0:

 psci: probing for conduit method from DT.
 psci: PSCIv1.0 detected in firmware.
 psci: Using standard PSCI v0.2 function IDs
 psci: Trusted OS resident on physical CPU 0x0
 psci: SMC Calling Convention v1.0
 ...
 psci_checker: Trying to turn off and on again group 0 (CPUs 0-3)
+WARNING: CPU: 0 PID: 14 at kernel/kthread.c:501 kthread_park+0x44/0xa4
+Modules linked in:
+CPU: 0 PID: 14 Comm: cpuhp/0 Not tainted
4.18.0-salvator-x-00407-gbc763e81b483a4e3 #170
+Hardware name: Renesas Salvator-X 2nd version board based on r8a7795
ES2.0+ (DT)
+pstate: 80400005 (Nzcv daif +PAN -UAO)
+pc : kthread_park+0x44/0xa4
+lr : smpboot_park_threads+0x88/0x94
+sp : ffffff8009dcbca0
+x29: ffffff8009dcbca0 x28: ffffff80081156cc
+x27: ffffff800900e000 x26: 00000046f7027000
+x25: ffffff8008e9cfc0 x24: ffffffc6ffec3fc0
+x23: ffffff80090305b8 x22: 0000000000000000
+x21: ffffffc6fb8ab200 x20: 0000000000000001
+x19: ffffff80090269e8 x18: ffffffc6fb897348
+x17: 0000000000000000 x16: 0000000000000000
+x15: 0000000000000400 x14: 0000000000000400
+x13: 0000000000000400 x12: 0000000000000001
+x11: 0000000000000400 x10: 0000000000000400
+x9 : 0000000000000125 x8 : 0000000000000000
+x7 : ffffff80081156f8 x6 : 0000000000000001
+x5 : 0000000000000000 x4 : ffffff8009804b40
+x3 : 000000007cbd3c4e x2 : 38716e04a5aa3600
+x1 : 0000000004208040 x0 : ffffffc6fb95c240
+Call trace:
+ kthread_park+0x44/0xa4
+ smpboot_park_threads+0x88/0x94
+ cpuhp_invoke_callback+0x230/0xcfc
+ cpuhp_thread_fun+0xb8/0x1d8
+ smpboot_thread_fn+0x228/0x244
+ kthread+0x124/0x134
+ ret_from_fork+0x10/0x18
+irq event stamp: 3390
+hardirqs last  enabled at (3389): [<ffffff800818729c>]
generic_exec_single+0x80/0x11c
+hardirqs last disabled at (3390): [<ffffff80080818cc>]
do_debug_exception+0x5c/0x17c
+softirqs last  enabled at (1810): [<ffffff8008081da8>] __do_softirq+0x160/0x4ec
+softirqs last disabled at (1795): [<ffffff80080ed460>] irq_exit+0xa4/0x100
+---[ end trace 518ee2fb840813cb ]---
 CPU1: shutdown
 psci: CPU1 killed.
-NOHZ: local_softirq_pending 51
+NOHZ: local_softirq_pending 55
 CPU2: shutdown
 psci: CPU2 killed.
 NOHZ: local_softirq_pending 51
 CPU3: shutdown
 psci: CPU3 killed.
 Detected PIPT I-cache on CPU1
 CPU1: Booted secondary processor 0x0000000001 [0x411fd073]
 Detected PIPT I-cache on CPU2
 CPU2: Booted secondary processor 0x0000000002 [0x411fd073]
 Detected PIPT I-cache on CPU3
 CPU3: Booted secondary processor 0x0000000003 [0x411fd073]

The issue can also be seen on R-Car M3-W and M3-N.
It does not happen on R-Car H3 ES1.0 with an older firmware version, where
no trusted OS is running on CPU0 ("psci: Trusted OS migration not required").

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: WARNING: CPU: 0 PID: 14 at kernel/kthread.c:501 kthread_park+0x44/0xa4 (was: Re: kthread: Simplify kthread_park() completion)
  2018-08-21 18:38 ` WARNING: CPU: 0 PID: 14 at kernel/kthread.c:501 kthread_park+0x44/0xa4 (was: Re: kthread: Simplify kthread_park() completion) Geert Uytterhoeven
@ 2018-08-23 13:15   ` Lorenzo Pieralisi
  0 siblings, 0 replies; 2+ messages in thread
From: Lorenzo Pieralisi @ 2018-08-23 13:15 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Peter Zijlstra, Mark Rutland, Linux ARM, Linux-Renesas,
	Linux Kernel Mailing List

On Tue, Aug 21, 2018 at 08:38:39PM +0200, Geert Uytterhoeven wrote:
> Hoi Peter,
> 
> On Tue, Aug 14, 2018 at 12:00 AM Linux Kernel Mailing List
> <linux-kernel@vger.kernel.org> wrote:
> > Web:        https://git.kernel.org/torvalds/c/f83ee19be4272564ad592ef90145db7295229490
> > Commit:     f83ee19be4272564ad592ef90145db7295229490
> > Parent:     167a88677b05d6a810f23b871cfb2b5db1808e60
> > Refname:    refs/heads/master
> > Author:     Peter Zijlstra <peterz@infradead.org>
> > AuthorDate: Thu Jun 7 10:55:56 2018 +0200
> > Committer:  Ingo Molnar <mingo@kernel.org>
> > CommitDate: Tue Jul 3 09:20:44 2018 +0200
> >
> >     kthread: Simplify kthread_park() completion
> >
> >     Oleg explains the reason we could hit park+park is that
> >     smpboot_update_cpumask_percpu_thread()'s
> >
> >       for_each_cpu_and(cpu, &tmp, cpu_online_mask)
> >             smpboot_park_kthread();
> >
> >     turns into:
> >
> >       for ((cpu) = 0; (cpu) < 1; (cpu)++, (void)mask, (void)and)
> >             smpboot_park_kthread();
> >
> >     on UP, ignoring the mask. But since we just completely removed that
> >     function, this is no longer relevant.
> >
> >     So revert commit:
> >
> >       b1f5b378e126 ("kthread: Allow kthread_park() on a parked kthread")
> >
> >     Suggested-by: Oleg Nesterov <oleg@redhat.com>
> >     Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> >     Cc: Linus Torvalds <torvalds@linux-foundation.org>
> >     Cc: Peter Zijlstra <peterz@infradead.org>
> >     Cc: Thomas Gleixner <tglx@linutronix.de>
> >     Cc: linux-kernel@vger.kernel.org
> >     Signed-off-by: Ingo Molnar <mingo@kernel.org>
> > ---
> >  kernel/kthread.c | 6 ++++--
> >  1 file changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/kernel/kthread.c b/kernel/kthread.c
> > index 750cb8082694..11b591ee51ab 100644
> > --- a/kernel/kthread.c
> > +++ b/kernel/kthread.c
> > @@ -190,7 +190,7 @@ static void __kthread_parkme(struct kthread *self)
> >                 if (!test_bit(KTHREAD_SHOULD_PARK, &self->flags))
> >                         break;
> >
> > -               complete_all(&self->parked);
> > +               complete(&self->parked);
> >                 schedule();
> >         }
> >         __set_current_state(TASK_RUNNING);
> > @@ -465,7 +465,6 @@ void kthread_unpark(struct task_struct *k)
> >         if (test_bit(KTHREAD_IS_PER_CPU, &kthread->flags))
> >                 __kthread_bind(k, kthread->cpu, TASK_PARKED);
> >
> > -       reinit_completion(&kthread->parked);
> >         clear_bit(KTHREAD_SHOULD_PARK, &kthread->flags);
> >         /*
> >          * __kthread_parkme() will either see !SHOULD_PARK or get the wakeup.
> > @@ -493,6 +492,9 @@ int kthread_park(struct task_struct *k)
> >         if (WARN_ON(k->flags & PF_EXITING))
> >                 return -ENOSYS;
> >
> > +       if (WARN_ON_ONCE(test_bit(KTHREAD_SHOULD_PARK, &kthread->flags)))
> > +               return -EBUSY;
> > +
> >         set_bit(KTHREAD_SHOULD_PARK, &kthread->flags);
> >         if (k != current) {
> >                 wake_up_process(k);
> 
> The above WARN_ON_ONCE() triggers during psci_checker operation when booting
> on R-Car Gen3 (arm64) SoCs where a trusted OS is resident on CPU0.
> 
> Reverting the commit fixes the issue.
> 
> Dmesg before/after on R-Car H3 ES2.0:
> 
>  psci: probing for conduit method from DT.
>  psci: PSCIv1.0 detected in firmware.
>  psci: Using standard PSCI v0.2 function IDs
>  psci: Trusted OS resident on physical CPU 0x0
>  psci: SMC Calling Convention v1.0
>  ...
>  psci_checker: Trying to turn off and on again group 0 (CPUs 0-3)
> +WARNING: CPU: 0 PID: 14 at kernel/kthread.c:501 kthread_park+0x44/0xa4
> +Modules linked in:
> +CPU: 0 PID: 14 Comm: cpuhp/0 Not tainted
> 4.18.0-salvator-x-00407-gbc763e81b483a4e3 #170
> +Hardware name: Renesas Salvator-X 2nd version board based on r8a7795
> ES2.0+ (DT)
> +pstate: 80400005 (Nzcv daif +PAN -UAO)
> +pc : kthread_park+0x44/0xa4
> +lr : smpboot_park_threads+0x88/0x94
> +sp : ffffff8009dcbca0
> +x29: ffffff8009dcbca0 x28: ffffff80081156cc
> +x27: ffffff800900e000 x26: 00000046f7027000
> +x25: ffffff8008e9cfc0 x24: ffffffc6ffec3fc0
> +x23: ffffff80090305b8 x22: 0000000000000000
> +x21: ffffffc6fb8ab200 x20: 0000000000000001
> +x19: ffffff80090269e8 x18: ffffffc6fb897348
> +x17: 0000000000000000 x16: 0000000000000000
> +x15: 0000000000000400 x14: 0000000000000400
> +x13: 0000000000000400 x12: 0000000000000001
> +x11: 0000000000000400 x10: 0000000000000400
> +x9 : 0000000000000125 x8 : 0000000000000000
> +x7 : ffffff80081156f8 x6 : 0000000000000001
> +x5 : 0000000000000000 x4 : ffffff8009804b40
> +x3 : 000000007cbd3c4e x2 : 38716e04a5aa3600
> +x1 : 0000000004208040 x0 : ffffffc6fb95c240
> +Call trace:
> + kthread_park+0x44/0xa4
> + smpboot_park_threads+0x88/0x94
> + cpuhp_invoke_callback+0x230/0xcfc
> + cpuhp_thread_fun+0xb8/0x1d8
> + smpboot_thread_fn+0x228/0x244
> + kthread+0x124/0x134
> + ret_from_fork+0x10/0x18
> +irq event stamp: 3390
> +hardirqs last  enabled at (3389): [<ffffff800818729c>]
> generic_exec_single+0x80/0x11c
> +hardirqs last disabled at (3390): [<ffffff80080818cc>]
> do_debug_exception+0x5c/0x17c
> +softirqs last  enabled at (1810): [<ffffff8008081da8>] __do_softirq+0x160/0x4ec
> +softirqs last disabled at (1795): [<ffffff80080ed460>] irq_exit+0xa4/0x100
> +---[ end trace 518ee2fb840813cb ]---
>  CPU1: shutdown
>  psci: CPU1 killed.
> -NOHZ: local_softirq_pending 51
> +NOHZ: local_softirq_pending 55
>  CPU2: shutdown
>  psci: CPU2 killed.
>  NOHZ: local_softirq_pending 51
>  CPU3: shutdown
>  psci: CPU3 killed.
>  Detected PIPT I-cache on CPU1
>  CPU1: Booted secondary processor 0x0000000001 [0x411fd073]
>  Detected PIPT I-cache on CPU2
>  CPU2: Booted secondary processor 0x0000000002 [0x411fd073]
>  Detected PIPT I-cache on CPU3
>  CPU3: Booted secondary processor 0x0000000003 [0x411fd073]
> 
> The issue can also be seen on R-Car M3-W and M3-N.  It does not happen
> on R-Car H3 ES1.0 with an older firmware version, where no trusted OS
> is running on CPU0 ("psci: Trusted OS migration not required").

The problem is caused by __cpu_disable() returning -EPERM. I was
expecting the hotplug state machine to rollback the state
but it seems that, in _cpu_down() the callback following
CPUHP_TEARDOWN_CPU (ie CPUHP_AP_SMPBOOT_THREADS, that should call
smpboot_unpark_threads()) is skipped. I am trying to debug the
hotplug state machine and I am not familiar enough with that code
to pinpoint an issue but I have more than a feeling that reverting
the patch removes the warning but does _not_ fix the underlying
issue.

You can easily reproduce the problem by tring to hotplug CPU0
out through sysfs (and AFAICS it is a problem also on other
arches where __cpu_disable() may return an error).

Lorenzo

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2018-08-23 13:15 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20180813220011.D03DE21C8F@pdx-korg-gitolite-1.ci.codeaurora.org>
2018-08-21 18:38 ` WARNING: CPU: 0 PID: 14 at kernel/kthread.c:501 kthread_park+0x44/0xa4 (was: Re: kthread: Simplify kthread_park() completion) Geert Uytterhoeven
2018-08-23 13:15   ` Lorenzo Pieralisi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).