All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH linux-next] powerpc: use raw_smp_processor_id in arch_touch_nmi_watchdog
@ 2022-07-14  1:31 Zhouyi Zhou
  2022-07-14  9:25 ` John Ogness
  0 siblings, 1 reply; 6+ messages in thread
From: Zhouyi Zhou @ 2022-07-14  1:31 UTC (permalink / raw)
  To: mpe, benh, paulus, npiggin, ldufour, pmladek, john.ogness,
	Julia.Lawall, linuxppc-dev, linux-kernel, lance, paulmck, rcu
  Cc: Zhouyi Zhou

use raw_smp_processor_id() in arch_touch_nmi_watchdog
because when called from watchdog, the cpu is preemptible.

Signed-off-by: Zhouyi Zhou <zhouzhouyi@gmail.com>
---
Dear PPC developers

I found this bug when trying to do rcutorture tests in ppc VM of
Open Source Lab of Oregon State University.

qemu-system-ppc64  -nographic -smp cores=4,threads=1 -net none  -M pseries -nodefaults -device spapr-vscsi -serial file:/tmp/console.log -m 2G -kernel /home/ubuntu/linux-next/tools/testing/selftests/rcutorture/res/2022.07.08-22.36.11-torture/results-rcuscale-kvfree/TREE/vmlinux -append "debug_boot_weak_hash panic=-1 console=ttyS0 rcuscale.kfree_rcu_test=1 rcuscale.kfree_nthreads=16 rcuscale.holdoff=20 rcuscale.kfree_loops=10000 torture.disable_onoff_at_boot rcuscale.shutdown=1 rcuscale.verbose=0"

tail /tmp/console.log
[ 1232.433552][   T41] BUG: using smp_processor_id() in preemptible [00000000] code: khungtaskd/41
[ 1232.439751][   T41] caller is arch_touch_nmi_watchdog+0x34/0xd0
[ 1232.440934][   T41] CPU: 3 PID: 41 Comm: khungtaskd Not tainted 5.19.0-rc5-next-20220708-dirty #106
[ 1232.442684][   T41] Call Trace:
[ 1232.443343][   T41] [c0000000029cbbb0] [c0000000006df360] dump_stack_lvl+0x74/0xa8 (unreliable)
[ 1232.445237][   T41] [c0000000029cbbf0] [c000000000d04f30] check_preemption_disabled+0x150/0x160
[ 1232.446926][   T41] [c0000000029cbc80] [c000000000035584] arch_touch_nmi_watchdog+0x34/0xd0
[ 1232.448532][   T41] [c0000000029cbcb0] [c0000000002068ac] watchdog+0x40c/0x5b0
[ 1232.451449][   T41] [c0000000029cbdc0] [c000000000139df4] kthread+0x144/0x170
[ 1232.452896][   T41] [c0000000029cbe10] [c00000000000cd54] ret_from_kernel_thread+0x5c/0x64

After this fix, "BUG: using smp_processor_id() in preemptible [00000000] code: khungtaskd/41" does not
appear again.

I also examined other places in watchdog.c where smp_processor_id() are used, but they are well protected by preempt
disable.

Kind Regards
Zhouyi
--
 arch/powerpc/kernel/watchdog.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c
index 7d28b9553654..ab6b84e00311 100644
--- a/arch/powerpc/kernel/watchdog.c
+++ b/arch/powerpc/kernel/watchdog.c
@@ -450,7 +450,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
 void arch_touch_nmi_watchdog(void)
 {
 	unsigned long ticks = tb_ticks_per_usec * wd_timer_period_ms * 1000;
-	int cpu = smp_processor_id();
+	int cpu = raw_smp_processor_id();
 	u64 tb;
 
 	if (!cpumask_test_cpu(cpu, &watchdog_cpumask))
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH linux-next] powerpc: use raw_smp_processor_id in arch_touch_nmi_watchdog
  2022-07-14  1:31 [PATCH linux-next] powerpc: use raw_smp_processor_id in arch_touch_nmi_watchdog Zhouyi Zhou
@ 2022-07-14  9:25 ` John Ogness
  2022-07-14 10:01     ` Zhouyi Zhou
  0 siblings, 1 reply; 6+ messages in thread
From: John Ogness @ 2022-07-14  9:25 UTC (permalink / raw)
  To: Zhouyi Zhou, mpe, benh, paulus, npiggin, ldufour, pmladek,
	Julia.Lawall, linuxppc-dev, linux-kernel, lance, paulmck, rcu
  Cc: Zhouyi Zhou

On 2022-07-14, Zhouyi Zhou <zhouzhouyi@gmail.com> wrote:
> use raw_smp_processor_id() in arch_touch_nmi_watchdog
> because when called from watchdog, the cpu is preemptible.

I would expect the correct solution is to make it a non-migration
section. Something like the below (untested) patch.

John Ogness

diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c
index bfc27496fe7e..9d34aa809241 100644
--- a/arch/powerpc/kernel/watchdog.c
+++ b/arch/powerpc/kernel/watchdog.c
@@ -450,17 +450,23 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
 void arch_touch_nmi_watchdog(void)
 {
 	unsigned long ticks = tb_ticks_per_usec * wd_timer_period_ms * 1000;
-	int cpu = smp_processor_id();
+	int cpu;
 	u64 tb;
 
-	if (!cpumask_test_cpu(cpu, &watchdog_cpumask))
+	cpu = get_cpu();
+
+	if (!cpumask_test_cpu(cpu, &watchdog_cpumask)) {
+		goto out;
 		return;
+	}
 
 	tb = get_tb();
 	if (tb - per_cpu(wd_timer_tb, cpu) >= ticks) {
 		per_cpu(wd_timer_tb, cpu) = tb;
 		wd_smp_clear_cpu_pending(cpu);
 	}
+out:
+	put_cpu();
 }
 EXPORT_SYMBOL(arch_touch_nmi_watchdog);

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH linux-next] powerpc: use raw_smp_processor_id in arch_touch_nmi_watchdog
  2022-07-14  9:25 ` John Ogness
@ 2022-07-14 10:01     ` Zhouyi Zhou
  0 siblings, 0 replies; 6+ messages in thread
From: Zhouyi Zhou @ 2022-07-14 10:01 UTC (permalink / raw)
  To: John Ogness
  Cc: Michael Ellerman, Benjamin Herrenschmidt, paulus,
	Nicholas Piggin, ldufour, pmladek, Julia.Lawall, linuxppc-dev,
	linux-kernel, lance, Paul E. McKenney, rcu

Thank John for correcting me ;-)

On Thu, Jul 14, 2022 at 5:25 PM John Ogness <john.ogness@linutronix.de> wrote:
>
> On 2022-07-14, Zhouyi Zhou <zhouzhouyi@gmail.com> wrote:
> > use raw_smp_processor_id() in arch_touch_nmi_watchdog
> > because when called from watchdog, the cpu is preemptible.
>
> I would expect the correct solution is to make it a non-migration
> section. Something like the below (untested) patch.
I applied your patch (I have made a tiny modification by removing the
return statement after "goto out;") and
passed the test in the ppc VM of Open Source Lab of Oregon State University.

Tested-by: Zhouyi Zhou <zhouzhouyi@gmail.com>

Many Thanks
Kindly Regards
Zhouyi
>
> John Ogness
>
> diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c
> index bfc27496fe7e..9d34aa809241 100644
> --- a/arch/powerpc/kernel/watchdog.c
> +++ b/arch/powerpc/kernel/watchdog.c
> @@ -450,17 +450,23 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
>  void arch_touch_nmi_watchdog(void)
>  {
>         unsigned long ticks = tb_ticks_per_usec * wd_timer_period_ms * 1000;
> -       int cpu = smp_processor_id();
> +       int cpu;
>         u64 tb;
>
> -       if (!cpumask_test_cpu(cpu, &watchdog_cpumask))
> +       cpu = get_cpu();
> +
> +       if (!cpumask_test_cpu(cpu, &watchdog_cpumask)) {
> +               goto out;
>                 return;
I think we should remove the return statement here.
> +       }
>
>         tb = get_tb();
>         if (tb - per_cpu(wd_timer_tb, cpu) >= ticks) {
>                 per_cpu(wd_timer_tb, cpu) = tb;
>                 wd_smp_clear_cpu_pending(cpu);
>         }
> +out:
> +       put_cpu();
>  }
>  EXPORT_SYMBOL(arch_touch_nmi_watchdog);

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH linux-next] powerpc: use raw_smp_processor_id in arch_touch_nmi_watchdog
@ 2022-07-14 10:01     ` Zhouyi Zhou
  0 siblings, 0 replies; 6+ messages in thread
From: Zhouyi Zhou @ 2022-07-14 10:01 UTC (permalink / raw)
  To: John Ogness
  Cc: pmladek, Paul E. McKenney, rcu, linux-kernel, Nicholas Piggin,
	Julia.Lawall, paulus, lance, ldufour, linuxppc-dev

Thank John for correcting me ;-)

On Thu, Jul 14, 2022 at 5:25 PM John Ogness <john.ogness@linutronix.de> wrote:
>
> On 2022-07-14, Zhouyi Zhou <zhouzhouyi@gmail.com> wrote:
> > use raw_smp_processor_id() in arch_touch_nmi_watchdog
> > because when called from watchdog, the cpu is preemptible.
>
> I would expect the correct solution is to make it a non-migration
> section. Something like the below (untested) patch.
I applied your patch (I have made a tiny modification by removing the
return statement after "goto out;") and
passed the test in the ppc VM of Open Source Lab of Oregon State University.

Tested-by: Zhouyi Zhou <zhouzhouyi@gmail.com>

Many Thanks
Kindly Regards
Zhouyi
>
> John Ogness
>
> diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c
> index bfc27496fe7e..9d34aa809241 100644
> --- a/arch/powerpc/kernel/watchdog.c
> +++ b/arch/powerpc/kernel/watchdog.c
> @@ -450,17 +450,23 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
>  void arch_touch_nmi_watchdog(void)
>  {
>         unsigned long ticks = tb_ticks_per_usec * wd_timer_period_ms * 1000;
> -       int cpu = smp_processor_id();
> +       int cpu;
>         u64 tb;
>
> -       if (!cpumask_test_cpu(cpu, &watchdog_cpumask))
> +       cpu = get_cpu();
> +
> +       if (!cpumask_test_cpu(cpu, &watchdog_cpumask)) {
> +               goto out;
>                 return;
I think we should remove the return statement here.
> +       }
>
>         tb = get_tb();
>         if (tb - per_cpu(wd_timer_tb, cpu) >= ticks) {
>                 per_cpu(wd_timer_tb, cpu) = tb;
>                 wd_smp_clear_cpu_pending(cpu);
>         }
> +out:
> +       put_cpu();
>  }
>  EXPORT_SYMBOL(arch_touch_nmi_watchdog);

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH linux-next] powerpc: use raw_smp_processor_id in arch_touch_nmi_watchdog
  2022-07-14 10:01     ` Zhouyi Zhou
@ 2022-07-14 11:46       ` John Ogness
  -1 siblings, 0 replies; 6+ messages in thread
From: John Ogness @ 2022-07-14 11:46 UTC (permalink / raw)
  To: Zhouyi Zhou
  Cc: Michael Ellerman, Benjamin Herrenschmidt, paulus,
	Nicholas Piggin, ldufour, pmladek, Julia.Lawall, linuxppc-dev,
	linux-kernel, lance, Paul E. McKenney, rcu

On 2022-07-14, Zhouyi Zhou <zhouzhouyi@gmail.com> wrote:
> Thank John for correcting me ;-)

After looking more closely, I do not think disabling migration is the
correct fix either.

The per-cpu variable @wd_timer_tb is written from 2 functions:

- watchdog_timer_interrupt() <-- irq handler
- arch_touch_nmi_watchdog()  <-- called from preemptible

Since watchdog_timer_interrupt() is called from irq context, I expect
that interrupts need to be disabled for the update in
arch_touch_nmi_watchdog(). Perhaps a using a per-cpu local_lock_t with
local_lock_irqsave() to protect write access to @wd_timer_tb?

John Ogness

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH linux-next] powerpc: use raw_smp_processor_id in arch_touch_nmi_watchdog
@ 2022-07-14 11:46       ` John Ogness
  0 siblings, 0 replies; 6+ messages in thread
From: John Ogness @ 2022-07-14 11:46 UTC (permalink / raw)
  To: Zhouyi Zhou
  Cc: pmladek, Paul E. McKenney, rcu, linux-kernel, Nicholas Piggin,
	Julia.Lawall, paulus, lance, ldufour, linuxppc-dev

On 2022-07-14, Zhouyi Zhou <zhouzhouyi@gmail.com> wrote:
> Thank John for correcting me ;-)

After looking more closely, I do not think disabling migration is the
correct fix either.

The per-cpu variable @wd_timer_tb is written from 2 functions:

- watchdog_timer_interrupt() <-- irq handler
- arch_touch_nmi_watchdog()  <-- called from preemptible

Since watchdog_timer_interrupt() is called from irq context, I expect
that interrupts need to be disabled for the update in
arch_touch_nmi_watchdog(). Perhaps a using a per-cpu local_lock_t with
local_lock_irqsave() to protect write access to @wd_timer_tb?

John Ogness

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-07-14 11:47 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-14  1:31 [PATCH linux-next] powerpc: use raw_smp_processor_id in arch_touch_nmi_watchdog Zhouyi Zhou
2022-07-14  9:25 ` John Ogness
2022-07-14 10:01   ` Zhouyi Zhou
2022-07-14 10:01     ` Zhouyi Zhou
2022-07-14 11:46     ` John Ogness
2022-07-14 11:46       ` John Ogness

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.