All of lore.kernel.org
 help / color / mirror / Atom feed
* fix BUG: using smp_processor_id() in touch_nmi_watchdog and touch_softlockup_watchdog
@ 2010-08-13 10:21 Sergey Senozhatsky
  2010-08-16  8:22 ` Peter Zijlstra
  2010-08-18 19:33 ` Andrew Morton
  0 siblings, 2 replies; 41+ messages in thread
From: Sergey Senozhatsky @ 2010-08-13 10:21 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Peter Zijlstra, Ingo Molnar, linux-kernel

Hello,

Got this traces today:

[   67.703556] BUG: using smp_processor_id() in preemptible [00000000] code: s2disk/5139
[   67.703563] caller is touch_nmi_watchdog+0x15/0x2c
[   67.703566] Pid: 5139, comm: s2disk Not tainted 2.6.36-rc0-git12-07921-g60bf26a-dirty #116
[   67.703568] Call Trace:
[   67.703575]  [<ffffffff811f6bf1>] debug_smp_processor_id+0xc9/0xe4
[   67.703578]  [<ffffffff81092766>] touch_nmi_watchdog+0x15/0x2c
[   67.703584]  [<ffffffff81222950>] acpi_os_stall+0x34/0x40
[   67.703589]  [<ffffffff812398d2>] acpi_ex_system_do_stall+0x34/0x38
[   67.703591]  [<ffffffff81238396>] acpi_ex_opcode_1A_0T_0R+0x6d/0xa1
[   67.703595]  [<ffffffff8122e280>] acpi_ds_exec_end_op+0xf8/0x578
[   67.703598]  [<ffffffff812457f9>] acpi_ps_parse_loop+0x88a/0xa55
[   67.703604]  [<ffffffff81244a00>] acpi_ps_parse_aml+0x104/0x3c4
[   67.703607]  [<ffffffff81246198>] acpi_ps_execute_method+0x20f/0x2f3
[   67.703610]  [<ffffffff8124021f>] acpi_ns_evaluate+0x18b/0x2d2
[   67.703614]  [<ffffffff8123fad0>] acpi_evaluate_object+0x1b8/0x2fc
[   67.703617]  [<ffffffff8123e020>] ? acpi_get_sleep_type_data+0x21c/0x236
[   67.703620]  [<ffffffff8123d9fb>] acpi_enter_sleep_state_prep+0x61/0xd9
[   67.703623]  [<ffffffff81224205>] acpi_sleep_prepare+0x4f/0x56
[   67.703626]  [<ffffffff81224268>] __acpi_pm_prepare+0x13/0x2e
[   67.703629]  [<ffffffff81224448>] acpi_pm_prepare+0xe/0x1f
[   67.703632]  [<ffffffff81224466>] acpi_hibernation_pre_snapshot+0xd/0x1e
[   67.703637]  [<ffffffff81071b80>] hibernation_snapshot+0xaf/0x258
[   67.703641]  [<ffffffff81074dca>] snapshot_ioctl+0x25c/0x547
[   67.703645]  [<ffffffff81056efc>] ? __srcu_read_unlock+0x3b/0x57
[   67.703649]  [<ffffffff810e7f7d>] vfs_ioctl+0x31/0xa2
[   67.703652]  [<ffffffff810e88dc>] do_vfs_ioctl+0x47c/0x4af
[   67.703655]  [<ffffffff8125ee3c>] ? n_tty_write+0x0/0x35e
[   67.703659]  [<ffffffff8100203a>] ? sysret_check+0x2e/0x69
[   67.703662]  [<ffffffff810e8960>] sys_ioctl+0x51/0x75
[   67.703665]  [<ffffffff81002002>] system_call_fastpath+0x16/0x1b


[   67.703668] BUG: using smp_processor_id() in preemptible [00000000] code: s2disk/5139
[   67.703670] caller is touch_softlockup_watchdog+0x15/0x2b
[   67.703672] Pid: 5139, comm: s2disk Not tainted 2.6.36-rc0-git12-07921-g60bf26a-dirty #116
[   67.703674] Call Trace:
[   67.703677]  [<ffffffff811f6bf1>] debug_smp_processor_id+0xc9/0xe4
[   67.703680]  [<ffffffff8109273b>] touch_softlockup_watchdog+0x15/0x2b
[   67.703682]  [<ffffffff81092779>] touch_nmi_watchdog+0x28/0x2c
[   67.703685]  [<ffffffff81222950>] acpi_os_stall+0x34/0x40
[   67.703688]  [<ffffffff812398d2>] acpi_ex_system_do_stall+0x34/0x38
[   67.703690]  [<ffffffff81238396>] acpi_ex_opcode_1A_0T_0R+0x6d/0xa1
[   67.703693]  [<ffffffff8122e280>] acpi_ds_exec_end_op+0xf8/0x578
[   67.703696]  [<ffffffff812457f9>] acpi_ps_parse_loop+0x88a/0xa55
[   67.703699]  [<ffffffff81244a00>] acpi_ps_parse_aml+0x104/0x3c4
[   67.703702]  [<ffffffff81246198>] acpi_ps_execute_method+0x20f/0x2f3
[   67.703705]  [<ffffffff8124021f>] acpi_ns_evaluate+0x18b/0x2d2
[   67.703708]  [<ffffffff8123fad0>] acpi_evaluate_object+0x1b8/0x2fc
[   67.703710]  [<ffffffff8123e020>] ? acpi_get_sleep_type_data+0x21c/0x236
[   67.703714]  [<ffffffff8123d9fb>] acpi_enter_sleep_state_prep+0x61/0xd9
[   67.703717]  [<ffffffff81224205>] acpi_sleep_prepare+0x4f/0x56
[   67.703719]  [<ffffffff81224268>] __acpi_pm_prepare+0x13/0x2e
[   67.703722]  [<ffffffff81224448>] acpi_pm_prepare+0xe/0x1f
[   67.703725]  [<ffffffff81224466>] acpi_hibernation_pre_snapshot+0xd/0x1e
[   67.703728]  [<ffffffff81071b80>] hibernation_snapshot+0xaf/0x258
[   67.703731]  [<ffffffff81074dca>] snapshot_ioctl+0x25c/0x547
[   67.703733]  [<ffffffff81056efc>] ? __srcu_read_unlock+0x3b/0x57
[   67.703736]  [<ffffffff810e7f7d>] vfs_ioctl+0x31/0xa2
[   67.703739]  [<ffffffff810e88dc>] do_vfs_ioctl+0x47c/0x4af
[   67.703741]  [<ffffffff8125ee3c>] ? n_tty_write+0x0/0x35e
[   67.703744]  [<ffffffff8100203a>] ? sysret_check+0x2e/0x69
[   67.703747]  [<ffffffff810e8960>] sys_ioctl+0x51/0x75
[   67.703750]  [<ffffffff81002002>] system_call_fastpath+0x16/0x1b



Please kindly review the patch:

Avoid using smp_processor_id in touch_softlockup_watchdog and touch_nmi_watchdog.
Patch also "removes" second call to smp_processor_id in __touch_watchdog
(smp_processor_id itself and smp_processor_id in __get_cpu_var). 

---

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 613bc1f..8822f1e 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -116,13 +116,14 @@ static unsigned long get_sample_period(void)
 static void __touch_watchdog(void)
 {
 	int this_cpu = smp_processor_id();
-
-	__get_cpu_var(watchdog_touch_ts) = get_timestamp(this_cpu);
+	per_cpu(watchdog_touch_ts, this_cpu) = get_timestamp(this_cpu);
 }
 
 void touch_softlockup_watchdog(void)
 {
-	__get_cpu_var(watchdog_touch_ts) = 0;
+	int this_cpu = get_cpu();
+	per_cpu(watchdog_touch_ts, this_cpu) = 0;
+	put_cpu();
 }
 EXPORT_SYMBOL(touch_softlockup_watchdog);
 
@@ -142,7 +143,9 @@ void touch_all_softlockup_watchdogs(void)
 #ifdef CONFIG_HARDLOCKUP_DETECTOR
 void touch_nmi_watchdog(void)
 {
-	__get_cpu_var(watchdog_nmi_touch) = true;
+	int this_cpu = get_cpu();
+	per_cpu(watchdog_nmi_touch, this_cpu) = true;
+	put_cpu();
 	touch_softlockup_watchdog();
 }
 EXPORT_SYMBOL(touch_nmi_watchdog);


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: fix BUG: using smp_processor_id() in touch_nmi_watchdog and touch_softlockup_watchdog
  2010-08-13 10:21 fix BUG: using smp_processor_id() in touch_nmi_watchdog and touch_softlockup_watchdog Sergey Senozhatsky
@ 2010-08-16  8:22 ` Peter Zijlstra
  2010-08-16 13:34   ` Don Zickus
  2010-08-18 19:33 ` Andrew Morton
  1 sibling, 1 reply; 41+ messages in thread
From: Peter Zijlstra @ 2010-08-16  8:22 UTC (permalink / raw)
  To: Sergey Senozhatsky; +Cc: Andrew Morton, Ingo Molnar, linux-kernel, Don Zickus

On Fri, 2010-08-13 at 13:21 +0300, Sergey Senozhatsky wrote:

> [   67.703556] BUG: using smp_processor_id() in preemptible [00000000] code: s2disk/5139
> [   67.703563] caller is touch_nmi_watchdog+0x15/0x2c
> [   67.703566] Pid: 5139, comm: s2disk Not tainted 2.6.36-rc0-git12-07921-g60bf26a-dirty #116
> [   67.703568] Call Trace:
> [   67.703575]  [<ffffffff811f6bf1>] debug_smp_processor_id+0xc9/0xe4
> [   67.703578]  [<ffffffff81092766>] touch_nmi_watchdog+0x15/0x2c
> [   67.703584]  [<ffffffff81222950>] acpi_os_stall+0x34/0x40
> [   67.703589]  [<ffffffff812398d2>] acpi_ex_system_do_stall+0x34/0x38

Which could mean two things, either ACPI got funny on us, or Don's new
watchdog stuff has a hole in it.


> ---
> 
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 613bc1f..8822f1e 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -116,13 +116,14 @@ static unsigned long get_sample_period(void)
>  static void __touch_watchdog(void)
>  {
>  	int this_cpu = smp_processor_id();
> -
> -	__get_cpu_var(watchdog_touch_ts) = get_timestamp(this_cpu);
> +	per_cpu(watchdog_touch_ts, this_cpu) = get_timestamp(this_cpu);
>  }

That change seems sensible enough..

>  void touch_softlockup_watchdog(void)
>  {
> -	__get_cpu_var(watchdog_touch_ts) = 0;
> +	int this_cpu = get_cpu();
> +	per_cpu(watchdog_touch_ts, this_cpu) = 0;
> +	put_cpu();
>  }
>  EXPORT_SYMBOL(touch_softlockup_watchdog);
>  
> @@ -142,7 +143,9 @@ void touch_all_softlockup_watchdogs(void)
>  #ifdef CONFIG_HARDLOCKUP_DETECTOR
>  void touch_nmi_watchdog(void)
>  {
> -	__get_cpu_var(watchdog_nmi_touch) = true;
> +	int this_cpu = get_cpu();
> +	per_cpu(watchdog_nmi_touch, this_cpu) = true;
> +	put_cpu();
>  	touch_softlockup_watchdog();
>  }
>  EXPORT_SYMBOL(touch_nmi_watchdog);

These other two really are about assumptions we make on the call sites,
which at the very least are violated by ACPI.

Don/Ingo, remember if we require touch_*_watchdog callers to have
preemption disabled? Or is the proposed patch sensible?


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: fix BUG: using smp_processor_id() in touch_nmi_watchdog and touch_softlockup_watchdog
  2010-08-16  8:22 ` Peter Zijlstra
@ 2010-08-16 13:34   ` Don Zickus
  2010-08-16 13:46     ` Peter Zijlstra
  2010-08-16 14:06     ` Yong Zhang
  0 siblings, 2 replies; 41+ messages in thread
From: Don Zickus @ 2010-08-16 13:34 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Sergey Senozhatsky, Andrew Morton, Ingo Molnar, linux-kernel

On Mon, Aug 16, 2010 at 10:22:50AM +0200, Peter Zijlstra wrote:
> On Fri, 2010-08-13 at 13:21 +0300, Sergey Senozhatsky wrote:
> 
> > [   67.703556] BUG: using smp_processor_id() in preemptible [00000000] code: s2disk/5139
> > [   67.703563] caller is touch_nmi_watchdog+0x15/0x2c
> > [   67.703566] Pid: 5139, comm: s2disk Not tainted 2.6.36-rc0-git12-07921-g60bf26a-dirty #116
> > [   67.703568] Call Trace:
> > [   67.703575]  [<ffffffff811f6bf1>] debug_smp_processor_id+0xc9/0xe4
> > [   67.703578]  [<ffffffff81092766>] touch_nmi_watchdog+0x15/0x2c
> > [   67.703584]  [<ffffffff81222950>] acpi_os_stall+0x34/0x40
> > [   67.703589]  [<ffffffff812398d2>] acpi_ex_system_do_stall+0x34/0x38
> 
> Which could mean two things, either ACPI got funny on us, or Don's new
> watchdog stuff has a hole in it.

it could. :-)

> 
> 
> > ---
> > 
> > diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> > index 613bc1f..8822f1e 100644
> > --- a/kernel/watchdog.c
> > +++ b/kernel/watchdog.c
> > @@ -116,13 +116,14 @@ static unsigned long get_sample_period(void)
> >  static void __touch_watchdog(void)
> >  {
> >  	int this_cpu = smp_processor_id();
> > -
> > -	__get_cpu_var(watchdog_touch_ts) = get_timestamp(this_cpu);
> > +	per_cpu(watchdog_touch_ts, this_cpu) = get_timestamp(this_cpu);
> >  }
> 
> That change seems sensible enough..

ok.

> 
> >  void touch_softlockup_watchdog(void)
> >  {
> > -	__get_cpu_var(watchdog_touch_ts) = 0;
> > +	int this_cpu = get_cpu();
> > +	per_cpu(watchdog_touch_ts, this_cpu) = 0;
> > +	put_cpu();
> >  }
> >  EXPORT_SYMBOL(touch_softlockup_watchdog);
> >  
> > @@ -142,7 +143,9 @@ void touch_all_softlockup_watchdogs(void)
> >  #ifdef CONFIG_HARDLOCKUP_DETECTOR
> >  void touch_nmi_watchdog(void)
> >  {
> > -	__get_cpu_var(watchdog_nmi_touch) = true;
> > +	int this_cpu = get_cpu();
> > +	per_cpu(watchdog_nmi_touch, this_cpu) = true;
> > +	put_cpu();
> >  	touch_softlockup_watchdog();
> >  }
> >  EXPORT_SYMBOL(touch_nmi_watchdog);
> 
> These other two really are about assumptions we make on the call sites,
> which at the very least are violated by ACPI.
> 
> Don/Ingo, remember if we require touch_*_watchdog callers to have
> preemption disabled? Or is the proposed patch sensible?

I don't recall any requirement to have preemption disabled when using
those functions.  It seems sensible to put it in the
touch_{softlockup|nmi}_watchdog code.

I assume the reason for having preemption disabled when using
smp_processor_id() is that the code could migrate to another cpu when
rescheduled?

I don't see a problem with the patch, but my low level understanding of
the __get_cpu_var vs. per_cpu isn't very strong.

Cheers,
Don

> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: fix BUG: using smp_processor_id() in touch_nmi_watchdog and touch_softlockup_watchdog
  2010-08-16 13:34   ` Don Zickus
@ 2010-08-16 13:46     ` Peter Zijlstra
  2010-08-16 14:08       ` [PATCH] fix BUG " Sergey Senozhatsky
  2010-08-16 14:12       ` fix BUG: using smp_processor_id() in touch_nmi_watchdog and touch_softlockup_watchdog Don Zickus
  2010-08-16 14:06     ` Yong Zhang
  1 sibling, 2 replies; 41+ messages in thread
From: Peter Zijlstra @ 2010-08-16 13:46 UTC (permalink / raw)
  To: Don Zickus; +Cc: Sergey Senozhatsky, Andrew Morton, Ingo Molnar, linux-kernel

On Mon, 2010-08-16 at 09:34 -0400, Don Zickus wrote:
> > Don/Ingo, remember if we require touch_*_watchdog callers to have
> > preemption disabled? Or is the proposed patch sensible?
> 
> I don't recall any requirement to have preemption disabled when using
> those functions.  It seems sensible to put it in the
> touch_{softlockup|nmi}_watchdog code.

OK, in that case the patch looks sensible.

> I assume the reason for having preemption disabled when using
> smp_processor_id() is that the code could migrate to another cpu when
> rescheduled?

Right, if you can freely schedule, you can get migrated, which means you
can get migrated between having determined the return value and using
it, at which point the computed value is meaningless.

> I don't see a problem with the patch, but my low level understanding of
> the __get_cpu_var vs. per_cpu isn't very strong.

__get_cpu_var() gets you the value on the current cpu, per_cpu() takes a
cpu argument.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: fix BUG: using smp_processor_id() in touch_nmi_watchdog and touch_softlockup_watchdog
  2010-08-16 13:34   ` Don Zickus
  2010-08-16 13:46     ` Peter Zijlstra
@ 2010-08-16 14:06     ` Yong Zhang
  1 sibling, 0 replies; 41+ messages in thread
From: Yong Zhang @ 2010-08-16 14:06 UTC (permalink / raw)
  To: Don Zickus
  Cc: Peter Zijlstra, Sergey Senozhatsky, Andrew Morton, Ingo Molnar,
	linux-kernel

On Mon, Aug 16, 2010 at 09:34:52AM -0400, Don Zickus wrote:
> I don't recall any requirement to have preemption disabled when using
> those functions.

Isn't that implicit? I mean the caller of touch_{softlockup|nmi}_watchdog
will sticky to that cpu before it finish running.

> It seems sensible to put it in the
> touch_{softlockup|nmi}_watchdog code.

I don't think so. Such as:

	...
      preempt_disable()                <===A
      touch_{softlockup|nmi}_watchdog  <===B
      preempt_enable()                 <===C
        ...

You just scroll A and C into B, but what will happen before preempt
occur before A?

> 
> I assume the reason for having preemption disabled when using
> smp_processor_id() is that the code could migrate to another cpu when
> rescheduled?

If the migration could happen, then we could touch the wrong cpu-data,
and the detection on the original cpu will trigger anyway.

> 
> I don't see a problem with the patch, but my low level understanding of
> the __get_cpu_var vs. per_cpu isn't very strong.

Maybe we should use __raw_get_cpu_var() instead.

Thanks,
Yong

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH] fix BUG using smp_processor_id() in touch_nmi_watchdog and touch_softlockup_watchdog
  2010-08-16 13:46     ` Peter Zijlstra
@ 2010-08-16 14:08       ` Sergey Senozhatsky
  2010-08-16 14:30         ` Don Zickus
  2010-08-17  2:59         ` Frederic Weisbecker
  2010-08-16 14:12       ` fix BUG: using smp_processor_id() in touch_nmi_watchdog and touch_softlockup_watchdog Don Zickus
  1 sibling, 2 replies; 41+ messages in thread
From: Sergey Senozhatsky @ 2010-08-16 14:08 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Don Zickus, Andrew Morton, Ingo Molnar, linux-kernel

Fix: acpi_os_stall calls touch_nmi_watchdog and touch_softlockup_watchdog
with preemption enabled causing 'BUG: using smp_processor_id() in preemptible
code'.
     
Patch also removes double smp_processor_id call (smp_processor_id itself and in
__get_cpu_var) in __touch_watchdog.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>

---

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 613bc1f..8822f1e 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -116,13 +116,14 @@ static unsigned long get_sample_period(void)
 static void __touch_watchdog(void)
 {
 	int this_cpu = smp_processor_id();
-
-	__get_cpu_var(watchdog_touch_ts) = get_timestamp(this_cpu);
+	per_cpu(watchdog_touch_ts, this_cpu) = get_timestamp(this_cpu);
 }
 
 void touch_softlockup_watchdog(void)
 {
-	__get_cpu_var(watchdog_touch_ts) = 0;
+	int this_cpu = get_cpu();
+	per_cpu(watchdog_touch_ts, this_cpu) = 0;
+	put_cpu();
 }
 EXPORT_SYMBOL(touch_softlockup_watchdog);
 
@@ -142,7 +143,9 @@ void touch_all_softlockup_watchdogs(void)
 #ifdef CONFIG_HARDLOCKUP_DETECTOR
 void touch_nmi_watchdog(void)
 {
-	__get_cpu_var(watchdog_nmi_touch) = true;
+	int this_cpu = get_cpu();
+	per_cpu(watchdog_nmi_touch, this_cpu) = true;
+	put_cpu();
 	touch_softlockup_watchdog();
 }
 EXPORT_SYMBOL(touch_nmi_watchdog);


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: fix BUG: using smp_processor_id() in touch_nmi_watchdog and touch_softlockup_watchdog
  2010-08-16 13:46     ` Peter Zijlstra
  2010-08-16 14:08       ` [PATCH] fix BUG " Sergey Senozhatsky
@ 2010-08-16 14:12       ` Don Zickus
  2010-08-16 14:29         ` Peter Zijlstra
  1 sibling, 1 reply; 41+ messages in thread
From: Don Zickus @ 2010-08-16 14:12 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Sergey Senozhatsky, Andrew Morton, Ingo Molnar, linux-kernel

On Mon, Aug 16, 2010 at 03:46:58PM +0200, Peter Zijlstra wrote:
> > I don't see a problem with the patch, but my low level understanding of
> > the __get_cpu_var vs. per_cpu isn't very strong.
> 
> __get_cpu_var() gets you the value on the current cpu, per_cpu() takes a
> cpu argument.

Well I know that much. :-)  It seems that __get_cpu_var depends on
preemption being disabled whereas per_cpu does not?  Though for some
reason I thought __get_cpu_var would be more atomic when it grabbed the
current cpu such that you wouldn't need to disable preemption.  Guess not.

Cheers,
Don


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: fix BUG: using smp_processor_id() in touch_nmi_watchdog and touch_softlockup_watchdog
  2010-08-16 14:12       ` fix BUG: using smp_processor_id() in touch_nmi_watchdog and touch_softlockup_watchdog Don Zickus
@ 2010-08-16 14:29         ` Peter Zijlstra
  0 siblings, 0 replies; 41+ messages in thread
From: Peter Zijlstra @ 2010-08-16 14:29 UTC (permalink / raw)
  To: Don Zickus; +Cc: Sergey Senozhatsky, Andrew Morton, Ingo Molnar, linux-kernel

On Mon, 2010-08-16 at 10:12 -0400, Don Zickus wrote:
> On Mon, Aug 16, 2010 at 03:46:58PM +0200, Peter Zijlstra wrote:
> > > I don't see a problem with the patch, but my low level understanding of
> > > the __get_cpu_var vs. per_cpu isn't very strong.
> > 
> > __get_cpu_var() gets you the value on the current cpu, per_cpu() takes a
> > cpu argument.
> 
> Well I know that much. :-)  It seems that __get_cpu_var depends on
> preemption being disabled whereas per_cpu does not?  Though for some
> reason I thought __get_cpu_var would be more atomic when it grabbed the
> current cpu such that you wouldn't need to disable preemption.  Guess not.

Indeed, it can't be implemented atomically on all smp systems, hence its
really nothing other than a 'convenient' short for per_cpu(foo,
smp_processor_id()).

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] fix BUG using smp_processor_id() in touch_nmi_watchdog and touch_softlockup_watchdog
  2010-08-16 14:08       ` [PATCH] fix BUG " Sergey Senozhatsky
@ 2010-08-16 14:30         ` Don Zickus
  2010-08-17  4:27           ` Yong Zhang
  2010-08-17  2:59         ` Frederic Weisbecker
  1 sibling, 1 reply; 41+ messages in thread
From: Don Zickus @ 2010-08-16 14:30 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Peter Zijlstra, Andrew Morton, Ingo Molnar, linux-kernel, fweisbec

cc'ing Frederic

On Mon, Aug 16, 2010 at 05:08:29PM +0300, Sergey Senozhatsky wrote:
> Fix: acpi_os_stall calls touch_nmi_watchdog and touch_softlockup_watchdog
> with preemption enabled causing 'BUG: using smp_processor_id() in preemptible
> code'.
>      
> Patch also removes double smp_processor_id call (smp_processor_id itself and in
> __get_cpu_var) in __touch_watchdog.
> 
> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>

Acked-by: Don Zickus <dzickus@redhat.com>

> 
> ---
> 
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 613bc1f..8822f1e 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -116,13 +116,14 @@ static unsigned long get_sample_period(void)
>  static void __touch_watchdog(void)
>  {
>  	int this_cpu = smp_processor_id();
> -
> -	__get_cpu_var(watchdog_touch_ts) = get_timestamp(this_cpu);
> +	per_cpu(watchdog_touch_ts, this_cpu) = get_timestamp(this_cpu);
>  }
>  
>  void touch_softlockup_watchdog(void)
>  {
> -	__get_cpu_var(watchdog_touch_ts) = 0;
> +	int this_cpu = get_cpu();
> +	per_cpu(watchdog_touch_ts, this_cpu) = 0;
> +	put_cpu();
>  }
>  EXPORT_SYMBOL(touch_softlockup_watchdog);
>  
> @@ -142,7 +143,9 @@ void touch_all_softlockup_watchdogs(void)
>  #ifdef CONFIG_HARDLOCKUP_DETECTOR
>  void touch_nmi_watchdog(void)
>  {
> -	__get_cpu_var(watchdog_nmi_touch) = true;
> +	int this_cpu = get_cpu();
> +	per_cpu(watchdog_nmi_touch, this_cpu) = true;
> +	put_cpu();
>  	touch_softlockup_watchdog();
>  }
>  EXPORT_SYMBOL(touch_nmi_watchdog);
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] fix BUG using smp_processor_id() in touch_nmi_watchdog and touch_softlockup_watchdog
  2010-08-16 14:08       ` [PATCH] fix BUG " Sergey Senozhatsky
  2010-08-16 14:30         ` Don Zickus
@ 2010-08-17  2:59         ` Frederic Weisbecker
  2010-08-17  3:16           ` Yong Zhang
  2010-08-17  7:56           ` [PATCH] fix BUG using smp_processor_id() in touch_nmi_watchdog and touch_softlockup_watchdog (v2) Sergey Senozhatsky
  1 sibling, 2 replies; 41+ messages in thread
From: Frederic Weisbecker @ 2010-08-17  2:59 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Peter Zijlstra, Don Zickus, Andrew Morton, Ingo Molnar, linux-kernel

On Mon, Aug 16, 2010 at 05:08:29PM +0300, Sergey Senozhatsky wrote:
>  void touch_softlockup_watchdog(void)
>  {
> -	__get_cpu_var(watchdog_touch_ts) = 0;
> +	int this_cpu = get_cpu();
> +	per_cpu(watchdog_touch_ts, this_cpu) = 0;
> +	put_cpu();
>  }



If preemption is disabled and you deal with the current cpu,
then please use __get_cpu_var, it makes the code more
readable:


void touch_softlockup_watchdog(void)
{
	preempt_disable();
	__get_cpu_var(watchdog_touch_ts) = 0;
	preempt_enable();
}


Same below.

Thanks.


>  EXPORT_SYMBOL(touch_softlockup_watchdog);
>  
> @@ -142,7 +143,9 @@ void touch_all_softlockup_watchdogs(void)
>  #ifdef CONFIG_HARDLOCKUP_DETECTOR
>  void touch_nmi_watchdog(void)
>  {
> -	__get_cpu_var(watchdog_nmi_touch) = true;
> +	int this_cpu = get_cpu();
> +	per_cpu(watchdog_nmi_touch, this_cpu) = true;
> +	put_cpu();
>  	touch_softlockup_watchdog();
>  }
>  EXPORT_SYMBOL(touch_nmi_watchdog);
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] fix BUG using smp_processor_id() in touch_nmi_watchdog and touch_softlockup_watchdog
  2010-08-17  2:59         ` Frederic Weisbecker
@ 2010-08-17  3:16           ` Yong Zhang
  2010-08-17  8:39             ` Sergey Senozhatsky
  2010-08-17  7:56           ` [PATCH] fix BUG using smp_processor_id() in touch_nmi_watchdog and touch_softlockup_watchdog (v2) Sergey Senozhatsky
  1 sibling, 1 reply; 41+ messages in thread
From: Yong Zhang @ 2010-08-17  3:16 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Sergey Senozhatsky, Peter Zijlstra, Don Zickus, Andrew Morton,
	Ingo Molnar, linux-kernel

On Tue, Aug 17, 2010 at 10:59 AM, Frederic Weisbecker
<fweisbec@gmail.com> wrote:
> If preemption is disabled and you deal with the current cpu,
> then please use __get_cpu_var, it makes the code more
> readable:
>
>
> void touch_softlockup_watchdog(void)
> {
>        preempt_disable();
>        __(watchdog_touch_ts) = 0;
>        preempt_enable();
> }

Why not use __raw_get_cpu_var() instead?
You know adding preempt protection in touch_softlockup_watchdog()
just suppress the warning. Am I missing something?

Thanks,
Yong

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] fix BUG using smp_processor_id() in touch_nmi_watchdog and touch_softlockup_watchdog
  2010-08-16 14:30         ` Don Zickus
@ 2010-08-17  4:27           ` Yong Zhang
  0 siblings, 0 replies; 41+ messages in thread
From: Yong Zhang @ 2010-08-17  4:27 UTC (permalink / raw)
  To: Don Zickus
  Cc: Sergey Senozhatsky, Peter Zijlstra, Andrew Morton, Ingo Molnar,
	linux-kernel, fweisbec

On Mon, Aug 16, 2010 at 10:30 PM, Don Zickus <dzickus@redhat.com> wrote:
>> Patch also removes double smp_processor_id call (smp_processor_id itself and in
>> __get_cpu_var) in __touch_watchdog.

After checking touch_softlockup_watchdog() and touch_nmi_watchdog() in
before version, __raw_get_cpu_var() is used there.

Thanks,
Yong

>>
>> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
>
> Acked-by: Don Zickus <dzickus@redhat.com>
>
>>
>> ---
>>
>> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
>> index 613bc1f..8822f1e 100644
>> --- a/kernel/watchdog.c
>> +++ b/kernel/watchdog.c
>> @@ -116,13 +116,14 @@ static unsigned long get_sample_period(void)
>>  static void __touch_watchdog(void)
>>  {
>>       int this_cpu = smp_processor_id();
>> -
>> -     __get_cpu_var(watchdog_touch_ts) = get_timestamp(this_cpu);
>> +     per_cpu(watchdog_touch_ts, this_cpu) = get_timestamp(this_cpu);
>>  }
>>
>>  void touch_softlockup_watchdog(void)
>>  {
>> -     __get_cpu_var(watchdog_touch_ts) = 0;
>> +     int this_cpu = get_cpu();
>> +     per_cpu(watchdog_touch_ts, this_cpu) = 0;
>> +     put_cpu();
>>  }
>>  EXPORT_SYMBOL(touch_softlockup_watchdog);
>>
>> @@ -142,7 +143,9 @@ void touch_all_softlockup_watchdogs(void)
>>  #ifdef CONFIG_HARDLOCKUP_DETECTOR
>>  void touch_nmi_watchdog(void)
>>  {
>> -     __get_cpu_var(watchdog_nmi_touch) = true;
>> +     int this_cpu = get_cpu();
>> +     per_cpu(watchdog_nmi_touch, this_cpu) = true;
>> +     put_cpu();
>>       touch_softlockup_watchdog();
>>  }
>>  EXPORT_SYMBOL(touch_nmi_watchdog);
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH] fix BUG using smp_processor_id() in touch_nmi_watchdog and touch_softlockup_watchdog (v2)
  2010-08-17  2:59         ` Frederic Weisbecker
  2010-08-17  3:16           ` Yong Zhang
@ 2010-08-17  7:56           ` Sergey Senozhatsky
  1 sibling, 0 replies; 41+ messages in thread
From: Sergey Senozhatsky @ 2010-08-17  7:56 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Peter Zijlstra, Don Zickus, Andrew Morton, Ingo Molnar, linux-kernel

Fix: acpi_os_stall calls touch_nmi_watchdog and touch_softlockup_watchdog
with preemption enabled causing 'BUG: using smp_processor_id() in preemptible 
code'.

Patch also removes double smp_processor_id call (smp_processor_id itself and
in __get_cpu_var) in __touch_watchdog.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>

---

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 613bc1f..cb4f4d4 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -116,13 +116,14 @@ static unsigned long get_sample_period(void)
 static void __touch_watchdog(void)
 {
 	int this_cpu = smp_processor_id();
-
-	__get_cpu_var(watchdog_touch_ts) = get_timestamp(this_cpu);
+	per_cpu(watchdog_touch_ts, this_cpu) = get_timestamp(this_cpu);
 }
 
 void touch_softlockup_watchdog(void)
 {
+	preempt_disable();
 	__get_cpu_var(watchdog_touch_ts) = 0;
+	preempt_enable();
 }
 EXPORT_SYMBOL(touch_softlockup_watchdog);
 
@@ -142,7 +143,10 @@ void touch_all_softlockup_watchdogs(void)
 #ifdef CONFIG_HARDLOCKUP_DETECTOR
 void touch_nmi_watchdog(void)
 {
+	preempt_disable();
 	__get_cpu_var(watchdog_nmi_touch) = true;
+	preempt_enable();
+
 	touch_softlockup_watchdog();
 }
 EXPORT_SYMBOL(touch_nmi_watchdog);


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH] fix BUG using smp_processor_id() in touch_nmi_watchdog and touch_softlockup_watchdog
  2010-08-17  3:16           ` Yong Zhang
@ 2010-08-17  8:39             ` Sergey Senozhatsky
  2010-08-17  9:05               ` Yong Zhang
  0 siblings, 1 reply; 41+ messages in thread
From: Sergey Senozhatsky @ 2010-08-17  8:39 UTC (permalink / raw)
  To: Yong Zhang
  Cc: Frederic Weisbecker, Peter Zijlstra, Don Zickus, Andrew Morton,
	Ingo Molnar, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 923 bytes --]

Hello,

On (08/17/10 11:16), Yong Zhang wrote:
> On Tue, Aug 17, 2010 at 10:59 AM, Frederic Weisbecker
> <fweisbec@gmail.com> wrote:
> > If preemption is disabled and you deal with the current cpu,
> > then please use __get_cpu_var, it makes the code more
> > readable:
> >
> >
> > void touch_softlockup_watchdog(void)
> > {
> >        preempt_disable();
> >        __(watchdog_touch_ts) = 0;
> >        preempt_enable();
> > }
> 
> Why not use __raw_get_cpu_var() instead?
> You know adding preempt protection in touch_softlockup_watchdog()
> just suppress the warning. Am I missing something?
> 

Sorry, my low level understanding of the __raw_get_cpu_var isn't very strong.
I assume it uses current_thread_info()->cpu in some cases (right?) or 
percpu_from_op.


Should it be
acpi_os_stall
	preepmt_disable
	touch_nmi_watchdog
		touch_softlockup_watchdog
	preempt_enable

?

	Sergey

[-- Attachment #2: Type: application/pgp-signature, Size: 316 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] fix BUG using smp_processor_id() in touch_nmi_watchdog and touch_softlockup_watchdog
  2010-08-17  8:39             ` Sergey Senozhatsky
@ 2010-08-17  9:05               ` Yong Zhang
  2010-08-17  9:24                 ` Sergey Senozhatsky
  0 siblings, 1 reply; 41+ messages in thread
From: Yong Zhang @ 2010-08-17  9:05 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Frederic Weisbecker, Peter Zijlstra, Don Zickus, Andrew Morton,
	Ingo Molnar, linux-kernel

On Tue, Aug 17, 2010 at 4:39 PM, Sergey Senozhatsky
<sergey.senozhatsky@gmail.com> wrote:
>> Why not use __raw_get_cpu_var() instead?
>> You know adding preempt protection in touch_softlockup_watchdog()
>> just suppress the warning. Am I missing something?
>>
>
> Sorry, my low level understanding of the __raw_get_cpu_var isn't very strong.
> I assume it uses current_thread_info()->cpu in some cases (right?) or
> percpu_from_op.

The difference is __raw_get_cpu_var() is using raw_smp_processor_id().

>
>
> Should it be
> acpi_os_stall
>        preepmt_disable
>        touch_nmi_watchdog
>                touch_softlockup_watchdog
>        preempt_enable

Actually I don't think this is helpful for the whole function. Because
if acpi_os_stall()
migrate(I don't know if it could) to another CPU just before
preepmt_disable(), we'll
be on the wrong way. Adding preempt protection is just smoothing the warning.

So I prefer using __raw_get_cpu_var() as what we have been done before.

Thanks,
Yong

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] fix BUG using smp_processor_id() in touch_nmi_watchdog and touch_softlockup_watchdog
  2010-08-17  9:05               ` Yong Zhang
@ 2010-08-17  9:24                 ` Sergey Senozhatsky
  2010-08-17  9:37                   ` Yong Zhang
  0 siblings, 1 reply; 41+ messages in thread
From: Sergey Senozhatsky @ 2010-08-17  9:24 UTC (permalink / raw)
  To: Yong Zhang
  Cc: Sergey Senozhatsky, Frederic Weisbecker, Peter Zijlstra,
	Don Zickus, Andrew Morton, Ingo Molnar, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1491 bytes --]

On (08/17/10 17:05), Yong Zhang wrote:
> >> Why not use __raw_get_cpu_var() instead?
> >> You know adding preempt protection in touch_softlockup_watchdog()
> >> just suppress the warning. Am I missing something?
> >>
> >
> > Sorry, my low level understanding of the __raw_get_cpu_var isn't very strong.
> > I assume it uses current_thread_info()->cpu in some cases (right?) or
> > percpu_from_op.
> 
> The difference is __raw_get_cpu_var() is using raw_smp_processor_id().
> 
> >
> >
> > Should it be
> > acpi_os_stall
> >        preepmt_disable
> >        touch_nmi_watchdog
> >                touch_softlockup_watchdog
> >        preempt_enable
> 
> Actually I don't think this is helpful for the whole function. Because
> if acpi_os_stall()
> migrate(I don't know if it could) to another CPU just before
> preepmt_disable(), we'll
> be on the wrong way. Adding preempt protection is just smoothing the warning.
>

OK. Suppose (I don't know if it could) migration has happen 

acpi_os_stall
	__migration__
	touch_nmi_watchdog

How calling raw_smp_processor_id() (which is current_thread_info()->cpu)
vs. preepmt_disable - smp_processor_id() will give us different CPUs?
 
> So I prefer using __raw_get_cpu_var() as what we have been done before.
> 

Hm...

26e09c6eee14f4827b55137ba0eedc4e77cd50ab

 static void __touch_watchdog(void)
 {
-       int this_cpu = raw_smp_processor_id();
+       int this_cpu = smp_processor_id();


	Sergey

[-- Attachment #2: Type: application/pgp-signature, Size: 316 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] fix BUG using smp_processor_id() in touch_nmi_watchdog and touch_softlockup_watchdog
  2010-08-17  9:24                 ` Sergey Senozhatsky
@ 2010-08-17  9:37                   ` Yong Zhang
  2010-08-17 10:28                     ` Sergey Senozhatsky
  2010-08-17 10:39                     ` Sergey Senozhatsky
  0 siblings, 2 replies; 41+ messages in thread
From: Yong Zhang @ 2010-08-17  9:37 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Frederic Weisbecker, Peter Zijlstra, Don Zickus, Andrew Morton,
	Ingo Molnar, linux-kernel

On Tue, Aug 17, 2010 at 5:24 PM, Sergey Senozhatsky
<sergey.senozhatsky@gmail.com> wrote:
> OK. Suppose (I don't know if it could) migration has happen
>
> acpi_os_stall
>        __migration__
>        touch_nmi_watchdog
>
> How calling raw_smp_processor_id() (which is current_thread_info()->cpu)
> vs. preepmt_disable - smp_processor_id() will give us different CPUs?

I don't mean you will get different CPUS(sorry for my poor english).
I mean if the migration could happen, you want to touch_nmi_watchdog()
on CPU A(otherwise the watchdog will shout on us), but eventually we
touch_nmi_watchdog() on CPU B(because of migration),
and this is not what we want.

So preempt_disable() is redundant here.

>
>> So I prefer using __raw_get_cpu_var() as what we have been done before.
>>
>
> Hm...
>
> 26e09c6eee14f4827b55137ba0eedc4e77cd50ab

f69bcf60c3f17aa367e16eef7bc6ab001ea6d58a
2508ce1845a3b256798532b2c6b7997c2dc6533b

you can get the previous touch_*_watchdog there.

Thanks,
Yong

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] fix BUG using smp_processor_id() in touch_nmi_watchdog and touch_softlockup_watchdog
  2010-08-17  9:37                   ` Yong Zhang
@ 2010-08-17 10:28                     ` Sergey Senozhatsky
  2010-08-17 12:48                       ` Yong Zhang
  2010-08-17 10:39                     ` Sergey Senozhatsky
  1 sibling, 1 reply; 41+ messages in thread
From: Sergey Senozhatsky @ 2010-08-17 10:28 UTC (permalink / raw)
  To: Yong Zhang
  Cc: Sergey Senozhatsky, Frederic Weisbecker, Peter Zijlstra,
	Don Zickus, Andrew Morton, Ingo Molnar, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1364 bytes --]

On (08/17/10 17:37), Yong Zhang wrote:
> On Tue, Aug 17, 2010 at 5:24 PM, Sergey Senozhatsky
> <sergey.senozhatsky@gmail.com> wrote:
> > OK. Suppose (I don't know if it could) migration has happen
> >
> > acpi_os_stall
> >        __migration__
> >        touch_nmi_watchdog
> >
> > How calling raw_smp_processor_id() (which is current_thread_info()->cpu)
> > vs. preepmt_disable - smp_processor_id() will give us different CPUs?
> 
> I don't mean you will get different CPUS(sorry for my poor english).
> I mean if the migration could happen, you want to touch_nmi_watchdog()
> on CPU A(otherwise the watchdog will shout on us), but eventually we
> touch_nmi_watchdog() on CPU B(because of migration),
> and this is not what we want.
> 
> So preempt_disable() is redundant here.
>

Shouldn't we be for sure not preepmtible when calling __raw_get_cpu_var?

preempt_disable is reduntant here because current_thread_info()->cpu is 
atomic and we just don't want preempt_(enable|disable) overhead?

	Sergey
 
> >
> >> So I prefer using __raw_get_cpu_var() as what we have been done before.
> >>
> >
> > Hm...
> >
> > 26e09c6eee14f4827b55137ba0eedc4e77cd50ab
> 
> f69bcf60c3f17aa367e16eef7bc6ab001ea6d58a
> 2508ce1845a3b256798532b2c6b7997c2dc6533b
> 
> you can get the previous touch_*_watchdog there.
> 
> Thanks,
> Yong
> 

[-- Attachment #2: Type: application/pgp-signature, Size: 316 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] fix BUG using smp_processor_id() in touch_nmi_watchdog and touch_softlockup_watchdog
  2010-08-17  9:37                   ` Yong Zhang
  2010-08-17 10:28                     ` Sergey Senozhatsky
@ 2010-08-17 10:39                     ` Sergey Senozhatsky
  2010-08-17 12:56                       ` Yong Zhang
  2010-08-17 13:13                       ` Don Zickus
  1 sibling, 2 replies; 41+ messages in thread
From: Sergey Senozhatsky @ 2010-08-17 10:39 UTC (permalink / raw)
  To: Yong Zhang
  Cc: Frederic Weisbecker, Peter Zijlstra, Don Zickus, Andrew Morton,
	Ingo Molnar, linux-kernel

Please kindly review.

---

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 613bc1f..22dd388 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -116,13 +116,12 @@ static unsigned long get_sample_period(void)
 static void __touch_watchdog(void)
 {
 	int this_cpu = smp_processor_id();
-
-	__get_cpu_var(watchdog_touch_ts) = get_timestamp(this_cpu);
+	per_cpu(watchdog_touch_ts, this_cpu) = get_timestamp(this_cpu);
 }
 
 void touch_softlockup_watchdog(void)
 {
-	__get_cpu_var(watchdog_touch_ts) = 0;
+	__raw_get_cpu_var(watchdog_touch_ts) = 0;
 }
 EXPORT_SYMBOL(touch_softlockup_watchdog);
 
@@ -142,7 +141,7 @@ void touch_all_softlockup_watchdogs(void)
 #ifdef CONFIG_HARDLOCKUP_DETECTOR
 void touch_nmi_watchdog(void)
 {
-	__get_cpu_var(watchdog_nmi_touch) = true;
+	__raw_get_cpu_var(watchdog_nmi_touch) = true;
 	touch_softlockup_watchdog();
 }
 EXPORT_SYMBOL(touch_nmi_watchdog);


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH] fix BUG using smp_processor_id() in touch_nmi_watchdog and touch_softlockup_watchdog
  2010-08-17 10:28                     ` Sergey Senozhatsky
@ 2010-08-17 12:48                       ` Yong Zhang
  0 siblings, 0 replies; 41+ messages in thread
From: Yong Zhang @ 2010-08-17 12:48 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Frederic Weisbecker, Peter Zijlstra, Don Zickus, Andrew Morton,
	Ingo Molnar, linux-kernel

On Tue, Aug 17, 2010 at 01:28:19PM +0300, Sergey Senozhatsky wrote:
> > So preempt_disable() is redundant here.
> >
> 
> Shouldn't we be for sure not preepmtible when calling __raw_get_cpu_var?

IMHO, it's the caller's responsibility.

> 
> preempt_disable is reduntant here because current_thread_info()->cpu is 
> atomic and we just don't want preempt_(enable|disable) overhead?

Yep.

Thanks,
Yong

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] fix BUG using smp_processor_id() in touch_nmi_watchdog and touch_softlockup_watchdog
  2010-08-17 10:39                     ` Sergey Senozhatsky
@ 2010-08-17 12:56                       ` Yong Zhang
  2010-08-17 13:13                       ` Don Zickus
  1 sibling, 0 replies; 41+ messages in thread
From: Yong Zhang @ 2010-08-17 12:56 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Frederic Weisbecker, Peter Zijlstra, Don Zickus, Andrew Morton,
	Ingo Molnar, linux-kernel

On Tue, Aug 17, 2010 at 01:39:48PM +0300, Sergey Senozhatsky wrote:
> Please kindly review.
> 
> ---
> 
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 613bc1f..22dd388 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -116,13 +116,12 @@ static unsigned long get_sample_period(void)
>  static void __touch_watchdog(void)
>  {
>  	int this_cpu = smp_processor_id();
> -
> -	__get_cpu_var(watchdog_touch_ts) = get_timestamp(this_cpu);
> +	per_cpu(watchdog_touch_ts, this_cpu) = get_timestamp(this_cpu);
>  }

The two caller of __touch_watchdog() is:
1)watchdog_timer_fn(): it's preempt disabled when called.
2)watchdog(): it's bound to one cpu. Then means using smp_processor_id()
              safely.

So I think this change is needless, but anyway it's harmless.

Below looks fine to me.
But you still need comments from others.

Thanks,
Yong

>  
>  void touch_softlockup_watchdog(void)
>  {
> -	__get_cpu_var(watchdog_touch_ts) = 0;
> +	__raw_get_cpu_var(watchdog_touch_ts) = 0;
>  }
>  EXPORT_SYMBOL(touch_softlockup_watchdog);
>  
> @@ -142,7 +141,7 @@ void touch_all_softlockup_watchdogs(void)
>  #ifdef CONFIG_HARDLOCKUP_DETECTOR
>  void touch_nmi_watchdog(void)
>  {
> -	__get_cpu_var(watchdog_nmi_touch) = true;
> +	__raw_get_cpu_var(watchdog_nmi_touch) = true;
>  	touch_softlockup_watchdog();
>  }
>  EXPORT_SYMBOL(touch_nmi_watchdog);

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] fix BUG using smp_processor_id() in touch_nmi_watchdog and touch_softlockup_watchdog
  2010-08-17 10:39                     ` Sergey Senozhatsky
  2010-08-17 12:56                       ` Yong Zhang
@ 2010-08-17 13:13                       ` Don Zickus
  2010-08-18  2:48                         ` Frederic Weisbecker
  1 sibling, 1 reply; 41+ messages in thread
From: Don Zickus @ 2010-08-17 13:13 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Yong Zhang, Frederic Weisbecker, Peter Zijlstra, Andrew Morton,
	Ingo Molnar, linux-kernel

On Tue, Aug 17, 2010 at 01:39:48PM +0300, Sergey Senozhatsky wrote:
> Please kindly review.

I don't have a deep enough understanding of the subtleties between
per_cpu, __get_cpu_var, and __raw_get_cpu_var to really say which is
correct.  To me, all three versions of your patch look they do the same
thing.

Technically, it seems like preempt_disable/enable would be the correct
thing to do.  But as someone pointed out earlier, if the code is preempted
and switches cpu, then the touch_*_watchdog effectively becomes a no-op
(which I guess it can do even with the preempt_disable/enable surrounding
it).  So I have no idea.  I am going to wait for smarter people than me to
provide an opinion. :-)

Cheers,
Don

> 
> ---
> 
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 613bc1f..22dd388 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -116,13 +116,12 @@ static unsigned long get_sample_period(void)
>  static void __touch_watchdog(void)
>  {
>  	int this_cpu = smp_processor_id();
> -
> -	__get_cpu_var(watchdog_touch_ts) = get_timestamp(this_cpu);
> +	per_cpu(watchdog_touch_ts, this_cpu) = get_timestamp(this_cpu);
>  }
>  
>  void touch_softlockup_watchdog(void)
>  {
> -	__get_cpu_var(watchdog_touch_ts) = 0;
> +	__raw_get_cpu_var(watchdog_touch_ts) = 0;
>  }
>  EXPORT_SYMBOL(touch_softlockup_watchdog);
>  
> @@ -142,7 +141,7 @@ void touch_all_softlockup_watchdogs(void)
>  #ifdef CONFIG_HARDLOCKUP_DETECTOR
>  void touch_nmi_watchdog(void)
>  {
> -	__get_cpu_var(watchdog_nmi_touch) = true;
> +	__raw_get_cpu_var(watchdog_nmi_touch) = true;
>  	touch_softlockup_watchdog();
>  }
>  EXPORT_SYMBOL(touch_nmi_watchdog);
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] fix BUG using smp_processor_id() in touch_nmi_watchdog and touch_softlockup_watchdog
  2010-08-17 13:13                       ` Don Zickus
@ 2010-08-18  2:48                         ` Frederic Weisbecker
  2010-08-18 20:01                           ` Andrew Morton
  0 siblings, 1 reply; 41+ messages in thread
From: Frederic Weisbecker @ 2010-08-18  2:48 UTC (permalink / raw)
  To: Don Zickus, Len Brown
  Cc: Sergey Senozhatsky, Yong Zhang, Peter Zijlstra, Andrew Morton,
	Ingo Molnar, linux-kernel

On Tue, Aug 17, 2010 at 09:13:20AM -0400, Don Zickus wrote:
> On Tue, Aug 17, 2010 at 01:39:48PM +0300, Sergey Senozhatsky wrote:
> > Please kindly review.
> 
> I don't have a deep enough understanding of the subtleties between
> per_cpu, __get_cpu_var, and __raw_get_cpu_var to really say which is
> correct.  To me, all three versions of your patch look they do the same
> thing.
> 
> Technically, it seems like preempt_disable/enable would be the correct
> thing to do.  But as someone pointed out earlier, if the code is preempted
> and switches cpu, then the touch_*_watchdog effectively becomes a no-op
> (which I guess it can do even with the preempt_disable/enable surrounding
> it).  So I have no idea.  I am going to wait for smarter people than me to
> provide an opinion. :-)
> 
> Cheers,
> Don


(Adding Len Brown in Cc.

Len, this is about acpi_os_stall() that touches the watchdog while
running in a preemptable section, this triggers warnings because of
the use of local cpu accessors. We are debating about the appropriate
way to solve this).

The more I think about it, the more I think that doesn't make sense
to have touch_nmi_watchdog() callable from preemptable code.

It is buggy by nature.

If you run in a preemptable section, then interrupts can fire, and if
they can, the nmi watchdog is fine and doesn't need to be touched.

Here the problem is more in the softlockup watchdog, because even if you
run in a preemptable section, if you run a !CONFIG_PREEMPT kernel, then
you can't be preempted and the watchdog won't be scheduled until the
udelay loop finishes. But to solve that you would need cond_resched()
calls, not touching the watchdog.

Because touching the softlockup watchdog doesn't make sense either
if you can migrate: you can run the udelay on CPU 0, then migrate on
CPU 1 and call touch_softlockup_watchdog() from there. Which makes
definetely no sense. This is buggy.

And because we want to avoid such buggy uses of the touch_whatever_watchdog()
APIs, these function must continue to check they are called from non-preemptable
code. Randomly touching the watchdog could hide real lockups to the user.

The problem is on the caller. Considering such udelays loop:

* if it's in a irq disabled section, call touch_nmi_watchdog(), because this
  could prevent the nmi watchdog irq from firing
* if it's in a non-preemptable section, call touch_softlockup_watchdog(), because
  this could prevent the softlockup watchdog task from beeing scheduled
* if it's from a preemptable task context, this should call cond_resched() to
  avoid huge latencies on !CONFIG_PREEMPT


But acpi_os_stall() seem to be called from 4 different places, and these places
may run in different context like the above described.

The ACPI code should probably use more specific busy-loop APIs, depending on the
context it runs.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: fix BUG: using smp_processor_id() in touch_nmi_watchdog and touch_softlockup_watchdog
  2010-08-13 10:21 fix BUG: using smp_processor_id() in touch_nmi_watchdog and touch_softlockup_watchdog Sergey Senozhatsky
  2010-08-16  8:22 ` Peter Zijlstra
@ 2010-08-18 19:33 ` Andrew Morton
  2010-08-18 21:44   ` Cyrill Gorcunov
  2010-09-22  9:00   ` [PATCH] avoid second smp_processor_id() call in __touch_watchdog Sergey Senozhatsky
  1 sibling, 2 replies; 41+ messages in thread
From: Andrew Morton @ 2010-08-18 19:33 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Don Zickus,
	Cyrill Gorcunov, Frederic Weisbecker

On Fri, 13 Aug 2010 13:21:58 +0300
Sergey Senozhatsky <sergey.senozhatsky@gmail.com> wrote:

> Hello,
> 
> Got this traces today:
> 
> ...
> 
> Avoid using smp_processor_id in touch_softlockup_watchdog and touch_nmi_watchdog.
> Patch also "removes" second call to smp_processor_id in __touch_watchdog
> (smp_processor_id itself and smp_processor_id in __get_cpu_var). 
> 
> ---
> 
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 613bc1f..8822f1e 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -116,13 +116,14 @@ static unsigned long get_sample_period(void)
>  static void __touch_watchdog(void)
>  {
>  	int this_cpu = smp_processor_id();
> -
> -	__get_cpu_var(watchdog_touch_ts) = get_timestamp(this_cpu);
> +	per_cpu(watchdog_touch_ts, this_cpu) = get_timestamp(this_cpu);
>  }

Fair enough, although strictly speaking this should be done in a
separate and later patch.

>  void touch_softlockup_watchdog(void)
>  {
> -	__get_cpu_var(watchdog_touch_ts) = 0;
> +	int this_cpu = get_cpu();
> +	per_cpu(watchdog_touch_ts, this_cpu) = 0;
> +	put_cpu();
>  }
>  EXPORT_SYMBOL(touch_softlockup_watchdog);
>  
> @@ -142,7 +143,9 @@ void touch_all_softlockup_watchdogs(void)
>  #ifdef CONFIG_HARDLOCKUP_DETECTOR
>  void touch_nmi_watchdog(void)
>  {
> -	__get_cpu_var(watchdog_nmi_touch) = true;
> +	int this_cpu = get_cpu();
> +	per_cpu(watchdog_nmi_touch, this_cpu) = true;
> +	put_cpu();
>  	touch_softlockup_watchdog();
>  }
>  EXPORT_SYMBOL(touch_nmi_watchdog);

Why did this start happening?  Surely we've called
touch_softlockup_watchdog() from within preemptible code before now. 
Methinks that

: commit 26e09c6eee14f4827b55137ba0eedc4e77cd50ab
: Author:     Don Zickus <dzickus@redhat.com>
: AuthorDate: Mon May 17 18:06:04 2010 -0400
: Commit:     Frederic Weisbecker <fweisbec@gmail.com>
: CommitDate: Wed May 19 11:32:14 2010 +0200
: 
:     lockup_detector: Convert per_cpu to __get_cpu_var for readability

was simply broken?  That would be strange, given that it's been sitting
around since May 17.

If we don't want to revert 26e09c6eee14f4827b55137ba0eedc4e77cd50ab
then I'd suggest that we simply switch to raw_smp_processor_id(): those
newly-added get_cpu/put_cpu calls don't do anything useful.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] fix BUG using smp_processor_id() in touch_nmi_watchdog and touch_softlockup_watchdog
  2010-08-18  2:48                         ` Frederic Weisbecker
@ 2010-08-18 20:01                           ` Andrew Morton
  2010-08-19  2:27                             ` Don Zickus
  2010-08-20  2:57                             ` Don Zickus
  0 siblings, 2 replies; 41+ messages in thread
From: Andrew Morton @ 2010-08-18 20:01 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Don Zickus, Len Brown, Sergey Senozhatsky, Yong Zhang,
	Peter Zijlstra, Ingo Molnar, linux-kernel, linux-acpi,
	Andy Grover

On Wed, 18 Aug 2010 04:48:05 +0200
Frederic Weisbecker <fweisbec@gmail.com> wrote:

> On Tue, Aug 17, 2010 at 09:13:20AM -0400, Don Zickus wrote:
> > On Tue, Aug 17, 2010 at 01:39:48PM +0300, Sergey Senozhatsky wrote:
> > > Please kindly review.
> > 
> > I don't have a deep enough understanding of the subtleties between
> > per_cpu, __get_cpu_var, and __raw_get_cpu_var to really say which is
> > correct.  To me, all three versions of your patch look they do the same
> > thing.
> > 
> > Technically, it seems like preempt_disable/enable would be the correct
> > thing to do.  But as someone pointed out earlier, if the code is preempted
> > and switches cpu, then the touch_*_watchdog effectively becomes a no-op
> > (which I guess it can do even with the preempt_disable/enable surrounding
> > it).  So I have no idea.  I am going to wait for smarter people than me to
> > provide an opinion. :-)
> > 
> > Cheers,
> > Don
> 
> 
> (Adding Len Brown in Cc.

I'm not sure who looks after osl.c.  I added linux-acpi to cc.

> Len, this is about acpi_os_stall() that touches the watchdog while
> running in a preemptable section, this triggers warnings because of
> the use of local cpu accessors. We are debating about the appropriate
> way to solve this).
> 
> The more I think about it, the more I think that doesn't make sense
> to have touch_nmi_watchdog() callable from preemptable code.
> 
> It is buggy by nature.
> 
> If you run in a preemptable section, then interrupts can fire, and if
> they can, the nmi watchdog is fine and doesn't need to be touched.
> 
> Here the problem is more in the softlockup watchdog, because even if you
> run in a preemptable section, if you run a !CONFIG_PREEMPT kernel, then
> you can't be preempted and the watchdog won't be scheduled until the
> udelay loop finishes. But to solve that you would need cond_resched()
> calls, not touching the watchdog.
> 
> Because touching the softlockup watchdog doesn't make sense either
> if you can migrate: you can run the udelay on CPU 0, then migrate on
> CPU 1 and call touch_softlockup_watchdog() from there. Which makes
> definetely no sense. This is buggy.
> 
> And because we want to avoid such buggy uses of the touch_whatever_watchdog()
> APIs, these function must continue to check they are called from non-preemptable
> code. Randomly touching the watchdog could hide real lockups to the user.
> 
> The problem is on the caller. Considering such udelays loop:
> 
> * if it's in a irq disabled section, call touch_nmi_watchdog(), because this
>   could prevent the nmi watchdog irq from firing
> * if it's in a non-preemptable section, call touch_softlockup_watchdog(), because
>   this could prevent the softlockup watchdog task from beeing scheduled
> * if it's from a preemptable task context, this should call cond_resched() to
>   avoid huge latencies on !CONFIG_PREEMPT
> 
> 
> But acpi_os_stall() seem to be called from 4 different places, and these places
> may run in different context like the above described.
> 
> The ACPI code should probably use more specific busy-loop APIs, depending on the
> context it runs.

The touch_nmi_watchdog() was added to acpi_os_stall() by little old me
in 2003.  It was committed by Andy with the patch title "ACPI:
Correctly handle NMI watchdog during long stalls (Andrew Morton)".  My
title was "ACPI poweroff trigers the NMI watchdog".  My changelog was

	ACPI poweroff trigers the NMI watchdog.  Fix.

(My spelling has improved with age).

So.  If we remove that touch, will poweroff still trigger the NMI? 
Dunno.


The surprise new requirement that touch_nmi_watchdog() be called from
non-preemptible code does seem to make sense IMO.  It's hard to see why
anyone would be touching the watchdog unless he's spinning in irqs-off
code.  Except, of course, when we have a utility function which can be
called from wither irqs-on or irqs-off: acpi_os_stall().

That being said, it's not good to introduce new API requirements by
accident!  An audit of all callers should first be performed, at least.


The surprise new requirement that touch_softlockup_watchdog() be called
from non-preemptible code doesn't make sense IMO.  If I have a piece of
code in the kernel which I expect to sit in TASK_UNINTERRUPTIBLE state
for three minutes waiting for my egg to boil, I should be able to do
that and I should be able to touch the softlockup detector without
needing to go non-preemptible.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: fix BUG: using smp_processor_id() in touch_nmi_watchdog and touch_softlockup_watchdog
  2010-08-18 19:33 ` Andrew Morton
@ 2010-08-18 21:44   ` Cyrill Gorcunov
  2010-09-22  9:00   ` [PATCH] avoid second smp_processor_id() call in __touch_watchdog Sergey Senozhatsky
  1 sibling, 0 replies; 41+ messages in thread
From: Cyrill Gorcunov @ 2010-08-18 21:44 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Sergey Senozhatsky, Peter Zijlstra, Ingo Molnar, linux-kernel,
	Don Zickus, Frederic Weisbecker

On Wed, Aug 18, 2010 at 12:33:46PM -0700, Andrew Morton wrote:
> On Fri, 13 Aug 2010 13:21:58 +0300
> Sergey Senozhatsky <sergey.senozhatsky@gmail.com> wrote:
> 
> > Hello,
> > 
> > Got this traces today:
> > 
> > ...
> > 

...

> >  void touch_softlockup_watchdog(void)
> >  {
> > -	__get_cpu_var(watchdog_touch_ts) = 0;
> > +	int this_cpu = get_cpu();
> > +	per_cpu(watchdog_touch_ts, this_cpu) = 0;
> > +	put_cpu();
> >  }
> >  EXPORT_SYMBOL(touch_softlockup_watchdog);
> >  
> > @@ -142,7 +143,9 @@ void touch_all_softlockup_watchdogs(void)
> >  #ifdef CONFIG_HARDLOCKUP_DETECTOR
> >  void touch_nmi_watchdog(void)
> >  {
> > -	__get_cpu_var(watchdog_nmi_touch) = true;
> > +	int this_cpu = get_cpu();
> > +	per_cpu(watchdog_nmi_touch, this_cpu) = true;
> > +	put_cpu();
> >  	touch_softlockup_watchdog();
> >  }
> >  EXPORT_SYMBOL(touch_nmi_watchdog);
> 
> Why did this start happening?  Surely we've called
> touch_softlockup_watchdog() from within preemptible code before now. 

indeed, and we've been using __raw interface before (2.6.18)

void touch_softlockup_watchdog(void)
{
	__raw_get_cpu_var(touch_timestamp) = jiffies;
}

> Methinks that
> 
> : commit 26e09c6eee14f4827b55137ba0eedc4e77cd50ab
> : Author:     Don Zickus <dzickus@redhat.com>
> : AuthorDate: Mon May 17 18:06:04 2010 -0400
> : Commit:     Frederic Weisbecker <fweisbec@gmail.com>
> : CommitDate: Wed May 19 11:32:14 2010 +0200
> : 
> :     lockup_detector: Convert per_cpu to __get_cpu_var for readability
> 
> was simply broken?  That would be strange, given that it's been sitting
> around since May 17.
> 
> If we don't want to revert 26e09c6eee14f4827b55137ba0eedc4e77cd50ab
> then I'd suggest that we simply switch to raw_smp_processor_id(): those
> newly-added get_cpu/put_cpu calls don't do anything useful.
>

I think it's fine to use __get_cpu_var in touch_nmi_watchdog (which should not
be called with irq enabled since it ticks anyway then, at least on x86) for hardware
nmi watchdog, can't conclude anything about softlockup (except that we had __raw
interface before) since I'm not familiar with soflockup watchdog at moment.

	-- Cyrill

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] fix BUG using smp_processor_id() in touch_nmi_watchdog and touch_softlockup_watchdog
  2010-08-18 20:01                           ` Andrew Morton
@ 2010-08-19  2:27                             ` Don Zickus
  2010-08-20  2:57                             ` Don Zickus
  1 sibling, 0 replies; 41+ messages in thread
From: Don Zickus @ 2010-08-19  2:27 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Frederic Weisbecker, Len Brown, Sergey Senozhatsky, Yong Zhang,
	Peter Zijlstra, Ingo Molnar, linux-kernel, linux-acpi,
	Andy Grover

On Wed, Aug 18, 2010 at 01:01:56PM -0700, Andrew Morton wrote:
> The surprise new requirement that touch_nmi_watchdog() be called from
> non-preemptible code does seem to make sense IMO.  It's hard to see why
> anyone would be touching the watchdog unless he's spinning in irqs-off
> code.  Except, of course, when we have a utility function which can be
> called from wither irqs-on or irqs-off: acpi_os_stall().
> 
> That being said, it's not good to introduce new API requirements by
> accident!  An audit of all callers should first be performed, at least.
> 
> 
> The surprise new requirement that touch_softlockup_watchdog() be called
> from non-preemptible code doesn't make sense IMO.  If I have a piece of
> code in the kernel which I expect to sit in TASK_UNINTERRUPTIBLE state
> for three minutes waiting for my egg to boil, I should be able to do
> that and I should be able to touch the softlockup detector without
> needing to go non-preemptible.

Wow.  So after re-reading what the original touch_*_watchdog code did and what I
copied to kernel/watchdog.c, I'm a little embarrassed on how I managed to
mangle the internals of both those functions.

While the idea is the same, the semantics are clearly different.

touch_nmi_watchdog had a for_each_cpu_present loop, which means it didn't
have to deal with the preempt issue.

touch_softlockup_watchdog used __raw_get_cpu_var to excuse itself from
dealing with the preempt issue.

I'll put together a patch that brings those functions back in line with
what they used to be.  Sorry for the trouble.

Cheers,
Don


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] fix BUG using smp_processor_id() in touch_nmi_watchdog and touch_softlockup_watchdog
  2010-08-18 20:01                           ` Andrew Morton
  2010-08-19  2:27                             ` Don Zickus
@ 2010-08-20  2:57                             ` Don Zickus
  2010-08-20  3:42                               ` Andrew Morton
                                                 ` (2 more replies)
  1 sibling, 3 replies; 41+ messages in thread
From: Don Zickus @ 2010-08-20  2:57 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Frederic Weisbecker, Len Brown, Sergey Senozhatsky, Yong Zhang,
	Peter Zijlstra, Ingo Molnar, linux-kernel, linux-acpi,
	Andy Grover

On Wed, Aug 18, 2010 at 01:01:56PM -0700, Andrew Morton wrote:
> The surprise new requirement that touch_nmi_watchdog() be called from
> non-preemptible code does seem to make sense IMO.  It's hard to see why
> anyone would be touching the watchdog unless he's spinning in irqs-off
> code.  Except, of course, when we have a utility function which can be
> called from wither irqs-on or irqs-off: acpi_os_stall().
> 
> That being said, it's not good to introduce new API requirements by
> accident!  An audit of all callers should first be performed, at least.
> 
> 
> The surprise new requirement that touch_softlockup_watchdog() be called
> from non-preemptible code doesn't make sense IMO.  If I have a piece of
> code in the kernel which I expect to sit in TASK_UNINTERRUPTIBLE state
> for three minutes waiting for my egg to boil, I should be able to do
> that and I should be able to touch the softlockup detector without
> needing to go non-preemptible.

Ok, so here is my patch that syncs the touch_*_watchdog back in line with
the old semantics.  Hopefully this will undo any harm I caused.

------------cut -->---------------------------

>From b372e821c804982438db090db6b4a2f753c78091 Mon Sep 17 00:00:00 2001
From: Don Zickus <dzickus@redhat.com>
Date: Thu, 19 Aug 2010 22:48:26 -0400
Subject: [PATCH] [lockup detector] sync touch_*_watchdog back to old semantics

During my rewrite, the semantics of touch_nmi_watchdog and
touch_softlockup_watchdog changed enough to break some drivers
(mostly over preemptable regions).

This change brings those touch_*_watchdog functions back in line
to how they used to work.

Signed-off-by: Don Zickus <dzickus@redhat.com>
---
 kernel/watchdog.c |   17 ++++++++++++-----
 1 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 613bc1f..99e35a2 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -122,7 +122,7 @@ static void __touch_watchdog(void)
 
 void touch_softlockup_watchdog(void)
 {
-	__get_cpu_var(watchdog_touch_ts) = 0;
+	__raw_get_cpu_var(watchdog_touch_ts) = 0;
 }
 EXPORT_SYMBOL(touch_softlockup_watchdog);
 
@@ -142,7 +142,14 @@ void touch_all_softlockup_watchdogs(void)
 #ifdef CONFIG_HARDLOCKUP_DETECTOR
 void touch_nmi_watchdog(void)
 {
-	__get_cpu_var(watchdog_nmi_touch) = true;
+	if (watchdog_enabled) {
+		unsigned cpu;
+
+		for_each_present_cpu(cpu) {
+			if (per_cpu(watchdog_nmi_touch, cpu) != true)
+				per_cpu(watchdog_nmi_touch, cpu) = true;
+		}
+	}
 	touch_softlockup_watchdog();
 }
 EXPORT_SYMBOL(touch_nmi_watchdog);
@@ -430,6 +437,9 @@ static int watchdog_enable(int cpu)
 		wake_up_process(p);
 	}
 
+	/* if any cpu succeeds, watchdog is considered enabled for the system */
+	watchdog_enabled = 1;
+
 	return 0;
 }
 
@@ -452,9 +462,6 @@ static void watchdog_disable(int cpu)
 		per_cpu(softlockup_watchdog, cpu) = NULL;
 		kthread_stop(p);
 	}
-
-	/* if any cpu succeeds, watchdog is considered enabled for the system */
-	watchdog_enabled = 1;
 }
 
 static void watchdog_enable_all_cpus(void)
-- 
1.7.2.1



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH] fix BUG using smp_processor_id() in touch_nmi_watchdog and touch_softlockup_watchdog
  2010-08-20  2:57                             ` Don Zickus
@ 2010-08-20  3:42                               ` Andrew Morton
  2010-08-20 12:34                                 ` Don Zickus
  2010-08-26 17:17                                 ` acpi_os_stall() and touch_nmi_watchdog() (was Re: [PATCH] fix BUG using smp_processor_id() in touch_nmi_watchdog and touch_softlockup_watchdog) Len Brown
  2010-08-20 15:02                               ` [PATCH] fix BUG using smp_processor_id() in touch_nmi_watchdog and touch_softlockup_watchdog Yong Zhang
  2010-08-26 10:14                               ` Maxim Levitsky
  2 siblings, 2 replies; 41+ messages in thread
From: Andrew Morton @ 2010-08-20  3:42 UTC (permalink / raw)
  To: Don Zickus
  Cc: Frederic Weisbecker, Len Brown, Sergey Senozhatsky, Yong Zhang,
	Peter Zijlstra, Ingo Molnar, linux-kernel, linux-acpi,
	Andy Grover, H. Peter Anvin

On Thu, 19 Aug 2010 22:57:49 -0400 Don Zickus <dzickus@redhat.com> wrote:

> On Wed, Aug 18, 2010 at 01:01:56PM -0700, Andrew Morton wrote:
> > The surprise new requirement that touch_nmi_watchdog() be called from
> > non-preemptible code does seem to make sense IMO.  It's hard to see why
> > anyone would be touching the watchdog unless he's spinning in irqs-off
> > code.  Except, of course, when we have a utility function which can be
> > called from wither irqs-on or irqs-off: acpi_os_stall().
> > 
> > That being said, it's not good to introduce new API requirements by
> > accident!  An audit of all callers should first be performed, at least.
> > 
> > 
> > The surprise new requirement that touch_softlockup_watchdog() be called
> > from non-preemptible code doesn't make sense IMO.  If I have a piece of
> > code in the kernel which I expect to sit in TASK_UNINTERRUPTIBLE state
> > for three minutes waiting for my egg to boil, I should be able to do
> > that and I should be able to touch the softlockup detector without
> > needing to go non-preemptible.
> 
> Ok, so here is my patch that syncs the touch_*_watchdog back in line with
> the old semantics.  Hopefully this will undo any harm I caused.
> 
> ------------cut -->---------------------------
> 
> >From b372e821c804982438db090db6b4a2f753c78091 Mon Sep 17 00:00:00 2001
> From: Don Zickus <dzickus@redhat.com>
> Date: Thu, 19 Aug 2010 22:48:26 -0400
> Subject: [PATCH] [lockup detector] sync touch_*_watchdog back to old semantics
> 
> During my rewrite, the semantics of touch_nmi_watchdog and
> touch_softlockup_watchdog changed enough to break some drivers
> (mostly over preemptable regions).
> 
> This change brings those touch_*_watchdog functions back in line
> to how they used to work.
> 
> Signed-off-by: Don Zickus <dzickus@redhat.com>
> ---
>  kernel/watchdog.c |   17 ++++++++++++-----
>  1 files changed, 12 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 613bc1f..99e35a2 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -122,7 +122,7 @@ static void __touch_watchdog(void)
>  
>  void touch_softlockup_watchdog(void)
>  {
> -	__get_cpu_var(watchdog_touch_ts) = 0;
> +	__raw_get_cpu_var(watchdog_touch_ts) = 0;
>  }
>  EXPORT_SYMBOL(touch_softlockup_watchdog);
>  
> @@ -142,7 +142,14 @@ void touch_all_softlockup_watchdogs(void)
>  #ifdef CONFIG_HARDLOCKUP_DETECTOR
>  void touch_nmi_watchdog(void)
>  {
> -	__get_cpu_var(watchdog_nmi_touch) = true;
> +	if (watchdog_enabled) {
> +		unsigned cpu;
> +
> +		for_each_present_cpu(cpu) {
> +			if (per_cpu(watchdog_nmi_touch, cpu) != true)
> +				per_cpu(watchdog_nmi_touch, cpu) = true;
> +		}
> +	}
>  	touch_softlockup_watchdog();
>  }
>  EXPORT_SYMBOL(touch_nmi_watchdog);
> @@ -430,6 +437,9 @@ static int watchdog_enable(int cpu)
>  		wake_up_process(p);
>  	}
>  
> +	/* if any cpu succeeds, watchdog is considered enabled for the system */
> +	watchdog_enabled = 1;
> +
>  	return 0;
>  }
>  
> @@ -452,9 +462,6 @@ static void watchdog_disable(int cpu)
>  		per_cpu(softlockup_watchdog, cpu) = NULL;
>  		kthread_stop(p);
>  	}
> -
> -	/* if any cpu succeeds, watchdog is considered enabled for the system */
> -	watchdog_enabled = 1;
>  }
>  
>  static void watchdog_enable_all_cpus(void)

hm, the code seems a bit screwy.  Maybe it was always thus.

watchdog_enabled gets set in the per-cpu function but it gets cleared
in the all-cpus function.  Asymmetric.

Also afacit the action of cpu-hotunplug+cpu-hotplug will reenable the
watchdog on a CPU which was supposed to have it disabled.  Perhaps you
could recheck that and make sure it all makes sense - perhaps we need a
separate state variable which is purely "current setting of
/proc/sys/kernel/nmi_watchdog" and doesn't get altered internally.

Anyway, I'll be disappearing for a few days so perhaps Frederic or hpa
can help get this all fixed/merged up?

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] fix BUG using smp_processor_id() in touch_nmi_watchdog and touch_softlockup_watchdog
  2010-08-20  3:42                               ` Andrew Morton
@ 2010-08-20 12:34                                 ` Don Zickus
  2010-08-26 17:17                                 ` acpi_os_stall() and touch_nmi_watchdog() (was Re: [PATCH] fix BUG using smp_processor_id() in touch_nmi_watchdog and touch_softlockup_watchdog) Len Brown
  1 sibling, 0 replies; 41+ messages in thread
From: Don Zickus @ 2010-08-20 12:34 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Frederic Weisbecker, Len Brown, Sergey Senozhatsky, Yong Zhang,
	Peter Zijlstra, Ingo Molnar, linux-kernel, linux-acpi,
	Andy Grover, H. Peter Anvin

On Thu, Aug 19, 2010 at 08:42:56PM -0700, Andrew Morton wrote:
> On Thu, 19 Aug 2010 22:57:49 -0400 Don Zickus <dzickus@redhat.com> wrote:
> 
> > On Wed, Aug 18, 2010 at 01:01:56PM -0700, Andrew Morton wrote:
> > @@ -430,6 +437,9 @@ static int watchdog_enable(int cpu)
> >  		wake_up_process(p);
> >  	}
> >  
> > +	/* if any cpu succeeds, watchdog is considered enabled for the system */
> > +	watchdog_enabled = 1;
> > +
> >  	return 0;
> >  }
> >  
> > @@ -452,9 +462,6 @@ static void watchdog_disable(int cpu)
> >  		per_cpu(softlockup_watchdog, cpu) = NULL;
> >  		kthread_stop(p);
> >  	}
> > -
> > -	/* if any cpu succeeds, watchdog is considered enabled for the system */
> > -	watchdog_enabled = 1;
> >  }
> >  
> >  static void watchdog_enable_all_cpus(void)
> 
> hm, the code seems a bit screwy.  Maybe it was always thus.

No, watchdog_enabled was something newly created for the lockup dectector.

> 
> watchdog_enabled gets set in the per-cpu function but it gets cleared
> in the all-cpus function.  Asymmetric.

Yes it is by design.  I was using watchdog_enabled as a global state
variable.  As soon as one cpu was enabled, I would set the bit.  But only
if all the cpus disabled the watchdog would I clear the bit.

> 
> Also afacit the action of cpu-hotunplug+cpu-hotplug will reenable the
> watchdog on a CPU which was supposed to have it disabled.  Perhaps you
> could recheck that and make sure it all makes sense - perhaps we need a
> separate state variable which is purely "current setting of
> /proc/sys/kernel/nmi_watchdog" and doesn't get altered internally.

I wasn't tracking it on a per cpu basis.  I didn't see a need to.  The
watchdog should globally be on/off across the system.  If a system comes
up and one of the cpus could not bring the watchdog online for some
reason, then that is a problem.  If a cpu-hotunplug+cpu-hotplug fixes it,
all the better. :-)

Also, if I wanted to track it per cpu, there is a bunch of status bits in
per-cpu variables that could let the code know whether a particular cpu
watchdog is on/off for either hardlockup or softlockup.

Cheers,
Don

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] fix BUG using smp_processor_id() in touch_nmi_watchdog and touch_softlockup_watchdog
  2010-08-20  2:57                             ` Don Zickus
  2010-08-20  3:42                               ` Andrew Morton
@ 2010-08-20 15:02                               ` Yong Zhang
  2010-08-26 10:14                               ` Maxim Levitsky
  2 siblings, 0 replies; 41+ messages in thread
From: Yong Zhang @ 2010-08-20 15:02 UTC (permalink / raw)
  To: Don Zickus
  Cc: Andrew Morton, Frederic Weisbecker, Len Brown,
	Sergey Senozhatsky, Peter Zijlstra, Ingo Molnar, linux-kernel,
	linux-acpi, Andy Grover

On Thu, Aug 19, 2010 at 10:57:49PM -0400, Don Zickus wrote:
> On Wed, Aug 18, 2010 at 01:01:56PM -0700, Andrew Morton wrote:
> > The surprise new requirement that touch_nmi_watchdog() be called from
> > non-preemptible code does seem to make sense IMO.  It's hard to see why
> > anyone would be touching the watchdog unless he's spinning in irqs-off
> > code.  Except, of course, when we have a utility function which can be
> > called from wither irqs-on or irqs-off: acpi_os_stall().
> > 
> > That being said, it's not good to introduce new API requirements by
> > accident!  An audit of all callers should first be performed, at least.
> > 
> > 
> > The surprise new requirement that touch_softlockup_watchdog() be called
> > from non-preemptible code doesn't make sense IMO.  If I have a piece of
> > code in the kernel which I expect to sit in TASK_UNINTERRUPTIBLE state
> > for three minutes waiting for my egg to boil, I should be able to do
> > that and I should be able to touch the softlockup detector without
> > needing to go non-preemptible.
> 
> Ok, so here is my patch that syncs the touch_*_watchdog back in line with
> the old semantics.  Hopefully this will undo any harm I caused.
> 
> ------------cut -->---------------------------
> 
> >From b372e821c804982438db090db6b4a2f753c78091 Mon Sep 17 00:00:00 2001
> From: Don Zickus <dzickus@redhat.com>
> Date: Thu, 19 Aug 2010 22:48:26 -0400
> Subject: [PATCH] [lockup detector] sync touch_*_watchdog back to old semantics
> 
> During my rewrite, the semantics of touch_nmi_watchdog and
> touch_softlockup_watchdog changed enough to break some drivers
> (mostly over preemptable regions).
> 
> This change brings those touch_*_watchdog functions back in line
> to how they used to work.

This one looks good to me.
Thank you Don.

-Yong

> 
> Signed-off-by: Don Zickus <dzickus@redhat.com>
> ---
>  kernel/watchdog.c |   17 ++++++++++++-----
>  1 files changed, 12 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 613bc1f..99e35a2 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -122,7 +122,7 @@ static void __touch_watchdog(void)
>  
>  void touch_softlockup_watchdog(void)
>  {
> -	__get_cpu_var(watchdog_touch_ts) = 0;
> +	__raw_get_cpu_var(watchdog_touch_ts) = 0;
>  }
>  EXPORT_SYMBOL(touch_softlockup_watchdog);
>  
> @@ -142,7 +142,14 @@ void touch_all_softlockup_watchdogs(void)
>  #ifdef CONFIG_HARDLOCKUP_DETECTOR
>  void touch_nmi_watchdog(void)
>  {
> -	__get_cpu_var(watchdog_nmi_touch) = true;
> +	if (watchdog_enabled) {
> +		unsigned cpu;
> +
> +		for_each_present_cpu(cpu) {
> +			if (per_cpu(watchdog_nmi_touch, cpu) != true)
> +				per_cpu(watchdog_nmi_touch, cpu) = true;
> +		}
> +	}
>  	touch_softlockup_watchdog();
>  }
>  EXPORT_SYMBOL(touch_nmi_watchdog);
> @@ -430,6 +437,9 @@ static int watchdog_enable(int cpu)
>  		wake_up_process(p);
>  	}
>  
> +	/* if any cpu succeeds, watchdog is considered enabled for the system */
> +	watchdog_enabled = 1;
> +
>  	return 0;
>  }
>  
> @@ -452,9 +462,6 @@ static void watchdog_disable(int cpu)
>  		per_cpu(softlockup_watchdog, cpu) = NULL;
>  		kthread_stop(p);
>  	}
> -
> -	/* if any cpu succeeds, watchdog is considered enabled for the system */
> -	watchdog_enabled = 1;
>  }
>  
>  static void watchdog_enable_all_cpus(void)
> -- 
> 1.7.2.1

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] fix BUG using smp_processor_id() in touch_nmi_watchdog and touch_softlockup_watchdog
  2010-08-20  2:57                             ` Don Zickus
  2010-08-20  3:42                               ` Andrew Morton
  2010-08-20 15:02                               ` [PATCH] fix BUG using smp_processor_id() in touch_nmi_watchdog and touch_softlockup_watchdog Yong Zhang
@ 2010-08-26 10:14                               ` Maxim Levitsky
  2010-08-26 14:40                                 ` Don Zickus
  2 siblings, 1 reply; 41+ messages in thread
From: Maxim Levitsky @ 2010-08-26 10:14 UTC (permalink / raw)
  To: Don Zickus
  Cc: Andrew Morton, Frederic Weisbecker, Len Brown,
	Sergey Senozhatsky, Yong Zhang, Peter Zijlstra, Ingo Molnar,
	linux-kernel, linux-acpi, Andy Grover

On Thu, 2010-08-19 at 22:57 -0400, Don Zickus wrote: 
> On Wed, Aug 18, 2010 at 01:01:56PM -0700, Andrew Morton wrote:
> > The surprise new requirement that touch_nmi_watchdog() be called from
> > non-preemptible code does seem to make sense IMO.  It's hard to see why
> > anyone would be touching the watchdog unless he's spinning in irqs-off
> > code.  Except, of course, when we have a utility function which can be
> > called from wither irqs-on or irqs-off: acpi_os_stall().
> > 
> > That being said, it's not good to introduce new API requirements by
> > accident!  An audit of all callers should first be performed, at least.
> > 
> > 
> > The surprise new requirement that touch_softlockup_watchdog() be called
> > from non-preemptible code doesn't make sense IMO.  If I have a piece of
> > code in the kernel which I expect to sit in TASK_UNINTERRUPTIBLE state
> > for three minutes waiting for my egg to boil, I should be able to do
> > that and I should be able to touch the softlockup detector without
> > needing to go non-preemptible.
> 
> Ok, so here is my patch that syncs the touch_*_watchdog back in line with
> the old semantics.  Hopefully this will undo any harm I caused.

Was this patch forgotten?

Best regards,
Maxim Levitsky


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] fix BUG using smp_processor_id() in touch_nmi_watchdog and touch_softlockup_watchdog
  2010-08-26 10:14                               ` Maxim Levitsky
@ 2010-08-26 14:40                                 ` Don Zickus
  0 siblings, 0 replies; 41+ messages in thread
From: Don Zickus @ 2010-08-26 14:40 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: Andrew Morton, Frederic Weisbecker, Len Brown,
	Sergey Senozhatsky, Yong Zhang, Peter Zijlstra, Ingo Molnar,
	linux-kernel, linux-acpi, Andy Grover

On Thu, Aug 26, 2010 at 01:14:31PM +0300, Maxim Levitsky wrote:
> On Thu, 2010-08-19 at 22:57 -0400, Don Zickus wrote: 
> > On Wed, Aug 18, 2010 at 01:01:56PM -0700, Andrew Morton wrote:
> > > The surprise new requirement that touch_nmi_watchdog() be called from
> > > non-preemptible code does seem to make sense IMO.  It's hard to see why
> > > anyone would be touching the watchdog unless he's spinning in irqs-off
> > > code.  Except, of course, when we have a utility function which can be
> > > called from wither irqs-on or irqs-off: acpi_os_stall().
> > > 
> > > That being said, it's not good to introduce new API requirements by
> > > accident!  An audit of all callers should first be performed, at least.
> > > 
> > > 
> > > The surprise new requirement that touch_softlockup_watchdog() be called
> > > from non-preemptible code doesn't make sense IMO.  If I have a piece of
> > > code in the kernel which I expect to sit in TASK_UNINTERRUPTIBLE state
> > > for three minutes waiting for my egg to boil, I should be able to do
> > > that and I should be able to touch the softlockup detector without
> > > needing to go non-preemptible.
> > 
> > Ok, so here is my patch that syncs the touch_*_watchdog back in line with
> > the old semantics.  Hopefully this will undo any harm I caused.
> 
> Was this patch forgotten?

Hm, apparently it was separated out by the mail server.  Here it is again
with some of the headers removed I guess.

Cheers,
Don


From: Don Zickus <dzickus@redhat.com>
Date: Thu, 19 Aug 2010 22:48:26 -0400
Subject: [PATCH] [lockup detector] sync touch_*_watchdog back to old semantics

During my rewrite, the semantics of touch_nmi_watchdog and
touch_softlockup_watchdog changed enough to break some drivers
(mostly over preemptable regions).

This change brings those touch_*_watchdog functions back in line
to how they used to work.

Signed-off-by: Don Zickus <dzickus@redhat.com>
---
 kernel/watchdog.c |   17 ++++++++++++-----
 1 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 613bc1f..99e35a2 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -122,7 +122,7 @@ static void __touch_watchdog(void)
 
 void touch_softlockup_watchdog(void)
 {
-	__get_cpu_var(watchdog_touch_ts) = 0;
+	__raw_get_cpu_var(watchdog_touch_ts) = 0;
 }
 EXPORT_SYMBOL(touch_softlockup_watchdog);
 
@@ -142,7 +142,14 @@ void touch_all_softlockup_watchdogs(void)
 #ifdef CONFIG_HARDLOCKUP_DETECTOR
 void touch_nmi_watchdog(void)
 {
-	__get_cpu_var(watchdog_nmi_touch) = true;
+	if (watchdog_enabled) {
+		unsigned cpu;
+
+		for_each_present_cpu(cpu) {
+			if (per_cpu(watchdog_nmi_touch, cpu) != true)
+				per_cpu(watchdog_nmi_touch, cpu) = true;
+		}
+	}
 	touch_softlockup_watchdog();
 }
 EXPORT_SYMBOL(touch_nmi_watchdog);
@@ -430,6 +437,9 @@ static int watchdog_enable(int cpu)
 		wake_up_process(p);
 	}
 
+	/* if any cpu succeeds, watchdog is considered enabled for the system */
+	watchdog_enabled = 1;
+
 	return 0;
 }
 
@@ -452,9 +462,6 @@ static void watchdog_disable(int cpu)
 		per_cpu(softlockup_watchdog, cpu) = NULL;
 		kthread_stop(p);
 	}
-
-	/* if any cpu succeeds, watchdog is considered enabled for the system */
-	watchdog_enabled = 1;
 }
 
 static void watchdog_enable_all_cpus(void)
-- 
1.7.2.1

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* acpi_os_stall() and touch_nmi_watchdog() (was Re: [PATCH] fix BUG using smp_processor_id() in touch_nmi_watchdog and touch_softlockup_watchdog)
  2010-08-20  3:42                               ` Andrew Morton
  2010-08-20 12:34                                 ` Don Zickus
@ 2010-08-26 17:17                                 ` Len Brown
  1 sibling, 0 replies; 41+ messages in thread
From: Len Brown @ 2010-08-26 17:17 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Don Zickus, Frederic Weisbecker, Len Brown, Sergey Senozhatsky,
	Yong Zhang, Peter Zijlstra, Ingo Molnar, linux-kernel,
	linux-acpi, Andy Grover, H. Peter Anvin

acpi_os_stall() is used in two ways.

The typical way is what triggered this e-mail thread.
It implements the AML "Stall()" operator, and is called
with interrupts enabled with durations <= 100 usec.
So one would expect it to be identical to udelay().

The exception case is when ACPICA calls it with interrupts off
and huge durations when we wrote the poweroff or sleep
register, yet we find outselves still running...

Apparently akpm added touch_nmi_watchdog() to keep the
watchdog from firing in this exception case.

Is it useful to have the watchdog running when
we are waiting for firmware to poweroff the machine?
If no, maybe we should turn it off as part of the shutdown
process rather than using yet another invocation
of touch_nmi_watchdog()?

Is calling delay() with IRQs disabled the best thing
we can do after we ask the firmware to cut power
and it takes a long time?

thanks,
Len Brown, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH] avoid second smp_processor_id() call in __touch_watchdog
  2010-08-18 19:33 ` Andrew Morton
  2010-08-18 21:44   ` Cyrill Gorcunov
@ 2010-09-22  9:00   ` Sergey Senozhatsky
  2010-09-22 14:41     ` Don Zickus
                       ` (2 more replies)
  1 sibling, 3 replies; 41+ messages in thread
From: Sergey Senozhatsky @ 2010-09-22  9:00 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Don Zickus,
	Cyrill Gorcunov, Frederic Weisbecker

Hello,

Per our previous conversation:

Andrew Morton wrote:
> Fair enough, although strictly speaking this should be done in a
> separate and later patch.
> 


Avoid double smp_processor_id() call in __touch_watchdog (smp_processor_id() 
itself and later call in __get_cpu_var())

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>

---

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 7f9c3c5..03d97c5 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -116,8 +116,7 @@ static unsigned long get_sample_period(void)
 static void __touch_watchdog(void)
 {
 	int this_cpu = smp_processor_id();
-
-	__get_cpu_var(watchdog_touch_ts) = get_timestamp(this_cpu);
+	per_cpu(watchdog_touch_ts, this_cpu) = get_timestamp(this_cpu);
 }
 
 void touch_softlockup_watchdog(void)


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH] avoid second smp_processor_id() call in __touch_watchdog
  2010-09-22  9:00   ` [PATCH] avoid second smp_processor_id() call in __touch_watchdog Sergey Senozhatsky
@ 2010-09-22 14:41     ` Don Zickus
  2010-09-22 16:27     ` Frederic Weisbecker
  2010-09-24 19:34     ` Don Zickus
  2 siblings, 0 replies; 41+ messages in thread
From: Don Zickus @ 2010-09-22 14:41 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Andrew Morton, Peter Zijlstra, Ingo Molnar, linux-kernel,
	Cyrill Gorcunov, Frederic Weisbecker

On Wed, Sep 22, 2010 at 12:00:12PM +0300, Sergey Senozhatsky wrote:
> Hello,
> 
> Per our previous conversation:
> 
> Andrew Morton wrote:
> > Fair enough, although strictly speaking this should be done in a
> > separate and later patch.
> > 
> 
> 
> Avoid double smp_processor_id() call in __touch_watchdog (smp_processor_id() 
> itself and later call in __get_cpu_var())
> 
> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>

Thanks!  I'll apply it.

Cheers,
Don
> 
> ---
> 
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 7f9c3c5..03d97c5 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -116,8 +116,7 @@ static unsigned long get_sample_period(void)
>  static void __touch_watchdog(void)
>  {
>  	int this_cpu = smp_processor_id();
> -
> -	__get_cpu_var(watchdog_touch_ts) = get_timestamp(this_cpu);
> +	per_cpu(watchdog_touch_ts, this_cpu) = get_timestamp(this_cpu);
>  }
>  
>  void touch_softlockup_watchdog(void)
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] avoid second smp_processor_id() call in __touch_watchdog
  2010-09-22  9:00   ` [PATCH] avoid second smp_processor_id() call in __touch_watchdog Sergey Senozhatsky
  2010-09-22 14:41     ` Don Zickus
@ 2010-09-22 16:27     ` Frederic Weisbecker
  2010-09-22 16:39       ` Peter Zijlstra
  2010-09-24 19:34     ` Don Zickus
  2 siblings, 1 reply; 41+ messages in thread
From: Frederic Weisbecker @ 2010-09-22 16:27 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Andrew Morton, Peter Zijlstra, Ingo Molnar, linux-kernel,
	Don Zickus, Cyrill Gorcunov

On Wed, Sep 22, 2010 at 12:00:12PM +0300, Sergey Senozhatsky wrote:
> Hello,
> 
> Per our previous conversation:
> 
> Andrew Morton wrote:
> > Fair enough, although strictly speaking this should be done in a
> > separate and later patch.
> > 
> 
> 
> Avoid double smp_processor_id() call in __touch_watchdog (smp_processor_id() 
> itself and later call in __get_cpu_var())
> 
> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> 
> ---
> 
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 7f9c3c5..03d97c5 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -116,8 +116,7 @@ static unsigned long get_sample_period(void)
>  static void __touch_watchdog(void)
>  {
>  	int this_cpu = smp_processor_id();
> -
> -	__get_cpu_var(watchdog_touch_ts) = get_timestamp(this_cpu);
> +	per_cpu(watchdog_touch_ts, this_cpu) = get_timestamp(this_cpu);



I'm not sure we want this. This is called by the watchdog internally,
from the timer or the cpu bound thread, so we probably should better
keep __get_cpu_var() because it checks that we are not in a preemptable
section.

Most of the time, accessing local data using per_cpu instead
of __get_cpu_var is wrong.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] avoid second smp_processor_id() call in __touch_watchdog
  2010-09-22 16:27     ` Frederic Weisbecker
@ 2010-09-22 16:39       ` Peter Zijlstra
  2010-09-22 16:47         ` Frederic Weisbecker
  0 siblings, 1 reply; 41+ messages in thread
From: Peter Zijlstra @ 2010-09-22 16:39 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Sergey Senozhatsky, Andrew Morton, Ingo Molnar, linux-kernel,
	Don Zickus, Cyrill Gorcunov

On Wed, 2010-09-22 at 18:27 +0200, Frederic Weisbecker wrote:
> 
> I'm not sure we want this. This is called by the watchdog internally,
> from the timer or the cpu bound thread, so we probably should better
> keep __get_cpu_var() because it checks that we are not in a preemptable
> section. 

The smp_processor_id() right at the start already does that.

That said, I doubt it really matter one way or the other, compilers have
been known to do CSE for quite a while.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] avoid second smp_processor_id() call in __touch_watchdog
  2010-09-22 16:39       ` Peter Zijlstra
@ 2010-09-22 16:47         ` Frederic Weisbecker
  0 siblings, 0 replies; 41+ messages in thread
From: Frederic Weisbecker @ 2010-09-22 16:47 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Sergey Senozhatsky, Andrew Morton, Ingo Molnar, linux-kernel,
	Don Zickus, Cyrill Gorcunov

On Wed, Sep 22, 2010 at 06:39:19PM +0200, Peter Zijlstra wrote:
> On Wed, 2010-09-22 at 18:27 +0200, Frederic Weisbecker wrote:
> > 
> > I'm not sure we want this. This is called by the watchdog internally,
> > from the timer or the cpu bound thread, so we probably should better
> > keep __get_cpu_var() because it checks that we are not in a preemptable
> > section. 
> 
> The smp_processor_id() right at the start already does that.
> 
> That said, I doubt it really matter one way or the other, compilers have
> been known to do CSE for quite a while.


I don't mind personally. We indeed have this smp_processor_id() that
does the check already. But that's also for readability: reviewers
that are used to deal with per cpu datas are also used to see
per_cpu() for remote percpu data access and get_cpu_var() for local
percpu.

Plus some archs may override their __my_cpu_offset implementation
to provide a faster access.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] avoid second smp_processor_id() call in __touch_watchdog
  2010-09-22  9:00   ` [PATCH] avoid second smp_processor_id() call in __touch_watchdog Sergey Senozhatsky
  2010-09-22 14:41     ` Don Zickus
  2010-09-22 16:27     ` Frederic Weisbecker
@ 2010-09-24 19:34     ` Don Zickus
  2010-09-25 17:43       ` Sergey Senozhatsky
  2 siblings, 1 reply; 41+ messages in thread
From: Don Zickus @ 2010-09-24 19:34 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Andrew Morton, Peter Zijlstra, Ingo Molnar, linux-kernel,
	Cyrill Gorcunov, Frederic Weisbecker

On Wed, Sep 22, 2010 at 12:00:12PM +0300, Sergey Senozhatsky wrote:
> Hello,
> 
> Per our previous conversation:
> 
> Andrew Morton wrote:
> > Fair enough, although strictly speaking this should be done in a
> > separate and later patch.
> > 
> 
> 
> Avoid double smp_processor_id() call in __touch_watchdog (smp_processor_id() 
> itself and later call in __get_cpu_var())

Dropping this patch then based on Peter's and Frederic's feedback that the
compiler probably already optimizes for this, leaving readability as a good
excuse not to change anything.

Cheers,
Don

> 
> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> 
> ---
> 
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 7f9c3c5..03d97c5 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -116,8 +116,7 @@ static unsigned long get_sample_period(void)
>  static void __touch_watchdog(void)
>  {
>  	int this_cpu = smp_processor_id();
> -
> -	__get_cpu_var(watchdog_touch_ts) = get_timestamp(this_cpu);
> +	per_cpu(watchdog_touch_ts, this_cpu) = get_timestamp(this_cpu);
>  }
>  
>  void touch_softlockup_watchdog(void)
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] avoid second smp_processor_id() call in __touch_watchdog
  2010-09-24 19:34     ` Don Zickus
@ 2010-09-25 17:43       ` Sergey Senozhatsky
  0 siblings, 0 replies; 41+ messages in thread
From: Sergey Senozhatsky @ 2010-09-25 17:43 UTC (permalink / raw)
  To: Don Zickus
  Cc: Sergey Senozhatsky, Andrew Morton, Peter Zijlstra, Ingo Molnar,
	linux-kernel, Cyrill Gorcunov, Frederic Weisbecker

[-- Attachment #1: Type: text/plain, Size: 2428 bytes --]

On (09/24/10 15:34), Don Zickus wrote:
> On Wed, Sep 22, 2010 at 12:00:12PM +0300, Sergey Senozhatsky wrote:
> > Hello,
> > 
> > Per our previous conversation:
> > 
> > Andrew Morton wrote:
> > > Fair enough, although strictly speaking this should be done in a
> > > separate and later patch.
> > > 
> > 
> > 
> > Avoid double smp_processor_id() call in __touch_watchdog (smp_processor_id() 
> > itself and later call in __get_cpu_var())
> 
> Dropping this patch then based on Peter's and Frederic's feedback that the
> compiler probably already optimizes for this, leaving readability as a good
> excuse not to change anything.
>

Hello,

gcc 4.5.1, x86_64

Dump of assembler code for function __touch_watchdog:
   <+0>:	push   %rbp
   <+1>:	mov    %rsp,%rbp
   <+4>:	push   %r12
   <+6>:	push   %rbx
   <+7>:	mov    $0x0,%rbx
   <+14>:	callq  0x43 <__touch_watchdog+19>
   <+19>:	mov    %eax,%r12d
   <+22>:	callq  0x4b <__touch_watchdog+27>
   <+27>:	mov    %eax,%eax
   <+29>:	mov    %r12d,%edi
   <+32>:	add    0x0(,%rax,8),%rbx
   <+40>:	callq  0x5d <__touch_watchdog+45>
   <+45>:	shr    $0x1e,%rax
   <+49>:	mov    %rax,(%rbx)
   <+52>:	pop    %rbx
   <+53>:	pop    %r12
   <+55>:	leaveq 
   <+56>:	retq   


patched __touch_watchdog:
Dump of assembler code for function __touch_watchdog:
   <+0>:	push   %rbp
   <+1>:	mov    %rsp,%rbp
   <+4>:	push   %rbx
   <+5>:	mov    $0x0,%rbx
   <+12>:	sub    $0x8,%rsp
   <+16>:	callq  0x45 <__touch_watchdog+21>
   <+21>:	movslq %eax,%rdx
   <+24>:	mov    %eax,%edi
   <+26>:	add    0x0(,%rdx,8),%rbx
   <+34>:	callq  0x57 <__touch_watchdog+39>
   <+39>:	shr    $0x1e,%rax
   <+43>:	mov    %rax,(%rbx)
   <+46>:	pop    %rax
   <+47>:	pop    %rbx
   <+48>:	leaveq 
   <+49>:	retq
 

	Sergey

> > 
> > Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> > 
> > ---
> > 
> > diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> > index 7f9c3c5..03d97c5 100644
> > --- a/kernel/watchdog.c
> > +++ b/kernel/watchdog.c
> > @@ -116,8 +116,7 @@ static unsigned long get_sample_period(void)
> >  static void __touch_watchdog(void)
> >  {
> >  	int this_cpu = smp_processor_id();
> > -
> > -	__get_cpu_var(watchdog_touch_ts) = get_timestamp(this_cpu);
> > +	per_cpu(watchdog_touch_ts, this_cpu) = get_timestamp(this_cpu);
> >  }
> >  
> >  void touch_softlockup_watchdog(void)
> > 
> 

[-- Attachment #2: Type: application/pgp-signature, Size: 316 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2010-09-25 17:43 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-08-13 10:21 fix BUG: using smp_processor_id() in touch_nmi_watchdog and touch_softlockup_watchdog Sergey Senozhatsky
2010-08-16  8:22 ` Peter Zijlstra
2010-08-16 13:34   ` Don Zickus
2010-08-16 13:46     ` Peter Zijlstra
2010-08-16 14:08       ` [PATCH] fix BUG " Sergey Senozhatsky
2010-08-16 14:30         ` Don Zickus
2010-08-17  4:27           ` Yong Zhang
2010-08-17  2:59         ` Frederic Weisbecker
2010-08-17  3:16           ` Yong Zhang
2010-08-17  8:39             ` Sergey Senozhatsky
2010-08-17  9:05               ` Yong Zhang
2010-08-17  9:24                 ` Sergey Senozhatsky
2010-08-17  9:37                   ` Yong Zhang
2010-08-17 10:28                     ` Sergey Senozhatsky
2010-08-17 12:48                       ` Yong Zhang
2010-08-17 10:39                     ` Sergey Senozhatsky
2010-08-17 12:56                       ` Yong Zhang
2010-08-17 13:13                       ` Don Zickus
2010-08-18  2:48                         ` Frederic Weisbecker
2010-08-18 20:01                           ` Andrew Morton
2010-08-19  2:27                             ` Don Zickus
2010-08-20  2:57                             ` Don Zickus
2010-08-20  3:42                               ` Andrew Morton
2010-08-20 12:34                                 ` Don Zickus
2010-08-26 17:17                                 ` acpi_os_stall() and touch_nmi_watchdog() (was Re: [PATCH] fix BUG using smp_processor_id() in touch_nmi_watchdog and touch_softlockup_watchdog) Len Brown
2010-08-20 15:02                               ` [PATCH] fix BUG using smp_processor_id() in touch_nmi_watchdog and touch_softlockup_watchdog Yong Zhang
2010-08-26 10:14                               ` Maxim Levitsky
2010-08-26 14:40                                 ` Don Zickus
2010-08-17  7:56           ` [PATCH] fix BUG using smp_processor_id() in touch_nmi_watchdog and touch_softlockup_watchdog (v2) Sergey Senozhatsky
2010-08-16 14:12       ` fix BUG: using smp_processor_id() in touch_nmi_watchdog and touch_softlockup_watchdog Don Zickus
2010-08-16 14:29         ` Peter Zijlstra
2010-08-16 14:06     ` Yong Zhang
2010-08-18 19:33 ` Andrew Morton
2010-08-18 21:44   ` Cyrill Gorcunov
2010-09-22  9:00   ` [PATCH] avoid second smp_processor_id() call in __touch_watchdog Sergey Senozhatsky
2010-09-22 14:41     ` Don Zickus
2010-09-22 16:27     ` Frederic Weisbecker
2010-09-22 16:39       ` Peter Zijlstra
2010-09-22 16:47         ` Frederic Weisbecker
2010-09-24 19:34     ` Don Zickus
2010-09-25 17:43       ` Sergey Senozhatsky

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.