linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [1/4] 2.6.22-rc3: known regressions
@ 2007-05-29 12:52 Michal Piotrowski
  2007-05-29 14:34 ` Jan Kara
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Michal Piotrowski @ 2007-05-29 12:52 UTC (permalink / raw)
  To: Linus Torvalds, Andrew Morton, LKML, Miklos Szeredi, Ingo Molnar,
	linux-acpi, Len Brown, Thierry Volpiatto, alsa-devel,
	Takashi Iwai, Jaroslav Kysela, Ben Collins, Jan Kara,
	linux-fsdevel, Florin Iucha, Sam Ravnborg, Andrey Borzenkov

Hi all,

Here is a list of some known regressions in 2.6.22-rc3.

Feel free to add new regressions/remove fixed etc.
http://kernelnewbies.org/known_regressions



Unclassified

Subject    : long freezes on thinkpad t60
References : http://lkml.org/lkml/2007/5/24/100
Submitter  : Miklos Szeredi <miklos@szeredi.hu>
Handled-By : Ingo Molnar <mingo@elte.hu>
Status     : problem is being debugged



ACPI

Subject    : unable to shutdown on kernel 2.6.22-rc2
References : http://bugzilla.kernel.org/show_bug.cgi?id=8516
Submitter  : Thierry Volpiatto <tvolpiatt@neuf.fr>
Status     : Unknown



ALSA

Subject    : snd-aoa causes badness in lib/kref.c:33
References : http://bugzilla.kernel.org/show_bug.cgi?id=8513
Submitter  : Ben Collins <bcollins@ubuntu.com>
Status     : Unknown



File systems

Subject    : Oops in dentry_iput with 2.6.22-rc2 on AMD64
References : http://lkml.org/lkml/2007/5/22/4
Submitter  : Florin Iucha <florin@iucha.net>
Status     : Unknown



Kbuild

Subject    : make M=$PWD modules_install does nothing
References : http://lkml.org/lkml/2007/5/27/190
Submitter  : Andrey Borzenkov <arvidjaar@mail.ru>
Status     : Unknown



Regards,
Michal

--
"Najbardziej brakowało mi twojego milczenia."
-- Andrzej Sapkowski "Coś więcej"


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [1/4] 2.6.22-rc3: known regressions
  2007-05-29 12:52 [1/4] 2.6.22-rc3: known regressions Michal Piotrowski
@ 2007-05-29 14:34 ` Jan Kara
  2007-05-29 14:41   ` Florin Iucha
  2007-05-30  4:35 ` Sam Ravnborg
  2007-06-03 13:02 ` Udo A. Steinberg
  2 siblings, 1 reply; 18+ messages in thread
From: Jan Kara @ 2007-05-29 14:34 UTC (permalink / raw)
  To: Michal Piotrowski
  Cc: Linus Torvalds, Andrew Morton, LKML, Miklos Szeredi, Ingo Molnar,
	linux-acpi, Len Brown, Thierry Volpiatto, alsa-devel,
	Takashi Iwai, Jaroslav Kysela, Ben Collins, Jan Kara,
	linux-fsdevel, Florin Iucha, Sam Ravnborg, Andrey Borzenkov

  Hi,

On Tue 29-05-07 14:52:53, Michal Piotrowski wrote:
> Here is a list of some known regressions in 2.6.22-rc3.
> 
> Feel free to add new regressions/remove fixed etc.
> http://kernelnewbies.org/known_regressions
> 
  <snip>
> File systems
> 
> Subject    : Oops in dentry_iput with 2.6.22-rc2 on AMD64
> References : http://lkml.org/lkml/2007/5/22/4
> Submitter  : Florin Iucha <florin@iucha.net>
> Status     : Unknown
  Actually, the bug seems to be unreproducible and it has probably been a
1-bit flip. So I'd be reluctant to call it a regression...

								Honza
-- 
Jan Kara <jack@suse.cz>
SuSE CR Labs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [1/4] 2.6.22-rc3: known regressions
  2007-05-29 14:34 ` Jan Kara
@ 2007-05-29 14:41   ` Florin Iucha
  0 siblings, 0 replies; 18+ messages in thread
From: Florin Iucha @ 2007-05-29 14:41 UTC (permalink / raw)
  To: Jan Kara
  Cc: Michal Piotrowski, Linus Torvalds, Andrew Morton, LKML,
	Miklos Szeredi, Ingo Molnar, linux-acpi, Len Brown,
	Thierry Volpiatto, alsa-devel, Takashi Iwai, Jaroslav Kysela,
	Ben Collins, linux-fsdevel, Sam Ravnborg, Andrey Borzenkov

[-- Attachment #1: Type: text/plain, Size: 707 bytes --]

On Tue, May 29, 2007 at 04:34:59PM +0200, Jan Kara wrote:
> On Tue 29-05-07 14:52:53, Michal Piotrowski wrote:
> > Here is a list of some known regressions in 2.6.22-rc3.
> > 
> > Subject    : Oops in dentry_iput with 2.6.22-rc2 on AMD64
> > References : http://lkml.org/lkml/2007/5/22/4
> > Submitter  : Florin Iucha <florin@iucha.net>
> > Status     : Unknown
>   Actually, the bug seems to be unreproducible and it has probably been a
> 1-bit flip. So I'd be reluctant to call it a regression...

I agree with this statement.  I'll ping Michal and Jan if the oops
resurfaces.

florin

-- 
Bruce Schneier expects the Spanish Inquisition.
      http://geekz.co.uk/schneierfacts/fact/163

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [1/4] 2.6.22-rc3: known regressions
  2007-05-29 12:52 [1/4] 2.6.22-rc3: known regressions Michal Piotrowski
  2007-05-29 14:34 ` Jan Kara
@ 2007-05-30  4:35 ` Sam Ravnborg
  2007-06-03 13:02 ` Udo A. Steinberg
  2 siblings, 0 replies; 18+ messages in thread
From: Sam Ravnborg @ 2007-05-30  4:35 UTC (permalink / raw)
  To: Michal Piotrowski
  Cc: Linus Torvalds, Andrew Morton, LKML, Miklos Szeredi, Ingo Molnar,
	linux-acpi, Len Brown, Thierry Volpiatto, alsa-devel,
	Takashi Iwai, Jaroslav Kysela, Ben Collins, Jan Kara,
	linux-fsdevel, Florin Iucha, Andrey Borzenkov

On Tue, May 29, 2007 at 02:52:53PM +0200, Michal Piotrowski wrote:
> Hi all,
> 
> Here is a list of some known regressions in 2.6.22-rc3.
> 
> 
> Kbuild
> 
> Subject    : make M=$PWD modules_install does nothing
> References : http://lkml.org/lkml/2007/5/27/190
> Submitter  : Andrey Borzenkov <arvidjaar@mail.ru>
> Status     : Unknown
Closed - see http://lkml.org/lkml/2007/5/29/497

	Sam

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [1/4] 2.6.22-rc3: known regressions
  2007-05-29 12:52 [1/4] 2.6.22-rc3: known regressions Michal Piotrowski
  2007-05-29 14:34 ` Jan Kara
  2007-05-30  4:35 ` Sam Ravnborg
@ 2007-06-03 13:02 ` Udo A. Steinberg
  2007-06-08  6:02   ` [PATCH] Fix interchanged parameters to release_{evntsel,perfctr}_nmi Björn Steinbrink
  2 siblings, 1 reply; 18+ messages in thread
From: Udo A. Steinberg @ 2007-06-03 13:02 UTC (permalink / raw)
  To: Michal Piotrowski; +Cc: Linus Torvalds, Andrew Morton, LKML, Ingo Molnar

[-- Attachment #1: Type: text/plain, Size: 1814 bytes --]

On Tue, 29 May 2007 14:52:53 +0200 Michal Piotrowski (MP) wrote:

MP> Here is a list of some known regressions in 2.6.22-rc3.
MP> 
MP> Feel free to add new regressions/remove fixed etc.
MP> http://kernelnewbies.org/known_regressions

Here's another 2.6.22-rc3 regression. It was ok on 2.6.21. I believe it
triggered during: echo 0 > /proc/sys/kernel/nmi_watchdog


------------[ cut here ]------------
kernel BUG at arch/i386/kernel/cpu/perfctr-watchdog.c:126!
invalid opcode: 0000 [#1]
PREEMPT 
Modules linked in:
CPU:    0
EIP:    0060:[<c010cae5>]    Not tainted VLI
EFLAGS: 00010286   (2.6.22-rc3 #2)
EIP is at release_evntsel_nmi+0x16/0x22
eax: 000000c1   ebx: 080f7408   ecx: c04296e0   edx: ffffff3b
esi: 00000001   edi: f69d4240   ebp: 00000002   esp: f6962f30
ds: 007b   es: 007b   fs: 0000  gs: 0033  ss: 0068
Process rc.M (pid: 1281, ti=f6962000 task=f706c030 task.ti=f6962000)
Stack: c010cb60 c010cda3 c0110abe 080f7408 f6962f64 f6962fa0 c042ab68 ffffffff 
       c01853a8 080f7408 f6962f64 f6962fa0 080f7408 00000002 c042a774 f69d4240 
       080f7408 c0185339 00000002 c0156d33 f6962fa0 f7fcccb4 f69d4240 fffffff7 
Call Trace:
 [<c010cb60>] single_msr_unreserve+0xd/0x1a
 [<c010cda3>] disable_lapic_nmi_watchdog+0x2b/0x39
 [<c0110abe>] proc_nmi_enabled+0xa0/0xbd
 [<c01853a8>] proc_sys_write+0x6f/0x8c
 [<c0185339>] proc_sys_write+0x0/0x8c
 [<c0156d33>] vfs_write+0x8a/0x10c
 [<c01571ef>] sys_write+0x41/0x67
 [<c0103c30>] syscall_call+0x7/0xb
 =======================
Code: 00 c7 04 24 f6 5d 3c c0 e8 7d e0 00 00 83 ca ff 89 d0 5a 59 c3 8b 0d 28
6e 48 c0 31 d2 85 c9 74 0e 89 c2 2b 51 18 83 fa 42 76 04 <0f> 0b eb fe 0f b3 15
38 6e 48 c0 c3 8b 0d 28 6e 48 c0 31 d2 85 EIP: [<c010cae5>]
release_evntsel_nmi+0x16/0x22 SS:ESP 0068:f6962f30

Cheers,

	- Udo

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH] Fix interchanged parameters to release_{evntsel,perfctr}_nmi
  2007-06-03 13:02 ` Udo A. Steinberg
@ 2007-06-08  6:02   ` Björn Steinbrink
  2007-06-08  6:41     ` Andrew Morton
  0 siblings, 1 reply; 18+ messages in thread
From: Björn Steinbrink @ 2007-06-08  6:02 UTC (permalink / raw)
  To: Udo A. Steinberg
  Cc: Michal Piotrowski, Linus Torvalds, Andrew Morton, LKML, Ingo Molnar, ak

On 2007.06.03 15:02:46 +0200, Udo A. Steinberg wrote:
> On Tue, 29 May 2007 14:52:53 +0200 Michal Piotrowski (MP) wrote:
> 
> MP> Here is a list of some known regressions in 2.6.22-rc3.
> MP> 
> MP> Feel free to add new regressions/remove fixed etc.
> MP> http://kernelnewbies.org/known_regressions
> 
> Here's another 2.6.22-rc3 regression. It was ok on 2.6.21. I believe it
> triggered during: echo 0 > /proc/sys/kernel/nmi_watchdog
> 
> 
> ------------[ cut here ]------------
> kernel BUG at arch/i386/kernel/cpu/perfctr-watchdog.c:126!
> invalid opcode: 0000 [#1]
> PREEMPT 
> Modules linked in:
> CPU:    0
> EIP:    0060:[<c010cae5>]    Not tainted VLI
> EFLAGS: 00010286   (2.6.22-rc3 #2)
> EIP is at release_evntsel_nmi+0x16/0x22
> eax: 000000c1   ebx: 080f7408   ecx: c04296e0   edx: ffffff3b
> esi: 00000001   edi: f69d4240   ebp: 00000002   esp: f6962f30
> ds: 007b   es: 007b   fs: 0000  gs: 0033  ss: 0068
> Process rc.M (pid: 1281, ti=f6962000 task=f706c030 task.ti=f6962000)
> Stack: c010cb60 c010cda3 c0110abe 080f7408 f6962f64 f6962fa0 c042ab68 ffffffff 
>        c01853a8 080f7408 f6962f64 f6962fa0 080f7408 00000002 c042a774 f69d4240 
>        080f7408 c0185339 00000002 c0156d33 f6962fa0 f7fcccb4 f69d4240 fffffff7 
> Call Trace:
>  [<c010cb60>] single_msr_unreserve+0xd/0x1a
>  [<c010cda3>] disable_lapic_nmi_watchdog+0x2b/0x39
>  [<c0110abe>] proc_nmi_enabled+0xa0/0xbd
>  [<c01853a8>] proc_sys_write+0x6f/0x8c
>  [<c0185339>] proc_sys_write+0x0/0x8c
>  [<c0156d33>] vfs_write+0x8a/0x10c
>  [<c01571ef>] sys_write+0x41/0x67
>  [<c0103c30>] syscall_call+0x7/0xb
>  =======================
> Code: 00 c7 04 24 f6 5d 3c c0 e8 7d e0 00 00 83 ca ff 89 d0 5a 59 c3 8b 0d 28
> 6e 48 c0 31 d2 85 c9 74 0e 89 c2 2b 51 18 83 fa 42 76 04 <0f> 0b eb fe 0f b3 15
> 38 6e 48 c0 c3 8b 0d 28 6e 48 c0 31 d2 85 EIP: [<c010cae5>]
> release_evntsel_nmi+0x16/0x22 SS:ESP 0068:f6962f30

The culprit seems to be 09198e68501a7e34737cd9264d266f42429abcdc:
[PATCH] i386: Clean up NMI watchdog code

In two places, the parameters to release_{evntsel,perfctr}_nmi
got interchanged during the cleanup. Unfortunately, the NMI watchdog
doesn't want to be enabled on my T43 at all (or I just have no idea what
magic is required to make it happy), so this patch untested. Could you
give it a spin?

Thanks,
Björn


From: Björn Steinbrink <B.Steinbrink@gmx.de>

Fix interchanged parameters to release_{evntsel,perfctr}_nmi.

Signed-off-by: Björn Steinbrink <B.Steinbrink@gmx.de>
---
diff --git a/arch/i386/kernel/cpu/perfctr-watchdog.c b/arch/i386/kernel/cpu/perfctr-watchdog.c
index 2b04c8f..e490ac2 100644
--- a/arch/i386/kernel/cpu/perfctr-watchdog.c
+++ b/arch/i386/kernel/cpu/perfctr-watchdog.c
@@ -276,8 +276,8 @@ static int single_msr_reserve(void)
 
 static void single_msr_unreserve(void)
 {
-	release_evntsel_nmi(wd_ops->perfctr);
-	release_perfctr_nmi(wd_ops->evntsel);
+	release_evntsel_nmi(wd_ops->evntsel);
+	release_perfctr_nmi(wd_ops->perfctr);
 }
 
 static void single_msr_rearm(struct nmi_watchdog_ctlblk *wd, unsigned nmi_hz)
@@ -475,10 +475,10 @@ static void p4_unreserve(void)
 {
 #ifdef CONFIG_SMP
 	if (smp_num_siblings > 1)
-		release_evntsel_nmi(MSR_P4_IQ_PERFCTR1);
+		release_perfctr_nmi(MSR_P4_IQ_PERFCTR1);
 #endif
-	release_evntsel_nmi(MSR_P4_IQ_PERFCTR0);
-	release_perfctr_nmi(MSR_P4_CRU_ESCR0);
+	release_evntsel_nmi(MSR_P4_CRU_ESCR0);
+	release_perfctr_nmi(MSR_P4_IQ_PERFCTR0);
 }
 
 static void p4_rearm(struct nmi_watchdog_ctlblk *wd, unsigned nmi_hz)

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] Fix interchanged parameters to release_{evntsel,perfctr}_nmi
  2007-06-08  6:02   ` [PATCH] Fix interchanged parameters to release_{evntsel,perfctr}_nmi Björn Steinbrink
@ 2007-06-08  6:41     ` Andrew Morton
  2007-06-08 10:58       ` Ingo Molnar
  0 siblings, 1 reply; 18+ messages in thread
From: Andrew Morton @ 2007-06-08  6:41 UTC (permalink / raw)
  To: Björn Steinbrink
  Cc: Udo A. Steinberg, Michal Piotrowski, Linus Torvalds, LKML,
	Ingo Molnar, ak

On Fri, 8 Jun 2007 08:02:44 +0200 Björn Steinbrink <B.Steinbrink@gmx.de> wrote:

> Fix interchanged parameters to release_{evntsel,perfctr}_nmi.
> 
> Signed-off-by: Björn Steinbrink <B.Steinbrink@gmx.de>
> ---
> diff --git a/arch/i386/kernel/cpu/perfctr-watchdog.c b/arch/i386/kernel/cpu/perfctr-watchdog.c
> index 2b04c8f..e490ac2 100644
> --- a/arch/i386/kernel/cpu/perfctr-watchdog.c
> +++ b/arch/i386/kernel/cpu/perfctr-watchdog.c
> @@ -276,8 +276,8 @@ static int single_msr_reserve(void)
>  
>  static void single_msr_unreserve(void)
>  {
> -	release_evntsel_nmi(wd_ops->perfctr);
> -	release_perfctr_nmi(wd_ops->evntsel);
> +	release_evntsel_nmi(wd_ops->evntsel);
> +	release_perfctr_nmi(wd_ops->perfctr);
>  }
>  
>  static void single_msr_rearm(struct nmi_watchdog_ctlblk *wd, unsigned nmi_hz)
> @@ -475,10 +475,10 @@ static void p4_unreserve(void)
>  {
>  #ifdef CONFIG_SMP
>  	if (smp_num_siblings > 1)
> -		release_evntsel_nmi(MSR_P4_IQ_PERFCTR1);
> +		release_perfctr_nmi(MSR_P4_IQ_PERFCTR1);
>  #endif
> -	release_evntsel_nmi(MSR_P4_IQ_PERFCTR0);
> -	release_perfctr_nmi(MSR_P4_CRU_ESCR0);
> +	release_evntsel_nmi(MSR_P4_CRU_ESCR0);
> +	release_perfctr_nmi(MSR_P4_IQ_PERFCTR0);
>  }
>  
>  static void p4_rearm(struct nmi_watchdog_ctlblk *wd, unsigned nmi_hz)

Half of this (the first hunk) has been in Andi's tree for a day or two.

I shall drop Andi's patch,  queue this one up and shall send this off to Linus if
nothing else happens in the next couple of days.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] Fix interchanged parameters to release_{evntsel,perfctr}_nmi
  2007-06-08  6:41     ` Andrew Morton
@ 2007-06-08 10:58       ` Ingo Molnar
  2007-06-08 18:44         ` [PATCH 0/2] i386: Fix two more NMI watchdog bugs Björn Steinbrink
  0 siblings, 1 reply; 18+ messages in thread
From: Ingo Molnar @ 2007-06-08 10:58 UTC (permalink / raw)
  To: Andrew Morton, Andi Kleen
  Cc: Björn Steinbrink, Udo A. Steinberg, Michal Piotrowski,
	Linus Torvalds, LKML, ak


* Andrew Morton <akpm@linux-foundation.org> wrote:

> Half of this (the first hunk) has been in Andi's tree for a day or 
> two.
> 
> I shall drop Andi's patch, queue this one up and shall send this off 
> to Linus if nothing else happens in the next couple of days.

this patch does not fix the NMI watchdog bootup lockup i can reproduce, 
it still occurs in -rc4 too. Andi, could you please react to my report? 
See the "2.6.22-rc3 nmi watchdog hang" thread on lkml.

	Ingo

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 0/2] i386: Fix two more NMI watchdog bugs
  2007-06-08 10:58       ` Ingo Molnar
@ 2007-06-08 18:44         ` Björn Steinbrink
  2007-06-08 18:46           ` [PATCH 1/2] i386: Fix NMI watchdog not reserving its MSRs Björn Steinbrink
                             ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Björn Steinbrink @ 2007-06-08 18:44 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrew Morton, Andi Kleen, Udo A. Steinberg, Michal Piotrowski,
	Linus Torvalds, LKML, ak

Hi Ingo,

On 2007.06.08 12:58:08 +0200, Ingo Molnar wrote:
> 
> * Andrew Morton <akpm@linux-foundation.org> wrote:
> 
> > Half of this (the first hunk) has been in Andi's tree for a day or 
> > two.
> > 
> > I shall drop Andi's patch, queue this one up and shall send this off 
> > to Linus if nothing else happens in the next couple of days.
> 
> this patch does not fix the NMI watchdog bootup lockup i can reproduce, 
> it still occurs in -rc4 too. Andi, could you please react to my report? 
> See the "2.6.22-rc3 nmi watchdog hang" thread on lkml.

Ok, so after I figured out again how to enable the nmi watchdog, I found
a few more bugs. One is pretty clear, calling a function directly while
the wrapper should be used, causing a(nother) BUG() when the watchdog is
disabled using /proc/sys/...

The other is less clear (to me). It seems like the perfect candidate to
muck up the watchdog, but I can't get it to do that. On system boot up,
the MSRs are no longer reserved, so some other subsystem might mess with
them. The only suitable subsystem I found was oprofile though, and I
could neither get that to reproduce the hang here nor does oprofile show
up in your logs.

Anyway, both are bugs and should be fixed. Maybe we're even lucky and it
fixes your hang. *fingers crossed*

Björn

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] i386: Fix NMI watchdog not reserving its MSRs
  2007-06-08 18:44         ` [PATCH 0/2] i386: Fix two more NMI watchdog bugs Björn Steinbrink
@ 2007-06-08 18:46           ` Björn Steinbrink
  2007-06-08 18:50           ` [PATCH 2/2] i386: Use the right wrapper to disable the NMI watchdog Björn Steinbrink
  2007-06-08 20:43           ` [PATCH 0/2] i386: Fix two more NMI watchdog bugs Ingo Molnar
  2 siblings, 0 replies; 18+ messages in thread
From: Björn Steinbrink @ 2007-06-08 18:46 UTC (permalink / raw)
  To: Ingo Molnar, Andrew Morton, Andi Kleen, Udo A. Steinberg,
	Michal Piotrowski, Linus Torvalds, LKML, ak

At system boot time, the NMI watchdog no longer reserved its MSRs,
allowing other subsystems to mess with them. Fix that.

Signed-off-by: Björn Steinbrink <B.Steinbrink@gmx.de>
---
diff --git a/arch/i386/kernel/cpu/perfctr-watchdog.c b/arch/i386/kernel/cpu/perfctr-watchdog.c
index 2b04c8f..f0b6763 100644
--- a/arch/i386/kernel/cpu/perfctr-watchdog.c
+++ b/arch/i386/kernel/cpu/perfctr-watchdog.c
@@ -614,6 +614,12 @@ int lapic_watchdog_init(unsigned nmi_hz)
 		probe_nmi_watchdog();
 		if (!wd_ops)
 			return -1;
+
+		if (!wd_ops->reserve()) {
+			printk(KERN_ERR
+				"NMI watchdog: cannot reserve perfctrs\n");
+			return -1;
+		}
 	}
 
 	if (!(wd_ops->setup(nmi_hz))) {

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] i386: Use the right wrapper to disable the NMI watchdog
  2007-06-08 18:44         ` [PATCH 0/2] i386: Fix two more NMI watchdog bugs Björn Steinbrink
  2007-06-08 18:46           ` [PATCH 1/2] i386: Fix NMI watchdog not reserving its MSRs Björn Steinbrink
@ 2007-06-08 18:50           ` Björn Steinbrink
  2007-06-08 20:43           ` [PATCH 0/2] i386: Fix two more NMI watchdog bugs Ingo Molnar
  2 siblings, 0 replies; 18+ messages in thread
From: Björn Steinbrink @ 2007-06-08 18:50 UTC (permalink / raw)
  To: Ingo Molnar, Andrew Morton, Andi Kleen, Udo A. Steinberg,
	Michal Piotrowski, Linus Torvalds, LKML, ak

When disabled through /proc/sys/kernel/nmi_watchdog, the NMI watchdog
uses the stop() method directly, which does not decrement the activity
counter, leading to a BUG(). Use the wrapper function instead to fix
that.

Signed-off-by: Björn Steinbrink <B.Steinbrink@gmx.de>
---
diff --git a/arch/i386/kernel/cpu/perfctr-watchdog.c b/arch/i386/kernel/cpu/perfctr-watchdog.c
index 2b04c8f..f0b6763 100644
--- a/arch/i386/kernel/cpu/perfctr-watchdog.c
+++ b/arch/i386/kernel/cpu/perfctr-watchdog.c
@@ -28,7 +28,7 @@ struct wd_ops {
 	void (*unreserve)(void);
 	int (*setup)(unsigned nmi_hz);
 	void (*rearm)(struct nmi_watchdog_ctlblk *wd, unsigned nmi_hz);
-	void (*stop)(void *);
+	void (*stop)(void);
 	unsigned perfctr;
 	unsigned evntsel;
 	u64 checkbit;
@@ -142,7 +142,7 @@ void disable_lapic_nmi_watchdog(void)
 	if (atomic_read(&nmi_active) <= 0)
 		return;
 
-	on_each_cpu(wd_ops->stop, NULL, 0, 1);
+	on_each_cpu(stop_apic_nmi_watchdog, NULL, 0, 1);
 	wd_ops->unreserve();
 
 	BUG_ON(atomic_read(&nmi_active) != 0);
@@ -255,7 +255,7 @@ static int setup_k7_watchdog(unsigned nmi_hz)
 	return 1;
 }
 
-static void single_msr_stop_watchdog(void *arg)
+static void single_msr_stop_watchdog(void)
 {
 	struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
 
@@ -442,7 +442,7 @@ static int setup_p4_watchdog(unsigned nmi_hz)
 	return 1;
 }
 
-static void stop_p4_watchdog(void *arg)
+static void stop_p4_watchdog(void)
 {
 	struct nmi_watchdog_ctlblk *wd = &__get_cpu_var(nmi_watchdog_ctlblk);
 	wrmsr(wd->cccr_msr, 0, 0);
@@ -628,7 +634,7 @@ int lapic_watchdog_init(unsigned nmi_hz)
 void lapic_watchdog_stop(void)
 {
 	if (wd_ops)
-		wd_ops->stop(NULL);
+		wd_ops->stop();
 }
 
 unsigned lapic_adjust_nmi_hz(unsigned hz)

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/2] i386: Fix two more NMI watchdog bugs
  2007-06-08 18:44         ` [PATCH 0/2] i386: Fix two more NMI watchdog bugs Björn Steinbrink
  2007-06-08 18:46           ` [PATCH 1/2] i386: Fix NMI watchdog not reserving its MSRs Björn Steinbrink
  2007-06-08 18:50           ` [PATCH 2/2] i386: Use the right wrapper to disable the NMI watchdog Björn Steinbrink
@ 2007-06-08 20:43           ` Ingo Molnar
  2007-06-08 20:49             ` Udo A. Steinberg
  2007-06-09  2:27             ` [PATCH] i386: Fix the K7 NMI watchdog checkbit Björn Steinbrink
  2 siblings, 2 replies; 18+ messages in thread
From: Ingo Molnar @ 2007-06-08 20:43 UTC (permalink / raw)
  To: Bj?rn Steinbrink, Andrew Morton, Andi Kleen, Udo A. Steinberg,
	Michal Piotrowski, Linus Torvalds, LKML, ak


* Bj?rn Steinbrink <B.Steinbrink@gmx.de> wrote:

> Anyway, both are bugs and should be fixed. Maybe we're even lucky and 
> it fixes your hang. *fingers crossed*

just to make it clear: the NMI watchdog was working perfectly fine on 
that box (in v2.6.21 and in dozens of kernel releases before that, for 
multiple years) before Andi's cleanup patch. So lets find that bug first 
or revert the cleanups.

	Ingo

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/2] i386: Fix two more NMI watchdog bugs
  2007-06-08 20:43           ` [PATCH 0/2] i386: Fix two more NMI watchdog bugs Ingo Molnar
@ 2007-06-08 20:49             ` Udo A. Steinberg
  2007-06-08 20:57               ` Andrew Morton
  2007-06-08 22:28               ` Andi Kleen
  2007-06-09  2:27             ` [PATCH] i386: Fix the K7 NMI watchdog checkbit Björn Steinbrink
  1 sibling, 2 replies; 18+ messages in thread
From: Udo A. Steinberg @ 2007-06-08 20:49 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Bj?rn Steinbrink, Andrew Morton, Andi Kleen, Michal Piotrowski,
	Linus Torvalds, LKML, ak

[-- Attachment #1: Type: text/plain, Size: 1104 bytes --]

On Fri, 8 Jun 2007 22:43:25 +0200 Ingo Molnar (IM) wrote:

IM> 
IM> * Bj?rn Steinbrink <B.Steinbrink@gmx.de> wrote:
IM> 
IM> > Anyway, both are bugs and should be fixed. Maybe we're even lucky and 
IM> > it fixes your hang. *fingers crossed*
IM> 
IM> just to make it clear: the NMI watchdog was working perfectly fine on 
IM> that box (in v2.6.21 and in dozens of kernel releases before that, for 
IM> multiple years) before Andi's cleanup patch. So lets find that bug first 
IM> or revert the cleanups.
IM> 
IM> 	Ingo

None of the patches posted by Björn fix the kernel BUG at
arch/i386/kernel/cpu/perfctr-watchdog.c:126! that occurs when doing
echo 0 > /proc/sys/kernel/nmi_watchdog

Call Trace:
 [<c010c429>] single_msr_unreserve+0xd/0x1a
 [<c010c668>] disable_lapic_nmi_watchdog+0x27/0x35
 [<c0110ac6>] proc_nmi_enabled+0xa0/0xbd
 [<c018550c>] proc_sys_write+0x6f/0x8c
 [<c018549d>] proc_sys_write+0x0/0x8c
 [<c0156e5b>] vfs_write+0x8a/0x10c
 [<c0157317>] sys_write+0x41/0x67
 [<c0103c30>] syscall_call+0x7/0xb

Andi, did you have a patch for that?

Cheers,

	- Udo

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/2] i386: Fix two more NMI watchdog bugs
  2007-06-08 20:49             ` Udo A. Steinberg
@ 2007-06-08 20:57               ` Andrew Morton
  2007-06-08 21:13                 ` Udo A. Steinberg
  2007-06-08 22:28               ` Andi Kleen
  1 sibling, 1 reply; 18+ messages in thread
From: Andrew Morton @ 2007-06-08 20:57 UTC (permalink / raw)
  To: Udo A. Steinberg
  Cc: Ingo Molnar, Bj?rn Steinbrink, Andi Kleen, Michal Piotrowski,
	Linus Torvalds, LKML, ak

On Fri, 8 Jun 2007 22:49:11 +0200
"Udo A. Steinberg" <us15@os.inf.tu-dresden.de> wrote:

> On Fri, 8 Jun 2007 22:43:25 +0200 Ingo Molnar (IM) wrote:
> 
> IM> 
> IM> * Bj?rn Steinbrink <B.Steinbrink@gmx.de> wrote:
> IM> 
> IM> > Anyway, both are bugs and should be fixed. Maybe we're even lucky and 
> IM> > it fixes your hang. *fingers crossed*
> IM> 
> IM> just to make it clear: the NMI watchdog was working perfectly fine on 
> IM> that box (in v2.6.21 and in dozens of kernel releases before that, for 
> IM> multiple years) before Andi's cleanup patch. So lets find that bug first 
> IM> or revert the cleanups.
> IM> 
> IM> 	Ingo
> 
> None of the patches posted by Bj__rn fix the kernel BUG at
> arch/i386/kernel/cpu/perfctr-watchdog.c:126! that occurs when doing
> echo 0 > /proc/sys/kernel/nmi_watchdog
> 
> Call Trace:
>  [<c010c429>] single_msr_unreserve+0xd/0x1a
>  [<c010c668>] disable_lapic_nmi_watchdog+0x27/0x35
>  [<c0110ac6>] proc_nmi_enabled+0xa0/0xbd
>  [<c018550c>] proc_sys_write+0x6f/0x8c
>  [<c018549d>] proc_sys_write+0x0/0x8c
>  [<c0156e5b>] vfs_write+0x8a/0x10c
>  [<c0157317>] sys_write+0x41/0x67
>  [<c0103c30>] syscall_call+0x7/0xb
> 
> Andi, did you have a patch for that?
> 

This?


From: Bjorn Steinbrink <B.Steinbrink@gmx.de>

Fix oops triggered during: echo 0 > /proc/sys/kernel/nmi_watchdog

The culprit seems to be 09198e68501a7e34737cd9264d266f42429abcdc:
[PATCH] i386: Clean up NMI watchdog code

In two places, the parameters to release_{evntsel,perfctr}_nmi
got interchanged during the cleanup.

Fix interchanged parameters to release_{evntsel,perfctr}_nmi.

Signed-off-by: Bjorn Steinbrink <B.Steinbrink@gmx.de>
Cc: Stephane Eranian <eranian@hpl.hp.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Michal Piotrowski <michal.k.k.piotrowski@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 arch/i386/kernel/cpu/perfctr-watchdog.c |   10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff -puN arch/i386/kernel/cpu/perfctr-watchdog.c~fix-interchanged-parameters-to-release_evntselperfctr_nmi arch/i386/kernel/cpu/perfctr-watchdog.c
--- a/arch/i386/kernel/cpu/perfctr-watchdog.c~fix-interchanged-parameters-to-release_evntselperfctr_nmi
+++ a/arch/i386/kernel/cpu/perfctr-watchdog.c
@@ -276,8 +276,8 @@ static int single_msr_reserve(void)
 
 static void single_msr_unreserve(void)
 {
-	release_evntsel_nmi(wd_ops->perfctr);
-	release_perfctr_nmi(wd_ops->evntsel);
+	release_evntsel_nmi(wd_ops->evntsel);
+	release_perfctr_nmi(wd_ops->perfctr);
 }
 
 static void single_msr_rearm(struct nmi_watchdog_ctlblk *wd, unsigned nmi_hz)
@@ -475,10 +475,10 @@ static void p4_unreserve(void)
 {
 #ifdef CONFIG_SMP
 	if (smp_num_siblings > 1)
-		release_evntsel_nmi(MSR_P4_IQ_PERFCTR1);
+		release_perfctr_nmi(MSR_P4_IQ_PERFCTR1);
 #endif
-	release_evntsel_nmi(MSR_P4_IQ_PERFCTR0);
-	release_perfctr_nmi(MSR_P4_CRU_ESCR0);
+	release_evntsel_nmi(MSR_P4_CRU_ESCR0);
+	release_perfctr_nmi(MSR_P4_IQ_PERFCTR0);
 }
 
 static void p4_rearm(struct nmi_watchdog_ctlblk *wd, unsigned nmi_hz)
_


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/2] i386: Fix two more NMI watchdog bugs
  2007-06-08 20:57               ` Andrew Morton
@ 2007-06-08 21:13                 ` Udo A. Steinberg
  0 siblings, 0 replies; 18+ messages in thread
From: Udo A. Steinberg @ 2007-06-08 21:13 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Ingo Molnar, Bj?rn Steinbrink, Andi Kleen, Michal Piotrowski,
	Linus Torvalds, LKML, ak

[-- Attachment #1: Type: text/plain, Size: 1798 bytes --]

On Fri, 8 Jun 2007 13:57:27 -0700 Andrew Morton (AM) wrote:

AM> On Fri, 8 Jun 2007 22:49:11 +0200
AM> "Udo A. Steinberg" <us15@os.inf.tu-dresden.de> wrote:
AM> 
AM> > On Fri, 8 Jun 2007 22:43:25 +0200 Ingo Molnar (IM) wrote:
AM> > 
AM> > IM> 
AM> > IM> * Bj?rn Steinbrink <B.Steinbrink@gmx.de> wrote:
AM> > IM> 
AM> > IM> > Anyway, both are bugs and should be fixed. Maybe we're even lucky
AM> > IM> > and it fixes your hang. *fingers crossed*
AM> > IM> 
AM> > IM> just to make it clear: the NMI watchdog was working perfectly fine on 
AM> > IM> that box (in v2.6.21 and in dozens of kernel releases before that,
AM> > IM> for multiple years) before Andi's cleanup patch. So lets find that
AM> > IM> bug first or revert the cleanups.
AM> > IM> 
AM> > IM> 	Ingo
AM> > 
AM> > None of the patches posted by Bj__rn fix the kernel BUG at
AM> > arch/i386/kernel/cpu/perfctr-watchdog.c:126! that occurs when doing
AM> > echo 0 > /proc/sys/kernel/nmi_watchdog
AM> > 
AM> > Call Trace:
AM> >  [<c010c429>] single_msr_unreserve+0xd/0x1a
AM> >  [<c010c668>] disable_lapic_nmi_watchdog+0x27/0x35
AM> >  [<c0110ac6>] proc_nmi_enabled+0xa0/0xbd
AM> >  [<c018550c>] proc_sys_write+0x6f/0x8c
AM> >  [<c018549d>] proc_sys_write+0x0/0x8c
AM> >  [<c0156e5b>] vfs_write+0x8a/0x10c
AM> >  [<c0157317>] sys_write+0x41/0x67
AM> >  [<c0103c30>] syscall_call+0x7/0xb
AM> > 
AM> > Andi, did you have a patch for that?
AM> > 
AM> 
AM> This?
AM> 
AM> 
AM> From: Bjorn Steinbrink <B.Steinbrink@gmx.de>
AM> 
AM> Fix oops triggered during: echo 0 > /proc/sys/kernel/nmi_watchdog
AM> 
AM> The culprit seems to be 09198e68501a7e34737cd9264d266f42429abcdc:

This alone does not help, but in combination with the other two patches 
the problem no longer occurs. 

Thanks,

	- Udo

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/2] i386: Fix two more NMI watchdog bugs
  2007-06-08 20:49             ` Udo A. Steinberg
  2007-06-08 20:57               ` Andrew Morton
@ 2007-06-08 22:28               ` Andi Kleen
  1 sibling, 0 replies; 18+ messages in thread
From: Andi Kleen @ 2007-06-08 22:28 UTC (permalink / raw)
  To: Udo A. Steinberg
  Cc: Ingo Molnar, Bj?rn Steinbrink, Andrew Morton, Andi Kleen,
	Michal Piotrowski, Linus Torvalds, LKML


> None of the patches posted by Björn fix the kernel BUG at
> arch/i386/kernel/cpu/perfctr-watchdog.c:126! that occurs when doing
> echo 0 > /proc/sys/kernel/nmi_watchdog
>
> Call Trace:
>  [<c010c429>] single_msr_unreserve+0xd/0x1a
>  [<c010c668>] disable_lapic_nmi_watchdog+0x27/0x35
>  [<c0110ac6>] proc_nmi_enabled+0xa0/0xbd
>  [<c018550c>] proc_sys_write+0x6f/0x8c
>  [<c018549d>] proc_sys_write+0x0/0x8c
>  [<c0156e5b>] vfs_write+0x8a/0x10c
>  [<c0157317>] sys_write+0x41/0x67
>  [<c0103c30>] syscall_call+0x7/0xb
>
> Andi, did you have a patch for that?

ftp://ftp.firstfloor.org/pub/ak/x86_64/quilt/patches/disable-watchdog

-Andi

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH] i386: Fix the K7 NMI watchdog checkbit
  2007-06-08 20:43           ` [PATCH 0/2] i386: Fix two more NMI watchdog bugs Ingo Molnar
  2007-06-08 20:49             ` Udo A. Steinberg
@ 2007-06-09  2:27             ` Björn Steinbrink
  2007-06-09  2:33               ` Björn Steinbrink
  1 sibling, 1 reply; 18+ messages in thread
From: Björn Steinbrink @ 2007-06-09  2:27 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrew Morton, Andi Kleen, Udo A. Steinberg, Michal Piotrowski,
	Linus Torvalds, LKML, ak, dzickus

On 2007.06.08 22:43:25 +0200, Ingo Molnar wrote:
> 
> * Björn Steinbrink <B.Steinbrink@gmx.de> wrote:
> 
> > Anyway, both are bugs and should be fixed. Maybe we're even lucky and 
> > it fixes your hang. *fingers crossed*
> 
> just to make it clear: the NMI watchdog was working perfectly fine on 
> that box (in v2.6.21 and in dozens of kernel releases before that, for 
> multiple years) before Andi's cleanup patch. So lets find that bug first 
> or revert the cleanups.

Might have been pure luck. ;-) The culprit seems to be commit
b7471c6da94d30d3deadc55986cc38d1ff57f9ca (from Sep 2006), which
introduced the check bit to figure out if a NMI was generated by the
watchdog timer. While the performance counter register on K7 is 64 bits
wide, the upper 16 bits are reserved and thus using bit 63 as the check
bit is wrong. A quick check using /dev/cpu/0/msr shows that
here, the upper 16 bits are zero all the time, chances are that this is
not deterministic and you got a 1 in bit 63 due to some random change.

Björn



The performance counters on K7 are only 48 bits wide, so using bit 63 to
check if the counter overflowed is wrong. Let's use bit 47 instead.

Signed-off-by: Björn Steinbrink <B.Steinbrink@gmx.de>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
---
diff --git a/arch/i386/kernel/cpu/perfctr-watchdog.c b/arch/i386/kernel/cpu/perfctr-watchdog.c
index 2b04c8f..82c6967 100644
--- a/arch/i386/kernel/cpu/perfctr-watchdog.c
+++ b/arch/i386/kernel/cpu/perfctr-watchdog.c
@@ -294,7 +294,7 @@ static struct wd_ops k7_wd_ops = {
 	.stop = single_msr_stop_watchdog,
 	.perfctr = MSR_K7_PERFCTR0,
 	.evntsel = MSR_K7_EVNTSEL0,
-	.checkbit = 1ULL<<63,
+	.checkbit = 1ULL<<47,
 };
 
 /* Intel Model 6 (PPro+,P2,P3,P-M,Core1) */

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] i386: Fix the K7 NMI watchdog checkbit
  2007-06-09  2:27             ` [PATCH] i386: Fix the K7 NMI watchdog checkbit Björn Steinbrink
@ 2007-06-09  2:33               ` Björn Steinbrink
  0 siblings, 0 replies; 18+ messages in thread
From: Björn Steinbrink @ 2007-06-09  2:33 UTC (permalink / raw)
  To: Ingo Molnar, Andrew Morton, Andi Kleen, Udo A. Steinberg,
	Michal Piotrowski, Linus Torvalds, LKML, ak, dzickus

On 2007.06.09 04:27:10 +0200, Björn Steinbrink wrote:
> On 2007.06.08 22:43:25 +0200, Ingo Molnar wrote:
> > 
> > * Björn Steinbrink <B.Steinbrink@gmx.de> wrote:
> > 
> > > Anyway, both are bugs and should be fixed. Maybe we're even lucky and 
> > > it fixes your hang. *fingers crossed*
> > 
> > just to make it clear: the NMI watchdog was working perfectly fine on 
> > that box (in v2.6.21 and in dozens of kernel releases before that, for 
> > multiple years) before Andi's cleanup patch. So lets find that bug first 
> > or revert the cleanups.
> 
> Might have been pure luck. ;-) The culprit seems to be commit
> b7471c6da94d30d3deadc55986cc38d1ff57f9ca (from Sep 2006), which
> introduced the check bit to figure out if a NMI was generated by the
> watchdog timer. While the performance counter register on K7 is 64 bits
> wide, the upper 16 bits are reserved and thus using bit 63 as the check
> bit is wrong. A quick check using /dev/cpu/0/msr shows that
> here, the upper 16 bits are zero all the time, chances are that this is
> not deterministic and you got a 1 in bit 63 due to some random change.

Hrmpf... Should've read the AMD docs first, not some random website. The
upper bits are "read as zero", so while that was another bug fix, it's
unlikely to help in your case. :-(

Björn

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2007-06-09  2:34 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-05-29 12:52 [1/4] 2.6.22-rc3: known regressions Michal Piotrowski
2007-05-29 14:34 ` Jan Kara
2007-05-29 14:41   ` Florin Iucha
2007-05-30  4:35 ` Sam Ravnborg
2007-06-03 13:02 ` Udo A. Steinberg
2007-06-08  6:02   ` [PATCH] Fix interchanged parameters to release_{evntsel,perfctr}_nmi Björn Steinbrink
2007-06-08  6:41     ` Andrew Morton
2007-06-08 10:58       ` Ingo Molnar
2007-06-08 18:44         ` [PATCH 0/2] i386: Fix two more NMI watchdog bugs Björn Steinbrink
2007-06-08 18:46           ` [PATCH 1/2] i386: Fix NMI watchdog not reserving its MSRs Björn Steinbrink
2007-06-08 18:50           ` [PATCH 2/2] i386: Use the right wrapper to disable the NMI watchdog Björn Steinbrink
2007-06-08 20:43           ` [PATCH 0/2] i386: Fix two more NMI watchdog bugs Ingo Molnar
2007-06-08 20:49             ` Udo A. Steinberg
2007-06-08 20:57               ` Andrew Morton
2007-06-08 21:13                 ` Udo A. Steinberg
2007-06-08 22:28               ` Andi Kleen
2007-06-09  2:27             ` [PATCH] i386: Fix the K7 NMI watchdog checkbit Björn Steinbrink
2007-06-09  2:33               ` Björn Steinbrink

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).