linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Warning in during hotplug on 2.6.27-rc2-git5
@ 2008-08-12 22:00 Mark Langsdorf
  2008-08-14 15:53 ` Rafael J. Wysocki
  0 siblings, 1 reply; 18+ messages in thread
From: Mark Langsdorf @ 2008-08-12 22:00 UTC (permalink / raw)
  To: linux-kernel

I'm seeing the following error message when I hotunplug and replug
a cpu in 2.6.27-rc2-git5.  The system becomes unstable almost
immediately afterwards.

------------[ cut here ]------------
WARNING: at fs/sysfs/dir.c:463 sysfs_add_one+0x33/0x39()
sysfs: duplicate filename 'machinecheck4' can not be created
Modules linked in: cpufreq_ondemand cpufreq_userspace cpufreq_powersave powernow_k8 freq_table sr_mod af_packet button battery ac loop dm_mod usb_storage usbhid ff_memless tg3 libphy ide_pci_generic shpchp ehci_hcd i2c_piix4 i2c_core pci_hotplug ohci_hcd usbcore ide_cd_mod cdrom floppy mptctl ext3 jbd edd fan thermal processor mptsas mptscsih mptbase scsi_transport_sas sg sata_svw libata dock serverworks sd_mod scsi_mod ide_disk ide_core
Pid: 4838, comm: bash Not tainted 2.6.27-rc2-git5-pn_test #2

Call Trace:
 [<ffffffff8023194f>] warn_slowpath+0xb4/0xde
 [<ffffffff80304cf7>] rb_insert_color+0x61/0xda
 [<ffffffff80304cf7>] rb_insert_color+0x61/0xda
 [<ffffffff80306c27>] vsnprintf+0x568/0x5b1
 [<ffffffff80226c3f>] hrtick_start_fair+0x10d/0x171
 [<ffffffff80301b45>] idr_get_empty_slot+0x164/0x243
 [<ffffffff80301d1a>] ida_get_new_above+0xf6/0x182
 [<ffffffff802a038a>] find_inode+0x28/0x6d
 [<ffffffff802d272c>] sysfs_ilookup_test+0x0/0xf
 [<ffffffff802d2928>] sysfs_find_dirent+0x1b/0x2f
 [<ffffffff802d29de>] sysfs_add_one+0x33/0x39
 [<ffffffff802d2ee3>] create_dir+0x4f/0x87
 [<ffffffff802d2f50>] sysfs_create_dir+0x35/0x4a
 [<ffffffff80302757>] kobject_get+0x12/0x17
 [<ffffffff80302890>] kobject_add_internal+0xcf/0x18a
 [<ffffffff80302a66>] kobject_init_and_add+0x5b/0x68
 [<ffffffff8022d5ab>] set_cpus_allowed_ptr+0x119/0x126
 [<ffffffff8038397a>] sysdev_register+0x5a/0xb5
 [<ffffffff804172df>] mce_create_device+0xb4/0x156
 [<ffffffff804173ac>] mce_cpu_callback+0x2b/0x9b
 [<ffffffff8041fb7d>] notifier_call_chain+0x29/0x4c
 [<ffffffff8041a2e7>] _cpu_up+0xc8/0x102
 [<ffffffff8041a375>] cpu_up+0x54/0x77
 [<ffffffff8040ffee>] store_online+0x43/0x67
 [<ffffffff802d20a1>] sysfs_write_file+0xd2/0x110
 [<ffffffff8028f1a5>] vfs_write+0xad/0x156
 [<ffffffff8028f692>] sys_write+0x45/0x6e
 [<ffffffff8020bdeb>] system_call_fastpath+0x16/0x1b

---[ end trace 48036b92036180e0 ]---
kobject_add_internal failed for machinecheck4 with -EEXIST, don't try to register things with the same name in the same directory.
Pid: 4838, comm: bash Tainted: G        W 2.6.27-rc2-git5-pn_test #2

Call Trace:
 [<ffffffff8030290c>] kobject_add_internal+0x14b/0x18a
 [<ffffffff80302a66>] kobject_init_and_add+0x5b/0x68
 [<ffffffff8022d5ab>] set_cpus_allowed_ptr+0x119/0x126
 [<ffffffff8038397a>] sysdev_register+0x5a/0xb5
 [<ffffffff804172df>] mce_create_device+0xb4/0x156
 [<ffffffff804173ac>] mce_cpu_callback+0x2b/0x9b
 [<ffffffff8041fb7d>] notifier_call_chain+0x29/0x4c
 [<ffffffff8041a2e7>] _cpu_up+0xc8/0x102
 [<ffffffff8041a375>] cpu_up+0x54/0x77
 [<ffffffff8040ffee>] store_online+0x43/0x67
 [<ffffffff802d20a1>] sysfs_write_file+0xd2/0x110
 [<ffffffff8028f1a5>] vfs_write+0xad/0x156
 [<ffffffff8028f692>] sys_write+0x45/0x6e
 [<ffffffff8020bdeb>] system_call_fastpath+0x16/0x1b

BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
IP: [<ffffffff802d2a11>] sysfs_addrm_start+0x2d/0x99
PGD 22a87c067 PUD 22c9ef067 PMD 0 
Oops: 0000 [1] SMP 
CPU 1 
Modules linked in: cpufreq_ondemand cpufreq_userspace cpufreq_powersave powernow_k8 freq_table sr_mod af_packet button battery ac loop dm_mod usb_storage usbhid ff_memless tg3 libphy ide_pci_generic shpchp ehci_hcd i2c_piix4 i2c_core pci_hotplug ohci_hcd usbcore ide_cd_mod cdrom floppy mptctl ext3 jbd edd fan thermal processor mptsas mptscsih mptbase scsi_transport_sas sg sata_svw libata dock serverworks sd_mod scsi_mod ide_disk ide_core
Pid: 4838, comm: bash Tainted: G        W 2.6.27-rc2-git5-pn_test #2
RIP: 0010:[<ffffffff802d2a11>]  [<ffffffff802d2a11>] sysfs_addrm_start+0x2d/0x99
RSP: 0018:ffff88022d0dbb38  EFLAGS: 00010246
RAX: ffff880028047710 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 00000acb2c4dfcc0 RSI: 0000000000000000 RDI: ffff88022fc3b000
RBP: ffff88022d0dbb58 R08: 0000000000000080 R09: ffff88022d0dba58
R10: ffff880028047710 R11: 00000acb2c4dfcc0 R12: 00000000fffffff4
R13: 0000000000000000 R14: ffff88022d0dbbb0 R15: 0000000000000004
FS:  00007f106d7496d0(0000) GS:ffff88022f0938c0(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000038 CR3: 000000022c8d0000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process bash (pid: 4838, threadinfo ffff88022d0da000, task ffff88022d6b4d90)
Stack:  00000000fffffff4 ffff88022e3dfc80 ffff88022dc0c870 ffffffff802d2ed8
 0000000000000000 0000000000000000 0000000000000000 0000000000000000
 ffff88022e3dfc80 ffff88022d0dbd18 00000000fffffffe 0000000000000004
Call Trace:
 [<ffffffff802d2ed8>] ? create_dir+0x44/0x87
 [<ffffffff802d2f50>] ? sysfs_create_dir+0x35/0x4a
 [<ffffffff80302757>] ? kobject_get+0x12/0x17
 [<ffffffff80302890>] ? kobject_add_internal+0xcf/0x18a
 [<ffffffff80302d57>] ? kobject_add+0x74/0x7c
 [<ffffffff8041bfc5>] ? thread_return+0x3e/0xa3
 [<ffffffff8030254e>] ? kobject_init_internal+0x12/0x2c
 [<ffffffff803025c6>] ? kobject_init+0x41/0x69
 [<ffffffff80302619>] ? kobject_create+0x2b/0x30
 [<ffffffff80302d8d>] ? kobject_create_and_add+0x2e/0x5b
 [<ffffffff804178c3>] ? threshold_create_device+0x1aa/0x32a
 [<ffffffff80417aa2>] ? threshold_cpu_callback+0x5f/0x2ca
 [<ffffffff804172df>] ? mce_create_device+0xb4/0x156
 [<ffffffff8041fb7d>] ? notifier_call_chain+0x29/0x4c
 [<ffffffff8041a2e7>] ? _cpu_up+0xc8/0x102
 [<ffffffff8041a375>] ? cpu_up+0x54/0x77
 [<ffffffff8040ffee>] ? store_online+0x43/0x67
 [<ffffffff802d20a1>] ? sysfs_write_file+0xd2/0x110
 [<ffffffff8028f1a5>] ? vfs_write+0xad/0x156
 [<ffffffff8028f692>] ? sys_write+0x45/0x6e
 [<ffffffff8020bdeb>] ? system_call_fastpath+0x16/0x1b


Code: c0 b9 08 00 00 00 fc 53 48 89 fd 48 89 f3 48 83 ec 08 f3 ab 48 89 75 00 48 c7 c7 e0 e7 51 80 e8 c2 9c 14 00 48 8b 3d 57 45 37 00 <48> 8b 73 38 48 89 d9 48 c7 c2 2c 27 2d 80 e8 1e db fc ff 48 85 
RIP  [<ffffffff802d2a11>] sysfs_addrm_start+0x2d/0x99
 RSP <ffff88022d0dbb38>
CR2: 0000000000000038
---[ end trace 48036b92036180e0 ]---




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Warning in during hotplug on 2.6.27-rc2-git5
  2008-08-12 22:00 Warning in during hotplug on 2.6.27-rc2-git5 Mark Langsdorf
@ 2008-08-14 15:53 ` Rafael J. Wysocki
  2008-08-14 16:06   ` Langsdorf, Mark
  2008-08-14 16:34   ` Andi Kleen
  0 siblings, 2 replies; 18+ messages in thread
From: Rafael J. Wysocki @ 2008-08-14 15:53 UTC (permalink / raw)
  To: Mark Langsdorf, Andi Kleen
  Cc: linux-kernel, Greg KH, Ingo Molnar, Andrew Morton

On Wednesday, 13 of August 2008, Mark Langsdorf wrote:
> I'm seeing the following error message when I hotunplug and replug
> a cpu in 2.6.27-rc2-git5.  The system becomes unstable almost
> immediately afterwards.

Hm, it seems that MCE is involved somehow.  Andi, can you have a look at this,
please?


> ------------[ cut here ]------------
> WARNING: at fs/sysfs/dir.c:463 sysfs_add_one+0x33/0x39()
> sysfs: duplicate filename 'machinecheck4' can not be created
> Modules linked in: cpufreq_ondemand cpufreq_userspace cpufreq_powersave powernow_k8 freq_table sr_mod af_packet button battery ac loop dm_mod usb_storage usbhid ff_memless tg3 libphy ide_pci_generic shpchp ehci_hcd i2c_piix4 i2c_core pci_hotplug ohci_hcd usbcore ide_cd_mod cdrom floppy mptctl ext3 jbd edd fan thermal processor mptsas mptscsih mptbase scsi_transport_sas sg sata_svw libata dock serverworks sd_mod scsi_mod ide_disk ide_core
> Pid: 4838, comm: bash Not tainted 2.6.27-rc2-git5-pn_test #2
> 
> Call Trace:
>  [<ffffffff8023194f>] warn_slowpath+0xb4/0xde
>  [<ffffffff80304cf7>] rb_insert_color+0x61/0xda
>  [<ffffffff80304cf7>] rb_insert_color+0x61/0xda
>  [<ffffffff80306c27>] vsnprintf+0x568/0x5b1
>  [<ffffffff80226c3f>] hrtick_start_fair+0x10d/0x171
>  [<ffffffff80301b45>] idr_get_empty_slot+0x164/0x243
>  [<ffffffff80301d1a>] ida_get_new_above+0xf6/0x182
>  [<ffffffff802a038a>] find_inode+0x28/0x6d
>  [<ffffffff802d272c>] sysfs_ilookup_test+0x0/0xf
>  [<ffffffff802d2928>] sysfs_find_dirent+0x1b/0x2f
>  [<ffffffff802d29de>] sysfs_add_one+0x33/0x39
>  [<ffffffff802d2ee3>] create_dir+0x4f/0x87
>  [<ffffffff802d2f50>] sysfs_create_dir+0x35/0x4a
>  [<ffffffff80302757>] kobject_get+0x12/0x17
>  [<ffffffff80302890>] kobject_add_internal+0xcf/0x18a
>  [<ffffffff80302a66>] kobject_init_and_add+0x5b/0x68
>  [<ffffffff8022d5ab>] set_cpus_allowed_ptr+0x119/0x126
>  [<ffffffff8038397a>] sysdev_register+0x5a/0xb5
>  [<ffffffff804172df>] mce_create_device+0xb4/0x156
>  [<ffffffff804173ac>] mce_cpu_callback+0x2b/0x9b
>  [<ffffffff8041fb7d>] notifier_call_chain+0x29/0x4c
>  [<ffffffff8041a2e7>] _cpu_up+0xc8/0x102
>  [<ffffffff8041a375>] cpu_up+0x54/0x77
>  [<ffffffff8040ffee>] store_online+0x43/0x67
>  [<ffffffff802d20a1>] sysfs_write_file+0xd2/0x110
>  [<ffffffff8028f1a5>] vfs_write+0xad/0x156
>  [<ffffffff8028f692>] sys_write+0x45/0x6e
>  [<ffffffff8020bdeb>] system_call_fastpath+0x16/0x1b
> 
> ---[ end trace 48036b92036180e0 ]---
> kobject_add_internal failed for machinecheck4 with -EEXIST, don't try to register things with the same name in the same directory.
> Pid: 4838, comm: bash Tainted: G        W 2.6.27-rc2-git5-pn_test #2
> 
> Call Trace:
>  [<ffffffff8030290c>] kobject_add_internal+0x14b/0x18a
>  [<ffffffff80302a66>] kobject_init_and_add+0x5b/0x68
>  [<ffffffff8022d5ab>] set_cpus_allowed_ptr+0x119/0x126
>  [<ffffffff8038397a>] sysdev_register+0x5a/0xb5
>  [<ffffffff804172df>] mce_create_device+0xb4/0x156
>  [<ffffffff804173ac>] mce_cpu_callback+0x2b/0x9b
>  [<ffffffff8041fb7d>] notifier_call_chain+0x29/0x4c
>  [<ffffffff8041a2e7>] _cpu_up+0xc8/0x102
>  [<ffffffff8041a375>] cpu_up+0x54/0x77
>  [<ffffffff8040ffee>] store_online+0x43/0x67
>  [<ffffffff802d20a1>] sysfs_write_file+0xd2/0x110
>  [<ffffffff8028f1a5>] vfs_write+0xad/0x156
>  [<ffffffff8028f692>] sys_write+0x45/0x6e
>  [<ffffffff8020bdeb>] system_call_fastpath+0x16/0x1b
> 
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
> IP: [<ffffffff802d2a11>] sysfs_addrm_start+0x2d/0x99
> PGD 22a87c067 PUD 22c9ef067 PMD 0 
> Oops: 0000 [1] SMP 
> CPU 1 
> Modules linked in: cpufreq_ondemand cpufreq_userspace cpufreq_powersave powernow_k8 freq_table sr_mod af_packet button battery ac loop dm_mod usb_storage usbhid ff_memless tg3 libphy ide_pci_generic shpchp ehci_hcd i2c_piix4 i2c_core pci_hotplug ohci_hcd usbcore ide_cd_mod cdrom floppy mptctl ext3 jbd edd fan thermal processor mptsas mptscsih mptbase scsi_transport_sas sg sata_svw libata dock serverworks sd_mod scsi_mod ide_disk ide_core
> Pid: 4838, comm: bash Tainted: G        W 2.6.27-rc2-git5-pn_test #2
> RIP: 0010:[<ffffffff802d2a11>]  [<ffffffff802d2a11>] sysfs_addrm_start+0x2d/0x99
> RSP: 0018:ffff88022d0dbb38  EFLAGS: 00010246
> RAX: ffff880028047710 RBX: 0000000000000000 RCX: 0000000000000000
> RDX: 00000acb2c4dfcc0 RSI: 0000000000000000 RDI: ffff88022fc3b000
> RBP: ffff88022d0dbb58 R08: 0000000000000080 R09: ffff88022d0dba58
> R10: ffff880028047710 R11: 00000acb2c4dfcc0 R12: 00000000fffffff4
> R13: 0000000000000000 R14: ffff88022d0dbbb0 R15: 0000000000000004
> FS:  00007f106d7496d0(0000) GS:ffff88022f0938c0(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 0000000000000038 CR3: 000000022c8d0000 CR4: 00000000000006e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process bash (pid: 4838, threadinfo ffff88022d0da000, task ffff88022d6b4d90)
> Stack:  00000000fffffff4 ffff88022e3dfc80 ffff88022dc0c870 ffffffff802d2ed8
>  0000000000000000 0000000000000000 0000000000000000 0000000000000000
>  ffff88022e3dfc80 ffff88022d0dbd18 00000000fffffffe 0000000000000004
> Call Trace:
>  [<ffffffff802d2ed8>] ? create_dir+0x44/0x87
>  [<ffffffff802d2f50>] ? sysfs_create_dir+0x35/0x4a
>  [<ffffffff80302757>] ? kobject_get+0x12/0x17
>  [<ffffffff80302890>] ? kobject_add_internal+0xcf/0x18a
>  [<ffffffff80302d57>] ? kobject_add+0x74/0x7c
>  [<ffffffff8041bfc5>] ? thread_return+0x3e/0xa3
>  [<ffffffff8030254e>] ? kobject_init_internal+0x12/0x2c
>  [<ffffffff803025c6>] ? kobject_init+0x41/0x69
>  [<ffffffff80302619>] ? kobject_create+0x2b/0x30
>  [<ffffffff80302d8d>] ? kobject_create_and_add+0x2e/0x5b
>  [<ffffffff804178c3>] ? threshold_create_device+0x1aa/0x32a
>  [<ffffffff80417aa2>] ? threshold_cpu_callback+0x5f/0x2ca
>  [<ffffffff804172df>] ? mce_create_device+0xb4/0x156
>  [<ffffffff8041fb7d>] ? notifier_call_chain+0x29/0x4c
>  [<ffffffff8041a2e7>] ? _cpu_up+0xc8/0x102
>  [<ffffffff8041a375>] ? cpu_up+0x54/0x77
>  [<ffffffff8040ffee>] ? store_online+0x43/0x67
>  [<ffffffff802d20a1>] ? sysfs_write_file+0xd2/0x110
>  [<ffffffff8028f1a5>] ? vfs_write+0xad/0x156
>  [<ffffffff8028f692>] ? sys_write+0x45/0x6e
>  [<ffffffff8020bdeb>] ? system_call_fastpath+0x16/0x1b
> 
> 
> Code: c0 b9 08 00 00 00 fc 53 48 89 fd 48 89 f3 48 83 ec 08 f3 ab 48 89 75 00 48 c7 c7 e0 e7 51 80 e8 c2 9c 14 00 48 8b 3d 57 45 37 00 <48> 8b 73 38 48 89 d9 48 c7 c2 2c 27 2d 80 e8 1e db fc ff 48 85 
> RIP  [<ffffffff802d2a11>] sysfs_addrm_start+0x2d/0x99
>  RSP <ffff88022d0dbb38>
> CR2: 0000000000000038
> ---[ end trace 48036b92036180e0 ]---
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
> 



^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: Warning in during hotplug on 2.6.27-rc2-git5
  2008-08-14 15:53 ` Rafael J. Wysocki
@ 2008-08-14 16:06   ` Langsdorf, Mark
  2008-08-14 16:34   ` Andi Kleen
  1 sibling, 0 replies; 18+ messages in thread
From: Langsdorf, Mark @ 2008-08-14 16:06 UTC (permalink / raw)
  To: Rafael J. Wysocki, Andi Kleen
  Cc: linux-kernel, Greg KH, Ingo Molnar, Andrew Morton

> On Wednesday, 13 of August 2008, Mark Langsdorf wrote:
> > I'm seeing the following error message when I hotunplug and replug
> > a cpu in 2.6.27-rc2-git5.  The system becomes unstable almost
> > immediately afterwards.
> 
> Hm, it seems that MCE is involved somehow.  Andi, can you 
> have a look at this, please?

Disabling MCE did eliminate the error message.

-Mark Langsdorf
Operating System Research Center
AMD


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Warning in during hotplug on 2.6.27-rc2-git5
  2008-08-14 15:53 ` Rafael J. Wysocki
  2008-08-14 16:06   ` Langsdorf, Mark
@ 2008-08-14 16:34   ` Andi Kleen
  2008-08-14 18:35     ` Langsdorf, Mark
  1 sibling, 1 reply; 18+ messages in thread
From: Andi Kleen @ 2008-08-14 16:34 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Mark Langsdorf, Andi Kleen, linux-kernel, Greg KH, Ingo Molnar,
	Andrew Morton

On Thu, Aug 14, 2008 at 05:53:57PM +0200, Rafael J. Wysocki wrote:
> On Wednesday, 13 of August 2008, Mark Langsdorf wrote:
> > I'm seeing the following error message when I hotunplug and replug
> > a cpu in 2.6.27-rc2-git5.  The system becomes unstable almost
> > immediately afterwards.
> 
> Hm, it seems that MCE is involved somehow.  Andi, can you have a look at this,
> please?

FWIW the mce code here actually hasn't changed for a long time.


> > ------------[ cut here ]------------
> > WARNING: at fs/sysfs/dir.c:463 sysfs_add_one+0x33/0x39()
> > sysfs: duplicate filename 'machinecheck4' can not be created

The only way I could see that happening if CPU_DEAD/CPU_ONLINE
is not properly balanced.

-Andi

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: Warning in during hotplug on 2.6.27-rc2-git5
  2008-08-14 16:34   ` Andi Kleen
@ 2008-08-14 18:35     ` Langsdorf, Mark
  2008-08-16 19:28       ` Rafael J. Wysocki
                         ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Langsdorf, Mark @ 2008-08-14 18:35 UTC (permalink / raw)
  To: Andi Kleen, Rafael J. Wysocki
  Cc: linux-kernel, Greg KH, Ingo Molnar, Andrew Morton

> On Thu, Aug 14, 2008 at 05:53:57PM +0200, Rafael J. Wysocki wrote:
> > On Wednesday, 13 of August 2008, Mark Langsdorf wrote:
> > > I'm seeing the following error message when I hotunplug and replug
> > > a cpu in 2.6.27-rc2-git5.  The system becomes unstable almost
> > > immediately afterwards.
> > 
> > Hm, it seems that MCE is involved somehow.  Andi, can you 
> have a look at this,
> > please?
> 
> FWIW the mce code here actually hasn't changed for a long time.
> 
> 
> > > ------------[ cut here ]------------
> > > WARNING: at fs/sysfs/dir.c:463 sysfs_add_one+0x33/0x39()
> > > sysfs: duplicate filename 'machinecheck4' can not be created
> 
> The only way I could see that happening if CPU_DEAD/CPU_ONLINE
> is not properly balanced.

I'm still seeing it on 2.6.27-rc2, even with the 
patch here http://lkml.org/lkml/2008/7/30/171 and the
wbinvd_halt code patch applied.  Maybe something else
broke in some of the recent hotplug changes?

-Mark Langsdorf
Operating System Research Center
AMD


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Warning in during hotplug on 2.6.27-rc2-git5
  2008-08-14 18:35     ` Langsdorf, Mark
@ 2008-08-16 19:28       ` Rafael J. Wysocki
  2008-08-16 20:18         ` Greg KH
  2008-08-17  2:25       ` Andi Kleen
  2008-08-17 21:53       ` Rafael J. Wysocki
  2 siblings, 1 reply; 18+ messages in thread
From: Rafael J. Wysocki @ 2008-08-16 19:28 UTC (permalink / raw)
  To: Langsdorf, Mark
  Cc: Andi Kleen, linux-kernel, Greg KH, Ingo Molnar, Andrew Morton

On Thursday, 14 of August 2008, Langsdorf, Mark wrote:
> > On Thu, Aug 14, 2008 at 05:53:57PM +0200, Rafael J. Wysocki wrote:
> > > On Wednesday, 13 of August 2008, Mark Langsdorf wrote:
> > > > I'm seeing the following error message when I hotunplug and replug
> > > > a cpu in 2.6.27-rc2-git5.  The system becomes unstable almost
> > > > immediately afterwards.
> > > 
> > > Hm, it seems that MCE is involved somehow.  Andi, can you 
> > have a look at this,
> > > please?
> > 
> > FWIW the mce code here actually hasn't changed for a long time.
> > 
> > 
> > > > ------------[ cut here ]------------
> > > > WARNING: at fs/sysfs/dir.c:463 sysfs_add_one+0x33/0x39()
> > > > sysfs: duplicate filename 'machinecheck4' can not be created
> > 
> > The only way I could see that happening if CPU_DEAD/CPU_ONLINE
> > is not properly balanced.
> 
> I'm still seeing it on 2.6.27-rc2, even with the 
> patch here http://lkml.org/lkml/2008/7/30/171 and the
> wbinvd_halt code patch applied.  Maybe something else
> broke in some of the recent hotplug changes?

My guess is that MCE does somthing that is not allowed by sysfs any more.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Warning in during hotplug on 2.6.27-rc2-git5
  2008-08-16 19:28       ` Rafael J. Wysocki
@ 2008-08-16 20:18         ` Greg KH
  2008-08-16 21:34           ` Rafael J. Wysocki
  2008-08-17  2:23           ` Andi Kleen
  0 siblings, 2 replies; 18+ messages in thread
From: Greg KH @ 2008-08-16 20:18 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Langsdorf, Mark, Andi Kleen, linux-kernel, Ingo Molnar, Andrew Morton

On Sat, Aug 16, 2008 at 09:28:24PM +0200, Rafael J. Wysocki wrote:
> On Thursday, 14 of August 2008, Langsdorf, Mark wrote:
> > > On Thu, Aug 14, 2008 at 05:53:57PM +0200, Rafael J. Wysocki wrote:
> > > > On Wednesday, 13 of August 2008, Mark Langsdorf wrote:
> > > > > I'm seeing the following error message when I hotunplug and replug
> > > > > a cpu in 2.6.27-rc2-git5.  The system becomes unstable almost
> > > > > immediately afterwards.
> > > > 
> > > > Hm, it seems that MCE is involved somehow.  Andi, can you 
> > > have a look at this,
> > > > please?
> > > 
> > > FWIW the mce code here actually hasn't changed for a long time.
> > > 
> > > 
> > > > > ------------[ cut here ]------------
> > > > > WARNING: at fs/sysfs/dir.c:463 sysfs_add_one+0x33/0x39()
> > > > > sysfs: duplicate filename 'machinecheck4' can not be created
> > > 
> > > The only way I could see that happening if CPU_DEAD/CPU_ONLINE
> > > is not properly balanced.
> > 
> > I'm still seeing it on 2.6.27-rc2, even with the 
> > patch here http://lkml.org/lkml/2008/7/30/171 and the
> > wbinvd_halt code patch applied.  Maybe something else
> > broke in some of the recent hotplug changes?
> 
> My guess is that MCE does somthing that is not allowed by sysfs any more.

Hm, sysfs hasn't changed any in 2.6.27-rcX that I know of.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Warning in during hotplug on 2.6.27-rc2-git5
  2008-08-16 20:18         ` Greg KH
@ 2008-08-16 21:34           ` Rafael J. Wysocki
  2008-08-18 15:32             ` Langsdorf, Mark
  2008-08-17  2:23           ` Andi Kleen
  1 sibling, 1 reply; 18+ messages in thread
From: Rafael J. Wysocki @ 2008-08-16 21:34 UTC (permalink / raw)
  To: Greg KH, Langsdorf, Mark
  Cc: Andi Kleen, linux-kernel, Ingo Molnar, Andrew Morton

On Saturday, 16 of August 2008, Greg KH wrote:
> On Sat, Aug 16, 2008 at 09:28:24PM +0200, Rafael J. Wysocki wrote:
> > On Thursday, 14 of August 2008, Langsdorf, Mark wrote:
> > > > On Thu, Aug 14, 2008 at 05:53:57PM +0200, Rafael J. Wysocki wrote:
> > > > > On Wednesday, 13 of August 2008, Mark Langsdorf wrote:
> > > > > > I'm seeing the following error message when I hotunplug and replug
> > > > > > a cpu in 2.6.27-rc2-git5.  The system becomes unstable almost
> > > > > > immediately afterwards.
> > > > > 
> > > > > Hm, it seems that MCE is involved somehow.  Andi, can you 
> > > > have a look at this,
> > > > > please?
> > > > 
> > > > FWIW the mce code here actually hasn't changed for a long time.
> > > > 
> > > > 
> > > > > > ------------[ cut here ]------------
> > > > > > WARNING: at fs/sysfs/dir.c:463 sysfs_add_one+0x33/0x39()
> > > > > > sysfs: duplicate filename 'machinecheck4' can not be created
> > > > 
> > > > The only way I could see that happening if CPU_DEAD/CPU_ONLINE
> > > > is not properly balanced.
> > > 
> > > I'm still seeing it on 2.6.27-rc2, even with the 
> > > patch here http://lkml.org/lkml/2008/7/30/171 and the
> > > wbinvd_halt code patch applied.  Maybe something else
> > > broke in some of the recent hotplug changes?
> > 
> > My guess is that MCE does somthing that is not allowed by sysfs any more.
> 
> Hm, sysfs hasn't changed any in 2.6.27-rcX that I know of.

Hmm.  Mark, what kind of a system is this?  Is it a 2 quad-core CPU system
or similar ('machinecheck4' in your trace seems to imply something like this)?

Rafael

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Warning in during hotplug on 2.6.27-rc2-git5
  2008-08-16 20:18         ` Greg KH
  2008-08-16 21:34           ` Rafael J. Wysocki
@ 2008-08-17  2:23           ` Andi Kleen
  2008-08-17 17:25             ` Rafael J. Wysocki
  1 sibling, 1 reply; 18+ messages in thread
From: Andi Kleen @ 2008-08-17  2:23 UTC (permalink / raw)
  To: Greg KH
  Cc: Rafael J. Wysocki, Langsdorf, Mark, Andi Kleen, linux-kernel,
	Ingo Molnar, Andrew Morton

> > > I'm still seeing it on 2.6.27-rc2, even with the 
> > > patch here http://lkml.org/lkml/2008/7/30/171 and the
> > > wbinvd_halt code patch applied.  Maybe something else
> > > broke in some of the recent hotplug changes?
> > 
> > My guess is that MCE does somthing that is not allowed by sysfs any more.
> 
> Hm, sysfs hasn't changed any in 2.6.27-rcX that I know of.

mce hasn't either in this regard. My current theory is that the CPU 
up/down notifiers are not balanced anymore (as in duplicated up events) 

Would need to add printk to verify that.

-Andi

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Warning in during hotplug on 2.6.27-rc2-git5
  2008-08-14 18:35     ` Langsdorf, Mark
  2008-08-16 19:28       ` Rafael J. Wysocki
@ 2008-08-17  2:25       ` Andi Kleen
  2008-08-17 21:53       ` Rafael J. Wysocki
  2 siblings, 0 replies; 18+ messages in thread
From: Andi Kleen @ 2008-08-17  2:25 UTC (permalink / raw)
  To: Langsdorf, Mark
  Cc: Andi Kleen, Rafael J. Wysocki, linux-kernel, Greg KH,
	Ingo Molnar, Andrew Morton

> I'm still seeing it on 2.6.27-rc2, even with the 
> patch here http://lkml.org/lkml/2008/7/30/171 and the
> wbinvd_halt code patch applied.  Maybe something else
> broke in some of the recent hotplug changes?

Mark, can you please put a printk

printk("mce_cpu_callback action %lu cpu %u\n", action, cpu);

into arch/x86/kernel/cpu/mcheck/mce_64.c:mce_cpu_callback() 
and post the log including the warning? 

That would verify my theory.

-Andi


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Warning in during hotplug on 2.6.27-rc2-git5
  2008-08-17  2:23           ` Andi Kleen
@ 2008-08-17 17:25             ` Rafael J. Wysocki
  2008-08-17 19:23               ` Rafael J. Wysocki
  0 siblings, 1 reply; 18+ messages in thread
From: Rafael J. Wysocki @ 2008-08-17 17:25 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Greg KH, Langsdorf, Mark, linux-kernel, Ingo Molnar, Andrew Morton

On Sunday, 17 of August 2008, Andi Kleen wrote:
> > > > I'm still seeing it on 2.6.27-rc2, even with the 
> > > > patch here http://lkml.org/lkml/2008/7/30/171 and the
> > > > wbinvd_halt code patch applied.  Maybe something else
> > > > broke in some of the recent hotplug changes?
> > > 
> > > My guess is that MCE does somthing that is not allowed by sysfs any more.
> > 
> > Hm, sysfs hasn't changed any in 2.6.27-rcX that I know of.
> 
> mce hasn't either in this regard. My current theory is that the CPU 
> up/down notifiers are not balanced anymore (as in duplicated up events) 

It doesn't look like this is the case.  Moreover, had that been the case, we'd
have had many reports from people doing suspend/hibernation, but it doesn't
happen.

I think that cpu_down() fails for some reason and that causes the subsequent
onlining to fail.  I'd like to find out what's the root cause of that.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Warning in during hotplug on 2.6.27-rc2-git5
  2008-08-17 17:25             ` Rafael J. Wysocki
@ 2008-08-17 19:23               ` Rafael J. Wysocki
  0 siblings, 0 replies; 18+ messages in thread
From: Rafael J. Wysocki @ 2008-08-17 19:23 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Greg KH, Langsdorf, Mark, linux-kernel, Ingo Molnar, Andrew Morton

On Sunday, 17 of August 2008, Rafael J. Wysocki wrote:
> On Sunday, 17 of August 2008, Andi Kleen wrote:
> > > > > I'm still seeing it on 2.6.27-rc2, even with the 
> > > > > patch here http://lkml.org/lkml/2008/7/30/171 and the
> > > > > wbinvd_halt code patch applied.  Maybe something else
> > > > > broke in some of the recent hotplug changes?
> > > > 
> > > > My guess is that MCE does somthing that is not allowed by sysfs any more.
> > > 
> > > Hm, sysfs hasn't changed any in 2.6.27-rcX that I know of.
> > 
> > mce hasn't either in this regard. My current theory is that the CPU 
> > up/down notifiers are not balanced anymore (as in duplicated up events) 
> 
> It doesn't look like this is the case.  Moreover, had that been the case, we'd
> have had many reports from people doing suspend/hibernation, but it doesn't
> happen.
> 
> I think that cpu_down() fails for some reason and that causes the subsequent
> onlining to fail.

Well, no.  If my understanding of the CPU hotplug code is correct, this is not
possible.

The next possibility is that for some 'i' mce_attributes[i] is NULL, although
there are non-NULL values for some j > i.  In that case, mce_remove_device()
would fail to remove device_mce for given CPU and the subsequent
mce_create_device() would cause the observed failure.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Warning in during hotplug on 2.6.27-rc2-git5
  2008-08-14 18:35     ` Langsdorf, Mark
  2008-08-16 19:28       ` Rafael J. Wysocki
  2008-08-17  2:25       ` Andi Kleen
@ 2008-08-17 21:53       ` Rafael J. Wysocki
  2008-08-18 13:11         ` Langsdorf, Mark
  2 siblings, 1 reply; 18+ messages in thread
From: Rafael J. Wysocki @ 2008-08-17 21:53 UTC (permalink / raw)
  To: Langsdorf, Mark
  Cc: Andi Kleen, linux-kernel, Greg KH, Ingo Molnar, Andrew Morton

On Thursday, 14 of August 2008, Langsdorf, Mark wrote:
> > On Thu, Aug 14, 2008 at 05:53:57PM +0200, Rafael J. Wysocki wrote:
> > > On Wednesday, 13 of August 2008, Mark Langsdorf wrote:
> > > > I'm seeing the following error message when I hotunplug and replug
> > > > a cpu in 2.6.27-rc2-git5.  The system becomes unstable almost
> > > > immediately afterwards.
> > > 
> > > Hm, it seems that MCE is involved somehow.  Andi, can you 
> > have a look at this,
> > > please?
> > 
> > FWIW the mce code here actually hasn't changed for a long time.
> > 
> > 
> > > > ------------[ cut here ]------------
> > > > WARNING: at fs/sysfs/dir.c:463 sysfs_add_one+0x33/0x39()
> > > > sysfs: duplicate filename 'machinecheck4' can not be created
> > 
> > The only way I could see that happening if CPU_DEAD/CPU_ONLINE
> > is not properly balanced.
> 
> I'm still seeing it on 2.6.27-rc2, even with the 
> patch here http://lkml.org/lkml/2008/7/30/171 and the
> wbinvd_halt code patch applied.  Maybe something else
> broke in some of the recent hotplug changes?

Mark, have you tried to test with commit 34ae7f35a21694aa5cb8829dc5142c39d73d6ba0
(your "preregister support for powernow-k8" patch) reverted and MCE enabled?

Rafael

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: Warning in during hotplug on 2.6.27-rc2-git5
  2008-08-17 21:53       ` Rafael J. Wysocki
@ 2008-08-18 13:11         ` Langsdorf, Mark
  2008-08-18 14:38           ` Rafael J. Wysocki
  0 siblings, 1 reply; 18+ messages in thread
From: Langsdorf, Mark @ 2008-08-18 13:11 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Andi Kleen, linux-kernel, Greg KH, Ingo Molnar, Andrew Morton

> > > The only way I could see that happening if CPU_DEAD/CPU_ONLINE
> > > is not properly balanced.
> > 
> > I'm still seeing it on 2.6.27-rc2, even with the 
> > patch here http://lkml.org/lkml/2008/7/30/171 and the
> > wbinvd_halt code patch applied.  Maybe something else
> > broke in some of the recent hotplug changes?
> 
> Mark, have you tried to test with commit 
> 34ae7f35a21694aa5cb8829dc5142c39d73d6ba0
> (your "preregister support for powernow-k8" patch) reverted 
> and MCE enabled?

Yes, it's a failure on a clean 2.6.27-rc2 installation.

-Mark Langsdorf
Operating System Research Center
AMD


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Warning in during hotplug on 2.6.27-rc2-git5
  2008-08-18 13:11         ` Langsdorf, Mark
@ 2008-08-18 14:38           ` Rafael J. Wysocki
  2008-08-18 16:29             ` Langsdorf, Mark
  0 siblings, 1 reply; 18+ messages in thread
From: Rafael J. Wysocki @ 2008-08-18 14:38 UTC (permalink / raw)
  To: Langsdorf, Mark
  Cc: Andi Kleen, linux-kernel, Greg KH, Ingo Molnar, Andrew Morton

On Monday, 18 of August 2008, Langsdorf, Mark wrote:
> > > > The only way I could see that happening if CPU_DEAD/CPU_ONLINE
> > > > is not properly balanced.
> > > 
> > > I'm still seeing it on 2.6.27-rc2, even with the 
> > > patch here http://lkml.org/lkml/2008/7/30/171 and the
> > > wbinvd_halt code patch applied.  Maybe something else
> > > broke in some of the recent hotplug changes?
> > 
> > Mark, have you tried to test with commit 
> > 34ae7f35a21694aa5cb8829dc5142c39d73d6ba0
> > (your "preregister support for powernow-k8" patch) reverted 
> > and MCE enabled?
> 
> Yes, it's a failure on a clean 2.6.27-rc2 installation.

Ah, the commit was done after -rc2, sorry.

I have a couple of questions, then:
- Does it fail identically for all CPUs or for CPU4 and above only?
- Did previous kernels, most importantly 2.6.26, work correctly?

Also, can you please attach dmesg output, including the offlining of a CPU
that would later lead to the problem with cpu_up(), to the Bugzilla entry at
http://bugzilla.kernel.org/show_bug.cgi?id=11337 ?

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: Warning in during hotplug on 2.6.27-rc2-git5
  2008-08-16 21:34           ` Rafael J. Wysocki
@ 2008-08-18 15:32             ` Langsdorf, Mark
  0 siblings, 0 replies; 18+ messages in thread
From: Langsdorf, Mark @ 2008-08-18 15:32 UTC (permalink / raw)
  To: Rafael J. Wysocki, Greg KH
  Cc: Andi Kleen, linux-kernel, Ingo Molnar, Andrew Morton

> > > My guess is that MCE does somthing that is not allowed by 
> sysfs any more.
> > 
> > Hm, sysfs hasn't changed any in 2.6.27-rcX that I know of.
> 
> Hmm.  Mark, what kind of a system is this?  Is it a 2 
> quad-core CPU system or similar ('machinecheck4' in your
> trace seems to imply something like this)?

It's a commercial Tyan 2 socket motherboard with a modified
BIOS to support Family 10h processors.  It is a dual socket
with quad-cores in it.

-Mark Langsdorf
Operating System Research Center
AMD


^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: Warning in during hotplug on 2.6.27-rc2-git5
  2008-08-18 14:38           ` Rafael J. Wysocki
@ 2008-08-18 16:29             ` Langsdorf, Mark
  2008-08-18 16:42               ` Rafael J. Wysocki
  0 siblings, 1 reply; 18+ messages in thread
From: Langsdorf, Mark @ 2008-08-18 16:29 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Andi Kleen, linux-kernel, Greg KH, Ingo Molnar, Andrew Morton

> > > > I'm still seeing it on 2.6.27-rc2, even with the 
> > > > patch here http://lkml.org/lkml/2008/7/30/171 and the
> > > > wbinvd_halt code patch applied.  Maybe something else
> > > > broke in some of the recent hotplug changes?
>
> I have a couple of questions, then:
> - Does it fail identically for all CPUs or for CPU4 and above only?
> - Did previous kernels, most importantly 2.6.26, work correctly?

It turns out 2.6.26 also fails.  It is only failing for CPU4+,
I'm not sure why that would be significant.

> Also, can you please attach dmesg output, including the 
> offlining of a CPU that would later lead to the problem
> with cpu_up(), to the Bugzilla entry at
> http://bugzilla.kernel.org/show_bug.cgi?id=11337 ?

I did.

-Mark Langsdorf
Operating System Research Center
AMD


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Warning in during hotplug on 2.6.27-rc2-git5
  2008-08-18 16:29             ` Langsdorf, Mark
@ 2008-08-18 16:42               ` Rafael J. Wysocki
  0 siblings, 0 replies; 18+ messages in thread
From: Rafael J. Wysocki @ 2008-08-18 16:42 UTC (permalink / raw)
  To: Langsdorf, Mark
  Cc: Andi Kleen, linux-kernel, Greg KH, Ingo Molnar, Andrew Morton

On Monday, 18 of August 2008, Langsdorf, Mark wrote:
> > > > > I'm still seeing it on 2.6.27-rc2, even with the 
> > > > > patch here http://lkml.org/lkml/2008/7/30/171 and the
> > > > > wbinvd_halt code patch applied.  Maybe something else
> > > > > broke in some of the recent hotplug changes?
> >
> > I have a couple of questions, then:
> > - Does it fail identically for all CPUs or for CPU4 and above only?
> > - Did previous kernels, most importantly 2.6.26, work correctly?
> 
> It turns out 2.6.26 also fails.

OK, so I'll drop the bug from the list of recent regressions.

> It is only failing for CPU4+, I'm not sure why that would be significant.

Because I cannot reproduce it on a single-socket AMD quad-core. :-)

> > Also, can you please attach dmesg output, including the 
> > offlining of a CPU that would later lead to the problem
> > with cpu_up(), to the Bugzilla entry at
> > http://bugzilla.kernel.org/show_bug.cgi?id=11337 ?
> 
> I did.

Thanks.  I think we can further debug this using Bugzilla if you don't mind.

Best,
Rafael

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2008-08-18 16:39 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-08-12 22:00 Warning in during hotplug on 2.6.27-rc2-git5 Mark Langsdorf
2008-08-14 15:53 ` Rafael J. Wysocki
2008-08-14 16:06   ` Langsdorf, Mark
2008-08-14 16:34   ` Andi Kleen
2008-08-14 18:35     ` Langsdorf, Mark
2008-08-16 19:28       ` Rafael J. Wysocki
2008-08-16 20:18         ` Greg KH
2008-08-16 21:34           ` Rafael J. Wysocki
2008-08-18 15:32             ` Langsdorf, Mark
2008-08-17  2:23           ` Andi Kleen
2008-08-17 17:25             ` Rafael J. Wysocki
2008-08-17 19:23               ` Rafael J. Wysocki
2008-08-17  2:25       ` Andi Kleen
2008-08-17 21:53       ` Rafael J. Wysocki
2008-08-18 13:11         ` Langsdorf, Mark
2008-08-18 14:38           ` Rafael J. Wysocki
2008-08-18 16:29             ` Langsdorf, Mark
2008-08-18 16:42               ` Rafael J. Wysocki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).