linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* threshold_init_device/kobject_uevent_env oops
@ 2008-01-25 21:05 Yinghai Lu
  2008-01-25 22:15 ` Greg KH
  0 siblings, 1 reply; 15+ messages in thread
From: Yinghai Lu @ 2008-01-25 21:05 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Ingo Molnar; +Cc: Linux Kernel Mailing List

current linus tree + x86.git

got

Calling initcall 0xffffffff80b93d98: threshold_init_device+0x0/0x3f()
BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
IP: [<ffffffff80458e20>] kobject_uevent_env+0x2a/0x3d9
PGD 0
Oops: 0000 [1] SMP
CPU 0
Modules linked in:
Pid: 1, comm: swapper Not tainted 2.6.24-smp-g075f8dcd-dirty #2
RIP: 0010:[<ffffffff80458e20>]  [<ffffffff80458e20>]
kobject_uevent_env+0x2a/0x3d9
RSP: 0000:ffff81042643db50  EFLAGS: 00010286
RAX: 0000000000000018 RBX: 0000000000000000 RCX: 00000000c0000410
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000018
RBP: 0000000000000018 R08: ffff810824964410 R09: 0000000024964410
R10: ffffc20002c0a6a8 R11: ffff810824964410 R12: 0000000000000000
R13: 0000000000000008 R14: 0000000000000000 R15: 0000000000000004
FS:  0000000000000000(0000) GS:ffffffff80b42000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000040 CR3: 0000000000201000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 1, threadinfo ffff81042643c000, task ffff81082643a000)
Stack:  ffff810824964410 00000000802c5ca6 ffffffff809961f5 ffffffff80a8ad00
 ffff810824964500 0000000000000000 ffff810824964410 ffff81082653f0c0
 0000000000000004 ffffffff802c5842 0000000024964410 00000000c000040f
Call Trace:
 [<ffffffff802c5842>] ? sysfs_add_file+0x5b/0x81
 [<ffffffff802180dc>] ? allocate_threshold_blocks+0x190/0x1bc
 [<ffffffff802180ca>] ? allocate_threshold_blocks+0x17e/0x1bc
 [<ffffffff802180ca>] ? allocate_threshold_blocks+0x17e/0x1bc
 [<ffffffff802180ca>] ? allocate_threshold_blocks+0x17e/0x1bc
 [<ffffffff802180ca>] ? allocate_threshold_blocks+0x17e/0x1bc
 [<ffffffff802180ca>] ? allocate_threshold_blocks+0x17e/0x1bc
 [<ffffffff802180ca>] ? allocate_threshold_blocks+0x17e/0x1bc
 [<ffffffff802180ca>] ? allocate_threshold_blocks+0x17e/0x1bc
 [<ffffffff802180ca>] ? allocate_threshold_blocks+0x17e/0x1bc
 [<ffffffff80218349>] ? threshold_create_device+0x241/0x330
 [<ffffffff8023bd80>] ? __mod_timer+0xbd/0xcd
 [<ffffffff80b93dae>] ? threshold_init_device+0x16/0x3f
 [<ffffffff80b8b6e3>] ? kernel_init+0x175/0x2e1
 [<ffffffff8020ccd8>] ? child_rip+0xa/0x12
 [<ffffffff80b8b56e>] ? kernel_init+0x0/0x2e1
 [<ffffffff8020ccce>] ? child_rip+0x0/0x12


Code: c3 41 57 89 f0 41 56 41 55 41 54 55 48 89 fd 53 48 89 d3 48 83
ec 58 48 8b 04 c5 00 11 84 80 89 74 24 0c 48 89 44 24 10 48 89 f8 <4c>
8b 70 28 4d 85 f6 75 14 48 8b 40 20 48 85 c0 75 ee 41 bd ea
RIP  [<ffffffff80458e20>] kobject_uevent_env+0x2a/0x3d9
 RSP <ffff81042643db50>
CR2: 0000000000000040
---[ end trace 778e504de7e3b1e3 ]---

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: threshold_init_device/kobject_uevent_env oops
  2008-01-25 21:05 threshold_init_device/kobject_uevent_env oops Yinghai Lu
@ 2008-01-25 22:15 ` Greg KH
  2008-01-25 22:35   ` Ingo Molnar
  0 siblings, 1 reply; 15+ messages in thread
From: Greg KH @ 2008-01-25 22:15 UTC (permalink / raw)
  To: Yinghai Lu; +Cc: Ingo Molnar, Linux Kernel Mailing List

On Fri, Jan 25, 2008 at 01:05:40PM -0800, Yinghai Lu wrote:
> current linus tree + x86.git
> 
> got
> 
> Calling initcall 0xffffffff80b93d98: threshold_init_device+0x0/0x3f()
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
> IP: [<ffffffff80458e20>] kobject_uevent_env+0x2a/0x3d9

Does this happen on just Linus's tree?

Can you send me a .config file for this?

What is threshold_init()?  Is it something new in the x86.git tree?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: threshold_init_device/kobject_uevent_env oops
  2008-01-25 22:15 ` Greg KH
@ 2008-01-25 22:35   ` Ingo Molnar
  2008-01-25 22:47     ` Greg KH
  2008-01-25 23:08     ` Greg KH
  0 siblings, 2 replies; 15+ messages in thread
From: Ingo Molnar @ 2008-01-25 22:35 UTC (permalink / raw)
  To: Greg KH; +Cc: Yinghai Lu, Linux Kernel Mailing List, Linus Torvalds


* Greg KH <gregkh@suse.de> wrote:

> On Fri, Jan 25, 2008 at 01:05:40PM -0800, Yinghai Lu wrote:
> > current linus tree + x86.git
> > 
> > got
> > 
> > Calling initcall 0xffffffff80b93d98: threshold_init_device+0x0/0x3f()
> > BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
> > IP: [<ffffffff80458e20>] kobject_uevent_env+0x2a/0x3d9
> 
> Does this happen on just Linus's tree?
> 
> Can you send me a .config file for this?
> 
> What is threshold_init()?  Is it something new in the x86.git tree?

no. A quick grep shows that it is in a file that _your_ changes in 
Linus' latest have touched:

  arch/x86/kernel/cpu/mcheck/mce_amd_64.c

via:

 Author: Greg Kroah-Hartman <gregkh@suse.de>
 Date:   Thu Dec 20 08:13:05 2007 -0800

     Kobject: convert arch/* from kobject_unregister() to kobject_put()

 Author: Greg Kroah-Hartman <gregkh@suse.de>
 Date:   Wed Dec 19 09:23:20 2007 -0800

     Kobject: change arch/x86/kernel/cpu/mcheck/mce_amd_64.c to use kobject_init_ 

 commit a521cf209c6e7042f85b2c5b16da3ffa8004fb7b
 Author: Greg Kroah-Hartman <gregkh@suse.de>
 Date:   Wed Dec 19 09:23:20 2007 -0800

     Kobject: change arch/x86/kernel/cpu/mcheck/mce_amd_64.c to use kobject_creat

x86.git changed nothing to cause a crash in kobject_uevent_env(), and 
nothing has changed anything near this code anyway.

We havent had a runtime (non-boot related) crash in x86.git for quite 
some time. It's rock solid and dependable, and the only significant 
change today were your upstream kobject commits.

	Ingo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: threshold_init_device/kobject_uevent_env oops
  2008-01-25 22:35   ` Ingo Molnar
@ 2008-01-25 22:47     ` Greg KH
  2008-01-25 22:50       ` Greg KH
  2008-01-25 23:12       ` Yinghai Lu
  2008-01-25 23:08     ` Greg KH
  1 sibling, 2 replies; 15+ messages in thread
From: Greg KH @ 2008-01-25 22:47 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Yinghai Lu, Linux Kernel Mailing List, Linus Torvalds

On Fri, Jan 25, 2008 at 11:35:56PM +0100, Ingo Molnar wrote:
> 
> * Greg KH <gregkh@suse.de> wrote:
> 
> > On Fri, Jan 25, 2008 at 01:05:40PM -0800, Yinghai Lu wrote:
> > > current linus tree + x86.git
> > > 
> > > got
> > > 
> > > Calling initcall 0xffffffff80b93d98: threshold_init_device+0x0/0x3f()
> > > BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
> > > IP: [<ffffffff80458e20>] kobject_uevent_env+0x2a/0x3d9
> > 
> > Does this happen on just Linus's tree?
> > 
> > Can you send me a .config file for this?
> > 
> > What is threshold_init()?  Is it something new in the x86.git tree?
> 
> no. A quick grep shows that it is in a file that _your_ changes in 
> Linus' latest have touched:
> 
>   arch/x86/kernel/cpu/mcheck/mce_amd_64.c

Ok, those are pretty much just search/and/replace type changes, but I
have been running x86-64 boxes with these changes in place.

That's why I'm interested if Linus's tree right now shows this problem,
and if I can get a .config of the offending kernel to try to reproduce
it and fix it myself.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: threshold_init_device/kobject_uevent_env oops
  2008-01-25 22:47     ` Greg KH
@ 2008-01-25 22:50       ` Greg KH
  2008-01-26  6:04         ` Yinghai Lu
  2008-01-25 23:12       ` Yinghai Lu
  1 sibling, 1 reply; 15+ messages in thread
From: Greg KH @ 2008-01-25 22:50 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Yinghai Lu, Linux Kernel Mailing List, Linus Torvalds

On Fri, Jan 25, 2008 at 02:47:11PM -0800, Greg KH wrote:
> On Fri, Jan 25, 2008 at 11:35:56PM +0100, Ingo Molnar wrote:
> > 
> > * Greg KH <gregkh@suse.de> wrote:
> > 
> > > On Fri, Jan 25, 2008 at 01:05:40PM -0800, Yinghai Lu wrote:
> > > > current linus tree + x86.git
> > > > 
> > > > got
> > > > 
> > > > Calling initcall 0xffffffff80b93d98: threshold_init_device+0x0/0x3f()
> > > > BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
> > > > IP: [<ffffffff80458e20>] kobject_uevent_env+0x2a/0x3d9
> > > 
> > > Does this happen on just Linus's tree?
> > > 
> > > Can you send me a .config file for this?
> > > 
> > > What is threshold_init()?  Is it something new in the x86.git tree?
> > 
> > no. A quick grep shows that it is in a file that _your_ changes in 
> > Linus' latest have touched:
> > 
> >   arch/x86/kernel/cpu/mcheck/mce_amd_64.c
> 
> Ok, those are pretty much just search/and/replace type changes, but I
> have been running x86-64 boxes with these changes in place.

Oh wait, I do see a change.  We are now (finally) emitting a kobject
uevent for these devices, which somehow the code can't handle properly.

Let me go poke this some more, unfortunatly I don't have any AMD 64
boxes here anymore, only Intel based processors, so I can't run this
module...

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: threshold_init_device/kobject_uevent_env oops
  2008-01-25 22:35   ` Ingo Molnar
  2008-01-25 22:47     ` Greg KH
@ 2008-01-25 23:08     ` Greg KH
  2008-01-25 23:20       ` Yinghai Lu
  1 sibling, 1 reply; 15+ messages in thread
From: Greg KH @ 2008-01-25 23:08 UTC (permalink / raw)
  To: Ingo Molnar, jacob.shin
  Cc: Yinghai Lu, Linux Kernel Mailing List, Linus Torvalds

On Fri, Jan 25, 2008 at 11:35:56PM +0100, Ingo Molnar wrote:
> 
> * Greg KH <gregkh@suse.de> wrote:
> 
> > On Fri, Jan 25, 2008 at 01:05:40PM -0800, Yinghai Lu wrote:
> > > current linus tree + x86.git
> > > 
> > > got
> > > 
> > > Calling initcall 0xffffffff80b93d98: threshold_init_device+0x0/0x3f()
> > > BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
> > > IP: [<ffffffff80458e20>] kobject_uevent_env+0x2a/0x3d9
> > 
> > Does this happen on just Linus's tree?
> > 
> > Can you send me a .config file for this?
> > 
> > What is threshold_init()?  Is it something new in the x86.git tree?
> 
> no. A quick grep shows that it is in a file that _your_ changes in 
> Linus' latest have touched:
> 
>   arch/x86/kernel/cpu/mcheck/mce_amd_64.c

In looking at this code some more, I'm a bit confused.  We have an array
of kobjects in per_cpu(threshold_banks, cpu)[bank]->kobj

Now the kobject in the struct threshold_bank structure is a "static"
one, one that should govern the lifecycle of the object, yet there is no
release function for it at all.  I don't see a way for it to ever be
properly torn down.

But it's the bank kobjects that are dynamic.  They look to be created
properly, and the life cycle is correct because they are initialized by
the kobject core now correctly.  But which order are things initialized
by the cpu code?

Ideally the banks are created before the blocks, but by the error
message that we have here, I'm not so sure about this.  Can anyone
confirm this order is always correct?

Can someone send me the sysfs 'tree' output of what these kobjects are
supposed to be looking like?

Also, can someone enable CONFIG_KOBJECT_DEBUG and send me the output of
the startup of this code?  That should help explain what order things
are happening it.

Actually the debug output with this oops would be great, it should show
the offending logic pretty well.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: threshold_init_device/kobject_uevent_env oops
  2008-01-25 22:47     ` Greg KH
  2008-01-25 22:50       ` Greg KH
@ 2008-01-25 23:12       ` Yinghai Lu
  1 sibling, 0 replies; 15+ messages in thread
From: Yinghai Lu @ 2008-01-25 23:12 UTC (permalink / raw)
  To: Greg KH; +Cc: Ingo Molnar, Linux Kernel Mailing List, Linus Torvalds

On Jan 25, 2008 2:47 PM, Greg KH <gregkh@suse.de> wrote:
> On Fri, Jan 25, 2008 at 11:35:56PM +0100, Ingo Molnar wrote:
> >
> > * Greg KH <gregkh@suse.de> wrote:
> >
> > > On Fri, Jan 25, 2008 at 01:05:40PM -0800, Yinghai Lu wrote:
> > > > current linus tree + x86.git
> > > >
> > > > got
> > > >
> > > > Calling initcall 0xffffffff80b93d98: threshold_init_device+0x0/0x3f()
> > > > BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
> > > > IP: [<ffffffff80458e20>] kobject_uevent_env+0x2a/0x3d9
> > >
> > > Does this happen on just Linus's tree?
> > >
> > > Can you send me a .config file for this?
> > >
> > > What is threshold_init()?  Is it something new in the x86.git tree?
> >
> > no. A quick grep shows that it is in a file that _your_ changes in
> > Linus' latest have touched:
> >
> >   arch/x86/kernel/cpu/mcheck/mce_amd_64.c
>
> Ok, those are pretty much just search/and/replace type changes, but I
> have been running x86-64 boxes with these changes in place.
>
> That's why I'm interested if Linus's tree right now shows this problem,
> and if I can get a .config of the offending kernel to try to reproduce
> it and fix it myself.

Calling initcall 0xffffffff80ba1dee: threshold_init_device+0x0/0x3f()
Unable to handle kernel NULL pointer dereference at 0000000000000040 RIP:
 [<ffffffff8045d2e8>] kobject_uevent_env+0x2a/0x3dd
PGD 0
Oops: 0000 [1] SMP
CPU 0
Modules linked in:
Pid: 1, comm: swapper Not tainted 2.6.24-smp-g99f1c97d-dirty #1
RIP: 0010:[<ffffffff8045d2e8>]  [<ffffffff8045d2e8>]
kobject_uevent_env+0x2a/0x3dd
RSP: 0000:ffff81042643db50  EFLAGS: 00010286
RAX: 0000000000000018 RBX: 0000000000000000 RCX: 00000000c0000410
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000018
RBP: 0000000000000018 R08: ffff810824960dc0 R09: 0000000024960dc0
R10: ffffc200025299d0 R11: ffff810824960dc0 R12: 0000000000000000
R13: 0000000000000008 R14: 0000000000000000 R15: 0000000000000004
FS:  0000000000000000(0000) GS:ffffffff80b4f000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000040 CR3: 0000000000201000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 1, threadinfo ffff81042643c000, task ffff81082643a000)
Stack:  ffff810824960dc0 00000000802c7ffe ffffffff809a28be ffffffff80a96f10
 ffff810824960eb0 0000000000000000 ffff810824960dc0 ffff8108265120c0
 0000000000000004 ffffffff802c7b93 0000000024960dc0 00000000c000040f
Call Trace:
 [<ffffffff802c7b93>] sysfs_add_file+0x5b/0x81
 [<ffffffff80215fcb>] allocate_threshold_blocks+0x184/0x1b0
 [<ffffffff80215fb9>] allocate_threshold_blocks+0x172/0x1b0
 [<ffffffff80215fb9>] allocate_threshold_blocks+0x172/0x1b0
 [<ffffffff80215fb9>] allocate_threshold_blocks+0x172/0x1b0
 [<ffffffff80215fb9>] allocate_threshold_blocks+0x172/0x1b0
 [<ffffffff80215fb9>] allocate_threshold_blocks+0x172/0x1b0
 [<ffffffff80215fb9>] allocate_threshold_blocks+0x172/0x1b0
 [<ffffffff80215fb9>] allocate_threshold_blocks+0x172/0x1b0
 [<ffffffff80215fb9>] allocate_threshold_blocks+0x172/0x1b0
 [<ffffffff80216236>] threshold_create_device+0x23f/0x32e
 [<ffffffff8023fede>] __mod_timer+0xc3/0xd3
 [<ffffffff80ba1e04>] threshold_init_device+0x16/0x3f
 [<ffffffff80b9963a>] kernel_init+0x175/0x2e1
 [<ffffffff8020cd48>] child_rip+0xa/0x12
 [<ffffffff80b994c5>] kernel_init+0x0/0x2e1
 [<ffffffff8020cd3e>] child_rip+0x0/0x12


Code: 4c 8b 70 28 4d 85 f6 75 14 48 8b 40 20 48 85 c0 75 ee 41 bd
RIP  [<ffffffff8045d2e8>] kobject_uevent_env+0x2a/0x3dd
 RSP <ffff81042643db50>
CR2: 0000000000000040
---[ end trace 778e504de7e3b1e3 ]---
Kernel panic - not syncing: Attempted to kill init!

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: threshold_init_device/kobject_uevent_env oops
  2008-01-25 23:08     ` Greg KH
@ 2008-01-25 23:20       ` Yinghai Lu
  2008-01-26  6:22         ` Greg KH
  0 siblings, 1 reply; 15+ messages in thread
From: Yinghai Lu @ 2008-01-25 23:20 UTC (permalink / raw)
  To: Greg KH
  Cc: Ingo Molnar, jacob.shin, Linux Kernel Mailing List, Linus Torvalds

On Jan 25, 2008 3:08 PM, Greg KH <gregkh@suse.de> wrote:
> On Fri, Jan 25, 2008 at 11:35:56PM +0100, Ingo Molnar wrote:
..
> Also, can someone enable CONFIG_KOBJECT_DEBUG and send me the output of
> the startup of this code?  That should help explain what order things
> are happening it.

Calling initcall 0xffffffff80ba1dee: threshold_init_device+0x0/0x3f()
kobject: 'threshold_bank4' (ffff8108265450c0): kobject_add_internal:
parent: 'machinecheck0', set: '<NULL>'
kobject: 'misc0' (ffff810425497418): kobject_add_internal: parent:
'threshold_bank4', set: '<NULL>'
kobject: 'misc1' (ffff810425497498): kobject_add_internal: parent:
'threshold_bank4', set: '<NULL>'
kobject: 'misc2' (ffff810425497518): kobject_add_internal: parent:
'threshold_bank4', set: '<NULL>'
Unable to handle kernel NULL pointer dereference at 0000000000000018 RIP:
 [<ffffffff8045d443>] kobject_uevent_env+0x31/0x45f
PGD 0
Oops: 0000 [1] SMP
CPU 0
Modules linked in:
Pid: 1, comm: swapper Not tainted 2.6.24-smp-g99f1c97d-dirty #2
RIP: 0010:[<ffffffff8045d443>]  [<ffffffff8045d443>]
kobject_uevent_env+0x31/0x45f
RSP: 0000:ffff81042645bb50  EFLAGS: 00010286
RAX: ffffffff809a2906 RBX: 0000000000000000 RCX: ffffffff8084b050
RDX: 0000000000000018 RSI: 0000000000000000 RDI: 0000000000000018
RBP: 0000000000000018 R08: ffff8108265ffaf0 R09: 00000000265ffaf0
R10: ffffc20002eb9ad0 R11: ffff8108265ffaf0 R12: 0000000000000000
R13: 0000000000000008 R14: 0000000000000000 R15: 0000000000000004
FS:  0000000000000000(0000) GS:ffffffff80b4f000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000018 CR3: 0000000000201000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 1, threadinfo ffff81042645a000, task ffff81082643c000)
Stack:  ffff8108265ffaf0 00000000802c7ffe ffffffff809a2906 ffffffff80a96f10
 ffff8108265ffc30 0000000000000000 ffff8108265ffaf0 ffff8108265450c0
 0000000000000004 ffffffff802c7b93 00000000265ffaf0 00000000c000040f
Call Trace:
 [<ffffffff802c7b93>] sysfs_add_file+0x5b/0x81
 [<ffffffff80215fcb>] allocate_threshold_blocks+0x184/0x1b0
 [<ffffffff80215fb9>] allocate_threshold_blocks+0x172/0x1b0
 [<ffffffff80215fb9>] allocate_threshold_blocks+0x172/0x1b0
 [<ffffffff80215fb9>] allocate_threshold_blocks+0x172/0x1b0
 [<ffffffff80215fb9>] allocate_threshold_blocks+0x172/0x1b0
 [<ffffffff80215fb9>] allocate_threshold_blocks+0x172/0x1b0
 [<ffffffff80215fb9>] allocate_threshold_blocks+0x172/0x1b0
 [<ffffffff80215fb9>] allocate_threshold_blocks+0x172/0x1b0
 [<ffffffff80215fb9>] allocate_threshold_blocks+0x172/0x1b0
 [<ffffffff80216236>] threshold_create_device+0x23f/0x32e
 [<ffffffff8023fede>] __mod_timer+0xc3/0xd3
 [<ffffffff80ba1e04>] threshold_init_device+0x16/0x3f
 [<ffffffff80b9963a>] kernel_init+0x175/0x2e1
 [<ffffffff8020cd48>] child_rip+0xa/0x12
 [<ffffffff80b994c5>] kernel_init+0x0/0x2e1
 [<ffffffff8020cd3e>] child_rip+0x0/0x12


Code: 48 8b 37 31 c0 48 c7 c7 34 68 9d 80 e8 80 aa dd ff 48 89 e8
RIP  [<ffffffff8045d443>] kobject_uevent_env+0x31/0x45f
 RSP <ffff81042645bb50>
CR2: 0000000000000018
---[ end trace 778e504de7e3b1e3 ]---

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: threshold_init_device/kobject_uevent_env oops
  2008-01-25 22:50       ` Greg KH
@ 2008-01-26  6:04         ` Yinghai Lu
  2008-01-26  6:14           ` Greg KH
  0 siblings, 1 reply; 15+ messages in thread
From: Yinghai Lu @ 2008-01-26  6:04 UTC (permalink / raw)
  To: Greg KH; +Cc: Ingo Molnar, Linux Kernel Mailing List, Linus Torvalds

On Jan 25, 2008 2:50 PM, Greg KH <gregkh@suse.de> wrote:
> On Fri, Jan 25, 2008 at 02:47:11PM -0800, Greg KH wrote:
> > On Fri, Jan 25, 2008 at 11:35:56PM +0100, Ingo Molnar wrote:
> > >
> > > * Greg KH <gregkh@suse.de> wrote:
> > >
> > > > On Fri, Jan 25, 2008 at 01:05:40PM -0800, Yinghai Lu wrote:
> > > > > current linus tree + x86.git
> > > > >
> > > > > got
> > > > >
> > > > > Calling initcall 0xffffffff80b93d98: threshold_init_device+0x0/0x3f()
> > > > > BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
> > > > > IP: [<ffffffff80458e20>] kobject_uevent_env+0x2a/0x3d9
> > > >
> > > > Does this happen on just Linus's tree?
> > > >
> > > > Can you send me a .config file for this?
> > > >
> > > > What is threshold_init()?  Is it something new in the x86.git tree?
> > >
> > > no. A quick grep shows that it is in a file that _your_ changes in
> > > Linus' latest have touched:
> > >
> > >   arch/x86/kernel/cpu/mcheck/mce_amd_64.c
> >
> > Ok, those are pretty much just search/and/replace type changes, but I
> > have been running x86-64 boxes with these changes in place.
>
> Oh wait, I do see a change.  We are now (finally) emitting a kobject
> uevent for these devices, which somehow the code can't handle properly.
>
> Let me go poke this some more, unfortunatly I don't have any AMD 64
> boxes here anymore, only Intel based processors, so I can't run this
> module...

it only happens with AMD Quad Core CPU or Fam 10h.

works well with AMD opteron Rev E, and Rev F.

So you may need have access to new system with quad core cpu.

YH

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: threshold_init_device/kobject_uevent_env oops
  2008-01-26  6:04         ` Yinghai Lu
@ 2008-01-26  6:14           ` Greg KH
  2008-01-26  7:08             ` Yinghai Lu
  0 siblings, 1 reply; 15+ messages in thread
From: Greg KH @ 2008-01-26  6:14 UTC (permalink / raw)
  To: Yinghai Lu, jacob.shin
  Cc: Ingo Molnar, Linux Kernel Mailing List, Linus Torvalds

On Fri, Jan 25, 2008 at 10:04:19PM -0800, Yinghai Lu wrote:
> On Jan 25, 2008 2:50 PM, Greg KH <gregkh@suse.de> wrote:
> > On Fri, Jan 25, 2008 at 02:47:11PM -0800, Greg KH wrote:
> > > On Fri, Jan 25, 2008 at 11:35:56PM +0100, Ingo Molnar wrote:
> > > >
> > > > * Greg KH <gregkh@suse.de> wrote:
> > > >
> > > > > On Fri, Jan 25, 2008 at 01:05:40PM -0800, Yinghai Lu wrote:
> > > > > > current linus tree + x86.git
> > > > > >
> > > > > > got
> > > > > >
> > > > > > Calling initcall 0xffffffff80b93d98: threshold_init_device+0x0/0x3f()
> > > > > > BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
> > > > > > IP: [<ffffffff80458e20>] kobject_uevent_env+0x2a/0x3d9
> > > > >
> > > > > Does this happen on just Linus's tree?
> > > > >
> > > > > Can you send me a .config file for this?
> > > > >
> > > > > What is threshold_init()?  Is it something new in the x86.git tree?
> > > >
> > > > no. A quick grep shows that it is in a file that _your_ changes in
> > > > Linus' latest have touched:
> > > >
> > > >   arch/x86/kernel/cpu/mcheck/mce_amd_64.c
> > >
> > > Ok, those are pretty much just search/and/replace type changes, but I
> > > have been running x86-64 boxes with these changes in place.
> >
> > Oh wait, I do see a change.  We are now (finally) emitting a kobject
> > uevent for these devices, which somehow the code can't handle properly.
> >
> > Let me go poke this some more, unfortunatly I don't have any AMD 64
> > boxes here anymore, only Intel based processors, so I can't run this
> > module...
> 
> it only happens with AMD Quad Core CPU or Fam 10h.
> 
> works well with AMD opteron Rev E, and Rev F.

So this only dies on a multi-core system?  Or does 2 processor boxes
work, but not 4?

> So you may need have access to new system with quad core cpu.

Ugh, that's not good.

The kobjects here are really not making much sense.

Jacob, any hints on exactly what you were trying to do with these
kobjects?  What's the end goal here, and why didn't you just use a
struct device instead?

The mce_amd_64.c file is the only thing in the tree using this userspace
API, can you please document it in Documentation/ABI so that others can
understand what it is used for, what files are expected, and what values
in the files are?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: threshold_init_device/kobject_uevent_env oops
  2008-01-25 23:20       ` Yinghai Lu
@ 2008-01-26  6:22         ` Greg KH
  0 siblings, 0 replies; 15+ messages in thread
From: Greg KH @ 2008-01-26  6:22 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Ingo Molnar, jacob.shin, Linux Kernel Mailing List, Linus Torvalds

On Fri, Jan 25, 2008 at 03:20:45PM -0800, Yinghai Lu wrote:
> On Jan 25, 2008 3:08 PM, Greg KH <gregkh@suse.de> wrote:
> > On Fri, Jan 25, 2008 at 11:35:56PM +0100, Ingo Molnar wrote:
> ..
> > Also, can someone enable CONFIG_KOBJECT_DEBUG and send me the output of
> > the startup of this code?  That should help explain what order things
> > are happening it.
> 
> Calling initcall 0xffffffff80ba1dee: threshold_init_device+0x0/0x3f()
> kobject: 'threshold_bank4' (ffff8108265450c0): kobject_add_internal: parent: 'machinecheck0', set: '<NULL>'
> kobject: 'misc0' (ffff810425497418): kobject_add_internal: parent: 'threshold_bank4', set: '<NULL>'
> kobject: 'misc1' (ffff810425497498): kobject_add_internal: parent: 'threshold_bank4', set: '<NULL>'
> kobject: 'misc2' (ffff810425497518): kobject_add_internal: parent: 'threshold_bank4', set: '<NULL>'
> Unable to handle kernel NULL pointer dereference at 0000000000000018 RIP: [<ffffffff8045d443>] kobject_uevent_env+0x31/0x45f

2 of these work just fine, and the third blows up in kobject_uevent().
So wierd, let me dig further...

Hm, it's when we unwind that we blow up on the kobject_uevent, as that's
the first time it is called (gotta love recursion here...)  So it is
really never working for these objects at all, what a mess.

As a work-around for now, you can probably just comment out the
'kobject_uevent() in the file arch/x86/kernel/cpu/mcheck/mcd_amd_64.c
and everything should work just fine, as there never really was an event
being properly generated before, no one would miss it now :)

I'll keep digging...

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: threshold_init_device/kobject_uevent_env oops
  2008-01-26  6:14           ` Greg KH
@ 2008-01-26  7:08             ` Yinghai Lu
  2008-01-26  7:24               ` Greg KH
  0 siblings, 1 reply; 15+ messages in thread
From: Yinghai Lu @ 2008-01-26  7:08 UTC (permalink / raw)
  To: Greg KH
  Cc: jacob.shin, Ingo Molnar, Linux Kernel Mailing List, Linus Torvalds

On Jan 25, 2008 10:14 PM, Greg KH <gregkh@suse.de> wrote:
>
> On Fri, Jan 25, 2008 at 10:04:19PM -0800, Yinghai Lu wrote:
> > On Jan 25, 2008 2:50 PM, Greg KH <gregkh@suse.de> wrote:
> > > On Fri, Jan 25, 2008 at 02:47:11PM -0800, Greg KH wrote:
> > > > On Fri, Jan 25, 2008 at 11:35:56PM +0100, Ingo Molnar wrote:
> > > > >
> > > > > * Greg KH <gregkh@suse.de> wrote:
> > > > >
> > > > > > On Fri, Jan 25, 2008 at 01:05:40PM -0800, Yinghai Lu wrote:
> > > > > > > current linus tree + x86.git
> > > > > > >
> > > > > > > got
> > > > > > >
> > > > > > > Calling initcall 0xffffffff80b93d98: threshold_init_device+0x0/0x3f()
> > > > > > > BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
> > > > > > > IP: [<ffffffff80458e20>] kobject_uevent_env+0x2a/0x3d9
> > > > > >
> > > > > > Does this happen on just Linus's tree?
> > > > > >
> > > > > > Can you send me a .config file for this?
> > > > > >
> > > > > > What is threshold_init()?  Is it something new in the x86.git tree?
> > > > >
> > > > > no. A quick grep shows that it is in a file that _your_ changes in
> > > > > Linus' latest have touched:
> > > > >
> > > > >   arch/x86/kernel/cpu/mcheck/mce_amd_64.c
> > > >
> > > > Ok, those are pretty much just search/and/replace type changes, but I
> > > > have been running x86-64 boxes with these changes in place.
> > >
> > > Oh wait, I do see a change.  We are now (finally) emitting a kobject
> > > uevent for these devices, which somehow the code can't handle properly.
> > >
> > > Let me go poke this some more, unfortunatly I don't have any AMD 64
> > > boxes here anymore, only Intel based processors, so I can't run this
> > > module...
> >
> > it only happens with AMD Quad Core CPU or Fam 10h.
> >
> > works well with AMD opteron Rev E, and Rev F.
>
> So this only dies on a multi-core system?  Or does 2 processor boxes
> work, but not 4?

2 sockets x quad core will fail (fam 10h)
2 sockets x dual core works....( rev E, and rev F opteron)

there are some changs between opteron and fam10h.  fam10h may have
more local vectors for MCE...
or more banks and blocks...

will look at AMD64 Bios and kernel porting guide for Fam 10h again..

wonder if your code uncover some bugs ...

YH

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: threshold_init_device/kobject_uevent_env oops
  2008-01-26  7:08             ` Yinghai Lu
@ 2008-01-26  7:24               ` Greg KH
  2008-01-26  7:35                 ` Greg KH
  2008-01-26 21:26                 ` Yinghai Lu
  0 siblings, 2 replies; 15+ messages in thread
From: Greg KH @ 2008-01-26  7:24 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: jacob.shin, Ingo Molnar, Linux Kernel Mailing List, Linus Torvalds

On Fri, Jan 25, 2008 at 11:08:53PM -0800, Yinghai Lu wrote:
> On Jan 25, 2008 10:14 PM, Greg KH <gregkh@suse.de> wrote:
> >
> > On Fri, Jan 25, 2008 at 10:04:19PM -0800, Yinghai Lu wrote:
> > > On Jan 25, 2008 2:50 PM, Greg KH <gregkh@suse.de> wrote:
> > > > On Fri, Jan 25, 2008 at 02:47:11PM -0800, Greg KH wrote:
> > > > > On Fri, Jan 25, 2008 at 11:35:56PM +0100, Ingo Molnar wrote:
> > > > > >
> > > > > > * Greg KH <gregkh@suse.de> wrote:
> > > > > >
> > > > > > > On Fri, Jan 25, 2008 at 01:05:40PM -0800, Yinghai Lu wrote:
> > > > > > > > current linus tree + x86.git
> > > > > > > >
> > > > > > > > got
> > > > > > > >
> > > > > > > > Calling initcall 0xffffffff80b93d98: threshold_init_device+0x0/0x3f()
> > > > > > > > BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
> > > > > > > > IP: [<ffffffff80458e20>] kobject_uevent_env+0x2a/0x3d9
> > > > > > >
> > > > > > > Does this happen on just Linus's tree?
> > > > > > >
> > > > > > > Can you send me a .config file for this?
> > > > > > >
> > > > > > > What is threshold_init()?  Is it something new in the x86.git tree?
> > > > > >
> > > > > > no. A quick grep shows that it is in a file that _your_ changes in
> > > > > > Linus' latest have touched:
> > > > > >
> > > > > >   arch/x86/kernel/cpu/mcheck/mce_amd_64.c
> > > > >
> > > > > Ok, those are pretty much just search/and/replace type changes, but I
> > > > > have been running x86-64 boxes with these changes in place.
> > > >
> > > > Oh wait, I do see a change.  We are now (finally) emitting a kobject
> > > > uevent for these devices, which somehow the code can't handle properly.
> > > >
> > > > Let me go poke this some more, unfortunatly I don't have any AMD 64
> > > > boxes here anymore, only Intel based processors, so I can't run this
> > > > module...
> > >
> > > it only happens with AMD Quad Core CPU or Fam 10h.
> > >
> > > works well with AMD opteron Rev E, and Rev F.
> >
> > So this only dies on a multi-core system?  Or does 2 processor boxes
> > work, but not 4?
> 
> 2 sockets x quad core will fail (fam 10h)
> 2 sockets x dual core works....( rev E, and rev F opteron)
> 
> there are some changs between opteron and fam10h.  fam10h may have
> more local vectors for MCE...
> or more banks and blocks...
> 
> will look at AMD64 Bios and kernel porting guide for Fam 10h again..
> 
> wonder if your code uncover some bugs ...

No, the logic in this function is just crazy.  It's recursive, but we
can circumvent the creation for the kobject and whole creation of the
threshold_block if some conditions are met.  That's why we see the
allocate_threshold_blocks so many times in the callstack, yet only a few
kobjects created.

Then we blow up in kobject_uevent_env() on the first debug printk.
Which means that we are just passing in garbage.

Let me know if the patch below fixes this for you, I think it should, as
there is a code path where b is NULL and then we call kobject_uevent.

Man, this is one time that comments in code would have been very nice to
have, and why forward goto's into major code blocks are just evil...

thanks,

greg k-h

diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd_64.c b/arch/x86/kernel/cpu/mcheck/mce_amd_64.c
index 7535887..8a7f204 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_amd_64.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_amd_64.c
@@ -450,7 +450,8 @@ recurse:
 	if (err)
 		goto out_free;
 
-	kobject_uevent(&b->kobj, KOBJ_ADD);
+	if (b && &b->kobj)
+		kobject_uevent(&b->kobj, KOBJ_ADD);
 
 	return err;
 

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: threshold_init_device/kobject_uevent_env oops
  2008-01-26  7:24               ` Greg KH
@ 2008-01-26  7:35                 ` Greg KH
  2008-01-26 21:26                 ` Yinghai Lu
  1 sibling, 0 replies; 15+ messages in thread
From: Greg KH @ 2008-01-26  7:35 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: jacob.shin, Ingo Molnar, Linux Kernel Mailing List, Linus Torvalds

On Fri, Jan 25, 2008 at 11:24:55PM -0800, Greg KH wrote:
> On Fri, Jan 25, 2008 at 11:08:53PM -0800, Yinghai Lu wrote:
> > On Jan 25, 2008 10:14 PM, Greg KH <gregkh@suse.de> wrote:
> > >
> > > On Fri, Jan 25, 2008 at 10:04:19PM -0800, Yinghai Lu wrote:
> > > > On Jan 25, 2008 2:50 PM, Greg KH <gregkh@suse.de> wrote:
> > > > > On Fri, Jan 25, 2008 at 02:47:11PM -0800, Greg KH wrote:
> > > > > > On Fri, Jan 25, 2008 at 11:35:56PM +0100, Ingo Molnar wrote:
> > > > > > >
> > > > > > > * Greg KH <gregkh@suse.de> wrote:
> > > > > > >
> > > > > > > > On Fri, Jan 25, 2008 at 01:05:40PM -0800, Yinghai Lu wrote:
> > > > > > > > > current linus tree + x86.git
> > > > > > > > >
> > > > > > > > > got
> > > > > > > > >
> > > > > > > > > Calling initcall 0xffffffff80b93d98: threshold_init_device+0x0/0x3f()
> > > > > > > > > BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
> > > > > > > > > IP: [<ffffffff80458e20>] kobject_uevent_env+0x2a/0x3d9
> > > > > > > >
> > > > > > > > Does this happen on just Linus's tree?
> > > > > > > >
> > > > > > > > Can you send me a .config file for this?
> > > > > > > >
> > > > > > > > What is threshold_init()?  Is it something new in the x86.git tree?
> > > > > > >
> > > > > > > no. A quick grep shows that it is in a file that _your_ changes in
> > > > > > > Linus' latest have touched:
> > > > > > >
> > > > > > >   arch/x86/kernel/cpu/mcheck/mce_amd_64.c
> > > > > >
> > > > > > Ok, those are pretty much just search/and/replace type changes, but I
> > > > > > have been running x86-64 boxes with these changes in place.
> > > > >
> > > > > Oh wait, I do see a change.  We are now (finally) emitting a kobject
> > > > > uevent for these devices, which somehow the code can't handle properly.
> > > > >
> > > > > Let me go poke this some more, unfortunatly I don't have any AMD 64
> > > > > boxes here anymore, only Intel based processors, so I can't run this
> > > > > module...
> > > >
> > > > it only happens with AMD Quad Core CPU or Fam 10h.
> > > >
> > > > works well with AMD opteron Rev E, and Rev F.
> > >
> > > So this only dies on a multi-core system?  Or does 2 processor boxes
> > > work, but not 4?
> > 
> > 2 sockets x quad core will fail (fam 10h)
> > 2 sockets x dual core works....( rev E, and rev F opteron)
> > 
> > there are some changs between opteron and fam10h.  fam10h may have
> > more local vectors for MCE...
> > or more banks and blocks...
> > 
> > will look at AMD64 Bios and kernel porting guide for Fam 10h again..
> > 
> > wonder if your code uncover some bugs ...
> 
> No, the logic in this function is just crazy.  It's recursive, but we
> can circumvent the creation for the kobject and whole creation of the
> threshold_block if some conditions are met.  That's why we see the
> allocate_threshold_blocks so many times in the callstack, yet only a few
> kobjects created.
> 
> Then we blow up in kobject_uevent_env() on the first debug printk.
> Which means that we are just passing in garbage.
> 
> Let me know if the patch below fixes this for you, I think it should, as
> there is a code path where b is NULL and then we call kobject_uevent.
> 
> Man, this is one time that comments in code would have been very nice to
> have, and why forward goto's into major code blocks are just evil...
> 
> thanks,
> 
> greg k-h
> 
> diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd_64.c b/arch/x86/kernel/cpu/mcheck/mce_amd_64.c
> index 7535887..8a7f204 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce_amd_64.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce_amd_64.c
> @@ -450,7 +450,8 @@ recurse:
>  	if (err)
>  		goto out_free;
>  
> -	kobject_uevent(&b->kobj, KOBJ_ADD);
> +	if (b && &b->kobj)
> +		kobject_uevent(&b->kobj, KOBJ_ADD);
>  
>  	return err;
>  

Actually the second test doesn't make sense, it can just be:


diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd_64.c b/arch/x86/kernel/cpu/mcheck/mce_amd_64.c
index 7535887..8a7f204 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_amd_64.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_amd_64.c
@@ -450,7 +450,8 @@ recurse:
 	if (err)
 		goto out_free;
 
-	kobject_uevent(&b->kobj, KOBJ_ADD);
+	if (b)
+		kobject_uevent(&b->kobj, KOBJ_ADD);
 
 	return err;
 

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: threshold_init_device/kobject_uevent_env oops
  2008-01-26  7:24               ` Greg KH
  2008-01-26  7:35                 ` Greg KH
@ 2008-01-26 21:26                 ` Yinghai Lu
  1 sibling, 0 replies; 15+ messages in thread
From: Yinghai Lu @ 2008-01-26 21:26 UTC (permalink / raw)
  To: Greg KH
  Cc: jacob.shin, Ingo Molnar, Linux Kernel Mailing List, Linus Torvalds

[-- Attachment #1: Type: text/plain, Size: 2959 bytes --]

On Jan 25, 2008 11:24 PM, Greg KH <gregkh@suse.de> wrote:
>
> On Fri, Jan 25, 2008 at 11:08:53PM -0800, Yinghai Lu wrote:
> > On Jan 25, 2008 10:14 PM, Greg KH <gregkh@suse.de> wrote:
> > >
> > > On Fri, Jan 25, 2008 at 10:04:19PM -0800, Yinghai Lu wrote:
> > > > On Jan 25, 2008 2:50 PM, Greg KH <gregkh@suse.de> wrote:
> > > > > On Fri, Jan 25, 2008 at 02:47:11PM -0800, Greg KH wrote:
> > > > > > On Fri, Jan 25, 2008 at 11:35:56PM +0100, Ingo Molnar wrote:
> > > > > > >
> > > > > > > * Greg KH <gregkh@suse.de> wrote:
> > > > > > >
> > > > > > > > On Fri, Jan 25, 2008 at 01:05:40PM -0800, Yinghai Lu wrote:
> > > > > > > > > current linus tree + x86.git
> > > > > > > > >
> > > > > > > > > got
> > > > > > > > >
> > > > > > > > > Calling initcall 0xffffffff80b93d98: threshold_init_device+0x0/0x3f()
> > > > > > > > > BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
> > > > > > > > > IP: [<ffffffff80458e20>] kobject_uevent_env+0x2a/0x3d9
> > > > > > > >
> > > > > > > > Does this happen on just Linus's tree?
> > > > > > > >
> > > > > > > > Can you send me a .config file for this?
> > > > > > > >
> > > > > > > > What is threshold_init()?  Is it something new in the x86.git tree?
> > > > > > >
> > > > > > > no. A quick grep shows that it is in a file that _your_ changes in
> > > > > > > Linus' latest have touched:
> > > > > > >
> > > > > > >   arch/x86/kernel/cpu/mcheck/mce_amd_64.c
> > > > > >
> > > > > > Ok, those are pretty much just search/and/replace type changes, but I
> > > > > > have been running x86-64 boxes with these changes in place.
> > > > >
> > > > > Oh wait, I do see a change.  We are now (finally) emitting a kobject
> > > > > uevent for these devices, which somehow the code can't handle properly.
> > > > >
> > > > > Let me go poke this some more, unfortunatly I don't have any AMD 64
> > > > > boxes here anymore, only Intel based processors, so I can't run this
> > > > > module...
> > > >
> > > > it only happens with AMD Quad Core CPU or Fam 10h.
> > > >
> > > > works well with AMD opteron Rev E, and Rev F.
> > >
> > > So this only dies on a multi-core system?  Or does 2 processor boxes
> > > work, but not 4?
> >
> > 2 sockets x quad core will fail (fam 10h)
> > 2 sockets x dual core works....( rev E, and rev F opteron)
> >
> > there are some changs between opteron and fam10h.  fam10h may have
> > more local vectors for MCE...
> > or more banks and blocks...
> >
> > will look at AMD64 Bios and kernel porting guide for Fam 10h again..
> >
> > wonder if your code uncover some bugs ...
>
> No, the logic in this function is just crazy.  It's recursive, but we
> can circumvent the creation for the kobject and whole creation of the
> threshold_block if some conditions are met.  That's why we see the
> allocate_threshold_blocks so many times in the callstack, yet only a few
> kobjects created.

i produced one patch that remove the recursive. will test it and your
patch Monday.

YH

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: mce_check_amd64.patch --]
[-- Type: text/x-patch; name=mce_check_amd64.patch, Size: 936 bytes --]

diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd_64.c b/arch/x86/kernel/cpu/mcheck/mce_amd_64.c
index 65621fd..5c4cb21 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_amd_64.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_amd_64.c
@@ -553,8 +552,9 @@ static __cpuinit int threshold_create_device(unsigned int cpu)
 	unsigned int bank;
 	int err = 0;
 
+	printk(KERN_DEBUG "threshold_create_device: cpu %d, bank_map=%02x\n", cpu, per_cpu(bank_map,cpu));
 	for (bank = 0; bank < NR_BANKS; ++bank) {
-		if (!(per_cpu(bank_map, cpu) & 1 << bank))
+		if (!(per_cpu(bank_map, cpu) & (1 << bank)))
 			continue;
 		err = threshold_create_bank(cpu, bank);
 		if (err)
@@ -637,7 +637,7 @@ static void threshold_remove_device(unsigned int cpu)
 	unsigned int bank;
 
 	for (bank = 0; bank < NR_BANKS; ++bank) {
-		if (!(per_cpu(bank_map, cpu) & 1 << bank))
+		if (!(per_cpu(bank_map, cpu) & (1 << bank)))
 			continue;
 		threshold_remove_bank(cpu, bank);
 	}

^ permalink raw reply related	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2008-01-26 21:26 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-01-25 21:05 threshold_init_device/kobject_uevent_env oops Yinghai Lu
2008-01-25 22:15 ` Greg KH
2008-01-25 22:35   ` Ingo Molnar
2008-01-25 22:47     ` Greg KH
2008-01-25 22:50       ` Greg KH
2008-01-26  6:04         ` Yinghai Lu
2008-01-26  6:14           ` Greg KH
2008-01-26  7:08             ` Yinghai Lu
2008-01-26  7:24               ` Greg KH
2008-01-26  7:35                 ` Greg KH
2008-01-26 21:26                 ` Yinghai Lu
2008-01-25 23:12       ` Yinghai Lu
2008-01-25 23:08     ` Greg KH
2008-01-25 23:20       ` Yinghai Lu
2008-01-26  6:22         ` Greg KH

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).