linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* v2.6.28-rc7: error in panic code? (NULL pointer dereference at 0000004c)
@ 2008-12-18 22:06 Vegard Nossum
  2008-12-19 17:19 ` Vegard Nossum
  0 siblings, 1 reply; 15+ messages in thread
From: Vegard Nossum @ 2008-12-18 22:06 UTC (permalink / raw)
  To: Linux Kernel Mailing List

Hi,

With such a patch:

diff --git a/init/main.c b/init/main.c
index 7e117a2..2f93119 100644
--- a/init/main.c
+++ b/init/main.c
@@ -465,6 +465,8 @@ static void noinline __init_refok rest_init(void)
 {
        int pid;

+       *(char *) NULL = 0;
+
        kernel_thread(kernel_init, NULL, CLONE_FS | CLONE_SIGHAND);
        numa_default_policy();
        pid = kernel_thread(kthreadd, NULL, CLONE_FS | CLONE_FILES);

...I would expect a page fault and that's that. So panic() is called,
but it causes a new page fault somewhere. Here is the log:

(this part is correct and expected)

[    0.031003] BUG: unable to handle kernel NULL pointer dereference at 00000000
[    0.033997] IP: [<c13b448b>] rest_init+0xf/0x57
[    0.035997] *pde = 00000000
[    0.037289] Oops: 0002 [#1] SMP
[    0.037994] last sysfs file:
[    0.037994] Modules linked in:
[    0.037994]
[    0.037994] Pid: 0, comm: swapper Not tainted (2.6.28-rc7 #181) 945P-A
[    0.037994] EIP: 0060:[<c13b448b>] EFLAGS: 00010246 CPU: 0
[    0.037994] EIP is at rest_init+0xf/0x57
[    0.037994] EAX: c16631e3 EBX: 00000040 ECX: 00000a00 EDX: 00000000
[    0.037994] ESI: 00099800 EDI: c160a000 EBP: c165dfd0 ESP: c165dfd0
[    0.037994]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[    0.037994] Process swapper (pid: 0, ti=c165c000 task=c156e334
task.ti=c165c000)
[    0.037994] Stack:
[    0.037994]  c165dfe0 c16637af c1691768 00000000 c165dff8 c1663080
0175a000 00000000
[    0.037994]  c14ceaae 00020800 01b7e003 00000000
[    0.037994] Call Trace:
[    0.037994]  [<c16637af>] ? start_kernel+0x2a2/0x2a7
[    0.037994]  [<c1663080>] ? __init_begin+0x80/0x88
[    0.037994] Code: 00 8b 43 04 8d 56 04 89 50 04 89 46 04 8d 43 04
89 46 08 89 53 04 fe 03 5b 5e 5d c3 55 b9 00 0a 00 00 8
9 e5 31 d2 b8 e3 31 66 c1 <c6> 05 00 00 00 00 00 e8 00 df c4 ff b9 00
06 00 00 31 d2 b8 af
[    0.037994] EIP: [<c13b448b>] rest_init+0xf/0x57 SS:ESP 0068:c165dfd0
[    0.038004] ---[ end trace 4eaa2a86a8e2da22 ]---
[    0.038998] Kernel panic - not syncing: Attempted to kill the idle task!

And now the unexpected part:

[    0.039999] Rebooting in 10 seconds..<1>BUG: unable to handle
kernel NULL pointer dereference at 0000004c
[    0.040993] IP: [<c13b41dc>] klist_next+0x10/0x8d
[    0.040993] *pde = 00000000
[    0.040993] Oops: 0000 [#2] SMP
[    0.040993] last sysfs file:
[    0.040993] Modules linked in:
[    0.040993]
[    0.040993] Pid: 0, comm: swapper Tainted: G      D    (2.6.28-rc7
#181) 945P-A
[    0.040993] EIP: 0060:[<c13b41dc>] EFLAGS: 00010286 CPU: 0
[    0.040993] EIP is at klist_next+0x10/0x8d
[    0.040993] EAX: 0000003c EBX: c165dd60 ECX: 00000000 EDX: c165dd60
[    0.040993] ESI: c165dd60 EDI: 00000000 EBP: c165dd58 ESP: c165dd48
[    0.040993]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[    0.040993] Process swapper (pid: 0, ti=c165c000 task=c156e334
task.ti=c165c000)
[    0.040993] Stack:
[    0.040993]  c1026a97 c165dd60 c165dd60 00000000 c165dd74 c11be196
0000003c 00000000
[    0.040993]  c13dbde0 00001078 00000100 c165dd84 c114ed26 c114e458
c13dbde0 c165dd9c
[    0.040993]  c1152546 ffffffff c13dbde0 00002710 0000000b c165ddac
c115259a ffffffff
[    0.040993] Call Trace:
[    0.040993]  [<c1026a97>] ? release_console_sem+0x16c/0x199
[    0.040993]  [<c11be196>] ? bus_find_device+0x4e/0x6e
[    0.040993]  [<c114ed26>] ? no_pci_devices+0x17/0x2d
[    0.040993]  [<c114e458>] ? find_anything+0x0/0xa
[    0.040993]  [<c1152546>] ? pci_get_subsys+0x15/0x5b
[    0.040993]  [<c115259a>] ? pci_get_device+0xe/0x10
[    0.040993]  [<c1013342>] ? mach_reboot_fixups+0x27/0x3c
[    0.040993]  [<c100fdfb>] ? native_machine_emergency_restart+0x3e/0xd7
[    0.040993]  [<c100fc60>] ? machine_emergency_restart+0x9/0xb
[    0.040993]  [<c1032b4b>] ? emergency_restart+0x8/0xa
[    0.040993]  [<c13cf607>] ? panic+0xb9/0xd6
[    0.040993]  [<c1028ee0>] ? do_exit+0x5b/0x740
[    0.040993]  [<c13cf633>] ? printk+0xf/0x11
[    0.040993]  [<c10263df>] ? print_oops_end_marker+0x1e/0x23
[    0.040993]  [<c13d1c1e>] ? oops_end+0x7f/0x87
[    0.040993]  [<c1005a08>] ? die+0x5b/0x63
[    0.040993]  [<c13d3061>] ? do_page_fault+0x581/0x66f
[    0.040993]  [<c103a537>] ? sched_clock_cpu+0x136/0x142
[    0.040993]  [<c103a537>] ? sched_clock_cpu+0x136/0x142
[    0.040993]  [<c1039165>] ? ktime_get+0x13/0x2f
[    0.040993]  [<c103a562>] ? sched_clock_idle_sleep_event+0xe/0x10
[    0.040993]  [<c102a86f>] ? __do_softirq+0x119/0x121
[    0.040993]  [<c116e282>] ? acpi_hw_low_level_read+0x3b/0x68
[    0.040993]  [<c116e34f>] ? acpi_hw_register_read+0xa0/0x112
[    0.040993]  [<c116e4f0>] ? acpi_get_register_unlocked+0x2c/0x48
[    0.040993]  [<c1161aa7>] ? acpi_os_release_lock+0x8/0xa
[    0.040993]  [<c116e67b>] ? acpi_get_register+0x2d/0x34
[    0.040993]  [<c13d2ae0>] ? do_page_fault+0x0/0x66f
[    0.040993]  [<c13d13da>] ? error_code+0x72/0x78
[    0.040993]  [<c16631e3>] ? kernel_init+0x0/0x148
[    0.040993]  [<c13b448b>] ? rest_init+0xf/0x57
[    0.040993]  [<c16637af>] ? start_kernel+0x2a2/0x2a7
[    0.040993]  [<c1663080>] ? __init_begin+0x80/0x88
[    0.040993] Code: 89 4a 04 74 08 8d 41 0c e8 fa 04 d9 ff 5d c3 55
31 c9 89 e5 e8 e0 ff ff ff 5d c3 55 89 e5 57 56 89 c6 5
3 83 ec 04 8b 00 8b 7e 04 <8b> 50 10 89 55 f0 e8 7b cf 01 00 85 ff 74
23 8b 47 04 ba ec 42
[    0.040993] EIP: [<c13b41dc>] klist_next+0x10/0x8d SS:ESP 0068:c165dd48

I know this is not much to fuzz about since it was artificially
induced with the NULL pointer dereference, but what if such an error
(a real one) made it into the kernel, it could scroll away the real
oops. Anyway -- to reproduce, apply the patch and boot with panic=10
(1 also works). Thanks for the attention,


Vegard

-- 
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
	-- E. W. Dijkstra, EWD1036

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: v2.6.28-rc7: error in panic code? (NULL pointer dereference at 0000004c)
  2008-12-18 22:06 v2.6.28-rc7: error in panic code? (NULL pointer dereference at 0000004c) Vegard Nossum
@ 2008-12-19 17:19 ` Vegard Nossum
  2008-12-19 17:39   ` Greg KH
  0 siblings, 1 reply; 15+ messages in thread
From: Vegard Nossum @ 2008-12-19 17:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Linux Kernel Mailing List

On Thu, Dec 18, 2008 at 11:06 PM, Vegard Nossum <vegard.nossum@gmail.com> wrote:
> Hi,
>
> With such a patch:
>
> diff --git a/init/main.c b/init/main.c
> index 7e117a2..2f93119 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -465,6 +465,8 @@ static void noinline __init_refok rest_init(void)
>  {
>        int pid;
>
> +       *(char *) NULL = 0;
> +
>        kernel_thread(kernel_init, NULL, CLONE_FS | CLONE_SIGHAND);
>        numa_default_policy();
>        pid = kernel_thread(kthreadd, NULL, CLONE_FS | CLONE_FILES);
>
> ...I would expect a page fault and that's that. So panic() is called,
> but it causes a new page fault somewhere. Here is the log:
>
> (this part is correct and expected)
>
> [    0.031003] BUG: unable to handle kernel NULL pointer dereference at 00000000
> [    0.033997] IP: [<c13b448b>] rest_init+0xf/0x57
> [    0.035997] *pde = 00000000
> [    0.037289] Oops: 0002 [#1] SMP
> [    0.037994] last sysfs file:
> [    0.037994] Modules linked in:
> [    0.037994]
> [    0.037994] Pid: 0, comm: swapper Not tainted (2.6.28-rc7 #181) 945P-A
> [    0.037994] EIP: 0060:[<c13b448b>] EFLAGS: 00010246 CPU: 0
> [    0.037994] EIP is at rest_init+0xf/0x57
> [    0.037994] EAX: c16631e3 EBX: 00000040 ECX: 00000a00 EDX: 00000000
> [    0.037994] ESI: 00099800 EDI: c160a000 EBP: c165dfd0 ESP: c165dfd0
> [    0.037994]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> [    0.037994] Process swapper (pid: 0, ti=c165c000 task=c156e334
> task.ti=c165c000)
> [    0.037994] Stack:
> [    0.037994]  c165dfe0 c16637af c1691768 00000000 c165dff8 c1663080
> 0175a000 00000000
> [    0.037994]  c14ceaae 00020800 01b7e003 00000000
> [    0.037994] Call Trace:
> [    0.037994]  [<c16637af>] ? start_kernel+0x2a2/0x2a7
> [    0.037994]  [<c1663080>] ? __init_begin+0x80/0x88
> [    0.037994] Code: 00 8b 43 04 8d 56 04 89 50 04 89 46 04 8d 43 04
> 89 46 08 89 53 04 fe 03 5b 5e 5d c3 55 b9 00 0a 00 00 8
> 9 e5 31 d2 b8 e3 31 66 c1 <c6> 05 00 00 00 00 00 e8 00 df c4 ff b9 00
> 06 00 00 31 d2 b8 af
> [    0.037994] EIP: [<c13b448b>] rest_init+0xf/0x57 SS:ESP 0068:c165dfd0
> [    0.038004] ---[ end trace 4eaa2a86a8e2da22 ]---
> [    0.038998] Kernel panic - not syncing: Attempted to kill the idle task!
>
> And now the unexpected part:
>
> [    0.039999] Rebooting in 10 seconds..<1>BUG: unable to handle
> kernel NULL pointer dereference at 0000004c
> [    0.040993] IP: [<c13b41dc>] klist_next+0x10/0x8d
> [    0.040993] *pde = 00000000
> [    0.040993] Oops: 0000 [#2] SMP
> [    0.040993] last sysfs file:
> [    0.040993] Modules linked in:
> [    0.040993]
> [    0.040993] Pid: 0, comm: swapper Tainted: G      D    (2.6.28-rc7
> #181) 945P-A
> [    0.040993] EIP: 0060:[<c13b41dc>] EFLAGS: 00010286 CPU: 0
> [    0.040993] EIP is at klist_next+0x10/0x8d
> [    0.040993] EAX: 0000003c EBX: c165dd60 ECX: 00000000 EDX: c165dd60
> [    0.040993] ESI: c165dd60 EDI: 00000000 EBP: c165dd58 ESP: c165dd48
> [    0.040993]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> [    0.040993] Process swapper (pid: 0, ti=c165c000 task=c156e334
> task.ti=c165c000)
> [    0.040993] Stack:
> [    0.040993]  c1026a97 c165dd60 c165dd60 00000000 c165dd74 c11be196
> 0000003c 00000000
> [    0.040993]  c13dbde0 00001078 00000100 c165dd84 c114ed26 c114e458
> c13dbde0 c165dd9c
> [    0.040993]  c1152546 ffffffff c13dbde0 00002710 0000000b c165ddac
> c115259a ffffffff
> [    0.040993] Call Trace:
> [    0.040993]  [<c1026a97>] ? release_console_sem+0x16c/0x199
> [    0.040993]  [<c11be196>] ? bus_find_device+0x4e/0x6e
> [    0.040993]  [<c114ed26>] ? no_pci_devices+0x17/0x2d
> [    0.040993]  [<c114e458>] ? find_anything+0x0/0xa
> [    0.040993]  [<c1152546>] ? pci_get_subsys+0x15/0x5b
> [    0.040993]  [<c115259a>] ? pci_get_device+0xe/0x10
> [    0.040993]  [<c1013342>] ? mach_reboot_fixups+0x27/0x3c
> [    0.040993]  [<c100fdfb>] ? native_machine_emergency_restart+0x3e/0xd7
> [    0.040993]  [<c100fc60>] ? machine_emergency_restart+0x9/0xb
> [    0.040993]  [<c1032b4b>] ? emergency_restart+0x8/0xa
> [    0.040993]  [<c13cf607>] ? panic+0xb9/0xd6
> [    0.040993]  [<c1028ee0>] ? do_exit+0x5b/0x740
> [    0.040993]  [<c13cf633>] ? printk+0xf/0x11
> [    0.040993]  [<c10263df>] ? print_oops_end_marker+0x1e/0x23
> [    0.040993]  [<c13d1c1e>] ? oops_end+0x7f/0x87
> [    0.040993]  [<c1005a08>] ? die+0x5b/0x63
> [    0.040993]  [<c13d3061>] ? do_page_fault+0x581/0x66f
> [    0.040993]  [<c103a537>] ? sched_clock_cpu+0x136/0x142
> [    0.040993]  [<c103a537>] ? sched_clock_cpu+0x136/0x142
> [    0.040993]  [<c1039165>] ? ktime_get+0x13/0x2f
> [    0.040993]  [<c103a562>] ? sched_clock_idle_sleep_event+0xe/0x10
> [    0.040993]  [<c102a86f>] ? __do_softirq+0x119/0x121
> [    0.040993]  [<c116e282>] ? acpi_hw_low_level_read+0x3b/0x68
> [    0.040993]  [<c116e34f>] ? acpi_hw_register_read+0xa0/0x112
> [    0.040993]  [<c116e4f0>] ? acpi_get_register_unlocked+0x2c/0x48
> [    0.040993]  [<c1161aa7>] ? acpi_os_release_lock+0x8/0xa
> [    0.040993]  [<c116e67b>] ? acpi_get_register+0x2d/0x34
> [    0.040993]  [<c13d2ae0>] ? do_page_fault+0x0/0x66f
> [    0.040993]  [<c13d13da>] ? error_code+0x72/0x78
> [    0.040993]  [<c16631e3>] ? kernel_init+0x0/0x148
> [    0.040993]  [<c13b448b>] ? rest_init+0xf/0x57
> [    0.040993]  [<c16637af>] ? start_kernel+0x2a2/0x2a7
> [    0.040993]  [<c1663080>] ? __init_begin+0x80/0x88
> [    0.040993] Code: 89 4a 04 74 08 8d 41 0c e8 fa 04 d9 ff 5d c3 55
> 31 c9 89 e5 e8 e0 ff ff ff 5d c3 55 89 e5 57 56 89 c6 5
> 3 83 ec 04 8b 00 8b 7e 04 <8b> 50 10 89 55 f0 e8 7b cf 01 00 85 ff 74
> 23 8b 47 04 ba ec 42
> [    0.040993] EIP: [<c13b41dc>] klist_next+0x10/0x8d SS:ESP 0068:c165dd48
>
> I know this is not much to fuzz about since it was artificially
> induced with the NULL pointer dereference, but what if such an error
> (a real one) made it into the kernel, it could scroll away the real
> oops. Anyway -- to reproduce, apply the patch and boot with panic=10
> (1 also works). Thanks for the attention,

Reverting:

commit 70308923d317f2ad4973c30d90bb48ae38761317
Author: Greg Kroah-Hartman <gregkh@suse.de>
Date:   Wed Feb 13 22:30:39 2008 -0800

    PCI: make no_pci_devices() use the pci_bus_type list

    no_pci_devices() should use the driver core list of PCI devices, not our
    "separate" one.

    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

fixes it.


Vegard

-- 
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
	-- E. W. Dijkstra, EWD1036

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: v2.6.28-rc7: error in panic code? (NULL pointer dereference at 0000004c)
  2008-12-19 17:19 ` Vegard Nossum
@ 2008-12-19 17:39   ` Greg KH
  2008-12-19 20:35     ` Pekka Enberg
  0 siblings, 1 reply; 15+ messages in thread
From: Greg KH @ 2008-12-19 17:39 UTC (permalink / raw)
  To: Vegard Nossum; +Cc: Linux Kernel Mailing List

On Fri, Dec 19, 2008 at 06:19:43PM +0100, Vegard Nossum wrote:
> On Thu, Dec 18, 2008 at 11:06 PM, Vegard Nossum <vegard.nossum@gmail.com> wrote:
> > Hi,
> >
> > With such a patch:
> >
> > diff --git a/init/main.c b/init/main.c
> > index 7e117a2..2f93119 100644
> > --- a/init/main.c
> > +++ b/init/main.c
> > @@ -465,6 +465,8 @@ static void noinline __init_refok rest_init(void)
> >  {
> >        int pid;
> >
> > +       *(char *) NULL = 0;
> > +
> >        kernel_thread(kernel_init, NULL, CLONE_FS | CLONE_SIGHAND);
> >        numa_default_policy();
> >        pid = kernel_thread(kthreadd, NULL, CLONE_FS | CLONE_FILES);
> >
> > ...I would expect a page fault and that's that. So panic() is called,
> > but it causes a new page fault somewhere. Here is the log:
> >
> > (this part is correct and expected)
> >
> > [    0.031003] BUG: unable to handle kernel NULL pointer dereference at 00000000
> > [    0.033997] IP: [<c13b448b>] rest_init+0xf/0x57
> > [    0.035997] *pde = 00000000
> > [    0.037289] Oops: 0002 [#1] SMP
> > [    0.037994] last sysfs file:
> > [    0.037994] Modules linked in:
> > [    0.037994]
> > [    0.037994] Pid: 0, comm: swapper Not tainted (2.6.28-rc7 #181) 945P-A
> > [    0.037994] EIP: 0060:[<c13b448b>] EFLAGS: 00010246 CPU: 0
> > [    0.037994] EIP is at rest_init+0xf/0x57
> > [    0.037994] EAX: c16631e3 EBX: 00000040 ECX: 00000a00 EDX: 00000000
> > [    0.037994] ESI: 00099800 EDI: c160a000 EBP: c165dfd0 ESP: c165dfd0
> > [    0.037994]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> > [    0.037994] Process swapper (pid: 0, ti=c165c000 task=c156e334
> > task.ti=c165c000)
> > [    0.037994] Stack:
> > [    0.037994]  c165dfe0 c16637af c1691768 00000000 c165dff8 c1663080
> > 0175a000 00000000
> > [    0.037994]  c14ceaae 00020800 01b7e003 00000000
> > [    0.037994] Call Trace:
> > [    0.037994]  [<c16637af>] ? start_kernel+0x2a2/0x2a7
> > [    0.037994]  [<c1663080>] ? __init_begin+0x80/0x88
> > [    0.037994] Code: 00 8b 43 04 8d 56 04 89 50 04 89 46 04 8d 43 04
> > 89 46 08 89 53 04 fe 03 5b 5e 5d c3 55 b9 00 0a 00 00 8
> > 9 e5 31 d2 b8 e3 31 66 c1 <c6> 05 00 00 00 00 00 e8 00 df c4 ff b9 00
> > 06 00 00 31 d2 b8 af
> > [    0.037994] EIP: [<c13b448b>] rest_init+0xf/0x57 SS:ESP 0068:c165dfd0
> > [    0.038004] ---[ end trace 4eaa2a86a8e2da22 ]---
> > [    0.038998] Kernel panic - not syncing: Attempted to kill the idle task!
> >
> > And now the unexpected part:
> >
> > [    0.039999] Rebooting in 10 seconds..<1>BUG: unable to handle
> > kernel NULL pointer dereference at 0000004c
> > [    0.040993] IP: [<c13b41dc>] klist_next+0x10/0x8d
> > [    0.040993] *pde = 00000000
> > [    0.040993] Oops: 0000 [#2] SMP
> > [    0.040993] last sysfs file:
> > [    0.040993] Modules linked in:
> > [    0.040993]
> > [    0.040993] Pid: 0, comm: swapper Tainted: G      D    (2.6.28-rc7
> > #181) 945P-A
> > [    0.040993] EIP: 0060:[<c13b41dc>] EFLAGS: 00010286 CPU: 0
> > [    0.040993] EIP is at klist_next+0x10/0x8d
> > [    0.040993] EAX: 0000003c EBX: c165dd60 ECX: 00000000 EDX: c165dd60
> > [    0.040993] ESI: c165dd60 EDI: 00000000 EBP: c165dd58 ESP: c165dd48
> > [    0.040993]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> > [    0.040993] Process swapper (pid: 0, ti=c165c000 task=c156e334
> > task.ti=c165c000)
> > [    0.040993] Stack:
> > [    0.040993]  c1026a97 c165dd60 c165dd60 00000000 c165dd74 c11be196
> > 0000003c 00000000
> > [    0.040993]  c13dbde0 00001078 00000100 c165dd84 c114ed26 c114e458
> > c13dbde0 c165dd9c
> > [    0.040993]  c1152546 ffffffff c13dbde0 00002710 0000000b c165ddac
> > c115259a ffffffff
> > [    0.040993] Call Trace:
> > [    0.040993]  [<c1026a97>] ? release_console_sem+0x16c/0x199
> > [    0.040993]  [<c11be196>] ? bus_find_device+0x4e/0x6e
> > [    0.040993]  [<c114ed26>] ? no_pci_devices+0x17/0x2d
> > [    0.040993]  [<c114e458>] ? find_anything+0x0/0xa
> > [    0.040993]  [<c1152546>] ? pci_get_subsys+0x15/0x5b
> > [    0.040993]  [<c115259a>] ? pci_get_device+0xe/0x10
> > [    0.040993]  [<c1013342>] ? mach_reboot_fixups+0x27/0x3c
> > [    0.040993]  [<c100fdfb>] ? native_machine_emergency_restart+0x3e/0xd7
> > [    0.040993]  [<c100fc60>] ? machine_emergency_restart+0x9/0xb
> > [    0.040993]  [<c1032b4b>] ? emergency_restart+0x8/0xa
> > [    0.040993]  [<c13cf607>] ? panic+0xb9/0xd6
> > [    0.040993]  [<c1028ee0>] ? do_exit+0x5b/0x740
> > [    0.040993]  [<c13cf633>] ? printk+0xf/0x11
> > [    0.040993]  [<c10263df>] ? print_oops_end_marker+0x1e/0x23
> > [    0.040993]  [<c13d1c1e>] ? oops_end+0x7f/0x87
> > [    0.040993]  [<c1005a08>] ? die+0x5b/0x63
> > [    0.040993]  [<c13d3061>] ? do_page_fault+0x581/0x66f
> > [    0.040993]  [<c103a537>] ? sched_clock_cpu+0x136/0x142
> > [    0.040993]  [<c103a537>] ? sched_clock_cpu+0x136/0x142
> > [    0.040993]  [<c1039165>] ? ktime_get+0x13/0x2f
> > [    0.040993]  [<c103a562>] ? sched_clock_idle_sleep_event+0xe/0x10
> > [    0.040993]  [<c102a86f>] ? __do_softirq+0x119/0x121
> > [    0.040993]  [<c116e282>] ? acpi_hw_low_level_read+0x3b/0x68
> > [    0.040993]  [<c116e34f>] ? acpi_hw_register_read+0xa0/0x112
> > [    0.040993]  [<c116e4f0>] ? acpi_get_register_unlocked+0x2c/0x48
> > [    0.040993]  [<c1161aa7>] ? acpi_os_release_lock+0x8/0xa
> > [    0.040993]  [<c116e67b>] ? acpi_get_register+0x2d/0x34
> > [    0.040993]  [<c13d2ae0>] ? do_page_fault+0x0/0x66f
> > [    0.040993]  [<c13d13da>] ? error_code+0x72/0x78
> > [    0.040993]  [<c16631e3>] ? kernel_init+0x0/0x148
> > [    0.040993]  [<c13b448b>] ? rest_init+0xf/0x57
> > [    0.040993]  [<c16637af>] ? start_kernel+0x2a2/0x2a7
> > [    0.040993]  [<c1663080>] ? __init_begin+0x80/0x88
> > [    0.040993] Code: 89 4a 04 74 08 8d 41 0c e8 fa 04 d9 ff 5d c3 55
> > 31 c9 89 e5 e8 e0 ff ff ff 5d c3 55 89 e5 57 56 89 c6 5
> > 3 83 ec 04 8b 00 8b 7e 04 <8b> 50 10 89 55 f0 e8 7b cf 01 00 85 ff 74
> > 23 8b 47 04 ba ec 42
> > [    0.040993] EIP: [<c13b41dc>] klist_next+0x10/0x8d SS:ESP 0068:c165dd48
> >
> > I know this is not much to fuzz about since it was artificially
> > induced with the NULL pointer dereference, but what if such an error
> > (a real one) made it into the kernel, it could scroll away the real
> > oops. Anyway -- to reproduce, apply the patch and boot with panic=10
> > (1 also works). Thanks for the attention,
> 
> Reverting:
> 
> commit 70308923d317f2ad4973c30d90bb48ae38761317
> Author: Greg Kroah-Hartman <gregkh@suse.de>
> Date:   Wed Feb 13 22:30:39 2008 -0800
> 
>     PCI: make no_pci_devices() use the pci_bus_type list
> 
>     no_pci_devices() should use the driver core list of PCI devices, not our
>     "separate" one.
> 
>     Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
> 
> fixes it.

Fixes what?  It might be quite difficult to revert that patch now, as
the infrastructure is no longer in place to use a private pci device
list, that code is long gone.

totally confused,

greg k-h

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: v2.6.28-rc7: error in panic code? (NULL pointer dereference at 0000004c)
  2008-12-19 17:39   ` Greg KH
@ 2008-12-19 20:35     ` Pekka Enberg
  2008-12-20  0:46       ` Greg KH
  0 siblings, 1 reply; 15+ messages in thread
From: Pekka Enberg @ 2008-12-19 20:35 UTC (permalink / raw)
  To: Greg KH; +Cc: Vegard Nossum, Linux Kernel Mailing List

Hi Greg,

On Fri, Dec 19, 2008 at 06:19:43PM +0100, Vegard Nossum wrote:
> Fixes what?  It might be quite difficult to revert that patch now, as
> the infrastructure is no longer in place to use a private pci device
> list, that code is long gone.

Vegard forced one oops but got two! The first one is expected and but
the second one shouldn't probably be there:

>> > And now the unexpected part:
>> >
>> > [    0.039999] Rebooting in 10 seconds..<1>BUG: unable to handle
>> > kernel NULL pointer dereference at 0000004c
>> > [    0.040993] IP: [<c13b41dc>] klist_next+0x10/0x8d
>> > [    0.040993] *pde = 00000000
>> > [    0.040993] Oops: 0000 [#2] SMP
>> > [    0.040993] last sysfs file:
>> > [    0.040993] Modules linked in:
>> > [    0.040993]
>> > [    0.040993] Pid: 0, comm: swapper Tainted: G      D    (2.6.28-rc7
>> > #181) 945P-A
>> > [    0.040993] EIP: 0060:[<c13b41dc>] EFLAGS: 00010286 CPU: 0
>> > [    0.040993] EIP is at klist_next+0x10/0x8d
>> > [    0.040993] EAX: 0000003c EBX: c165dd60 ECX: 00000000 EDX: c165dd60
>> > [    0.040993] ESI: c165dd60 EDI: 00000000 EBP: c165dd58 ESP: c165dd48
>> > [    0.040993]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
>> > [    0.040993] Process swapper (pid: 0, ti=c165c000 task=c156e334
>> > task.ti=c165c000)
>> > [    0.040993] Stack:
>> > [    0.040993]  c1026a97 c165dd60 c165dd60 00000000 c165dd74 c11be196
>> > 0000003c 00000000
>> > [    0.040993]  c13dbde0 00001078 00000100 c165dd84 c114ed26 c114e458
>> > c13dbde0 c165dd9c
>> > [    0.040993]  c1152546 ffffffff c13dbde0 00002710 0000000b c165ddac
>> > c115259a ffffffff
>> > [    0.040993] Call Trace:
>> > [    0.040993]  [<c1026a97>] ? release_console_sem+0x16c/0x199
>> > [    0.040993]  [<c11be196>] ? bus_find_device+0x4e/0x6e
>> > [    0.040993]  [<c114ed26>] ? no_pci_devices+0x17/0x2d
>> > [    0.040993]  [<c114e458>] ? find_anything+0x0/0xa
>> > [    0.040993]  [<c1152546>] ? pci_get_subsys+0x15/0x5b
>> > [    0.040993]  [<c115259a>] ? pci_get_device+0xe/0x10
>> > [    0.040993]  [<c1013342>] ? mach_reboot_fixups+0x27/0x3c
>> > [    0.040993]  [<c100fdfb>] ? native_machine_emergency_restart+0x3e/0xd7
>> > [    0.040993]  [<c100fc60>] ? machine_emergency_restart+0x9/0xb
>> > [    0.040993]  [<c1032b4b>] ? emergency_restart+0x8/0xa
>> > [    0.040993]  [<c13cf607>] ? panic+0xb9/0xd6
>> > [    0.040993]  [<c1028ee0>] ? do_exit+0x5b/0x740
>> > [    0.040993]  [<c13cf633>] ? printk+0xf/0x11
>> > [    0.040993]  [<c10263df>] ? print_oops_end_marker+0x1e/0x23
>> > [    0.040993]  [<c13d1c1e>] ? oops_end+0x7f/0x87
>> > [    0.040993]  [<c1005a08>] ? die+0x5b/0x63
>> > [    0.040993]  [<c13d3061>] ? do_page_fault+0x581/0x66f
>> > [    0.040993]  [<c103a537>] ? sched_clock_cpu+0x136/0x142
>> > [    0.040993]  [<c103a537>] ? sched_clock_cpu+0x136/0x142
>> > [    0.040993]  [<c1039165>] ? ktime_get+0x13/0x2f
>> > [    0.040993]  [<c103a562>] ? sched_clock_idle_sleep_event+0xe/0x10
>> > [    0.040993]  [<c102a86f>] ? __do_softirq+0x119/0x121
>> > [    0.040993]  [<c116e282>] ? acpi_hw_low_level_read+0x3b/0x68
>> > [    0.040993]  [<c116e34f>] ? acpi_hw_register_read+0xa0/0x112
>> > [    0.040993]  [<c116e4f0>] ? acpi_get_register_unlocked+0x2c/0x48
>> > [    0.040993]  [<c1161aa7>] ? acpi_os_release_lock+0x8/0xa
>> > [    0.040993]  [<c116e67b>] ? acpi_get_register+0x2d/0x34
>> > [    0.040993]  [<c13d2ae0>] ? do_page_fault+0x0/0x66f
>> > [    0.040993]  [<c13d13da>] ? error_code+0x72/0x78
>> > [    0.040993]  [<c16631e3>] ? kernel_init+0x0/0x148
>> > [    0.040993]  [<c13b448b>] ? rest_init+0xf/0x57
>> > [    0.040993]  [<c16637af>] ? start_kernel+0x2a2/0x2a7
>> > [    0.040993]  [<c1663080>] ? __init_begin+0x80/0x88
>> > [    0.040993] Code: 89 4a 04 74 08 8d 41 0c e8 fa 04 d9 ff 5d c3 55
>> > 31 c9 89 e5 e8 e0 ff ff ff 5d c3 55 89 e5 57 56 89 c6 5
>> > 3 83 ec 04 8b 00 8b 7e 04 <8b> 50 10 89 55 f0 e8 7b cf 01 00 85 ff 74
>> > 23 8b 47 04 ba ec 42
>> > [    0.040993] EIP: [<c13b41dc>] klist_next+0x10/0x8d SS:ESP 0068:c165dd48

Looks like the patch Vegard identified breaks something in the oops path?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: v2.6.28-rc7: error in panic code? (NULL pointer dereference at 0000004c)
  2008-12-19 20:35     ` Pekka Enberg
@ 2008-12-20  0:46       ` Greg KH
  2008-12-20  8:52         ` Vegard Nossum
  0 siblings, 1 reply; 15+ messages in thread
From: Greg KH @ 2008-12-20  0:46 UTC (permalink / raw)
  To: Pekka Enberg; +Cc: Vegard Nossum, Linux Kernel Mailing List

On Fri, Dec 19, 2008 at 10:35:57PM +0200, Pekka Enberg wrote:
> Hi Greg,
> 
> On Fri, Dec 19, 2008 at 06:19:43PM +0100, Vegard Nossum wrote:
> > Fixes what?  It might be quite difficult to revert that patch now, as
> > the infrastructure is no longer in place to use a private pci device
> > list, that code is long gone.
> 
> Vegard forced one oops but got two! The first one is expected and but
> the second one shouldn't probably be there:

"Second" oopses are known to not be reliable, I wouldn't count it as a
real problem unless it happens on its own.

> >> > [    0.040993] EIP: [<c13b41dc>] klist_next+0x10/0x8d SS:ESP 0068:c165dd48
> 
> Looks like the patch Vegard identified breaks something in the oops path?

Very wierd, I also don't understand how reverting the specific patch
would even make a buildable system.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: v2.6.28-rc7: error in panic code? (NULL pointer dereference at 0000004c)
  2008-12-20  0:46       ` Greg KH
@ 2008-12-20  8:52         ` Vegard Nossum
  2008-12-20  8:58           ` Greg KH
  0 siblings, 1 reply; 15+ messages in thread
From: Vegard Nossum @ 2008-12-20  8:52 UTC (permalink / raw)
  To: Greg KH; +Cc: Pekka Enberg, Linux Kernel Mailing List

On Sat, Dec 20, 2008 at 1:46 AM, Greg KH <gregkh@suse.de> wrote:
> On Fri, Dec 19, 2008 at 10:35:57PM +0200, Pekka Enberg wrote:
>> Hi Greg,
>>
>> On Fri, Dec 19, 2008 at 06:19:43PM +0100, Vegard Nossum wrote:
>> > Fixes what?  It might be quite difficult to revert that patch now, as
>> > the infrastructure is no longer in place to use a private pci device
>> > list, that code is long gone.
>>
>> Vegard forced one oops but got two! The first one is expected and but
>> the second one shouldn't probably be there:
>
> "Second" oopses are known to not be reliable, I wouldn't count it as a
> real problem unless it happens on its own.

Yes, because usually it's a process that BUGed and was killed --
perhaps with locks held or in the middle of some transaction that will
never complete. But this one happens in the panic code itself...

>
>> >> > [    0.040993] EIP: [<c13b41dc>] klist_next+0x10/0x8d SS:ESP 0068:c165dd48
>>
>> Looks like the patch Vegard identified breaks something in the oops path?
>
> Very wierd, I also don't understand how reverting the specific patch
> would even make a buildable system.

It was an unclean revert, here's the relevant resultant hunk:

diff --cc drivers/pci/probe.c
index 003a9b3,2db2e4b..0000000
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@@ -32,16 -29,55 +28,11 @@@ LIST_HEAD(pci_devices)
   */
  int no_pci_devices(void)
  {
-       struct device *dev;
-       int no_devices;
-
-       dev = bus_find_device(&pci_bus_type, NULL, NULL, find_anything);
-       no_devices = (dev == NULL);
-       put_device(dev);
-       return no_devices;
+       return list_empty(&pci_devices);
  }
+
  EXPORT_SYMBOL(no_pci_devices);


...and this builds.

The problem seems to be that pci_bus_type.p->klist_devices is NULL. Because:

Oops happens here:

struct klist_node *klist_next(struct klist_iter *i)
{
        void (*put)(struct klist_node *) = i->i_klist->put;

So either i == NULL or i->i_klist == NULL. But i->i_klist was just
before set here:

void klist_iter_init_node(struct klist *k, struct klist_iter *i,
                          struct klist_node *n)
{
        i->i_klist = k;
        i->i_cur = n;
        if (n)
                kref_get(&n->n_ref);
}

so the klist passed must have been NULL, it came from bus_find_device():

        klist_iter_init_node(&bus->p->klist_devices, &i,
                             (start ? &start->knode_bus : NULL));

...and indeed, printing bus->p here yields 00000000. This function was
called from no_pci_devices(), so the bus variable was initialized from
&pci_bus_type. So pci_bus_type.p == NULL.

This should be initialized in bus_register() called from
pci_driver_init(). Aha, this never gets called because initcalls did
not yet run.

A summary of the bug:

1. Sending panic=1 wants to reboot on panic.
2. If panic occurs before initcalls ran, pci_bus_type.p is not initialized.
3. mach_reboot_fixups() in x86 code calls pci_get_device()
4. New oops

Maybe mach_reboot_fixups() should check to see if PCI bus is
initialized before calling pci_get_device(), since obviously it can be
called before it has been initialized too.

The funny thing is that no_pci_devices() is what _used_ to guard
against using pci_bus_type too early:

/*
 * Some device drivers need know if pci is initiated.
 * Basically, we think pci is not initiated when there
 * is no device to be found on the pci_bus_type.
 */
int no_pci_devices(void)

...and now it uses pci_bus_type itself. That is what makes commit
70308923d317f2ad4973c30d90bb48ae38761317 wrong, because there might be
other users of no_pci_devices() too, which would now almost certainly
result in an Oops if the pci bus hasn't been initialized.

Please tell if any of the above is unclear, and I will try to explain
more. Thanks,


Vegard

-- 
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
	-- E. W. Dijkstra, EWD1036

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: v2.6.28-rc7: error in panic code? (NULL pointer dereference at 0000004c)
  2008-12-20  8:52         ` Vegard Nossum
@ 2008-12-20  8:58           ` Greg KH
  2008-12-20 10:56             ` [PATCH] pci: fix no_pci_devices() Vegard Nossum
  0 siblings, 1 reply; 15+ messages in thread
From: Greg KH @ 2008-12-20  8:58 UTC (permalink / raw)
  To: Vegard Nossum; +Cc: Pekka Enberg, Linux Kernel Mailing List

On Sat, Dec 20, 2008 at 09:52:10AM +0100, Vegard Nossum wrote:
> On Sat, Dec 20, 2008 at 1:46 AM, Greg KH <gregkh@suse.de> wrote:
> > On Fri, Dec 19, 2008 at 10:35:57PM +0200, Pekka Enberg wrote:
> >> Hi Greg,
> >>
> >> On Fri, Dec 19, 2008 at 06:19:43PM +0100, Vegard Nossum wrote:
> >> > Fixes what?  It might be quite difficult to revert that patch now, as
> >> > the infrastructure is no longer in place to use a private pci device
> >> > list, that code is long gone.
> >>
> >> Vegard forced one oops but got two! The first one is expected and but
> >> the second one shouldn't probably be there:
> >
> > "Second" oopses are known to not be reliable, I wouldn't count it as a
> > real problem unless it happens on its own.
> 
> Yes, because usually it's a process that BUGed and was killed --
> perhaps with locks held or in the middle of some transaction that will
> never complete. But this one happens in the panic code itself...
> 
> >
> >> >> > [    0.040993] EIP: [<c13b41dc>] klist_next+0x10/0x8d SS:ESP 0068:c165dd48
> >>
> >> Looks like the patch Vegard identified breaks something in the oops path?
> >
> > Very wierd, I also don't understand how reverting the specific patch
> > would even make a buildable system.
> 
> It was an unclean revert, here's the relevant resultant hunk:
> 
> diff --cc drivers/pci/probe.c
> index 003a9b3,2db2e4b..0000000
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@@ -32,16 -29,55 +28,11 @@@ LIST_HEAD(pci_devices)
>    */
>   int no_pci_devices(void)
>   {
> -       struct device *dev;
> -       int no_devices;
> -
> -       dev = bus_find_device(&pci_bus_type, NULL, NULL, find_anything);
> -       no_devices = (dev == NULL);
> -       put_device(dev);
> -       return no_devices;
> +       return list_empty(&pci_devices);
>   }
> +
>   EXPORT_SYMBOL(no_pci_devices);
> 
> 
> ...and this builds.
> 
> The problem seems to be that pci_bus_type.p->klist_devices is NULL. Because:
> 
> Oops happens here:
> 
> struct klist_node *klist_next(struct klist_iter *i)
> {
>         void (*put)(struct klist_node *) = i->i_klist->put;
> 
> So either i == NULL or i->i_klist == NULL. But i->i_klist was just
> before set here:
> 
> void klist_iter_init_node(struct klist *k, struct klist_iter *i,
>                           struct klist_node *n)
> {
>         i->i_klist = k;
>         i->i_cur = n;
>         if (n)
>                 kref_get(&n->n_ref);
> }
> 
> so the klist passed must have been NULL, it came from bus_find_device():
> 
>         klist_iter_init_node(&bus->p->klist_devices, &i,
>                              (start ? &start->knode_bus : NULL));
> 
> ...and indeed, printing bus->p here yields 00000000. This function was
> called from no_pci_devices(), so the bus variable was initialized from
> &pci_bus_type. So pci_bus_type.p == NULL.
> 
> This should be initialized in bus_register() called from
> pci_driver_init(). Aha, this never gets called because initcalls did
> not yet run.
> 
> A summary of the bug:
> 
> 1. Sending panic=1 wants to reboot on panic.
> 2. If panic occurs before initcalls ran, pci_bus_type.p is not initialized.
> 3. mach_reboot_fixups() in x86 code calls pci_get_device()
> 4. New oops
> 
> Maybe mach_reboot_fixups() should check to see if PCI bus is
> initialized before calling pci_get_device(), since obviously it can be
> called before it has been initialized too.
> 
> The funny thing is that no_pci_devices() is what _used_ to guard
> against using pci_bus_type too early:
> 
> /*
>  * Some device drivers need know if pci is initiated.
>  * Basically, we think pci is not initiated when there
>  * is no device to be found on the pci_bus_type.
>  */
> int no_pci_devices(void)
> 
> ...and now it uses pci_bus_type itself. That is what makes commit
> 70308923d317f2ad4973c30d90bb48ae38761317 wrong, because there might be
> other users of no_pci_devices() too, which would now almost certainly
> result in an Oops if the pci bus hasn't been initialized.
> 
> Please tell if any of the above is unclear, and I will try to explain
> more. Thanks,

No, that makes perfect sense.

Care to make a patch for no_pci_devices() to work properly in this kind
of situation?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH] pci: fix no_pci_devices()
  2008-12-20  8:58           ` Greg KH
@ 2008-12-20 10:56             ` Vegard Nossum
  2008-12-20 11:14               ` [PATCH] pci: fix no_pci_devices() #2 Vegard Nossum
  0 siblings, 1 reply; 15+ messages in thread
From: Vegard Nossum @ 2008-12-20 10:56 UTC (permalink / raw)
  To: Greg KH, Jesse Barnes; +Cc: Pekka Enberg, Linux Kernel Mailing List

On Sat, Dec 20, 2008 at 9:58 AM, Greg KH <gregkh@suse.de> wrote:
> Care to make a patch for no_pci_devices() to work properly in this kind
> of situation?

How does this look?

I have introduced a variable pci_is_initiated, which is set after the bus
has been registered.

Should the variable be atomic? My assumption is that initcalls are
synchronous and that the variable therefore cannot be accessed concurrently
by two or more CPUs.


Vegard


>From 409132338c9a6122dc7da08d2764492dbf351f32 Mon Sep 17 00:00:00 2001
From: Vegard Nossum <vegard.nossum@gmail.com>
Date: Sat, 20 Dec 2008 11:37:16 +0100
Subject: [PATCH] pci: fix no_pci_devices()

In short, no_pci_devices() should not use bus_find_device() before
initcalls have run, because the pci bus structure has not been
initialized yet.

Reference: http://lkml.org/lkml/2008/12/20/21

Cc: Greg KH <gregkh@suse.de>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
---
 drivers/pci/pci-driver.c |   10 +++++++++-
 drivers/pci/probe.c      |    3 +++
 include/linux/pci.h      |    1 +
 3 files changed, 13 insertions(+), 1 deletions(-)

diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index b4cdd69..9e0215e 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -825,6 +825,8 @@ int pci_uevent(struct device *dev, struct kobj_uevent_env *env)
 }
 #endif
 
+int pci_is_initiated __read_mostly;
+
 struct bus_type pci_bus_type = {
 	.name		= "pci",
 	.match		= pci_bus_match,
@@ -838,7 +840,13 @@ struct bus_type pci_bus_type = {
 
 static int __init pci_driver_init(void)
 {
-	return bus_register(&pci_bus_type);
+	int ret;
+
+	ret = bus_register(&pci_bus_type);
+	if (ret == 0)
+		pci_is_initiated = 1;
+
+	return ret;
 }
 
 postcore_initcall(pci_driver_init);
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 003a9b3..1320a81 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -35,6 +35,9 @@ int no_pci_devices(void)
 	struct device *dev;
 	int no_devices;
 
+	if (!pci_is_initiated)
+		return 1;
+
 	dev = bus_find_device(&pci_bus_type, NULL, NULL, find_anything);
 	no_devices = (dev == NULL);
 	put_device(dev);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index feb4657..9e1b30f 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -484,6 +484,7 @@ extern struct bus_type pci_bus_type;
  * code, or pci core code. */
 extern struct list_head pci_root_buses;	/* list of all known PCI buses */
 /* Some device drivers need know if pci is initiated */
+extern int pci_is_initiated;
 extern int no_pci_devices(void);
 
 void pcibios_fixup_bus(struct pci_bus *);
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH] pci: fix no_pci_devices() #2
  2008-12-20 10:56             ` [PATCH] pci: fix no_pci_devices() Vegard Nossum
@ 2008-12-20 11:14               ` Vegard Nossum
  2008-12-20 23:31                 ` Greg KH
  2009-01-05 19:09                 ` Jesse Barnes
  0 siblings, 2 replies; 15+ messages in thread
From: Vegard Nossum @ 2008-12-20 11:14 UTC (permalink / raw)
  To: Greg KH, Jesse Barnes; +Cc: Pekka Enberg, Linux Kernel Mailing List

On Sat, Dec 20, 2008 at 11:56 AM, Vegard Nossum <vegard.nossum@gmail.com> wrote:
> On Sat, Dec 20, 2008 at 9:58 AM, Greg KH <gregkh@suse.de> wrote:
>> Care to make a patch for no_pci_devices() to work properly in this kind
>> of situation?
>
> How does this look?
>
> I have introduced a variable pci_is_initiated, which is set after the bus
> has been registered.

This patch is simpler and also works for me. But I am not too fond of it
either...


Vegard


>From 1f047c86fc7a831d85174452da92344a3582a158 Mon Sep 17 00:00:00 2001
From: Vegard Nossum <vegard.nossum@gmail.com>
Date: Sat, 20 Dec 2008 12:08:18 +0100
Subject: [PATCH] pci: fix no_pci_devices() #2

In short, no_pci_devices() should not use bus_find_device() before
initcalls have run, because the pci bus structure has not been
initialized yet.

Reference: http://lkml.org/lkml/2008/12/20/21

Cc: Greg KH <gregkh@suse.de>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
---
 drivers/base/bus.c  |    1 +
 drivers/pci/probe.c |    3 +++
 2 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/drivers/base/bus.c b/drivers/base/bus.c
index 5aee1c0..5e83faf 100644
--- a/drivers/base/bus.c
+++ b/drivers/base/bus.c
@@ -931,6 +931,7 @@ bus_devices_fail:
 bus_uevent_fail:
 	kset_unregister(&bus->p->subsys);
 	kfree(bus->p);
+	bus->p = NULL;
 out:
 	return retval;
 }
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 003a9b3..d561be7 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -35,6 +35,9 @@ int no_pci_devices(void)
 	struct device *dev;
 	int no_devices;
 
+	if (!pci_bus_type.p)
+		return 1;
+
 	dev = bus_find_device(&pci_bus_type, NULL, NULL, find_anything);
 	no_devices = (dev == NULL);
 	put_device(dev);
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] pci: fix no_pci_devices() #2
  2008-12-20 11:14               ` [PATCH] pci: fix no_pci_devices() #2 Vegard Nossum
@ 2008-12-20 23:31                 ` Greg KH
  2009-01-05 19:09                 ` Jesse Barnes
  1 sibling, 0 replies; 15+ messages in thread
From: Greg KH @ 2008-12-20 23:31 UTC (permalink / raw)
  To: Vegard Nossum; +Cc: Jesse Barnes, Pekka Enberg, Linux Kernel Mailing List

On Sat, Dec 20, 2008 at 12:14:19PM +0100, Vegard Nossum wrote:
> On Sat, Dec 20, 2008 at 11:56 AM, Vegard Nossum <vegard.nossum@gmail.com> wrote:
> > On Sat, Dec 20, 2008 at 9:58 AM, Greg KH <gregkh@suse.de> wrote:
> >> Care to make a patch for no_pci_devices() to work properly in this kind
> >> of situation?
> >
> > How does this look?
> >
> > I have introduced a variable pci_is_initiated, which is set after the bus
> > has been registered.
> 
> This patch is simpler and also works for me. But I am not too fond of it
> either...
> 
> 
> Vegard
> 
> 
> >From 1f047c86fc7a831d85174452da92344a3582a158 Mon Sep 17 00:00:00 2001
> From: Vegard Nossum <vegard.nossum@gmail.com>
> Date: Sat, 20 Dec 2008 12:08:18 +0100
> Subject: [PATCH] pci: fix no_pci_devices() #2
> 
> In short, no_pci_devices() should not use bus_find_device() before
> initcalls have run, because the pci bus structure has not been
> initialized yet.
> 
> Reference: http://lkml.org/lkml/2008/12/20/21
> 
> Cc: Greg KH <gregkh@suse.de>
> Cc: Pekka Enberg <penberg@cs.helsinki.fi>
> Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
> ---
>  drivers/base/bus.c  |    1 +
>  drivers/pci/probe.c |    3 +++
>  2 files changed, 4 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/base/bus.c b/drivers/base/bus.c
> index 5aee1c0..5e83faf 100644
> --- a/drivers/base/bus.c
> +++ b/drivers/base/bus.c
> @@ -931,6 +931,7 @@ bus_devices_fail:
>  bus_uevent_fail:
>  	kset_unregister(&bus->p->subsys);
>  	kfree(bus->p);
> +	bus->p = NULL;
>  out:
>  	return retval;

I like this portion anyway, care to break this out into a separate
patch and send it to me?

The first one also looks good to me, Jesse, feel free to add my:
	Acked-by: Greg Kroah-Hartman <gregkh@suse.de>

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] pci: fix no_pci_devices() #2
  2008-12-20 11:14               ` [PATCH] pci: fix no_pci_devices() #2 Vegard Nossum
  2008-12-20 23:31                 ` Greg KH
@ 2009-01-05 19:09                 ` Jesse Barnes
  2009-01-07  0:42                   ` Greg KH
  1 sibling, 1 reply; 15+ messages in thread
From: Jesse Barnes @ 2009-01-05 19:09 UTC (permalink / raw)
  To: Vegard Nossum; +Cc: Greg KH, Pekka Enberg, Linux Kernel Mailing List

On Saturday, December 20, 2008 3:14 am Vegard Nossum wrote:
> On Sat, Dec 20, 2008 at 11:56 AM, Vegard Nossum <vegard.nossum@gmail.com> 
wrote:
> > On Sat, Dec 20, 2008 at 9:58 AM, Greg KH <gregkh@suse.de> wrote:
> >> Care to make a patch for no_pci_devices() to work properly in this kind
> >> of situation?
> >
> > How does this look?
> >
> > I have introduced a variable pci_is_initiated, which is set after the bus
> > has been registered.
>
> This patch is simpler and also works for me. But I am not too fond of it
> either...
>
>
> Vegard
>
>
> From 1f047c86fc7a831d85174452da92344a3582a158 Mon Sep 17 00:00:00 2001
> From: Vegard Nossum <vegard.nossum@gmail.com>
> Date: Sat, 20 Dec 2008 12:08:18 +0100
> Subject: [PATCH] pci: fix no_pci_devices() #2
>
> In short, no_pci_devices() should not use bus_find_device() before
> initcalls have run, because the pci bus structure has not been
> initialized yet.
>
> Reference: http://lkml.org/lkml/2008/12/20/21
>
> Cc: Greg KH <gregkh@suse.de>
> Cc: Pekka Enberg <penberg@cs.helsinki.fi>
> Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
> ---
>  drivers/base/bus.c  |    1 +
>  drivers/pci/probe.c |    3 +++
>  2 files changed, 4 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/base/bus.c b/drivers/base/bus.c
> index 5aee1c0..5e83faf 100644
> --- a/drivers/base/bus.c
> +++ b/drivers/base/bus.c
> @@ -931,6 +931,7 @@ bus_devices_fail:
>  bus_uevent_fail:
>  	kset_unregister(&bus->p->subsys);
>  	kfree(bus->p);
> +	bus->p = NULL;
>  out:
>  	return retval;
>  }
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 003a9b3..d561be7 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -35,6 +35,9 @@ int no_pci_devices(void)
>  	struct device *dev;
>  	int no_devices;
>
> +	if (!pci_bus_type.p)
> +		return 1;
> +
>  	dev = bus_find_device(&pci_bus_type, NULL, NULL, find_anything);
>  	no_devices = (dev == NULL);
>  	put_device(dev);

Assuming Greg already took the generic part, can you resend the PCI part to 
the linux-pci@vger.kernel.org list for review just in case anyone has a better 
idea of how to do it?

Thanks,
-- 
Jesse Barnes, Intel Open Source Technology Center


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] pci: fix no_pci_devices() #2
  2009-01-05 19:09                 ` Jesse Barnes
@ 2009-01-07  0:42                   ` Greg KH
  2009-01-16 18:41                     ` Jesse Barnes
  0 siblings, 1 reply; 15+ messages in thread
From: Greg KH @ 2009-01-07  0:42 UTC (permalink / raw)
  To: Jesse Barnes; +Cc: Vegard Nossum, Pekka Enberg, Linux Kernel Mailing List

On Mon, Jan 05, 2009 at 11:09:43AM -0800, Jesse Barnes wrote:
> On Saturday, December 20, 2008 3:14 am Vegard Nossum wrote:
> > On Sat, Dec 20, 2008 at 11:56 AM, Vegard Nossum <vegard.nossum@gmail.com> 
> wrote:
> > > On Sat, Dec 20, 2008 at 9:58 AM, Greg KH <gregkh@suse.de> wrote:
> > >> Care to make a patch for no_pci_devices() to work properly in this kind
> > >> of situation?
> > >
> > > How does this look?
> > >
> > > I have introduced a variable pci_is_initiated, which is set after the bus
> > > has been registered.
> >
> > This patch is simpler and also works for me. But I am not too fond of it
> > either...
> >
> >
> > Vegard
> >
> >
> > From 1f047c86fc7a831d85174452da92344a3582a158 Mon Sep 17 00:00:00 2001
> > From: Vegard Nossum <vegard.nossum@gmail.com>
> > Date: Sat, 20 Dec 2008 12:08:18 +0100
> > Subject: [PATCH] pci: fix no_pci_devices() #2
> >
> > In short, no_pci_devices() should not use bus_find_device() before
> > initcalls have run, because the pci bus structure has not been
> > initialized yet.
> >
> > Reference: http://lkml.org/lkml/2008/12/20/21
> >
> > Cc: Greg KH <gregkh@suse.de>
> > Cc: Pekka Enberg <penberg@cs.helsinki.fi>
> > Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
> > ---
> >  drivers/base/bus.c  |    1 +
> >  drivers/pci/probe.c |    3 +++
> >  2 files changed, 4 insertions(+), 0 deletions(-)
> >
> > diff --git a/drivers/base/bus.c b/drivers/base/bus.c
> > index 5aee1c0..5e83faf 100644
> > --- a/drivers/base/bus.c
> > +++ b/drivers/base/bus.c
> > @@ -931,6 +931,7 @@ bus_devices_fail:
> >  bus_uevent_fail:
> >  	kset_unregister(&bus->p->subsys);
> >  	kfree(bus->p);
> > +	bus->p = NULL;
> >  out:
> >  	return retval;
> >  }
> > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> > index 003a9b3..d561be7 100644
> > --- a/drivers/pci/probe.c
> > +++ b/drivers/pci/probe.c
> > @@ -35,6 +35,9 @@ int no_pci_devices(void)
> >  	struct device *dev;
> >  	int no_devices;
> >
> > +	if (!pci_bus_type.p)
> > +		return 1;
> > +
> >  	dev = bus_find_device(&pci_bus_type, NULL, NULL, find_anything);
> >  	no_devices = (dev == NULL);
> >  	put_device(dev);
> 
> Assuming Greg already took the generic part, can you resend the PCI part to 
> the linux-pci@vger.kernel.org list for review just in case anyone has a better 
> idea of how to do it?

Did I take the generic part?  I can't remember...

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] pci: fix no_pci_devices() #2
  2009-01-07  0:42                   ` Greg KH
@ 2009-01-16 18:41                     ` Jesse Barnes
  2009-01-16 19:21                       ` Vegard Nossum
  0 siblings, 1 reply; 15+ messages in thread
From: Jesse Barnes @ 2009-01-16 18:41 UTC (permalink / raw)
  To: Greg KH; +Cc: Vegard Nossum, Pekka Enberg, Linux Kernel Mailing List

On Tuesday, January 6, 2009 4:42 pm Greg KH wrote:
> On Mon, Jan 05, 2009 at 11:09:43AM -0800, Jesse Barnes wrote:
> > On Saturday, December 20, 2008 3:14 am Vegard Nossum wrote:
> > > On Sat, Dec 20, 2008 at 11:56 AM, Vegard Nossum
> > > <vegard.nossum@gmail.com>
> >
> > wrote:
> > > > On Sat, Dec 20, 2008 at 9:58 AM, Greg KH <gregkh@suse.de> wrote:
> > > >> Care to make a patch for no_pci_devices() to work properly in this
> > > >> kind of situation?
> > > >
> > > > How does this look?
> > > >
> > > > I have introduced a variable pci_is_initiated, which is set after the
> > > > bus has been registered.
> > >
> > > This patch is simpler and also works for me. But I am not too fond of
> > > it either...
> > >
> > >
> > > Vegard
> > >
> > >
> > > From 1f047c86fc7a831d85174452da92344a3582a158 Mon Sep 17 00:00:00 2001
> > > From: Vegard Nossum <vegard.nossum@gmail.com>
> > > Date: Sat, 20 Dec 2008 12:08:18 +0100
> > > Subject: [PATCH] pci: fix no_pci_devices() #2
> > >
> > > In short, no_pci_devices() should not use bus_find_device() before
> > > initcalls have run, because the pci bus structure has not been
> > > initialized yet.
> > >
> > > Reference: http://lkml.org/lkml/2008/12/20/21
> > >
> > > Cc: Greg KH <gregkh@suse.de>
> > > Cc: Pekka Enberg <penberg@cs.helsinki.fi>
> > > Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
> > > ---
> > >  drivers/base/bus.c  |    1 +
> > >  drivers/pci/probe.c |    3 +++
> > >  2 files changed, 4 insertions(+), 0 deletions(-)
> > >
> > > diff --git a/drivers/base/bus.c b/drivers/base/bus.c
> > > index 5aee1c0..5e83faf 100644
> > > --- a/drivers/base/bus.c
> > > +++ b/drivers/base/bus.c
> > > @@ -931,6 +931,7 @@ bus_devices_fail:
> > >  bus_uevent_fail:
> > >  	kset_unregister(&bus->p->subsys);
> > >  	kfree(bus->p);
> > > +	bus->p = NULL;
> > >  out:
> > >  	return retval;
> > >  }
> > > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> > > index 003a9b3..d561be7 100644
> > > --- a/drivers/pci/probe.c
> > > +++ b/drivers/pci/probe.c
> > > @@ -35,6 +35,9 @@ int no_pci_devices(void)
> > >  	struct device *dev;
> > >  	int no_devices;
> > >
> > > +	if (!pci_bus_type.p)
> > > +		return 1;
> > > +
> > >  	dev = bus_find_device(&pci_bus_type, NULL, NULL, find_anything);
> > >  	no_devices = (dev == NULL);
> > >  	put_device(dev);
> >
> > Assuming Greg already took the generic part, can you resend the PCI part
> > to the linux-pci@vger.kernel.org list for review just in case anyone has
> > a better idea of how to do it?
>
> Did I take the generic part?  I can't remember...

Doesn't look like it.  Vergard can you send out an updated patch set?

Thanks,
-- 
Jesse Barnes, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] pci: fix no_pci_devices() #2
  2009-01-16 18:41                     ` Jesse Barnes
@ 2009-01-16 19:21                       ` Vegard Nossum
  2009-01-27 18:41                         ` Jesse Barnes
  0 siblings, 1 reply; 15+ messages in thread
From: Vegard Nossum @ 2009-01-16 19:21 UTC (permalink / raw)
  To: Jesse Barnes; +Cc: Greg KH, Pekka Enberg, Linux Kernel Mailing List

On Fri, Jan 16, 2009 at 7:41 PM, Jesse Barnes <jbarnes@virtuousgeek.org> wrote:
>> > Assuming Greg already took the generic part, can you resend the PCI part
>> > to the linux-pci@vger.kernel.org list for review just in case anyone has
>> > a better idea of how to do it?
>>
>> Did I take the generic part?  I can't remember...
>
> Doesn't look like it.  Vergard can you send out an updated patch set?

Actually, my patch is still just a hack, since pci_bus_type.p is still
set before the pci_bus_type is really usable. So if the kernel crashes
(or, in general, no_pci_devices() is called) at some point between the
pci_bus_type.p = <something> and pci_bus_type.p = NULL (which I
inserted), we will still see the same type of fault.

So I would prefer to solve this in a different way, like a dedicated
flag which is only set after we know that pci_bus_type initialisation
succeeded. I think that was the approach of my first patch? I don't
remember. In any case, such a patch could not be split in generic/pci
parts, I think. Also, should we anticipate concurrent access to
pci_bus_type.p or such a dedicated "no_pci_devices" flag?

Here is the first patch, but I wonder if it should be turned into
atomic_t instead: http://lkml.org/lkml/2008/12/20/33


Vegard

-- 
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
	-- E. W. Dijkstra, EWD1036

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] pci: fix no_pci_devices() #2
  2009-01-16 19:21                       ` Vegard Nossum
@ 2009-01-27 18:41                         ` Jesse Barnes
  0 siblings, 0 replies; 15+ messages in thread
From: Jesse Barnes @ 2009-01-27 18:41 UTC (permalink / raw)
  To: Vegard Nossum; +Cc: Greg KH, Pekka Enberg, Linux Kernel Mailing List

On Friday, January 16, 2009 11:21 am Vegard Nossum wrote:
> On Fri, Jan 16, 2009 at 7:41 PM, Jesse Barnes <jbarnes@virtuousgeek.org> 
wrote:
> >> > Assuming Greg already took the generic part, can you resend the PCI
> >> > part to the linux-pci@vger.kernel.org list for review just in case
> >> > anyone has a better idea of how to do it?
> >>
> >> Did I take the generic part?  I can't remember...
> >
> > Doesn't look like it.  Vergard can you send out an updated patch set?
>
> Actually, my patch is still just a hack, since pci_bus_type.p is still
> set before the pci_bus_type is really usable. So if the kernel crashes
> (or, in general, no_pci_devices() is called) at some point between the
> pci_bus_type.p = <something> and pci_bus_type.p = NULL (which I
> inserted), we will still see the same type of fault.
>
> So I would prefer to solve this in a different way, like a dedicated
> flag which is only set after we know that pci_bus_type initialisation
> succeeded. I think that was the approach of my first patch? I don't
> remember. In any case, such a patch could not be split in generic/pci
> parts, I think. Also, should we anticipate concurrent access to
> pci_bus_type.p or such a dedicated "no_pci_devices" flag?
>
> Here is the first patch, but I wonder if it should be turned into
> atomic_t instead: http://lkml.org/lkml/2008/12/20/33

It seems like you could do this with a driver core call?  Isn't there a way to 
check whether a given bus type is registered?  If so, we could just use that 
from no_pci_devices instead of a new flag.

-- 
Jesse Barnes, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2009-01-27 18:42 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-12-18 22:06 v2.6.28-rc7: error in panic code? (NULL pointer dereference at 0000004c) Vegard Nossum
2008-12-19 17:19 ` Vegard Nossum
2008-12-19 17:39   ` Greg KH
2008-12-19 20:35     ` Pekka Enberg
2008-12-20  0:46       ` Greg KH
2008-12-20  8:52         ` Vegard Nossum
2008-12-20  8:58           ` Greg KH
2008-12-20 10:56             ` [PATCH] pci: fix no_pci_devices() Vegard Nossum
2008-12-20 11:14               ` [PATCH] pci: fix no_pci_devices() #2 Vegard Nossum
2008-12-20 23:31                 ` Greg KH
2009-01-05 19:09                 ` Jesse Barnes
2009-01-07  0:42                   ` Greg KH
2009-01-16 18:41                     ` Jesse Barnes
2009-01-16 19:21                       ` Vegard Nossum
2009-01-27 18:41                         ` Jesse Barnes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).