All of lore.kernel.org
 help / color / mirror / Atom feed
* 584c5c422f6c ("x86/ioapic: Support hot-removal of IOAPICs present during boot")
@ 2016-09-09 22:07 Borislav Petkov
  2016-09-10  0:14 ` Rui Wang
  0 siblings, 1 reply; 7+ messages in thread
From: Borislav Petkov @ 2016-09-09 22:07 UTC (permalink / raw)
  To: Rui Wang
  Cc: Bjorn Helgaas, Linus Torvalds, Peter Zijlstra, Thomas Gleixner,
	helgaas, linux-acpi, linux-pci, rjw, tony.luck, Ingo Molnar,
	x86-ml, lkml

Hi,

commit in $Subject from tip breaks booting on my SNB test box,
it freezes somewhere very early during boot. When I revert the
aforementioned commit ontop of Linus' master + tip/master from today, it
boots fine.

Additional info is available upon request.

Thanks.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* 584c5c422f6c ("x86/ioapic: Support hot-removal of IOAPICs present during boot")
  2016-09-09 22:07 584c5c422f6c ("x86/ioapic: Support hot-removal of IOAPICs present during boot") Borislav Petkov
@ 2016-09-10  0:14 ` Rui Wang
  2016-09-10 11:27   ` Borislav Petkov
  0 siblings, 1 reply; 7+ messages in thread
From: Rui Wang @ 2016-09-10  0:14 UTC (permalink / raw)
  To: bp
  Cc: bhelgaas, torvalds, peterz, tglx, helgaas, linux-acpi, linux-pci,
	rjw, tony.luck, mingo, x86, linux-kernel, Rui Wang

On Saturday, September 10, 2016 6:07 AM, Borislav Petkov wrote:
> 
> Hi,
> 
> commit in $Subject from tip breaks booting on my SNB test box, it freezes
> somewhere very early during boot. When I revert the aforementioned
> commit ontop of Linus' master + tip/master from today, it boots fine.
> 
> Additional info is available upon request.
> 

Hi Boris,

Is it a Romley-EP ? What kind of SNB box is it? I'll try to find one
to reproduce the problem.

Thanks
Rui

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 584c5c422f6c ("x86/ioapic: Support hot-removal of IOAPICs present during boot")
  2016-09-10  0:14 ` Rui Wang
@ 2016-09-10 11:27   ` Borislav Petkov
  2016-09-10 13:11     ` Rui Wang
  0 siblings, 1 reply; 7+ messages in thread
From: Borislav Petkov @ 2016-09-10 11:27 UTC (permalink / raw)
  To: Rui Wang
  Cc: bhelgaas, torvalds, peterz, tglx, helgaas, linux-acpi, linux-pci,
	rjw, tony.luck, mingo, x86, linux-kernel

On Sat, Sep 10, 2016 at 08:14:04AM +0800, Rui Wang wrote:
> Is it a Romley-EP ? What kind of SNB box is it? I'll try to find one
> to reproduce the problem.

You don't have to - I went and analyzed it for you but it is ACPI
crap and I can't be bothered to think of a solution. It looks like
a chicken-and-an-egg issue where you call acpi_ioapic_add() on the
root_bus and something's not initialized there in the chain of parent
and children nodes.

But I could very well be wrong so let me dump the details, I'll let
someone else make sense of it all. Provided I've not made a mistake
somewhere, which is not uncommon.

So the splat is:

[    4.396872] pci_bus 0000:3f: resource 5 [mem 0x00000000-0x3fffffffffff]
[    4.403493] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
[    4.411331] IP: [<ffffffff8143ef8d>] acpi_ns_walk_namespace+0x3d/0x18f
[    4.417878] PGD 0 
[    4.419900] Oops: 0000 [#1] PREEMPT SMP
[    4.423731] Modules linked in:
[    4.426791] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.8.0-rc5+ #13
[    4.433122] Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A13 05/11/2014
[    4.440506] task: ffff88043be70000 task.stack: ffffc90000020000
[    4.446405] RIP: 0010:[<ffffffff8143ef8d>]  [<ffffffff8143ef8d>] acpi_ns_walk_namespace+0x3d/0x18f
[    4.455348] RSP: 0000:ffffc90000023d98  EFLAGS: 00010202
[    4.460641] RAX: ffffffff82dd6ae0 RBX: 0000000000000001 RCX: 0000000000000001
[    4.467752] RDX: 00000000ffffffff RSI: 0000000000000000 RDI: 0000000000000006
[    4.474854] RBP: ffffc90000023de8 R08: ffffffff8144bfaa R09: 0000000000000000
[    4.481962] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
[    4.489071] R13: ffffffff8144bfaa R14: 0000000000000000 R15: 0000000000000000
[    4.496187] FS:  0000000000000000(0000) GS:ffff88043de00000(0000) knlGS:0000000000000000
[    4.504242] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    4.509967] CR2: 0000000000000018 CR3: 0000000003e05000 CR4: 00000000000406f0
[    4.517078] Stack:
[    4.519079]  0000000000000018 ffffffff82dd6c58 ffffc90000023dc0 00000001ffffffff
[    4.526476]  0000000000000006 0000000000000000 0000000000001001 ffffffff8144bfaa
[    4.533876]  0000000000000000 0000000000000000 ffffc90000023e38 ffffffff8143f51d
[    4.541273] Call Trace:
[    4.543714]  [<ffffffff8144bfaa>] ? setup_res+0x94/0x94
[    4.548915]  [<ffffffff8143f51d>] acpi_walk_namespace+0x9b/0xcf
[    4.554838]  [<ffffffff81fc44c1>] ? ras_debugfs_init+0x1b/0x1b
[    4.560645]  [<ffffffff8144c376>] acpi_ioapic_add+0x33/0x4c
[    4.566221]  [<ffffffff81faf09a>] pci_assign_unassigned_resources+0x4e/0x56
[    4.573151]  [<ffffffff81fc44df>] pcibios_assign_resources+0x1e/0xb4
[    4.579476]  [<ffffffff81fc44c1>] ? ras_debugfs_init+0x1b/0x1b
[    4.585306]  [<ffffffff81000440>] do_one_initcall+0x50/0x190
[    4.590969]  [<ffffffff81f73091>] kernel_init_freeable+0x11e/0x1a6
[    4.597130]  [<ffffffff81854d60>] ? rest_init+0x140/0x140
[    4.602506]  [<ffffffff81854d6e>] kernel_init+0xe/0x110
[    4.607735]  [<ffffffff8185f11a>] ret_from_fork+0x2a/0x40
[    4.613110] Code: 41 54 41 89 cc 53 bb 01 00 00 00 48 83 ec 28 48 83 fe ff 89 7d d0 4c 0f 44 35 58 7b 99 01 89 55 c8 45 31 ff 41 83 e4 01 89 4d cc <4d> 8b 6e 18 4c 89 45 c0 4c 89 4d b8 c7 45 d4 00 00 00 00 4d 85 
[    4.632554] RIP  [<ffffffff8143ef8d>] acpi_ns_walk_namespace+0x3d/0x18f
[    4.639148]  RSP <ffffc90000023d98>
[    4.642619] CR2: 0000000000000018
[    4.645944] ---[ end trace 2fbd1c53d9becc2a ]---
[    4.650562] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
[    4.650562] 
[    4.659694] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009

You can see from the call trace how it happens:

pci_assign_unassigned_resources-> acpi_ioapic_add -> ... which you added.

rIP is this:

ffffffff8143ef50 <acpi_ns_walk_namespace>:
ffffffff8143ef50:       e8 6b 16 42 00          callq  ffffffff818605c0 <__fentry__>
ffffffff8143ef55:       55                      push   %rbp
ffffffff8143ef56:       48 89 e5                mov    %rsp,%rbp
ffffffff8143ef59:       41 57                   push   %r15
ffffffff8143ef5b:       41 56                   push   %r14
ffffffff8143ef5d:       49 89 f6                mov    %rsi,%r14
ffffffff8143ef60:       41 55                   push   %r13
ffffffff8143ef62:       41 54                   push   %r12
ffffffff8143ef64:       41 89 cc                mov    %ecx,%r12d
ffffffff8143ef67:       53                      push   %rbx
ffffffff8143ef68:       bb 01 00 00 00          mov    $0x1,%ebx
ffffffff8143ef6d:       48 83 ec 28             sub    $0x28,%rsp
ffffffff8143ef71:       48 83 fe ff             cmp    $0xffffffffffffffff,%rsi
ffffffff8143ef75:       89 7d d0                mov    %edi,-0x30(%rbp)
ffffffff8143ef78:       4c 0f 44 35 58 7b 99    cmove  0x1997b58(%rip),%r14        # ffffffff82dd6ad8 <acpi_gbl_root_node>
ffffffff8143ef7f:       01 
ffffffff8143ef80:       89 55 c8                mov    %edx,-0x38(%rbp)
ffffffff8143ef83:       45 31 ff                xor    %r15d,%r15d
ffffffff8143ef86:       41 83 e4 01             and    $0x1,%r12d
ffffffff8143ef8a:       89 4d cc                mov    %ecx,-0x34(%rbp)
ffffffff8143ef8d:       4d 8b 6e 18             mov    0x18(%r14),%r13		<--- faulting insn.
ffffffff8143ef91:       4c 89 45 c0             mov    %r8,-0x40(%rbp)
ffffffff8143ef95:       4c 89 4d b8             mov    %r9,-0x48(%rbp)
ffffffff8143ef99:       c7 45 d4 00 00 00 00    movl   $0x0,-0x2c(%rbp)
ffffffff8143efa0:       4d 85 ed                test   %r13,%r13

In the annotated asm, this is:

        .globl  acpi_ns_walk_namespace
        .type   acpi_ns_walk_namespace, @function
acpi_ns_walk_namespace:
.LFB1454:
        .loc 1 191 0
        .cfi_startproc
.LVL5:
        call    __fentry__
        pushq   %rbp    #
        .cfi_def_cfa_offset 16
        .cfi_offset 6, -16
        movq    %rsp, %rbp      #,
        .cfi_def_cfa_register 6
        pushq   %r15    #
        pushq   %r14    #
        .cfi_offset 15, -24
        .cfi_offset 14, -32
        movq    %rsi, %r14      # start_node, start_node
        pushq   %r13    #
        pushq   %r12    #
        .cfi_offset 13, -40
        .cfi_offset 12, -48
        .loc 1 249 0
        movl    %ecx, %r12d     # flags, D.24037
        .loc 1 191 0
        pushq   %rbx    #
        .cfi_offset 3, -56
        .loc 1 213 0
        movl    $1, %ebx        #, level
.LVL6:
        .loc 1 191 0
        subq    $40, %rsp       #,
        .loc 1 205 0
        cmpq    $-1, %rsi       #, start_node
        .loc 1 191 0
        movl    %edi, -48(%rbp) # type, %sfp
        .loc 1 205 0
        cmove   acpi_gbl_root_node(%rip), %r14  # acpi_gbl_root_node,, start_node
.LVL7:
        .loc 1 191 0
        movl    %edx, -56(%rbp) # max_depth, %sfp
        .loc 1 198 0
        xorl    %r15d, %r15d    # node_previously_visited
.LVL8:
        .loc 1 249 0
        andl    $1, %r12d       #, D.24037
        .loc 1 191 0
        movl    %ecx, -52(%rbp) # flags, %sfp
.LBB14:
.LBB15:
        .loc 1 79 0
        movq    24(%r14), %r13  # MEM[(struct acpi_namespace_node *)start_node_1].child, start_node		<--- faulting insn

and the relevant piece of C code is:

        if (start_node == ACPI_ROOT_OBJECT) {
                start_node = acpi_gbl_root_node;
        }

        /* Null child means "get first node" */

        parent_node = start_node;
        child_node = acpi_ns_get_next_node(parent_node, NULL);

And parent_node, i.e., start_node is NULL here and
acpi_ns_get_next_node() tries to deref it to get the child:

	movq    24(%r14), %r13

HTH.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 584c5c422f6c ("x86/ioapic: Support hot-removal of IOAPICs present during boot")
  2016-09-10 11:27   ` Borislav Petkov
@ 2016-09-10 13:11     ` Rui Wang
  2016-09-10 14:30       ` Borislav Petkov
  0 siblings, 1 reply; 7+ messages in thread
From: Rui Wang @ 2016-09-10 13:11 UTC (permalink / raw)
  To: bp
  Cc: bhelgaas, torvalds, peterz, tglx, helgaas, linux-acpi, linux-pci,
	rjw, tony.luck, mingo, x86, linux-kernel, rui.y.wang

On Sat, Sep 10, 2016 7:28 PM, Borislav Petkov wrote:
> 
> ffffffff8143ef50 <acpi_ns_walk_namespace>:
> ffffffff8143ef50:       e8 6b 16 42 00          callq  ffffffff818605c0 <__fentry__>
> ffffffff8143ef55:       55                      push   %rbp
> ffffffff8143ef56:       48 89 e5                mov    %rsp,%rbp
> ffffffff8143ef59:       41 57                   push   %r15
> ffffffff8143ef5b:       41 56                   push   %r14
> ffffffff8143ef5d:       49 89 f6                mov    %rsi,%r14
> ffffffff8143ef60:       41 55                   push   %r13
> ffffffff8143ef62:       41 54                   push   %r12
> ffffffff8143ef64:       41 89 cc                mov    %ecx,%r12d
> ffffffff8143ef67:       53                      push   %rbx
> ffffffff8143ef68:       bb 01 00 00 00          mov    $0x1,%ebx
> ffffffff8143ef6d:       48 83 ec 28             sub    $0x28,%rsp
> ffffffff8143ef71:       48 83 fe ff             cmp    $0xffffffffffffffff,%rsi
> ffffffff8143ef75:       89 7d d0                mov    %edi,-0x30(%rbp)
> ffffffff8143ef78:       4c 0f 44 35 58 7b 99    cmove  0x1997b58(%rip),%r14  # ffffffff82dd6ad8 <acpi_gbl_root_node>
> ffffffff8143ef7f:       01
> ffffffff8143ef80:       89 55 c8                mov    %edx,-0x38(%rbp)
> ffffffff8143ef83:       45 31 ff                xor    %r15d,%r15d
> ffffffff8143ef86:       41 83 e4 01             and    $0x1,%r12d
> ffffffff8143ef8a:       89 4d cc                mov    %ecx,-0x34(%rbp)
> ffffffff8143ef8d:       4d 8b 6e 18             mov    0x18(%r14),%r13    <--- > faulting insn.
> ffffffff8143ef91:       4c 89 45 c0             mov    %r8,-0x40(%rbp)
> ffffffff8143ef95:       4c 89 4d b8             mov    %r9,-0x48(%rbp)
> ffffffff8143ef99:       c7 45 d4 00 00 00 00    movl   $0x0,-0x2c(%rbp)
> ffffffff8143efa0:       4d 85 ed                test   %r13,%r13

Thanks for the analysis. Looks like acpi_gbl_root_node == 0. It is
initialized earlier than where we call acpi_ioapic_add().
start_kernel()
	->acpi_early_init()
		->acpi_initialize_subsystem()
			 ->acpi_ns_root_initialize()
				-> acpi_gbl_root_node = ...


But acpi_gbl_root_node can be 0 if acpi_disabled == 1.
Do you somehow have acpi turned off (!CONFIG_ACPI or booting with acpi=off)?

Thanks
Rui

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 584c5c422f6c ("x86/ioapic: Support hot-removal of IOAPICs present during boot")
  2016-09-10 13:11     ` Rui Wang
@ 2016-09-10 14:30       ` Borislav Petkov
  2016-09-10 15:03         ` Rui Wang
  0 siblings, 1 reply; 7+ messages in thread
From: Borislav Petkov @ 2016-09-10 14:30 UTC (permalink / raw)
  To: Rui Wang
  Cc: bhelgaas, torvalds, peterz, tglx, helgaas, linux-acpi, linux-pci,
	rjw, tony.luck, mingo, x86, linux-kernel

On Sat, Sep 10, 2016 at 09:11:30PM +0800, Rui Wang wrote:
> Thanks for the analysis. Looks like acpi_gbl_root_node == 0. It is

Is ACPI_HANDLE(root_bus->bridge) == ACPI_ROOT_OBJECT?

Because this is what the code does but I can't see it from the code that
the handle is of type ACPI_ROOT_OBJECT...

> initialized earlier than where we call acpi_ioapic_add().
> start_kernel()
> 	->acpi_early_init()
> 		->acpi_initialize_subsystem()
> 			 ->acpi_ns_root_initialize()
> 				-> acpi_gbl_root_node = ...
> 
> 
> But acpi_gbl_root_node can be 0 if acpi_disabled == 1.
> Do you somehow have acpi turned off (!CONFIG_ACPI or booting with acpi=off)?

Nope.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 584c5c422f6c ("x86/ioapic: Support hot-removal of IOAPICs present during boot")
  2016-09-10 14:30       ` Borislav Petkov
@ 2016-09-10 15:03         ` Rui Wang
  2016-09-10 15:23           ` Borislav Petkov
  0 siblings, 1 reply; 7+ messages in thread
From: Rui Wang @ 2016-09-10 15:03 UTC (permalink / raw)
  To: bp
  Cc: bhelgaas, torvalds, peterz, tglx, helgaas, linux-acpi, linux-pci,
	rjw, tony.luck, mingo, x86, linux-kernel, rui.y.wang

On Sat, Sep 10, 2016 10:31 PM, Borislav Petkov wrote:
> 
> On Sat, Sep 10, 2016 at 09:11:30PM +0800, Rui Wang wrote:
> > Thanks for the analysis. Looks like acpi_gbl_root_node == 0. It is
> 
> Is ACPI_HANDLE(root_bus->bridge) == ACPI_ROOT_OBJECT?
> 
> Because this is what the code does but I can't see it from the code that the
> handle is of type ACPI_ROOT_OBJECT...
> 

I seem to have the clue. It is because ACPI_HANDLE(root_bus->bridge) == NULL.
Then this root bridge doesn't have a corresponding ACPI device. In this case
we shouldn't call acpi_ioapic_add(). This can be true for some old platforms.

If that's the case then the patch below can fix it.

Thanks
Rui

>From b37d806c78e5eea70dbb890c1e3dbb8b7e5038e6 Mon Sep 17 00:00:00 2001
From: Rui Wang <rui.y.wang@intel.com>
Date: Sat, 10 Sep 2016 10:56:34 -0400
Subject: [PATCH] x86/ioapic: Ignore root bridges without a companion ACPI device

Some PCI root bridges don't have a corresponding ACPI device.
Don't call acpi_ioapic_add() on these bridges because they can't
support ioapic hotplug.

Signed-off-by: Rui Wang <rui.y.wang@intel.com>
---
 drivers/pci/setup-bus.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index ec538d3..2b05ac6 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1852,10 +1852,15 @@ dump:
 void __init pci_assign_unassigned_resources(void)
 {
 	struct pci_bus *root_bus;
+	acpi_handle root_handle;
 
 	list_for_each_entry(root_bus, &pci_root_buses, node) {
 		pci_assign_unassigned_root_bus_resources(root_bus);
-		acpi_ioapic_add(ACPI_HANDLE(root_bus->bridge));
+		root_handle = ACPI_HANDLE(root_bus->bridge);
+
+		/* make sure the root bridge has a companion ACPI device */
+		if (root_handle)
+			acpi_ioapic_add(root_handle);
 	}
 }
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: 584c5c422f6c ("x86/ioapic: Support hot-removal of IOAPICs present during boot")
  2016-09-10 15:03         ` Rui Wang
@ 2016-09-10 15:23           ` Borislav Petkov
  0 siblings, 0 replies; 7+ messages in thread
From: Borislav Petkov @ 2016-09-10 15:23 UTC (permalink / raw)
  To: Rui Wang
  Cc: bhelgaas, torvalds, peterz, tglx, helgaas, linux-acpi, linux-pci,
	rjw, tony.luck, mingo, x86, linux-kernel

On Sat, Sep 10, 2016 at 11:03:29PM +0800, Rui Wang wrote:
> From b37d806c78e5eea70dbb890c1e3dbb8b7e5038e6 Mon Sep 17 00:00:00 2001
> From: Rui Wang <rui.y.wang@intel.com>
> Date: Sat, 10 Sep 2016 10:56:34 -0400
> Subject: [PATCH] x86/ioapic: Ignore root bridges without a companion ACPI device
> 
> Some PCI root bridges don't have a corresponding ACPI device.

Please say here that this is can be the case on some old platforms. Mine
is a couple of years old dell workstation box.

> Don't call acpi_ioapic_add() on these bridges because they can't
> support ioapic hotplug.

Right, exactly.

> Signed-off-by: Rui Wang <rui.y.wang@intel.com>
> ---
>  drivers/pci/setup-bus.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> index ec538d3..2b05ac6 100644
> --- a/drivers/pci/setup-bus.c
> +++ b/drivers/pci/setup-bus.c
> @@ -1852,10 +1852,15 @@ dump:
>  void __init pci_assign_unassigned_resources(void)
>  {
>  	struct pci_bus *root_bus;
> +	acpi_handle root_handle;
>  
>  	list_for_each_entry(root_bus, &pci_root_buses, node) {
>  		pci_assign_unassigned_root_bus_resources(root_bus);
> -		acpi_ioapic_add(ACPI_HANDLE(root_bus->bridge));
> +		root_handle = ACPI_HANDLE(root_bus->bridge);
> +
> +		/* make sure the root bridge has a companion ACPI device */
> +		if (root_handle)
> +			acpi_ioapic_add(root_handle);

Just do:

		if (ACPI_HANDLE(root_bus->bridge))
			acpi_ioapic_add(ACPI_HANDLE(root_bus->bridge));

no need for the local variable - gcc is smart enough to fetch it only
once :)

Anyway, I'll run it later to verify.

Thanks.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-09-10 15:23 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-09 22:07 584c5c422f6c ("x86/ioapic: Support hot-removal of IOAPICs present during boot") Borislav Petkov
2016-09-10  0:14 ` Rui Wang
2016-09-10 11:27   ` Borislav Petkov
2016-09-10 13:11     ` Rui Wang
2016-09-10 14:30       ` Borislav Petkov
2016-09-10 15:03         ` Rui Wang
2016-09-10 15:23           ` Borislav Petkov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.