* 584c5c422f6c ("x86/ioapic: Support hot-removal of IOAPICs present during boot")
@ 2016-09-09 22:07 Borislav Petkov
2016-09-10 0:14 ` Rui Wang
0 siblings, 1 reply; 7+ messages in thread
From: Borislav Petkov @ 2016-09-09 22:07 UTC (permalink / raw)
To: Rui Wang
Cc: Bjorn Helgaas, Linus Torvalds, Peter Zijlstra, Thomas Gleixner,
helgaas, linux-acpi, linux-pci, rjw, tony.luck, Ingo Molnar,
x86-ml, lkml
Hi,
commit in $Subject from tip breaks booting on my SNB test box,
it freezes somewhere very early during boot. When I revert the
aforementioned commit ontop of Linus' master + tip/master from today, it
boots fine.
Additional info is available upon request.
Thanks.
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
^ permalink raw reply [flat|nested] 7+ messages in thread
* 584c5c422f6c ("x86/ioapic: Support hot-removal of IOAPICs present during boot")
2016-09-09 22:07 584c5c422f6c ("x86/ioapic: Support hot-removal of IOAPICs present during boot") Borislav Petkov
@ 2016-09-10 0:14 ` Rui Wang
2016-09-10 11:27 ` Borislav Petkov
0 siblings, 1 reply; 7+ messages in thread
From: Rui Wang @ 2016-09-10 0:14 UTC (permalink / raw)
To: bp
Cc: bhelgaas, torvalds, peterz, tglx, helgaas, linux-acpi, linux-pci,
rjw, tony.luck, mingo, x86, linux-kernel, Rui Wang
On Saturday, September 10, 2016 6:07 AM, Borislav Petkov wrote:
>
> Hi,
>
> commit in $Subject from tip breaks booting on my SNB test box, it freezes
> somewhere very early during boot. When I revert the aforementioned
> commit ontop of Linus' master + tip/master from today, it boots fine.
>
> Additional info is available upon request.
>
Hi Boris,
Is it a Romley-EP ? What kind of SNB box is it? I'll try to find one
to reproduce the problem.
Thanks
Rui
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: 584c5c422f6c ("x86/ioapic: Support hot-removal of IOAPICs present during boot")
2016-09-10 0:14 ` Rui Wang
@ 2016-09-10 11:27 ` Borislav Petkov
2016-09-10 13:11 ` Rui Wang
0 siblings, 1 reply; 7+ messages in thread
From: Borislav Petkov @ 2016-09-10 11:27 UTC (permalink / raw)
To: Rui Wang
Cc: bhelgaas, torvalds, peterz, tglx, helgaas, linux-acpi, linux-pci,
rjw, tony.luck, mingo, x86, linux-kernel
On Sat, Sep 10, 2016 at 08:14:04AM +0800, Rui Wang wrote:
> Is it a Romley-EP ? What kind of SNB box is it? I'll try to find one
> to reproduce the problem.
You don't have to - I went and analyzed it for you but it is ACPI
crap and I can't be bothered to think of a solution. It looks like
a chicken-and-an-egg issue where you call acpi_ioapic_add() on the
root_bus and something's not initialized there in the chain of parent
and children nodes.
But I could very well be wrong so let me dump the details, I'll let
someone else make sense of it all. Provided I've not made a mistake
somewhere, which is not uncommon.
So the splat is:
[ 4.396872] pci_bus 0000:3f: resource 5 [mem 0x00000000-0x3fffffffffff]
[ 4.403493] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
[ 4.411331] IP: [<ffffffff8143ef8d>] acpi_ns_walk_namespace+0x3d/0x18f
[ 4.417878] PGD 0
[ 4.419900] Oops: 0000 [#1] PREEMPT SMP
[ 4.423731] Modules linked in:
[ 4.426791] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.8.0-rc5+ #13
[ 4.433122] Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A13 05/11/2014
[ 4.440506] task: ffff88043be70000 task.stack: ffffc90000020000
[ 4.446405] RIP: 0010:[<ffffffff8143ef8d>] [<ffffffff8143ef8d>] acpi_ns_walk_namespace+0x3d/0x18f
[ 4.455348] RSP: 0000:ffffc90000023d98 EFLAGS: 00010202
[ 4.460641] RAX: ffffffff82dd6ae0 RBX: 0000000000000001 RCX: 0000000000000001
[ 4.467752] RDX: 00000000ffffffff RSI: 0000000000000000 RDI: 0000000000000006
[ 4.474854] RBP: ffffc90000023de8 R08: ffffffff8144bfaa R09: 0000000000000000
[ 4.481962] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
[ 4.489071] R13: ffffffff8144bfaa R14: 0000000000000000 R15: 0000000000000000
[ 4.496187] FS: 0000000000000000(0000) GS:ffff88043de00000(0000) knlGS:0000000000000000
[ 4.504242] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4.509967] CR2: 0000000000000018 CR3: 0000000003e05000 CR4: 00000000000406f0
[ 4.517078] Stack:
[ 4.519079] 0000000000000018 ffffffff82dd6c58 ffffc90000023dc0 00000001ffffffff
[ 4.526476] 0000000000000006 0000000000000000 0000000000001001 ffffffff8144bfaa
[ 4.533876] 0000000000000000 0000000000000000 ffffc90000023e38 ffffffff8143f51d
[ 4.541273] Call Trace:
[ 4.543714] [<ffffffff8144bfaa>] ? setup_res+0x94/0x94
[ 4.548915] [<ffffffff8143f51d>] acpi_walk_namespace+0x9b/0xcf
[ 4.554838] [<ffffffff81fc44c1>] ? ras_debugfs_init+0x1b/0x1b
[ 4.560645] [<ffffffff8144c376>] acpi_ioapic_add+0x33/0x4c
[ 4.566221] [<ffffffff81faf09a>] pci_assign_unassigned_resources+0x4e/0x56
[ 4.573151] [<ffffffff81fc44df>] pcibios_assign_resources+0x1e/0xb4
[ 4.579476] [<ffffffff81fc44c1>] ? ras_debugfs_init+0x1b/0x1b
[ 4.585306] [<ffffffff81000440>] do_one_initcall+0x50/0x190
[ 4.590969] [<ffffffff81f73091>] kernel_init_freeable+0x11e/0x1a6
[ 4.597130] [<ffffffff81854d60>] ? rest_init+0x140/0x140
[ 4.602506] [<ffffffff81854d6e>] kernel_init+0xe/0x110
[ 4.607735] [<ffffffff8185f11a>] ret_from_fork+0x2a/0x40
[ 4.613110] Code: 41 54 41 89 cc 53 bb 01 00 00 00 48 83 ec 28 48 83 fe ff 89 7d d0 4c 0f 44 35 58 7b 99 01 89 55 c8 45 31 ff 41 83 e4 01 89 4d cc <4d> 8b 6e 18 4c 89 45 c0 4c 89 4d b8 c7 45 d4 00 00 00 00 4d 85
[ 4.632554] RIP [<ffffffff8143ef8d>] acpi_ns_walk_namespace+0x3d/0x18f
[ 4.639148] RSP <ffffc90000023d98>
[ 4.642619] CR2: 0000000000000018
[ 4.645944] ---[ end trace 2fbd1c53d9becc2a ]---
[ 4.650562] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
[ 4.650562]
[ 4.659694] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
You can see from the call trace how it happens:
pci_assign_unassigned_resources-> acpi_ioapic_add -> ... which you added.
rIP is this:
ffffffff8143ef50 <acpi_ns_walk_namespace>:
ffffffff8143ef50: e8 6b 16 42 00 callq ffffffff818605c0 <__fentry__>
ffffffff8143ef55: 55 push %rbp
ffffffff8143ef56: 48 89 e5 mov %rsp,%rbp
ffffffff8143ef59: 41 57 push %r15
ffffffff8143ef5b: 41 56 push %r14
ffffffff8143ef5d: 49 89 f6 mov %rsi,%r14
ffffffff8143ef60: 41 55 push %r13
ffffffff8143ef62: 41 54 push %r12
ffffffff8143ef64: 41 89 cc mov %ecx,%r12d
ffffffff8143ef67: 53 push %rbx
ffffffff8143ef68: bb 01 00 00 00 mov $0x1,%ebx
ffffffff8143ef6d: 48 83 ec 28 sub $0x28,%rsp
ffffffff8143ef71: 48 83 fe ff cmp $0xffffffffffffffff,%rsi
ffffffff8143ef75: 89 7d d0 mov %edi,-0x30(%rbp)
ffffffff8143ef78: 4c 0f 44 35 58 7b 99 cmove 0x1997b58(%rip),%r14 # ffffffff82dd6ad8 <acpi_gbl_root_node>
ffffffff8143ef7f: 01
ffffffff8143ef80: 89 55 c8 mov %edx,-0x38(%rbp)
ffffffff8143ef83: 45 31 ff xor %r15d,%r15d
ffffffff8143ef86: 41 83 e4 01 and $0x1,%r12d
ffffffff8143ef8a: 89 4d cc mov %ecx,-0x34(%rbp)
ffffffff8143ef8d: 4d 8b 6e 18 mov 0x18(%r14),%r13 <--- faulting insn.
ffffffff8143ef91: 4c 89 45 c0 mov %r8,-0x40(%rbp)
ffffffff8143ef95: 4c 89 4d b8 mov %r9,-0x48(%rbp)
ffffffff8143ef99: c7 45 d4 00 00 00 00 movl $0x0,-0x2c(%rbp)
ffffffff8143efa0: 4d 85 ed test %r13,%r13
In the annotated asm, this is:
.globl acpi_ns_walk_namespace
.type acpi_ns_walk_namespace, @function
acpi_ns_walk_namespace:
.LFB1454:
.loc 1 191 0
.cfi_startproc
.LVL5:
call __fentry__
pushq %rbp #
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp #,
.cfi_def_cfa_register 6
pushq %r15 #
pushq %r14 #
.cfi_offset 15, -24
.cfi_offset 14, -32
movq %rsi, %r14 # start_node, start_node
pushq %r13 #
pushq %r12 #
.cfi_offset 13, -40
.cfi_offset 12, -48
.loc 1 249 0
movl %ecx, %r12d # flags, D.24037
.loc 1 191 0
pushq %rbx #
.cfi_offset 3, -56
.loc 1 213 0
movl $1, %ebx #, level
.LVL6:
.loc 1 191 0
subq $40, %rsp #,
.loc 1 205 0
cmpq $-1, %rsi #, start_node
.loc 1 191 0
movl %edi, -48(%rbp) # type, %sfp
.loc 1 205 0
cmove acpi_gbl_root_node(%rip), %r14 # acpi_gbl_root_node,, start_node
.LVL7:
.loc 1 191 0
movl %edx, -56(%rbp) # max_depth, %sfp
.loc 1 198 0
xorl %r15d, %r15d # node_previously_visited
.LVL8:
.loc 1 249 0
andl $1, %r12d #, D.24037
.loc 1 191 0
movl %ecx, -52(%rbp) # flags, %sfp
.LBB14:
.LBB15:
.loc 1 79 0
movq 24(%r14), %r13 # MEM[(struct acpi_namespace_node *)start_node_1].child, start_node <--- faulting insn
and the relevant piece of C code is:
if (start_node == ACPI_ROOT_OBJECT) {
start_node = acpi_gbl_root_node;
}
/* Null child means "get first node" */
parent_node = start_node;
child_node = acpi_ns_get_next_node(parent_node, NULL);
And parent_node, i.e., start_node is NULL here and
acpi_ns_get_next_node() tries to deref it to get the child:
movq 24(%r14), %r13
HTH.
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: 584c5c422f6c ("x86/ioapic: Support hot-removal of IOAPICs present during boot")
2016-09-10 11:27 ` Borislav Petkov
@ 2016-09-10 13:11 ` Rui Wang
2016-09-10 14:30 ` Borislav Petkov
0 siblings, 1 reply; 7+ messages in thread
From: Rui Wang @ 2016-09-10 13:11 UTC (permalink / raw)
To: bp
Cc: bhelgaas, torvalds, peterz, tglx, helgaas, linux-acpi, linux-pci,
rjw, tony.luck, mingo, x86, linux-kernel, rui.y.wang
On Sat, Sep 10, 2016 7:28 PM, Borislav Petkov wrote:
>
> ffffffff8143ef50 <acpi_ns_walk_namespace>:
> ffffffff8143ef50: e8 6b 16 42 00 callq ffffffff818605c0 <__fentry__>
> ffffffff8143ef55: 55 push %rbp
> ffffffff8143ef56: 48 89 e5 mov %rsp,%rbp
> ffffffff8143ef59: 41 57 push %r15
> ffffffff8143ef5b: 41 56 push %r14
> ffffffff8143ef5d: 49 89 f6 mov %rsi,%r14
> ffffffff8143ef60: 41 55 push %r13
> ffffffff8143ef62: 41 54 push %r12
> ffffffff8143ef64: 41 89 cc mov %ecx,%r12d
> ffffffff8143ef67: 53 push %rbx
> ffffffff8143ef68: bb 01 00 00 00 mov $0x1,%ebx
> ffffffff8143ef6d: 48 83 ec 28 sub $0x28,%rsp
> ffffffff8143ef71: 48 83 fe ff cmp $0xffffffffffffffff,%rsi
> ffffffff8143ef75: 89 7d d0 mov %edi,-0x30(%rbp)
> ffffffff8143ef78: 4c 0f 44 35 58 7b 99 cmove 0x1997b58(%rip),%r14 # ffffffff82dd6ad8 <acpi_gbl_root_node>
> ffffffff8143ef7f: 01
> ffffffff8143ef80: 89 55 c8 mov %edx,-0x38(%rbp)
> ffffffff8143ef83: 45 31 ff xor %r15d,%r15d
> ffffffff8143ef86: 41 83 e4 01 and $0x1,%r12d
> ffffffff8143ef8a: 89 4d cc mov %ecx,-0x34(%rbp)
> ffffffff8143ef8d: 4d 8b 6e 18 mov 0x18(%r14),%r13 <--- > faulting insn.
> ffffffff8143ef91: 4c 89 45 c0 mov %r8,-0x40(%rbp)
> ffffffff8143ef95: 4c 89 4d b8 mov %r9,-0x48(%rbp)
> ffffffff8143ef99: c7 45 d4 00 00 00 00 movl $0x0,-0x2c(%rbp)
> ffffffff8143efa0: 4d 85 ed test %r13,%r13
Thanks for the analysis. Looks like acpi_gbl_root_node == 0. It is
initialized earlier than where we call acpi_ioapic_add().
start_kernel()
->acpi_early_init()
->acpi_initialize_subsystem()
->acpi_ns_root_initialize()
-> acpi_gbl_root_node = ...
But acpi_gbl_root_node can be 0 if acpi_disabled == 1.
Do you somehow have acpi turned off (!CONFIG_ACPI or booting with acpi=off)?
Thanks
Rui
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: 584c5c422f6c ("x86/ioapic: Support hot-removal of IOAPICs present during boot")
2016-09-10 13:11 ` Rui Wang
@ 2016-09-10 14:30 ` Borislav Petkov
2016-09-10 15:03 ` Rui Wang
0 siblings, 1 reply; 7+ messages in thread
From: Borislav Petkov @ 2016-09-10 14:30 UTC (permalink / raw)
To: Rui Wang
Cc: bhelgaas, torvalds, peterz, tglx, helgaas, linux-acpi, linux-pci,
rjw, tony.luck, mingo, x86, linux-kernel
On Sat, Sep 10, 2016 at 09:11:30PM +0800, Rui Wang wrote:
> Thanks for the analysis. Looks like acpi_gbl_root_node == 0. It is
Is ACPI_HANDLE(root_bus->bridge) == ACPI_ROOT_OBJECT?
Because this is what the code does but I can't see it from the code that
the handle is of type ACPI_ROOT_OBJECT...
> initialized earlier than where we call acpi_ioapic_add().
> start_kernel()
> ->acpi_early_init()
> ->acpi_initialize_subsystem()
> ->acpi_ns_root_initialize()
> -> acpi_gbl_root_node = ...
>
>
> But acpi_gbl_root_node can be 0 if acpi_disabled == 1.
> Do you somehow have acpi turned off (!CONFIG_ACPI or booting with acpi=off)?
Nope.
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: 584c5c422f6c ("x86/ioapic: Support hot-removal of IOAPICs present during boot")
2016-09-10 14:30 ` Borislav Petkov
@ 2016-09-10 15:03 ` Rui Wang
2016-09-10 15:23 ` Borislav Petkov
0 siblings, 1 reply; 7+ messages in thread
From: Rui Wang @ 2016-09-10 15:03 UTC (permalink / raw)
To: bp
Cc: bhelgaas, torvalds, peterz, tglx, helgaas, linux-acpi, linux-pci,
rjw, tony.luck, mingo, x86, linux-kernel, rui.y.wang
On Sat, Sep 10, 2016 10:31 PM, Borislav Petkov wrote:
>
> On Sat, Sep 10, 2016 at 09:11:30PM +0800, Rui Wang wrote:
> > Thanks for the analysis. Looks like acpi_gbl_root_node == 0. It is
>
> Is ACPI_HANDLE(root_bus->bridge) == ACPI_ROOT_OBJECT?
>
> Because this is what the code does but I can't see it from the code that the
> handle is of type ACPI_ROOT_OBJECT...
>
I seem to have the clue. It is because ACPI_HANDLE(root_bus->bridge) == NULL.
Then this root bridge doesn't have a corresponding ACPI device. In this case
we shouldn't call acpi_ioapic_add(). This can be true for some old platforms.
If that's the case then the patch below can fix it.
Thanks
Rui
>From b37d806c78e5eea70dbb890c1e3dbb8b7e5038e6 Mon Sep 17 00:00:00 2001
From: Rui Wang <rui.y.wang@intel.com>
Date: Sat, 10 Sep 2016 10:56:34 -0400
Subject: [PATCH] x86/ioapic: Ignore root bridges without a companion ACPI device
Some PCI root bridges don't have a corresponding ACPI device.
Don't call acpi_ioapic_add() on these bridges because they can't
support ioapic hotplug.
Signed-off-by: Rui Wang <rui.y.wang@intel.com>
---
drivers/pci/setup-bus.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index ec538d3..2b05ac6 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1852,10 +1852,15 @@ dump:
void __init pci_assign_unassigned_resources(void)
{
struct pci_bus *root_bus;
+ acpi_handle root_handle;
list_for_each_entry(root_bus, &pci_root_buses, node) {
pci_assign_unassigned_root_bus_resources(root_bus);
- acpi_ioapic_add(ACPI_HANDLE(root_bus->bridge));
+ root_handle = ACPI_HANDLE(root_bus->bridge);
+
+ /* make sure the root bridge has a companion ACPI device */
+ if (root_handle)
+ acpi_ioapic_add(root_handle);
}
}
--
1.8.3.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: 584c5c422f6c ("x86/ioapic: Support hot-removal of IOAPICs present during boot")
2016-09-10 15:03 ` Rui Wang
@ 2016-09-10 15:23 ` Borislav Petkov
0 siblings, 0 replies; 7+ messages in thread
From: Borislav Petkov @ 2016-09-10 15:23 UTC (permalink / raw)
To: Rui Wang
Cc: bhelgaas, torvalds, peterz, tglx, helgaas, linux-acpi, linux-pci,
rjw, tony.luck, mingo, x86, linux-kernel
On Sat, Sep 10, 2016 at 11:03:29PM +0800, Rui Wang wrote:
> From b37d806c78e5eea70dbb890c1e3dbb8b7e5038e6 Mon Sep 17 00:00:00 2001
> From: Rui Wang <rui.y.wang@intel.com>
> Date: Sat, 10 Sep 2016 10:56:34 -0400
> Subject: [PATCH] x86/ioapic: Ignore root bridges without a companion ACPI device
>
> Some PCI root bridges don't have a corresponding ACPI device.
Please say here that this is can be the case on some old platforms. Mine
is a couple of years old dell workstation box.
> Don't call acpi_ioapic_add() on these bridges because they can't
> support ioapic hotplug.
Right, exactly.
> Signed-off-by: Rui Wang <rui.y.wang@intel.com>
> ---
> drivers/pci/setup-bus.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> index ec538d3..2b05ac6 100644
> --- a/drivers/pci/setup-bus.c
> +++ b/drivers/pci/setup-bus.c
> @@ -1852,10 +1852,15 @@ dump:
> void __init pci_assign_unassigned_resources(void)
> {
> struct pci_bus *root_bus;
> + acpi_handle root_handle;
>
> list_for_each_entry(root_bus, &pci_root_buses, node) {
> pci_assign_unassigned_root_bus_resources(root_bus);
> - acpi_ioapic_add(ACPI_HANDLE(root_bus->bridge));
> + root_handle = ACPI_HANDLE(root_bus->bridge);
> +
> + /* make sure the root bridge has a companion ACPI device */
> + if (root_handle)
> + acpi_ioapic_add(root_handle);
Just do:
if (ACPI_HANDLE(root_bus->bridge))
acpi_ioapic_add(ACPI_HANDLE(root_bus->bridge));
no need for the local variable - gcc is smart enough to fetch it only
once :)
Anyway, I'll run it later to verify.
Thanks.
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2016-09-10 15:23 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-09 22:07 584c5c422f6c ("x86/ioapic: Support hot-removal of IOAPICs present during boot") Borislav Petkov
2016-09-10 0:14 ` Rui Wang
2016-09-10 11:27 ` Borislav Petkov
2016-09-10 13:11 ` Rui Wang
2016-09-10 14:30 ` Borislav Petkov
2016-09-10 15:03 ` Rui Wang
2016-09-10 15:23 ` Borislav Petkov
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.