All of lore.kernel.org
 help / color / mirror / Atom feed
* NULL Pointer in 3.x during PCI bus enumeration
@ 2015-02-23 19:38 Robert White
  2015-02-28  0:34 ` Fwd: " Robert White
  0 siblings, 1 reply; 6+ messages in thread
From: Robert White @ 2015-02-23 19:38 UTC (permalink / raw)
  To: Linux Kernel

The below BUG event happens during PCI bus enumeration on some of my
gear. In particular the Advanced Telecommunications Architecture (ATCA)
has carrier cards that contain Field Replaceable Units (FRUs). FRUs
are all attached by PCI-to-PCI bridges and some may be empty.

So architecturally the main card is just an array of eight bridges
and the CPU/computer is just in one slot.

carrier |--- adapter 1
PCI     |--- (empty)
bus     |--- CPU (fru)
        |--- adapter 4
       ... etc.

The CPU module sees this as a PCI bus with all the normal things
on the local PCI bus within its FRU and then a bridge to a
tree of bridges, and some of those bridges go nowhere.

CPU -|--- memory controller
     |--- whatever
     |--- PCI bridge(#) -|--- PCI bridge -|--- adapter 1 item 1
                         |                |--- adapter 1 item 2
                         |
                         |--- PCI bridge -|--- adapter 4 item 1
                                          |--- adapter 4 item 2

(#)Actually I think there is another layer of bridges in there
but I am running out of ASCII art space.

The longest link is something like
CPU to local bus
local bus to plug bus
plug bus to backplane
backplane to other plug bus
other plug bus to target local bus
target local bus to device.

Anyway, I am taking a system that is working under 2.x where this
bridge to bridge (to bridge?) thing worked and it's bugging out
on 3.x (at least 3.18 and 3.19, I have no knowledge of 3.x for
x less than 18).

I got as far as seeing that its a composite pointer deref thats
going bad in pci_aspm_init_link_state according to gdb

parent = pdev->bus->parent->self->link_state;

but the sequencing dependency (e.g. when "self", "parent"
and "bus" is really set for each item) is making my brain hurt.



[    1.590865] BUG: unable to handle kernel NULL pointer dereference at 0000000000000088
[    1.606588] IP: [<ffffffff81550324>] pcie_aspm_init_link_state+0x744/0x850
[    1.620375] PGD 0 
[    1.624436] Oops: 0000 [#1] PREEMPT SMP 
[    1.632387] Modules linked in:
[    1.638536] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.19.0-gentoo #9
[    1.651590] Hardware name: Kontron B3001/B3001, BIOS 4.6.3 08/07/2012
[    1.664472] task: ffff880116b20000 ti: ffff880116b28000 task.ti: ffff880116b28000
[    1.679436] RIP: 0010:[<ffffffff81550324>]  [<ffffffff81550324>] pcie_aspm_init_link_state+0x744/0x850
[    1.698084] RSP: 0000:ffff880116b2b958  EFLAGS: 00010246
[    1.708707] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff8801165aae78
[    1.722978] RDX: ffff8801165aae58 RSI: 0000000000000000 RDI: ffff8801165aaf00
[    1.737250] RBP: ffff880116b2b9c8 R08: 0000000000015b80 R09: ffff8801165aae40
[    1.751520] R10: ffff8801165aae40 R11: 000000000000000f R12: ffff8801165aae40
[    1.765791] R13: ffff8801165e8000 R14: 0000000000000000 R15: ffff88011643fc00
[    1.780063] FS:  0000000000000000(0000) GS:ffff88011bc00000(0000) knlGS:0000000000000000
[    1.796243] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[    1.807738] CR2: 0000000000000088 CR3: 0000000002412000 CR4: 00000000000007f0
[    1.822007] Stack:
[    1.826036]  ffff880116b2b988 ffffffff8153b682 ffff8801165e9000 ffff8801165e9000
[    1.840966]  ffff880117038400 0000000000000000 ffff880116b2b9c8 ffffffff8153b761
[    1.855896]  ffff880116b2b9b8 ffff880117038400 0000000000000001 0000000000000000
[    1.870828] Call Trace:
[    1.875727]  [<ffffffff8153b682>] ? pci_device_add+0x122/0x170
[    1.887392]  [<ffffffff8153b761>] ? pci_scan_single_device+0x91/0xc0
[    1.900099]  [<ffffffff8153b865>] pci_scan_slot+0xd5/0x120
[    1.911071]  [<ffffffff8153ca1d>] pci_scan_child_bus+0x2d/0xd0
[    1.922738]  [<ffffffff8153c733>] pci_scan_bridge+0x383/0x640
[    1.934233]  [<ffffffff8153ca75>] pci_scan_child_bus+0x85/0xd0
[    1.945900]  [<ffffffff8153c733>] pci_scan_bridge+0x383/0x640
[    1.957391]  [<ffffffff8153b724>] ? pci_scan_single_device+0x54/0xc0
[    1.970101]  [<ffffffff8153ca75>] pci_scan_child_bus+0x85/0xd0
[    1.981770]  [<ffffffff81b26357>] pci_acpi_scan_root+0x317/0x520
[    1.993784]  [<ffffffff8158c8a3>] acpi_pci_root_add+0x3c9/0x4db
[    2.005623]  [<ffffffff8158e44e>] ? acpi_pnp_match+0x2c/0xa4
[    2.016943]  [<ffffffff825bb3c6>] ? acpi_sleep_proc_init+0x2a/0x2a
[    2.029303]  [<ffffffff81588f15>] acpi_bus_attach+0xcf/0x1bf
[    2.040621]  [<ffffffff825bb3c6>] ? acpi_sleep_proc_init+0x2a/0x2a
[    2.052985]  [<ffffffff817d1f85>] ? device_attach+0x45/0xb0
[    2.064128]  [<ffffffff81588f8f>] acpi_bus_attach+0x149/0x1bf
[    2.075622]  [<ffffffff825bb3c6>] ? acpi_sleep_proc_init+0x2a/0x2a
[    2.087984]  [<ffffffff817d1f85>] ? device_attach+0x45/0xb0
[    2.099130]  [<ffffffff81588f8f>] acpi_bus_attach+0x149/0x1bf
[    2.110623]  [<ffffffff825bb3c6>] ? acpi_sleep_proc_init+0x2a/0x2a
[    2.122983]  [<ffffffff815890f4>] acpi_bus_scan+0x5c/0x67
[    2.133782]  [<ffffffff825bb7e6>] acpi_scan_init+0x6b/0x1a1
[    2.144929]  [<ffffffff825bb617>] acpi_init+0x251/0x26e
[    2.155379]  [<ffffffff825bb3c6>] ? acpi_sleep_proc_init+0x2a/0x2a
[    2.167741]  [<ffffffff810002d8>] do_one_initcall+0x98/0x1e0
[    2.179063]  [<ffffffff810e6900>] ? parse_args+0x150/0x430
[    2.190036]  [<ffffffff8257907c>] kernel_init_freeable+0x17e/0x20b
[    2.202394]  [<ffffffff81d884f0>] ? rest_init+0x90/0x90
[    2.212846]  [<ffffffff81d884f9>] kernel_init+0x9/0xf0
[    2.223125]  [<ffffffff81d9b4ac>] ret_from_fork+0x7c/0xb0
[    2.233922]  [<ffffffff81d884f0>] ? rest_init+0x90/0x90
[    2.244372] Code: 0f 85 e2 fa ff ff 41 80 4c 24 4a 03 b8 01 00 00 00 41 0f b6 54 24 49 e9 4b fb ff ff 0f 1f 00 49 8b 45 10 48 8b 40 10 48 8b 40 38 <48> 8b 80 88 00 00 00 48 85 c0 0f 
[    2.284338] RIP  [<ffffffff81550324>] pcie_aspm_init_link_state+0x744/0x850
[    2.298296]  RSP <ffff880116b2b958>
[    2.305276] CR2: 0000000000000088
[    2.311913] ---[ end trace 153b3907ad1e19ba ]---


(gdb) list *0xffffffff815502ba
0xffffffff815502ba is in pcie_aspm_init_link_state
(drivers/pci/pcie/aspm.c:530).
525             INIT_LIST_HEAD(&link->children);
526             INIT_LIST_HEAD(&link->link);
527             link->pdev = pdev;
528             if (pci_pcie_type(pdev) == PCI_EXP_TYPE_DOWNSTREAM) {
529                     struct pcie_link_state *parent;
530                     parent = pdev->bus->parent->self->link_state;
531                     if (!parent) {
532                             kfree(link);
533                             return NULL;
534                     }


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Fwd: NULL Pointer in 3.x during PCI bus enumeration
  2015-02-23 19:38 NULL Pointer in 3.x during PCI bus enumeration Robert White
@ 2015-02-28  0:34 ` Robert White
  2015-02-28 22:33   ` Bjorn Helgaas
  0 siblings, 1 reply; 6+ messages in thread
From: Robert White @ 2015-02-28  0:34 UTC (permalink / raw)
  To: linux-pci

Eh, wrong mailing list the first time...?


-------- Forwarded Message --------
Subject: NULL Pointer in 3.x during PCI bus enumeration
Date: Mon, 23 Feb 2015 11:38:26 -0800
From: Robert White <rwhite@pobox.com>
To: Linux Kernel <linux-kernel@vger.kernel.org>

The below BUG event happens during PCI bus enumeration on some of my
gear. In particular the Advanced Telecommunications Architecture (ATCA)
has carrier cards that contain Field Replaceable Units (FRUs). FRUs
are all attached by PCI-to-PCI bridges and some may be empty.

So architecturally the main card is just an array of eight bridges
and the CPU/computer is just in one slot.

carrier |--- adapter 1
PCI     |--- (empty)
bus     |--- CPU (fru)
         |--- adapter 4
        ... etc.

The CPU module sees this as a PCI bus with all the normal things
on the local PCI bus within its FRU and then a bridge to a
tree of bridges, and some of those bridges go nowhere.

CPU -|--- memory controller
      |--- whatever
      |--- PCI bridge(#) -|--- PCI bridge -|--- adapter 1 item 1
                          |                |--- adapter 1 item 2
                          |
                          |--- PCI bridge -|--- adapter 4 item 1
                                           |--- adapter 4 item 2

(#)Actually I think there is another layer of bridges in there
but I am running out of ASCII art space.

The longest link is something like
CPU to local bus
local bus to plug bus
plug bus to backplane
backplane to other plug bus
other plug bus to target local bus
target local bus to device.

Anyway, I am taking a system that is working under 2.x where this
bridge to bridge (to bridge?) thing worked and it's bugging out
on 3.x (at least 3.18 and 3.19, I have no knowledge of 3.x for
x less than 18).

I got as far as seeing that its a composite pointer deref thats
going bad in pci_aspm_init_link_state according to gdb

parent = pdev->bus->parent->self->link_state;

but the sequencing dependency (e.g. when "self", "parent"
and "bus" is really set for each item) is making my brain hurt.



[    1.590865] BUG: unable to handle kernel NULL pointer dereference at 
0000000000000088
[    1.606588] IP: [<ffffffff81550324>] 
pcie_aspm_init_link_state+0x744/0x850
[    1.620375] PGD 0
[    1.624436] Oops: 0000 [#1] PREEMPT SMP
[    1.632387] Modules linked in:
[    1.638536] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.19.0-gentoo #9
[    1.651590] Hardware name: Kontron B3001/B3001, BIOS 4.6.3 08/07/2012
[    1.664472] task: ffff880116b20000 ti: ffff880116b28000 task.ti: 
ffff880116b28000
[    1.679436] RIP: 0010:[<ffffffff81550324>]  [<ffffffff81550324>] 
pcie_aspm_init_link_state+0x744/0x850
[    1.698084] RSP: 0000:ffff880116b2b958  EFLAGS: 00010246
[    1.708707] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 
ffff8801165aae78
[    1.722978] RDX: ffff8801165aae58 RSI: 0000000000000000 RDI: 
ffff8801165aaf00
[    1.737250] RBP: ffff880116b2b9c8 R08: 0000000000015b80 R09: 
ffff8801165aae40
[    1.751520] R10: ffff8801165aae40 R11: 000000000000000f R12: 
ffff8801165aae40
[    1.765791] R13: ffff8801165e8000 R14: 0000000000000000 R15: 
ffff88011643fc00
[    1.780063] FS:  0000000000000000(0000) GS:ffff88011bc00000(0000) 
knlGS:0000000000000000
[    1.796243] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[    1.807738] CR2: 0000000000000088 CR3: 0000000002412000 CR4: 
00000000000007f0
[    1.822007] Stack:
[    1.826036]  ffff880116b2b988 ffffffff8153b682 ffff8801165e9000 
ffff8801165e9000
[    1.840966]  ffff880117038400 0000000000000000 ffff880116b2b9c8 
ffffffff8153b761
[    1.855896]  ffff880116b2b9b8 ffff880117038400 0000000000000001 
0000000000000000
[    1.870828] Call Trace:
[    1.875727]  [<ffffffff8153b682>] ? pci_device_add+0x122/0x170
[    1.887392]  [<ffffffff8153b761>] ? pci_scan_single_device+0x91/0xc0
[    1.900099]  [<ffffffff8153b865>] pci_scan_slot+0xd5/0x120
[    1.911071]  [<ffffffff8153ca1d>] pci_scan_child_bus+0x2d/0xd0
[    1.922738]  [<ffffffff8153c733>] pci_scan_bridge+0x383/0x640
[    1.934233]  [<ffffffff8153ca75>] pci_scan_child_bus+0x85/0xd0
[    1.945900]  [<ffffffff8153c733>] pci_scan_bridge+0x383/0x640
[    1.957391]  [<ffffffff8153b724>] ? pci_scan_single_device+0x54/0xc0
[    1.970101]  [<ffffffff8153ca75>] pci_scan_child_bus+0x85/0xd0
[    1.981770]  [<ffffffff81b26357>] pci_acpi_scan_root+0x317/0x520
[    1.993784]  [<ffffffff8158c8a3>] acpi_pci_root_add+0x3c9/0x4db
[    2.005623]  [<ffffffff8158e44e>] ? acpi_pnp_match+0x2c/0xa4
[    2.016943]  [<ffffffff825bb3c6>] ? acpi_sleep_proc_init+0x2a/0x2a
[    2.029303]  [<ffffffff81588f15>] acpi_bus_attach+0xcf/0x1bf
[    2.040621]  [<ffffffff825bb3c6>] ? acpi_sleep_proc_init+0x2a/0x2a
[    2.052985]  [<ffffffff817d1f85>] ? device_attach+0x45/0xb0
[    2.064128]  [<ffffffff81588f8f>] acpi_bus_attach+0x149/0x1bf
[    2.075622]  [<ffffffff825bb3c6>] ? acpi_sleep_proc_init+0x2a/0x2a
[    2.087984]  [<ffffffff817d1f85>] ? device_attach+0x45/0xb0
[    2.099130]  [<ffffffff81588f8f>] acpi_bus_attach+0x149/0x1bf
[    2.110623]  [<ffffffff825bb3c6>] ? acpi_sleep_proc_init+0x2a/0x2a
[    2.122983]  [<ffffffff815890f4>] acpi_bus_scan+0x5c/0x67
[    2.133782]  [<ffffffff825bb7e6>] acpi_scan_init+0x6b/0x1a1
[    2.144929]  [<ffffffff825bb617>] acpi_init+0x251/0x26e
[    2.155379]  [<ffffffff825bb3c6>] ? acpi_sleep_proc_init+0x2a/0x2a
[    2.167741]  [<ffffffff810002d8>] do_one_initcall+0x98/0x1e0
[    2.179063]  [<ffffffff810e6900>] ? parse_args+0x150/0x430
[    2.190036]  [<ffffffff8257907c>] kernel_init_freeable+0x17e/0x20b
[    2.202394]  [<ffffffff81d884f0>] ? rest_init+0x90/0x90
[    2.212846]  [<ffffffff81d884f9>] kernel_init+0x9/0xf0
[    2.223125]  [<ffffffff81d9b4ac>] ret_from_fork+0x7c/0xb0
[    2.233922]  [<ffffffff81d884f0>] ? rest_init+0x90/0x90
[    2.244372] Code: 0f 85 e2 fa ff ff 41 80 4c 24 4a 03 b8 01 00 00 00 
41 0f b6 54 24 49 e9 4b fb ff ff 0f 1f 00 49 8b 45 10 48 8b 40 10 48 8b 
40 38 <48> 8b 80 88 00 00 00 48 85 c0 0f
[    2.284338] RIP  [<ffffffff81550324>] 
pcie_aspm_init_link_state+0x744/0x850
[    2.298296]  RSP <ffff880116b2b958>
[    2.305276] CR2: 0000000000000088
[    2.311913] ---[ end trace 153b3907ad1e19ba ]---


(gdb) list *0xffffffff815502ba
0xffffffff815502ba is in pcie_aspm_init_link_state
(drivers/pci/pcie/aspm.c:530).
525             INIT_LIST_HEAD(&link->children);
526             INIT_LIST_HEAD(&link->link);
527             link->pdev = pdev;
528             if (pci_pcie_type(pdev) == PCI_EXP_TYPE_DOWNSTREAM) {
529                     struct pcie_link_state *parent;
530                     parent = pdev->bus->parent->self->link_state;
531                     if (!parent) {
532                             kfree(link);
533                             return NULL;
534                     }

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: NULL Pointer in 3.x during PCI bus enumeration
  2015-02-28  0:34 ` Fwd: " Robert White
@ 2015-02-28 22:33   ` Bjorn Helgaas
  2015-03-05 16:28     ` Bjorn Helgaas
  2015-03-06  2:27     ` Robert White
  0 siblings, 2 replies; 6+ messages in thread
From: Bjorn Helgaas @ 2015-02-28 22:33 UTC (permalink / raw)
  To: Robert White; +Cc: linux-pci

On Fri, Feb 27, 2015 at 4:34 PM, Robert White <rwhite@pobox.com> wrote:
> Eh, wrong mailing list the first time...?

Yep, I browse linux-kernel sometimes, but not enough to catch
everything.  Anyway, thanks a lot for the problem report.

Would you mind opening a bug report at http://bugzilla.kernel.org,
drivers/pci component, and attaching

  - a completely dmesg log from your most recent kernel (it probably
doesn't boot, which it makes it hard to get an actual dmesg log; a
complete console log with "ignore_loglevel" is fine, too).
  - complete "lspci -vv" output from a working system (v2.x is fine).

Thanks,
  Bjorn

> -------- Forwarded Message --------
> Subject: NULL Pointer in 3.x during PCI bus enumeration
> Date: Mon, 23 Feb 2015 11:38:26 -0800
> From: Robert White <rwhite@pobox.com>
> To: Linux Kernel <linux-kernel@vger.kernel.org>
>
> The below BUG event happens during PCI bus enumeration on some of my
> gear. In particular the Advanced Telecommunications Architecture (ATCA)
> has carrier cards that contain Field Replaceable Units (FRUs). FRUs
> are all attached by PCI-to-PCI bridges and some may be empty.
>
> So architecturally the main card is just an array of eight bridges
> and the CPU/computer is just in one slot.
>
> carrier |--- adapter 1
> PCI     |--- (empty)
> bus     |--- CPU (fru)
>         |--- adapter 4
>        ... etc.
>
> The CPU module sees this as a PCI bus with all the normal things
> on the local PCI bus within its FRU and then a bridge to a
> tree of bridges, and some of those bridges go nowhere.
>
> CPU -|--- memory controller
>      |--- whatever
>      |--- PCI bridge(#) -|--- PCI bridge -|--- adapter 1 item 1
>                          |                |--- adapter 1 item 2
>                          |
>                          |--- PCI bridge -|--- adapter 4 item 1
>                                           |--- adapter 4 item 2
>
> (#)Actually I think there is another layer of bridges in there
> but I am running out of ASCII art space.
>
> The longest link is something like
> CPU to local bus
> local bus to plug bus
> plug bus to backplane
> backplane to other plug bus
> other plug bus to target local bus
> target local bus to device.
>
> Anyway, I am taking a system that is working under 2.x where this
> bridge to bridge (to bridge?) thing worked and it's bugging out
> on 3.x (at least 3.18 and 3.19, I have no knowledge of 3.x for
> x less than 18).
>
> I got as far as seeing that its a composite pointer deref thats
> going bad in pci_aspm_init_link_state according to gdb
>
> parent = pdev->bus->parent->self->link_state;
>
> but the sequencing dependency (e.g. when "self", "parent"
> and "bus" is really set for each item) is making my brain hurt.
>
>
>
> [    1.590865] BUG: unable to handle kernel NULL pointer dereference at
> 0000000000000088
> [    1.606588] IP: [<ffffffff81550324>]
> pcie_aspm_init_link_state+0x744/0x850
> [    1.620375] PGD 0
> [    1.624436] Oops: 0000 [#1] PREEMPT SMP
> [    1.632387] Modules linked in:
> [    1.638536] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.19.0-gentoo #9
> [    1.651590] Hardware name: Kontron B3001/B3001, BIOS 4.6.3 08/07/2012
> [    1.664472] task: ffff880116b20000 ti: ffff880116b28000 task.ti:
> ffff880116b28000
> [    1.679436] RIP: 0010:[<ffffffff81550324>]  [<ffffffff81550324>]
> pcie_aspm_init_link_state+0x744/0x850
> [    1.698084] RSP: 0000:ffff880116b2b958  EFLAGS: 00010246
> [    1.708707] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
> ffff8801165aae78
> [    1.722978] RDX: ffff8801165aae58 RSI: 0000000000000000 RDI:
> ffff8801165aaf00
> [    1.737250] RBP: ffff880116b2b9c8 R08: 0000000000015b80 R09:
> ffff8801165aae40
> [    1.751520] R10: ffff8801165aae40 R11: 000000000000000f R12:
> ffff8801165aae40
> [    1.765791] R13: ffff8801165e8000 R14: 0000000000000000 R15:
> ffff88011643fc00
> [    1.780063] FS:  0000000000000000(0000) GS:ffff88011bc00000(0000)
> knlGS:0000000000000000
> [    1.796243] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [    1.807738] CR2: 0000000000000088 CR3: 0000000002412000 CR4:
> 00000000000007f0
> [    1.822007] Stack:
> [    1.826036]  ffff880116b2b988 ffffffff8153b682 ffff8801165e9000
> ffff8801165e9000
> [    1.840966]  ffff880117038400 0000000000000000 ffff880116b2b9c8
> ffffffff8153b761
> [    1.855896]  ffff880116b2b9b8 ffff880117038400 0000000000000001
> 0000000000000000
> [    1.870828] Call Trace:
> [    1.875727]  [<ffffffff8153b682>] ? pci_device_add+0x122/0x170
> [    1.887392]  [<ffffffff8153b761>] ? pci_scan_single_device+0x91/0xc0
> [    1.900099]  [<ffffffff8153b865>] pci_scan_slot+0xd5/0x120
> [    1.911071]  [<ffffffff8153ca1d>] pci_scan_child_bus+0x2d/0xd0
> [    1.922738]  [<ffffffff8153c733>] pci_scan_bridge+0x383/0x640
> [    1.934233]  [<ffffffff8153ca75>] pci_scan_child_bus+0x85/0xd0
> [    1.945900]  [<ffffffff8153c733>] pci_scan_bridge+0x383/0x640
> [    1.957391]  [<ffffffff8153b724>] ? pci_scan_single_device+0x54/0xc0
> [    1.970101]  [<ffffffff8153ca75>] pci_scan_child_bus+0x85/0xd0
> [    1.981770]  [<ffffffff81b26357>] pci_acpi_scan_root+0x317/0x520
> [    1.993784]  [<ffffffff8158c8a3>] acpi_pci_root_add+0x3c9/0x4db
> [    2.005623]  [<ffffffff8158e44e>] ? acpi_pnp_match+0x2c/0xa4
> [    2.016943]  [<ffffffff825bb3c6>] ? acpi_sleep_proc_init+0x2a/0x2a
> [    2.029303]  [<ffffffff81588f15>] acpi_bus_attach+0xcf/0x1bf
> [    2.040621]  [<ffffffff825bb3c6>] ? acpi_sleep_proc_init+0x2a/0x2a
> [    2.052985]  [<ffffffff817d1f85>] ? device_attach+0x45/0xb0
> [    2.064128]  [<ffffffff81588f8f>] acpi_bus_attach+0x149/0x1bf
> [    2.075622]  [<ffffffff825bb3c6>] ? acpi_sleep_proc_init+0x2a/0x2a
> [    2.087984]  [<ffffffff817d1f85>] ? device_attach+0x45/0xb0
> [    2.099130]  [<ffffffff81588f8f>] acpi_bus_attach+0x149/0x1bf
> [    2.110623]  [<ffffffff825bb3c6>] ? acpi_sleep_proc_init+0x2a/0x2a
> [    2.122983]  [<ffffffff815890f4>] acpi_bus_scan+0x5c/0x67
> [    2.133782]  [<ffffffff825bb7e6>] acpi_scan_init+0x6b/0x1a1
> [    2.144929]  [<ffffffff825bb617>] acpi_init+0x251/0x26e
> [    2.155379]  [<ffffffff825bb3c6>] ? acpi_sleep_proc_init+0x2a/0x2a
> [    2.167741]  [<ffffffff810002d8>] do_one_initcall+0x98/0x1e0
> [    2.179063]  [<ffffffff810e6900>] ? parse_args+0x150/0x430
> [    2.190036]  [<ffffffff8257907c>] kernel_init_freeable+0x17e/0x20b
> [    2.202394]  [<ffffffff81d884f0>] ? rest_init+0x90/0x90
> [    2.212846]  [<ffffffff81d884f9>] kernel_init+0x9/0xf0
> [    2.223125]  [<ffffffff81d9b4ac>] ret_from_fork+0x7c/0xb0
> [    2.233922]  [<ffffffff81d884f0>] ? rest_init+0x90/0x90
> [    2.244372] Code: 0f 85 e2 fa ff ff 41 80 4c 24 4a 03 b8 01 00 00 00 41
> 0f b6 54 24 49 e9 4b fb ff ff 0f 1f 00 49 8b 45 10 48 8b 40 10 48 8b 40 38
> <48> 8b 80 88 00 00 00 48 85 c0 0f
> [    2.284338] RIP  [<ffffffff81550324>]
> pcie_aspm_init_link_state+0x744/0x850
> [    2.298296]  RSP <ffff880116b2b958>
> [    2.305276] CR2: 0000000000000088
> [    2.311913] ---[ end trace 153b3907ad1e19ba ]---
>
>
> (gdb) list *0xffffffff815502ba
> 0xffffffff815502ba is in pcie_aspm_init_link_state
> (drivers/pci/pcie/aspm.c:530).
> 525             INIT_LIST_HEAD(&link->children);
> 526             INIT_LIST_HEAD(&link->link);
> 527             link->pdev = pdev;
> 528             if (pci_pcie_type(pdev) == PCI_EXP_TYPE_DOWNSTREAM) {
> 529                     struct pcie_link_state *parent;
> 530                     parent = pdev->bus->parent->self->link_state;
> 531                     if (!parent) {
> 532                             kfree(link);
> 533                             return NULL;
> 534                     }
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: NULL Pointer in 3.x during PCI bus enumeration
  2015-02-28 22:33   ` Bjorn Helgaas
@ 2015-03-05 16:28     ` Bjorn Helgaas
  2015-03-06  2:27     ` Robert White
  1 sibling, 0 replies; 6+ messages in thread
From: Bjorn Helgaas @ 2015-03-05 16:28 UTC (permalink / raw)
  To: Robert White; +Cc: linux-pci

On Sat, Feb 28, 2015 at 4:33 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> On Fri, Feb 27, 2015 at 4:34 PM, Robert White <rwhite@pobox.com> wrote:
>> Eh, wrong mailing list the first time...?
>
> Yep, I browse linux-kernel sometimes, but not enough to catch
> everything.  Anyway, thanks a lot for the problem report.
>
> Would you mind opening a bug report at http://bugzilla.kernel.org,
> drivers/pci component, and attaching
>
>   - a completely dmesg log from your most recent kernel (it probably
> doesn't boot, which it makes it hard to get an actual dmesg log; a
> complete console log with "ignore_loglevel" is fine, too).
>   - complete "lspci -vv" output from a working system (v2.x is fine).

Ping, I'd like to debug this, but I'd like to start with a little more
information.

Bjorn

>> -------- Forwarded Message --------
>> Subject: NULL Pointer in 3.x during PCI bus enumeration
>> Date: Mon, 23 Feb 2015 11:38:26 -0800
>> From: Robert White <rwhite@pobox.com>
>> To: Linux Kernel <linux-kernel@vger.kernel.org>
>>
>> The below BUG event happens during PCI bus enumeration on some of my
>> gear. In particular the Advanced Telecommunications Architecture (ATCA)
>> has carrier cards that contain Field Replaceable Units (FRUs). FRUs
>> are all attached by PCI-to-PCI bridges and some may be empty.
>>
>> So architecturally the main card is just an array of eight bridges
>> and the CPU/computer is just in one slot.
>>
>> carrier |--- adapter 1
>> PCI     |--- (empty)
>> bus     |--- CPU (fru)
>>         |--- adapter 4
>>        ... etc.
>>
>> The CPU module sees this as a PCI bus with all the normal things
>> on the local PCI bus within its FRU and then a bridge to a
>> tree of bridges, and some of those bridges go nowhere.
>>
>> CPU -|--- memory controller
>>      |--- whatever
>>      |--- PCI bridge(#) -|--- PCI bridge -|--- adapter 1 item 1
>>                          |                |--- adapter 1 item 2
>>                          |
>>                          |--- PCI bridge -|--- adapter 4 item 1
>>                                           |--- adapter 4 item 2
>>
>> (#)Actually I think there is another layer of bridges in there
>> but I am running out of ASCII art space.
>>
>> The longest link is something like
>> CPU to local bus
>> local bus to plug bus
>> plug bus to backplane
>> backplane to other plug bus
>> other plug bus to target local bus
>> target local bus to device.
>>
>> Anyway, I am taking a system that is working under 2.x where this
>> bridge to bridge (to bridge?) thing worked and it's bugging out
>> on 3.x (at least 3.18 and 3.19, I have no knowledge of 3.x for
>> x less than 18).
>>
>> I got as far as seeing that its a composite pointer deref thats
>> going bad in pci_aspm_init_link_state according to gdb
>>
>> parent = pdev->bus->parent->self->link_state;
>>
>> but the sequencing dependency (e.g. when "self", "parent"
>> and "bus" is really set for each item) is making my brain hurt.
>>
>>
>>
>> [    1.590865] BUG: unable to handle kernel NULL pointer dereference at
>> 0000000000000088
>> [    1.606588] IP: [<ffffffff81550324>]
>> pcie_aspm_init_link_state+0x744/0x850
>> [    1.620375] PGD 0
>> [    1.624436] Oops: 0000 [#1] PREEMPT SMP
>> [    1.632387] Modules linked in:
>> [    1.638536] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.19.0-gentoo #9
>> [    1.651590] Hardware name: Kontron B3001/B3001, BIOS 4.6.3 08/07/2012
>> [    1.664472] task: ffff880116b20000 ti: ffff880116b28000 task.ti:
>> ffff880116b28000
>> [    1.679436] RIP: 0010:[<ffffffff81550324>]  [<ffffffff81550324>]
>> pcie_aspm_init_link_state+0x744/0x850
>> [    1.698084] RSP: 0000:ffff880116b2b958  EFLAGS: 00010246
>> [    1.708707] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
>> ffff8801165aae78
>> [    1.722978] RDX: ffff8801165aae58 RSI: 0000000000000000 RDI:
>> ffff8801165aaf00
>> [    1.737250] RBP: ffff880116b2b9c8 R08: 0000000000015b80 R09:
>> ffff8801165aae40
>> [    1.751520] R10: ffff8801165aae40 R11: 000000000000000f R12:
>> ffff8801165aae40
>> [    1.765791] R13: ffff8801165e8000 R14: 0000000000000000 R15:
>> ffff88011643fc00
>> [    1.780063] FS:  0000000000000000(0000) GS:ffff88011bc00000(0000)
>> knlGS:0000000000000000
>> [    1.796243] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>> [    1.807738] CR2: 0000000000000088 CR3: 0000000002412000 CR4:
>> 00000000000007f0
>> [    1.822007] Stack:
>> [    1.826036]  ffff880116b2b988 ffffffff8153b682 ffff8801165e9000
>> ffff8801165e9000
>> [    1.840966]  ffff880117038400 0000000000000000 ffff880116b2b9c8
>> ffffffff8153b761
>> [    1.855896]  ffff880116b2b9b8 ffff880117038400 0000000000000001
>> 0000000000000000
>> [    1.870828] Call Trace:
>> [    1.875727]  [<ffffffff8153b682>] ? pci_device_add+0x122/0x170
>> [    1.887392]  [<ffffffff8153b761>] ? pci_scan_single_device+0x91/0xc0
>> [    1.900099]  [<ffffffff8153b865>] pci_scan_slot+0xd5/0x120
>> [    1.911071]  [<ffffffff8153ca1d>] pci_scan_child_bus+0x2d/0xd0
>> [    1.922738]  [<ffffffff8153c733>] pci_scan_bridge+0x383/0x640
>> [    1.934233]  [<ffffffff8153ca75>] pci_scan_child_bus+0x85/0xd0
>> [    1.945900]  [<ffffffff8153c733>] pci_scan_bridge+0x383/0x640
>> [    1.957391]  [<ffffffff8153b724>] ? pci_scan_single_device+0x54/0xc0
>> [    1.970101]  [<ffffffff8153ca75>] pci_scan_child_bus+0x85/0xd0
>> [    1.981770]  [<ffffffff81b26357>] pci_acpi_scan_root+0x317/0x520
>> [    1.993784]  [<ffffffff8158c8a3>] acpi_pci_root_add+0x3c9/0x4db
>> [    2.005623]  [<ffffffff8158e44e>] ? acpi_pnp_match+0x2c/0xa4
>> [    2.016943]  [<ffffffff825bb3c6>] ? acpi_sleep_proc_init+0x2a/0x2a
>> [    2.029303]  [<ffffffff81588f15>] acpi_bus_attach+0xcf/0x1bf
>> [    2.040621]  [<ffffffff825bb3c6>] ? acpi_sleep_proc_init+0x2a/0x2a
>> [    2.052985]  [<ffffffff817d1f85>] ? device_attach+0x45/0xb0
>> [    2.064128]  [<ffffffff81588f8f>] acpi_bus_attach+0x149/0x1bf
>> [    2.075622]  [<ffffffff825bb3c6>] ? acpi_sleep_proc_init+0x2a/0x2a
>> [    2.087984]  [<ffffffff817d1f85>] ? device_attach+0x45/0xb0
>> [    2.099130]  [<ffffffff81588f8f>] acpi_bus_attach+0x149/0x1bf
>> [    2.110623]  [<ffffffff825bb3c6>] ? acpi_sleep_proc_init+0x2a/0x2a
>> [    2.122983]  [<ffffffff815890f4>] acpi_bus_scan+0x5c/0x67
>> [    2.133782]  [<ffffffff825bb7e6>] acpi_scan_init+0x6b/0x1a1
>> [    2.144929]  [<ffffffff825bb617>] acpi_init+0x251/0x26e
>> [    2.155379]  [<ffffffff825bb3c6>] ? acpi_sleep_proc_init+0x2a/0x2a
>> [    2.167741]  [<ffffffff810002d8>] do_one_initcall+0x98/0x1e0
>> [    2.179063]  [<ffffffff810e6900>] ? parse_args+0x150/0x430
>> [    2.190036]  [<ffffffff8257907c>] kernel_init_freeable+0x17e/0x20b
>> [    2.202394]  [<ffffffff81d884f0>] ? rest_init+0x90/0x90
>> [    2.212846]  [<ffffffff81d884f9>] kernel_init+0x9/0xf0
>> [    2.223125]  [<ffffffff81d9b4ac>] ret_from_fork+0x7c/0xb0
>> [    2.233922]  [<ffffffff81d884f0>] ? rest_init+0x90/0x90
>> [    2.244372] Code: 0f 85 e2 fa ff ff 41 80 4c 24 4a 03 b8 01 00 00 00 41
>> 0f b6 54 24 49 e9 4b fb ff ff 0f 1f 00 49 8b 45 10 48 8b 40 10 48 8b 40 38
>> <48> 8b 80 88 00 00 00 48 85 c0 0f
>> [    2.284338] RIP  [<ffffffff81550324>]
>> pcie_aspm_init_link_state+0x744/0x850
>> [    2.298296]  RSP <ffff880116b2b958>
>> [    2.305276] CR2: 0000000000000088
>> [    2.311913] ---[ end trace 153b3907ad1e19ba ]---
>>
>>
>> (gdb) list *0xffffffff815502ba
>> 0xffffffff815502ba is in pcie_aspm_init_link_state
>> (drivers/pci/pcie/aspm.c:530).
>> 525             INIT_LIST_HEAD(&link->children);
>> 526             INIT_LIST_HEAD(&link->link);
>> 527             link->pdev = pdev;
>> 528             if (pci_pcie_type(pdev) == PCI_EXP_TYPE_DOWNSTREAM) {
>> 529                     struct pcie_link_state *parent;
>> 530                     parent = pdev->bus->parent->self->link_state;
>> 531                     if (!parent) {
>> 532                             kfree(link);
>> 533                             return NULL;
>> 534                     }
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
>>
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>>
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: NULL Pointer in 3.x during PCI bus enumeration
  2015-02-28 22:33   ` Bjorn Helgaas
  2015-03-05 16:28     ` Bjorn Helgaas
@ 2015-03-06  2:27     ` Robert White
  2015-04-18  0:03       ` Bjorn Helgaas
  1 sibling, 1 reply; 6+ messages in thread
From: Robert White @ 2015-03-06  2:27 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci

It took me a while to get back into the lab and scrape together a 
working subsystem to collect the data you wanted.

Link to bug...

https://bugzilla.kernel.org/show_bug.cgi?id=94361

--Rob.


On 02/28/2015 02:33 PM, Bjorn Helgaas wrote:
> On Fri, Feb 27, 2015 at 4:34 PM, Robert White <rwhite@pobox.com> wrote:
>> Eh, wrong mailing list the first time...?
>
> Yep, I browse linux-kernel sometimes, but not enough to catch
> everything.  Anyway, thanks a lot for the problem report.
>
> Would you mind opening a bug report at http://bugzilla.kernel.org,
> drivers/pci component, and attaching
>
>    - a completely dmesg log from your most recent kernel (it probably
> doesn't boot, which it makes it hard to get an actual dmesg log; a
> complete console log with "ignore_loglevel" is fine, too).
>    - complete "lspci -vv" output from a working system (v2.x is fine).
>
> Thanks,
>    Bjorn
>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: NULL Pointer in 3.x during PCI bus enumeration
  2015-03-06  2:27     ` Robert White
@ 2015-04-18  0:03       ` Bjorn Helgaas
  0 siblings, 0 replies; 6+ messages in thread
From: Bjorn Helgaas @ 2015-04-18  0:03 UTC (permalink / raw)
  To: Robert White; +Cc: linux-pci, Yijing Wang, Matthew Garrett

[+cc Yijing, Matthew]

On Thu, Mar 05, 2015 at 06:27:09PM -0800, Robert White wrote:
> It took me a while to get back into the lab and scrape together a
> working subsystem to collect the data you wanted.
> 
> Link to bug...
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=94361

Thanks!  Yijing added useful analysis to the bugzilla, but I'm going to
continue the email thread because there's interesting stuff here that
shouldn't be buried off in bugzilla.

Yijing noticed that you have an interesting system topology:

  pci 0000:00:1c.0: PCI bridge to [bus 02-0a]   Root Port (Slot+)
  pci 0000:02:00.0: PCI bridge to [bus 03-0a]   Downstream Port (Slot-)
  pci 0000:03:00.0: PCI bridge to [bus 04]      Upstream Port
  pci 0000:03:01.0: PCI bridge to [bus 05]      Downstream Port (Slot+)
  pci 0000:03:02.0: PCI bridge to [bus 06]      Downstream Port (Slot+)
  ...
  pci 0000:03:0a.0: PCI bridge to [bus 0a]      Downstream Port (Slot+)

It would be more typical for the Root Port to connect to an Upstream
Port of a switch, and Downstream Ports of the switch would connect to
endpoint devices.  But your system has the Root Port connected to a
Downstream Port.

This might be a legal topology, but I'm confused about how we should
interpret it.  ASPM configuration involves both ends of a link, and the
code allocates pcie_link_state structures for the device at the upstream
end of the link.  Normally the upstream end is a Root Port or a Downstream
Port.  Then it configures the other end by iterating over the list of
devices on the secondary bus ("pci_dev.subordinate" in the code).

In your system, there's a link from 00:1c.0 to 02:00.0, but if we're
looking at 02:00.0, the link is on the *upstream* side, not the downstream
side where we expect it.  So Linux allocates a pcie_link_state for 02:00.0,
but it thinks the other end of that link is at 03:00.0, but I think that's
wrong.

It might be possible for us to figure out where the other end of the link
is in a different way, without relying on the assumption that a Downstream
Port's link goes to its secondary bus.  But that would definitely require
some changes.

(I know your FADT told us not to touch ASPM anyway; that's another issue
that we also need to sort out.)

You said:

> Yes. It is part of the Advanced Telecommunication Computing
> Architecture (ATCA)
> http://en.wikipedia.org/wiki/Advanced_Telecommunications_Computing_Architecture

> The deal is that the "Field Replaceable Units" (FRUs) fit each fit
> into a carrier card that is little than a power control matrix and a
> PCIe bus. The CPU module itself is just another FRU just like the
> targets. So the computing module doesn't own the top level bus. The
> data trip to the final target devices, if they aren't co-resident on
> the CPU module, is up to the backplane and then back down to the
> target controller.

> So in this setup the CPU is a peer of all the other adapters.

Where do the devices listed above physically live?  Is the switch (devices
02:00.0, 03:00.0, 03:01.0, etc.) physically on the backplane, and the FRUs
(including the CPU) in slots connected to Downstream Ports of the switch?

Who assigns bus numbers to this fabric?  Would anything break if Linux
reassigned them?

Bjorn

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-04-18  0:03 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-23 19:38 NULL Pointer in 3.x during PCI bus enumeration Robert White
2015-02-28  0:34 ` Fwd: " Robert White
2015-02-28 22:33   ` Bjorn Helgaas
2015-03-05 16:28     ` Bjorn Helgaas
2015-03-06  2:27     ` Robert White
2015-04-18  0:03       ` Bjorn Helgaas

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.