linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Problems with commit "cb4bfda62afa nvme-pci: fix hot removal during error handling"
@ 2018-12-18 22:28 Alex_Gagniuc
  2018-12-19 19:23 ` [REGRESSION] Deadlock in nvme drive removal (Was: Problems with commit "cb4bfda62afa nvme-pci: fix hot removal during error handling") Alex_Gagniuc
  0 siblings, 1 reply; 2+ messages in thread
From: Alex_Gagniuc @ 2018-12-18 22:28 UTC (permalink / raw)
  To: linux-nvme, linux-pci; +Cc: keith.busch

Hi,

I've been seeing two issues with this commit. One was a NULL dereference
that appears to have been fixed in 4.20-rc7.

For the second issue, I've included a system log in Appendix B with an 
explanation of what events are being generated -- all done with a 
hardware yanker, not ACPI or fake injection techniques.

After the second or so removal, the kernel will think the device is 
gone, and fails to recognize hotplug interrupts that bring the device 
back online. The hardware continues to send hotplug interrupts, but the 
handler doesn't want to recognize them -- look for "pciehp_isr hit" 
message in the logs.

I'm seeing this issue starting with commit
cb4bfda62afa "nvme-pci: fix hot removal during error handling"
I can't reproduce it with the commit right before it.

I'm not sure if this is a PCI or NVME bug, and it is baffling as to what 
the heck is going on. Even if I revert the commit from v4.20-rc7, I 
continue to see the hotplug interrupt misfire. I suspect it's the 
combination of several changes that breaks things.

Who let the bugs out?

Alex


Appendix B: System log with explanations

[  133.772812] IPv6: ADDRCONF(NETDEV_CHANGE): eno1: link becomes ready

**** Lanes 0-2 are disconnected here

[  599.231003] pcieport 0000:b0:04.0: pciehp_isr hit
[  599.231067] pciehp 0000:b0:04.0:pcie204: Slot(178): Link Down
[  599.272506] pcieport 0000:b0:04.0: pciehp_isr hit
[  599.299943] pciehp 0000:b0:04.0:pcie204: Slot(178): Card present
[  599.299947] pciehp 0000:b0:04.0:pcie204: Slot(178): Link Up
[  599.605523] pci 0000:b1:00.0: [8086:0a55] type 00 class 0x010802
[  599.605565] pci 0000:b1:00.0: reg 0x10: [mem 0x00000000-0x00003fff 64bit]
[  599.605612] pci 0000:b1:00.0: Max Payload Size set to 256 (was 128,
max 512)
[  599.605618] pci 0000:b1:00.0: enabling Extended Tags
[  599.605845] acpi_run_hpx: Type 3 record not supported
[  599.605984] pci 0000:b1:00.0: 7.876 Gb/s available PCIe bandwidth,
limited by 8 GT/s x1 link at 0000:b0:04.0 (capable of 31.504 Gb/s with 8
GT/s x4 link)
[  599.606356] pcieport 0000:b0:04.0: bridge window [io  0x1000-0x0fff]
to [bus b1] add_size 1000
[  599.606364] pcieport 0000:b0:04.0: BAR 13: no space for [io  size 0x1000]
[  599.606368] pcieport 0000:b0:04.0: BAR 13: failed to assign [io  size
0x1000]
[  599.606371] pcieport 0000:b0:04.0: BAR 13: no space for [io  size 0x1000]
[  599.606375] pcieport 0000:b0:04.0: BAR 13: failed to assign [io  size
0x1000]
[  599.606382] pci 0000:b1:00.0: BAR 0: assigned [mem
0xe1500000-0xe1503fff 64bit]
[  599.606426] pcieport 0000:b0:04.0: PCI bridge to [bus b1]
[  599.606434] pcieport 0000:b0:04.0:   bridge window [mem
0xe1500000-0xe15fffff]
[  599.606440] pcieport 0000:b0:04.0:   bridge window [mem
0x386000000000-0x3860001fffff 64bit pref]
[  599.608379] nvme nvme4: pci function 0000:b1:00.0
[  599.608680] nvme 0000:b1:00.0: enabling device (0000 -> 0002)
[  603.386826]  nvme1n2: p1
[  603.475185] md: super_written gets error=10
[  603.475192] md/raid1:mdX: Disk failure on dm-13, disabling device.
                md/raid1:mdX: Operation continuing on 2 devices.
[  603.519511] device-mapper: raid: Faulty raid1 device #1 has readable
super block.  Attempting to revive it.

*** Lanes 0-2 are reconnected

[  603.548495] md: recovery of RAID array mdX

*** All PCIe lanes are disconnected here

[  623.162103] pcieport 0000:b0:04.0: pciehp_isr hit
[  623.162169] pciehp 0000:b0:04.0:pcie204: Slot(178): Link Down
[  623.168534] pcieport 0000:b0:04.0: pciehp_isr hit
[  623.185774] print_req_error: I/O error, dev nvme1n2, sector 7860864
[  623.185812] print_req_error: I/O error, dev nvme1n2, sector 7861248
[  623.185817] print_req_error: I/O error, dev nvme1n2, sector 7857920
[  623.185822] print_req_error: I/O error, dev nvme1n2, sector 7858432
[  623.185826] print_req_error: I/O error, dev nvme1n2, sector 7858560
[  623.185830] print_req_error: I/O error, dev nvme1n2, sector 7862144
[  623.185835] print_req_error: I/O error, dev nvme1n2, sector 7858944
[  623.185870] md/raid1:mdX: Disk failure on dm-13, disabling device.
                md/raid1:mdX: Operation continuing on 2 devices.
[  623.185873] md: mdX: recovery interrupted.
[  623.185881] print_req_error: I/O error, dev nvme1n2, sector 7861376
[  623.197723] print_req_error: I/O error, dev nvme1n2, sector 7859072
[  623.197727] print_req_error: I/O error, dev nvme1n2, sector 7862528

*** All PCIe lanes are re-connected

[  628.842766] pcieport 0000:b0:04.0: pciehp_isr hit
[  628.842829] pciehp 0000:b0:04.0:pcie204: Slot(178): Link Up
[  629.147089] pci 0000:b1:00.0: [8086:0a55] type 00 class 0x010802
[  629.147130] pci 0000:b1:00.0: reg 0x10: [mem 0x00000000-0x00003fff 64bit]
[  629.147177] pci 0000:b1:00.0: Max Payload Size set to 256 (was 128,
max 512)
[  629.147182] pci 0000:b1:00.0: enabling Extended Tags
[  629.147401] acpi_run_hpx: Type 3 record not supported
[  629.147888] pcieport 0000:b0:04.0: bridge window [io  0x1000-0x0fff]
to [bus b1] add_size 1000
[  629.147896] pcieport 0000:b0:04.0: BAR 13: no space for [io  size 0x1000]
[  629.147899] pcieport 0000:b0:04.0: BAR 13: failed to assign [io  size
0x1000]
[  629.147903] pcieport 0000:b0:04.0: BAR 13: no space for [io  size 0x1000]
[  629.147906] pcieport 0000:b0:04.0: BAR 13: failed to assign [io  size
0x1000]
[  629.147913] pci 0000:b1:00.0: BAR 0: assigned [mem
0xe1500000-0xe1503fff 64bit]
[  629.147926] pcieport 0000:b0:04.0: PCI bridge to [bus b1]
[  629.147933] pcieport 0000:b0:04.0:   bridge window [mem
0xe1500000-0xe15fffff]
[  629.147939] pcieport 0000:b0:04.0:   bridge window [mem
0x386000000000-0x3860001fffff 64bit pref]
[  629.148891] nvme nvme2: pci function 0000:b1:00.0
[  629.148932] nvme 0000:b1:00.0: enabling device (0000 -> 0002)
[  629.239386] pcieport 0000:b0:04.0: pciehp_isr hit
[  629.239420] pciehp 0000:b0:04.0:pcie204: Slot(178): Card not present

*** Lanes not disconnected again until here

[  635.951171] pcieport 0000:b0:04.0: pciehp_isr hit
[  635.992718] pcieport 0000:b0:04.0: pciehp_isr hit
[  659.866526] pcieport 0000:b0:04.0: pciehp_isr hit
[  659.877189] pcieport 0000:b0:04.0: pciehp_isr hit
[  665.555242] pcieport 0000:b0:04.0: pciehp_isr hit

^ permalink raw reply	[flat|nested] 2+ messages in thread

* [REGRESSION] Deadlock in nvme drive removal (Was: Problems with commit "cb4bfda62afa nvme-pci: fix hot removal during error handling")
  2018-12-18 22:28 Problems with commit "cb4bfda62afa nvme-pci: fix hot removal during error handling" Alex_Gagniuc
@ 2018-12-19 19:23 ` Alex_Gagniuc
  0 siblings, 0 replies; 2+ messages in thread
From: Alex_Gagniuc @ 2018-12-19 19:23 UTC (permalink / raw)
  To: keith.busch; +Cc: linux-nvme, linux-pci, Austin.Bolen

I've tracked down the problem to pciehp_ist() deadlocking on nvme 
removal. I blame this on nvme bugs.

#$ ls /sys/bus/pci/devices/0000\:b0\:04.0/msi_irqs/
44

# ps aux |grep pcie
root       529  0.0  0.0      0     0 ?        D    11:04   0:00 
[irq/44-pciehp]

# cat /proc/529/stack
[<0>] __flush_work+0x11a/0x1d0
[<0>] __cancel_work_timer+0x10a/0x190
[<0>] nvme_remove+0x38/0x170 [nvme]
[<0>] pci_device_remove+0x3b/0xc0
[<0>] device_release_driver_internal+0x180/0x240
[<0>] pci_stop_bus_device+0x69/0x90
[<0>] pci_stop_and_remove_bus_device+0xe/0x20
[<0>] pciehp_unconfigure_device+0x84/0x140
[<0>] pciehp_disable_slot+0x67/0x110
[<0>] pciehp_handle_presence_or_link_change+0x25f/0x400
[<0>] pciehp_ist+0x22b/0x230
[<0>] irq_thread_fn+0x1f/0x50
[<0>] irq_thread+0xf3/0x190
[<0>] kthread+0x112/0x130
[<0>] ret_from_fork+0x35/0x40
[<0>] 0xffffffffffffffff

This is about two hours after the drive was removed, give or take a 
timezone.

<abridged_obligatory_frustrated_developer_rant>
This took about two damn weeks to track down. Where do I have to go to 
collect the coffee rations?
<\abridged_obligatory_frustrated_developer_rant>

Alex

On 12/18/2018 04:28 PM, Gagniuc, Alexandru - Dell Team wrote:
> Hi,
> 
> I've been seeing two issues with this commit. One was a NULL dereference
> that appears to have been fixed in 4.20-rc7.
> 
> For the second issue, I've included a system log in Appendix B with an
> explanation of what events are being generated -- all done with a
> hardware yanker, not ACPI or fake injection techniques.
> 
> After the second or so removal, the kernel will think the device is
> gone, and fails to recognize hotplug interrupts that bring the device
> back online. The hardware continues to send hotplug interrupts, but the
> handler doesn't want to recognize them -- look for "pciehp_isr hit"
> message in the logs.
> 
> I'm seeing this issue starting with commit
> cb4bfda62afa "nvme-pci: fix hot removal during error handling"
> I can't reproduce it with the commit right before it.
> 
> I'm not sure if this is a PCI or NVME bug, and it is baffling as to what
> the heck is going on. Even if I revert the commit from v4.20-rc7, I
> continue to see the hotplug interrupt misfire. I suspect it's the
> combination of several changes that breaks things.
> 
> Who let the bugs out?
> 
> Alex
> 
> 
> Appendix B: System log with explanations
> 
> [  133.772812] IPv6: ADDRCONF(NETDEV_CHANGE): eno1: link becomes ready
> 
> **** Lanes 0-2 are disconnected here
> 
> [  599.231003] pcieport 0000:b0:04.0: pciehp_isr hit
> [  599.231067] pciehp 0000:b0:04.0:pcie204: Slot(178): Link Down
> [  599.272506] pcieport 0000:b0:04.0: pciehp_isr hit
> [  599.299943] pciehp 0000:b0:04.0:pcie204: Slot(178): Card present
> [  599.299947] pciehp 0000:b0:04.0:pcie204: Slot(178): Link Up
> [  599.605523] pci 0000:b1:00.0: [8086:0a55] type 00 class 0x010802
> [  599.605565] pci 0000:b1:00.0: reg 0x10: [mem 0x00000000-0x00003fff 64bit]
> [  599.605612] pci 0000:b1:00.0: Max Payload Size set to 256 (was 128,
> max 512)
> [  599.605618] pci 0000:b1:00.0: enabling Extended Tags
> [  599.605845] acpi_run_hpx: Type 3 record not supported
> [  599.605984] pci 0000:b1:00.0: 7.876 Gb/s available PCIe bandwidth,
> limited by 8 GT/s x1 link at 0000:b0:04.0 (capable of 31.504 Gb/s with 8
> GT/s x4 link)
> [  599.606356] pcieport 0000:b0:04.0: bridge window [io  0x1000-0x0fff]
> to [bus b1] add_size 1000
> [  599.606364] pcieport 0000:b0:04.0: BAR 13: no space for [io  size 0x1000]
> [  599.606368] pcieport 0000:b0:04.0: BAR 13: failed to assign [io  size
> 0x1000]
> [  599.606371] pcieport 0000:b0:04.0: BAR 13: no space for [io  size 0x1000]
> [  599.606375] pcieport 0000:b0:04.0: BAR 13: failed to assign [io  size
> 0x1000]
> [  599.606382] pci 0000:b1:00.0: BAR 0: assigned [mem
> 0xe1500000-0xe1503fff 64bit]
> [  599.606426] pcieport 0000:b0:04.0: PCI bridge to [bus b1]
> [  599.606434] pcieport 0000:b0:04.0:   bridge window [mem
> 0xe1500000-0xe15fffff]
> [  599.606440] pcieport 0000:b0:04.0:   bridge window [mem
> 0x386000000000-0x3860001fffff 64bit pref]
> [  599.608379] nvme nvme4: pci function 0000:b1:00.0
> [  599.608680] nvme 0000:b1:00.0: enabling device (0000 -> 0002)
> [  603.386826]  nvme1n2: p1
> [  603.475185] md: super_written gets error=10
> [  603.475192] md/raid1:mdX: Disk failure on dm-13, disabling device.
>                  md/raid1:mdX: Operation continuing on 2 devices.
> [  603.519511] device-mapper: raid: Faulty raid1 device #1 has readable
> super block.  Attempting to revive it.
> 
> *** Lanes 0-2 are reconnected
> 
> [  603.548495] md: recovery of RAID array mdX
> 
> *** All PCIe lanes are disconnected here
> 
> [  623.162103] pcieport 0000:b0:04.0: pciehp_isr hit
> [  623.162169] pciehp 0000:b0:04.0:pcie204: Slot(178): Link Down
> [  623.168534] pcieport 0000:b0:04.0: pciehp_isr hit
> [  623.185774] print_req_error: I/O error, dev nvme1n2, sector 7860864
> [  623.185812] print_req_error: I/O error, dev nvme1n2, sector 7861248
> [  623.185817] print_req_error: I/O error, dev nvme1n2, sector 7857920
> [  623.185822] print_req_error: I/O error, dev nvme1n2, sector 7858432
> [  623.185826] print_req_error: I/O error, dev nvme1n2, sector 7858560
> [  623.185830] print_req_error: I/O error, dev nvme1n2, sector 7862144
> [  623.185835] print_req_error: I/O error, dev nvme1n2, sector 7858944
> [  623.185870] md/raid1:mdX: Disk failure on dm-13, disabling device.
>                  md/raid1:mdX: Operation continuing on 2 devices.
> [  623.185873] md: mdX: recovery interrupted.
> [  623.185881] print_req_error: I/O error, dev nvme1n2, sector 7861376
> [  623.197723] print_req_error: I/O error, dev nvme1n2, sector 7859072
> [  623.197727] print_req_error: I/O error, dev nvme1n2, sector 7862528
> 
> *** All PCIe lanes are re-connected
> 
> [  628.842766] pcieport 0000:b0:04.0: pciehp_isr hit
> [  628.842829] pciehp 0000:b0:04.0:pcie204: Slot(178): Link Up
> [  629.147089] pci 0000:b1:00.0: [8086:0a55] type 00 class 0x010802
> [  629.147130] pci 0000:b1:00.0: reg 0x10: [mem 0x00000000-0x00003fff 64bit]
> [  629.147177] pci 0000:b1:00.0: Max Payload Size set to 256 (was 128,
> max 512)
> [  629.147182] pci 0000:b1:00.0: enabling Extended Tags
> [  629.147401] acpi_run_hpx: Type 3 record not supported
> [  629.147888] pcieport 0000:b0:04.0: bridge window [io  0x1000-0x0fff]
> to [bus b1] add_size 1000
> [  629.147896] pcieport 0000:b0:04.0: BAR 13: no space for [io  size 0x1000]
> [  629.147899] pcieport 0000:b0:04.0: BAR 13: failed to assign [io  size
> 0x1000]
> [  629.147903] pcieport 0000:b0:04.0: BAR 13: no space for [io  size 0x1000]
> [  629.147906] pcieport 0000:b0:04.0: BAR 13: failed to assign [io  size
> 0x1000]
> [  629.147913] pci 0000:b1:00.0: BAR 0: assigned [mem
> 0xe1500000-0xe1503fff 64bit]
> [  629.147926] pcieport 0000:b0:04.0: PCI bridge to [bus b1]
> [  629.147933] pcieport 0000:b0:04.0:   bridge window [mem
> 0xe1500000-0xe15fffff]
> [  629.147939] pcieport 0000:b0:04.0:   bridge window [mem
> 0x386000000000-0x3860001fffff 64bit pref]
> [  629.148891] nvme nvme2: pci function 0000:b1:00.0
> [  629.148932] nvme 0000:b1:00.0: enabling device (0000 -> 0002)
> [  629.239386] pcieport 0000:b0:04.0: pciehp_isr hit
> [  629.239420] pciehp 0000:b0:04.0:pcie204: Slot(178): Card not present
> 
> *** Lanes not disconnected again until here
> 
> [  635.951171] pcieport 0000:b0:04.0: pciehp_isr hit
> [  635.992718] pcieport 0000:b0:04.0: pciehp_isr hit
> [  659.866526] pcieport 0000:b0:04.0: pciehp_isr hit
> [  659.877189] pcieport 0000:b0:04.0: pciehp_isr hit
> [  665.555242] pcieport 0000:b0:04.0: pciehp_isr hit
> 


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2018-12-19 19:24 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-18 22:28 Problems with commit "cb4bfda62afa nvme-pci: fix hot removal during error handling" Alex_Gagniuc
2018-12-19 19:23 ` [REGRESSION] Deadlock in nvme drive removal (Was: Problems with commit "cb4bfda62afa nvme-pci: fix hot removal during error handling") Alex_Gagniuc

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).