* Possible regression between 4.9 and 4.13 @ 2017-08-22 17:34 ` Mason 0 siblings, 0 replies; 60+ messages in thread From: Mason @ 2017-08-22 17:34 UTC (permalink / raw) To: linux-pci, linux-usb, Linux ARM Cc: Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, Mathias Nyman Hello, The driver for my system's PCIe host bridge landed recently (in 4.13) but it was developed on 4.9 I tested the PCIe host bridge by plugging a 4-port USB3 adapter into the PCIe slot (system at rest) and plugging an USB3 Flash drive into the USB3 adapter (at run-time). On 4.9, the setup works (almost perfectly, see below). On 4.13, once I unplug the Flash drive, the controller port remains unresponsive. On 4.9, I said *almost* perfectly, because the pcieport driver does report a few non-fatal errors when I unplug: [ 193.838504] usb 2-2: new SuperSpeed USB device number 2 using xhci_hcd [ 193.878081] usb-storage 2-2:1.0: USB Mass Storage device detected [ 193.884547] scsi host0: usb-storage 2-2:1.0 [ 194.907936] scsi 0:0:0:0: Direct-Access Kingston DataTraveler 3.0 PQ: 0 ANSI: 6 [ 194.920296] sd 0:0:0:0: [sda] 15109516 512-byte logical blocks: (7.74 GB/7.20 GiB) [ 194.928666] sd 0:0:0:0: [sda] Write Protect is off [ 194.933755] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA [ 194.946074] sda: sda1 [ 194.953608] sd 0:0:0:0: [sda] Attached SCSI removable disk [ 208.930260] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 [ 208.938342] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) [ 208.950163] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 [ 208.958577] pcieport 0000:00:00.0: [14] Completion Timeout (First) [ 208.965432] pcieport 0000:00:00.0: AER: Device recovery failed [ 209.663733] xhci_hcd 0000:01:00.0: Cannot set link state. [ 209.669194] usb usb2-port2: cannot disable (err = -32) [ 209.674376] usb 2-2: USB disconnect, device number 2 [ 209.680481] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 [ 209.688689] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) [ 209.700555] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 [ 209.708978] pcieport 0000:00:00.0: [14] Completion Timeout (First) [ 209.715845] pcieport 0000:00:00.0: AER: Device recovery failed [ 209.721722] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 [ 209.729785] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) [ 209.741602] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 [ 209.750027] pcieport 0000:00:00.0: [14] Completion Timeout (First) [ 209.756866] pcieport 0000:00:00.0: AER: Device recovery failed After that, I can still plug the drive into the same port. But on 4.13, I get [ 27.330378] usb 2-2: new SuperSpeed USB device number 2 using xhci_hcd [ 27.369383] usb-storage 2-2:1.0: USB Mass Storage device detected [ 27.375840] scsi host0: usb-storage 2-2:1.0 [ 28.403035] scsi 0:0:0:0: Direct-Access Kingston DataTraveler 3.0 PQ: 0 ANSI: 6 [ 28.413326] sd 0:0:0:0: [sda] 15109516 512-byte logical blocks: (7.74 GB/7.20 GiB) [ 28.423653] sd 0:0:0:0: [sda] Write Protect is off [ 28.429139] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA [ 28.441529] sda: sda1 [ 28.449431] sd 0:0:0:0: [sda] Attached SCSI removable disk [ 90.592134] xhci_hcd 0000:01:00.0: xHCI host controller not responding, assume dead [ 90.599857] xhci_hcd 0000:01:00.0: HC died; cleaning up [ 90.605336] usb 2-2: USB disconnect, device number 2 [ 90.630414] udevd[955]: inotify_add_watch(6, /dev/sda, 10) failed: No such file or directory Trying to replug into the same port = nothing happens (Linux did say "assume dead") Any idea what could have changed between 4.9 and 4.13 ? Regards. ^ permalink raw reply [flat|nested] 60+ messages in thread
* Possible regression between 4.9 and 4.13 @ 2017-08-22 17:34 ` Mason 0 siblings, 0 replies; 60+ messages in thread From: Mason @ 2017-08-22 17:34 UTC (permalink / raw) To: linux-arm-kernel Hello, The driver for my system's PCIe host bridge landed recently (in 4.13) but it was developed on 4.9 I tested the PCIe host bridge by plugging a 4-port USB3 adapter into the PCIe slot (system at rest) and plugging an USB3 Flash drive into the USB3 adapter (at run-time). On 4.9, the setup works (almost perfectly, see below). On 4.13, once I unplug the Flash drive, the controller port remains unresponsive. On 4.9, I said *almost* perfectly, because the pcieport driver does report a few non-fatal errors when I unplug: [ 193.838504] usb 2-2: new SuperSpeed USB device number 2 using xhci_hcd [ 193.878081] usb-storage 2-2:1.0: USB Mass Storage device detected [ 193.884547] scsi host0: usb-storage 2-2:1.0 [ 194.907936] scsi 0:0:0:0: Direct-Access Kingston DataTraveler 3.0 PQ: 0 ANSI: 6 [ 194.920296] sd 0:0:0:0: [sda] 15109516 512-byte logical blocks: (7.74 GB/7.20 GiB) [ 194.928666] sd 0:0:0:0: [sda] Write Protect is off [ 194.933755] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA [ 194.946074] sda: sda1 [ 194.953608] sd 0:0:0:0: [sda] Attached SCSI removable disk [ 208.930260] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 [ 208.938342] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) [ 208.950163] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 [ 208.958577] pcieport 0000:00:00.0: [14] Completion Timeout (First) [ 208.965432] pcieport 0000:00:00.0: AER: Device recovery failed [ 209.663733] xhci_hcd 0000:01:00.0: Cannot set link state. [ 209.669194] usb usb2-port2: cannot disable (err = -32) [ 209.674376] usb 2-2: USB disconnect, device number 2 [ 209.680481] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 [ 209.688689] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) [ 209.700555] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 [ 209.708978] pcieport 0000:00:00.0: [14] Completion Timeout (First) [ 209.715845] pcieport 0000:00:00.0: AER: Device recovery failed [ 209.721722] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 [ 209.729785] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) [ 209.741602] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 [ 209.750027] pcieport 0000:00:00.0: [14] Completion Timeout (First) [ 209.756866] pcieport 0000:00:00.0: AER: Device recovery failed After that, I can still plug the drive into the same port. But on 4.13, I get [ 27.330378] usb 2-2: new SuperSpeed USB device number 2 using xhci_hcd [ 27.369383] usb-storage 2-2:1.0: USB Mass Storage device detected [ 27.375840] scsi host0: usb-storage 2-2:1.0 [ 28.403035] scsi 0:0:0:0: Direct-Access Kingston DataTraveler 3.0 PQ: 0 ANSI: 6 [ 28.413326] sd 0:0:0:0: [sda] 15109516 512-byte logical blocks: (7.74 GB/7.20 GiB) [ 28.423653] sd 0:0:0:0: [sda] Write Protect is off [ 28.429139] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA [ 28.441529] sda: sda1 [ 28.449431] sd 0:0:0:0: [sda] Attached SCSI removable disk [ 90.592134] xhci_hcd 0000:01:00.0: xHCI host controller not responding, assume dead [ 90.599857] xhci_hcd 0000:01:00.0: HC died; cleaning up [ 90.605336] usb 2-2: USB disconnect, device number 2 [ 90.630414] udevd[955]: inotify_add_watch(6, /dev/sda, 10) failed: No such file or directory Trying to replug into the same port = nothing happens (Linux did say "assume dead") Any idea what could have changed between 4.9 and 4.13 ? Regards. ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: Possible regression between 4.9 and 4.13 2017-08-22 17:34 ` Mason @ 2017-08-23 6:07 ` Felipe Balbi -1 siblings, 0 replies; 60+ messages in thread From: Felipe Balbi @ 2017-08-23 6:07 UTC (permalink / raw) To: Mason, linux-pci, linux-usb, Linux ARM Cc: Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, Mathias Nyman Hi, Mason <slash.tmp@free.fr> writes: > Hello, > > The driver for my system's PCIe host bridge landed recently > (in 4.13) but it was developed on 4.9 > > I tested the PCIe host bridge by plugging a 4-port USB3 adapter > into the PCIe slot (system at rest) and plugging an USB3 Flash > drive into the USB3 adapter (at run-time). > > On 4.9, the setup works (almost perfectly, see below). > On 4.13, once I unplug the Flash drive, the controller port > remains unresponsive. > > > On 4.9, I said *almost* perfectly, because the pcieport driver > does report a few non-fatal errors when I unplug: > > [ 193.838504] usb 2-2: new SuperSpeed USB device number 2 using xhci_hcd > [ 193.878081] usb-storage 2-2:1.0: USB Mass Storage device detected > [ 193.884547] scsi host0: usb-storage 2-2:1.0 > [ 194.907936] scsi 0:0:0:0: Direct-Access Kingston DataTraveler 3.0 PQ: 0 ANSI: 6 > [ 194.920296] sd 0:0:0:0: [sda] 15109516 512-byte logical blocks: (7.74 GB/7.20 GiB) > [ 194.928666] sd 0:0:0:0: [sda] Write Protect is off > [ 194.933755] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA > [ 194.946074] sda: sda1 > [ 194.953608] sd 0:0:0:0: [sda] Attached SCSI removable disk > > [ 208.930260] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 > [ 208.938342] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) > [ 208.950163] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 > [ 208.958577] pcieport 0000:00:00.0: [14] Completion Timeout (First) > [ 208.965432] pcieport 0000:00:00.0: AER: Device recovery failed > [ 209.663733] xhci_hcd 0000:01:00.0: Cannot set link state. > [ 209.669194] usb usb2-port2: cannot disable (err = -32) > [ 209.674376] usb 2-2: USB disconnect, device number 2 > [ 209.680481] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 > [ 209.688689] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) > [ 209.700555] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 > [ 209.708978] pcieport 0000:00:00.0: [14] Completion Timeout (First) > [ 209.715845] pcieport 0000:00:00.0: AER: Device recovery failed > [ 209.721722] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 > [ 209.729785] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) > [ 209.741602] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 > [ 209.750027] pcieport 0000:00:00.0: [14] Completion Timeout (First) > [ 209.756866] pcieport 0000:00:00.0: AER: Device recovery failed > > After that, I can still plug the drive into the same port. > > But on 4.13, I get > > [ 27.330378] usb 2-2: new SuperSpeed USB device number 2 using xhci_hcd > [ 27.369383] usb-storage 2-2:1.0: USB Mass Storage device detected > [ 27.375840] scsi host0: usb-storage 2-2:1.0 > [ 28.403035] scsi 0:0:0:0: Direct-Access Kingston DataTraveler 3.0 PQ: 0 ANSI: 6 > [ 28.413326] sd 0:0:0:0: [sda] 15109516 512-byte logical blocks: (7.74 GB/7.20 GiB) > [ 28.423653] sd 0:0:0:0: [sda] Write Protect is off > [ 28.429139] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA > [ 28.441529] sda: sda1 > [ 28.449431] sd 0:0:0:0: [sda] Attached SCSI removable disk > > [ 90.592134] xhci_hcd 0000:01:00.0: xHCI host controller not responding, assume dead > [ 90.599857] xhci_hcd 0000:01:00.0: HC died; cleaning up > [ 90.605336] usb 2-2: USB disconnect, device number 2 > [ 90.630414] udevd[955]: inotify_add_watch(6, /dev/sda, 10) failed: No such file or directory > > Trying to replug into the same port = nothing happens > (Linux did say "assume dead") > > Any idea what could have changed between 4.9 and 4.13 ? > Quite a bit: $ git rev-list --no-merges --count v4.13-rc6 ^v4.9 -- drivers/usb/host/xhci drivers/usb/core/ 58 Any chance you can bisect to figure out the offending commit? -- balbi ^ permalink raw reply [flat|nested] 60+ messages in thread
* Possible regression between 4.9 and 4.13 @ 2017-08-23 6:07 ` Felipe Balbi 0 siblings, 0 replies; 60+ messages in thread From: Felipe Balbi @ 2017-08-23 6:07 UTC (permalink / raw) To: linux-arm-kernel Hi, Mason <slash.tmp@free.fr> writes: > Hello, > > The driver for my system's PCIe host bridge landed recently > (in 4.13) but it was developed on 4.9 > > I tested the PCIe host bridge by plugging a 4-port USB3 adapter > into the PCIe slot (system at rest) and plugging an USB3 Flash > drive into the USB3 adapter (at run-time). > > On 4.9, the setup works (almost perfectly, see below). > On 4.13, once I unplug the Flash drive, the controller port > remains unresponsive. > > > On 4.9, I said *almost* perfectly, because the pcieport driver > does report a few non-fatal errors when I unplug: > > [ 193.838504] usb 2-2: new SuperSpeed USB device number 2 using xhci_hcd > [ 193.878081] usb-storage 2-2:1.0: USB Mass Storage device detected > [ 193.884547] scsi host0: usb-storage 2-2:1.0 > [ 194.907936] scsi 0:0:0:0: Direct-Access Kingston DataTraveler 3.0 PQ: 0 ANSI: 6 > [ 194.920296] sd 0:0:0:0: [sda] 15109516 512-byte logical blocks: (7.74 GB/7.20 GiB) > [ 194.928666] sd 0:0:0:0: [sda] Write Protect is off > [ 194.933755] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA > [ 194.946074] sda: sda1 > [ 194.953608] sd 0:0:0:0: [sda] Attached SCSI removable disk > > [ 208.930260] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 > [ 208.938342] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) > [ 208.950163] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 > [ 208.958577] pcieport 0000:00:00.0: [14] Completion Timeout (First) > [ 208.965432] pcieport 0000:00:00.0: AER: Device recovery failed > [ 209.663733] xhci_hcd 0000:01:00.0: Cannot set link state. > [ 209.669194] usb usb2-port2: cannot disable (err = -32) > [ 209.674376] usb 2-2: USB disconnect, device number 2 > [ 209.680481] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 > [ 209.688689] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) > [ 209.700555] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 > [ 209.708978] pcieport 0000:00:00.0: [14] Completion Timeout (First) > [ 209.715845] pcieport 0000:00:00.0: AER: Device recovery failed > [ 209.721722] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 > [ 209.729785] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) > [ 209.741602] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 > [ 209.750027] pcieport 0000:00:00.0: [14] Completion Timeout (First) > [ 209.756866] pcieport 0000:00:00.0: AER: Device recovery failed > > After that, I can still plug the drive into the same port. > > But on 4.13, I get > > [ 27.330378] usb 2-2: new SuperSpeed USB device number 2 using xhci_hcd > [ 27.369383] usb-storage 2-2:1.0: USB Mass Storage device detected > [ 27.375840] scsi host0: usb-storage 2-2:1.0 > [ 28.403035] scsi 0:0:0:0: Direct-Access Kingston DataTraveler 3.0 PQ: 0 ANSI: 6 > [ 28.413326] sd 0:0:0:0: [sda] 15109516 512-byte logical blocks: (7.74 GB/7.20 GiB) > [ 28.423653] sd 0:0:0:0: [sda] Write Protect is off > [ 28.429139] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA > [ 28.441529] sda: sda1 > [ 28.449431] sd 0:0:0:0: [sda] Attached SCSI removable disk > > [ 90.592134] xhci_hcd 0000:01:00.0: xHCI host controller not responding, assume dead > [ 90.599857] xhci_hcd 0000:01:00.0: HC died; cleaning up > [ 90.605336] usb 2-2: USB disconnect, device number 2 > [ 90.630414] udevd[955]: inotify_add_watch(6, /dev/sda, 10) failed: No such file or directory > > Trying to replug into the same port = nothing happens > (Linux did say "assume dead") > > Any idea what could have changed between 4.9 and 4.13 ? > Quite a bit: $ git rev-list --no-merges --count v4.13-rc6 ^v4.9 -- drivers/usb/host/xhci drivers/usb/core/ 58 Any chance you can bisect to figure out the offending commit? -- balbi ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: Possible regression between 4.9 and 4.13 2017-08-23 6:07 ` Felipe Balbi @ 2017-08-23 7:51 ` Mathias Nyman -1 siblings, 0 replies; 60+ messages in thread From: Mathias Nyman @ 2017-08-23 7:51 UTC (permalink / raw) To: Felipe Balbi, Mason, linux-pci, linux-usb, Linux ARM Cc: Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman On 23.08.2017 09:07, Felipe Balbi wrote: > > Hi, > > Mason <slash.tmp@free.fr> writes: >> Hello, >> >> The driver for my system's PCIe host bridge landed recently >> (in 4.13) but it was developed on 4.9 >> >> I tested the PCIe host bridge by plugging a 4-port USB3 adapter >> into the PCIe slot (system at rest) and plugging an USB3 Flash >> drive into the USB3 adapter (at run-time). >> >> On 4.9, the setup works (almost perfectly, see below). >> On 4.13, once I unplug the Flash drive, the controller port >> remains unresponsive. >> >> >> On 4.9, I said *almost* perfectly, because the pcieport driver >> does report a few non-fatal errors when I unplug: >> >> [ 193.838504] usb 2-2: new SuperSpeed USB device number 2 using xhci_hcd >> [ 193.878081] usb-storage 2-2:1.0: USB Mass Storage device detected >> [ 193.884547] scsi host0: usb-storage 2-2:1.0 >> [ 194.907936] scsi 0:0:0:0: Direct-Access Kingston DataTraveler 3.0 PQ: 0 ANSI: 6 >> [ 194.920296] sd 0:0:0:0: [sda] 15109516 512-byte logical blocks: (7.74 GB/7.20 GiB) >> [ 194.928666] sd 0:0:0:0: [sda] Write Protect is off >> [ 194.933755] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA >> [ 194.946074] sda: sda1 >> [ 194.953608] sd 0:0:0:0: [sda] Attached SCSI removable disk >> >> [ 208.930260] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 >> [ 208.938342] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) >> [ 208.950163] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 >> [ 208.958577] pcieport 0000:00:00.0: [14] Completion Timeout (First) >> [ 208.965432] pcieport 0000:00:00.0: AER: Device recovery failed >> [ 209.663733] xhci_hcd 0000:01:00.0: Cannot set link state. >> [ 209.669194] usb usb2-port2: cannot disable (err = -32) >> [ 209.674376] usb 2-2: USB disconnect, device number 2 >> [ 209.680481] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 >> [ 209.688689] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) >> [ 209.700555] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 >> [ 209.708978] pcieport 0000:00:00.0: [14] Completion Timeout (First) >> [ 209.715845] pcieport 0000:00:00.0: AER: Device recovery failed >> [ 209.721722] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 >> [ 209.729785] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) >> [ 209.741602] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 >> [ 209.750027] pcieport 0000:00:00.0: [14] Completion Timeout (First) >> [ 209.756866] pcieport 0000:00:00.0: AER: Device recovery failed >> >> After that, I can still plug the drive into the same port. >> >> But on 4.13, I get >> >> [ 27.330378] usb 2-2: new SuperSpeed USB device number 2 using xhci_hcd >> [ 27.369383] usb-storage 2-2:1.0: USB Mass Storage device detected >> [ 27.375840] scsi host0: usb-storage 2-2:1.0 >> [ 28.403035] scsi 0:0:0:0: Direct-Access Kingston DataTraveler 3.0 PQ: 0 ANSI: 6 >> [ 28.413326] sd 0:0:0:0: [sda] 15109516 512-byte logical blocks: (7.74 GB/7.20 GiB) >> [ 28.423653] sd 0:0:0:0: [sda] Write Protect is off >> [ 28.429139] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA >> [ 28.441529] sda: sda1 >> [ 28.449431] sd 0:0:0:0: [sda] Attached SCSI removable disk >> >> [ 90.592134] xhci_hcd 0000:01:00.0: xHCI host controller not responding, assume dead >> [ 90.599857] xhci_hcd 0000:01:00.0: HC died; cleaning up >> [ 90.605336] usb 2-2: USB disconnect, device number 2 >> [ 90.630414] udevd[955]: inotify_add_watch(6, /dev/sda, 10) failed: No such file or directory >> >> Trying to replug into the same port = nothing happens >> (Linux did say "assume dead") >> >> Any idea what could have changed between 4.9 and 4.13 ? >> > > Quite a bit: > > $ git rev-list --no-merges --count v4.13-rc6 ^v4.9 -- drivers/usb/host/xhci drivers/usb/core/ > 58 > very likely cause is the more aggressive detection of pci removed xhci hosts See commit d9f11ba9f107aa335091ab8d7ba5eea714e46e8b xhci: Rework how we handle unresponsive or hoptlug removed hosts It checks if a xhci register reads returns 0xffffffff and assumes xhci died in that case. Could you add something like the below to check which what is killing the host? Or a BUG()/WARN() in xhci_hc_died() to get a backtrace of who called it. diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c index 51cd4b8..ade2ad6 100644 --- a/drivers/usb/host/xhci-ring.c +++ b/drivers/usb/host/xhci-ring.c @@ -922,7 +922,8 @@ void xhci_hc_died(struct xhci_hcd *xhci) if (xhci->xhc_state & XHCI_STATE_DYING) return; - xhci_err(xhci, "xHCI host controller not responding, assume dead\n"); + xhci_err(xhci, "xHC not responding in %pf, assume controller is dead\n", + __builtin_return_address(0)); xhci->xhc_state |= XHCI_STATE_DYING; xhci_cleanup_command_queue(xhci); Thanks Mathias ^ permalink raw reply related [flat|nested] 60+ messages in thread
* Possible regression between 4.9 and 4.13 @ 2017-08-23 7:51 ` Mathias Nyman 0 siblings, 0 replies; 60+ messages in thread From: Mathias Nyman @ 2017-08-23 7:51 UTC (permalink / raw) To: linux-arm-kernel On 23.08.2017 09:07, Felipe Balbi wrote: > > Hi, > > Mason <slash.tmp@free.fr> writes: >> Hello, >> >> The driver for my system's PCIe host bridge landed recently >> (in 4.13) but it was developed on 4.9 >> >> I tested the PCIe host bridge by plugging a 4-port USB3 adapter >> into the PCIe slot (system at rest) and plugging an USB3 Flash >> drive into the USB3 adapter (at run-time). >> >> On 4.9, the setup works (almost perfectly, see below). >> On 4.13, once I unplug the Flash drive, the controller port >> remains unresponsive. >> >> >> On 4.9, I said *almost* perfectly, because the pcieport driver >> does report a few non-fatal errors when I unplug: >> >> [ 193.838504] usb 2-2: new SuperSpeed USB device number 2 using xhci_hcd >> [ 193.878081] usb-storage 2-2:1.0: USB Mass Storage device detected >> [ 193.884547] scsi host0: usb-storage 2-2:1.0 >> [ 194.907936] scsi 0:0:0:0: Direct-Access Kingston DataTraveler 3.0 PQ: 0 ANSI: 6 >> [ 194.920296] sd 0:0:0:0: [sda] 15109516 512-byte logical blocks: (7.74 GB/7.20 GiB) >> [ 194.928666] sd 0:0:0:0: [sda] Write Protect is off >> [ 194.933755] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA >> [ 194.946074] sda: sda1 >> [ 194.953608] sd 0:0:0:0: [sda] Attached SCSI removable disk >> >> [ 208.930260] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 >> [ 208.938342] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) >> [ 208.950163] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 >> [ 208.958577] pcieport 0000:00:00.0: [14] Completion Timeout (First) >> [ 208.965432] pcieport 0000:00:00.0: AER: Device recovery failed >> [ 209.663733] xhci_hcd 0000:01:00.0: Cannot set link state. >> [ 209.669194] usb usb2-port2: cannot disable (err = -32) >> [ 209.674376] usb 2-2: USB disconnect, device number 2 >> [ 209.680481] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 >> [ 209.688689] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) >> [ 209.700555] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 >> [ 209.708978] pcieport 0000:00:00.0: [14] Completion Timeout (First) >> [ 209.715845] pcieport 0000:00:00.0: AER: Device recovery failed >> [ 209.721722] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 >> [ 209.729785] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) >> [ 209.741602] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 >> [ 209.750027] pcieport 0000:00:00.0: [14] Completion Timeout (First) >> [ 209.756866] pcieport 0000:00:00.0: AER: Device recovery failed >> >> After that, I can still plug the drive into the same port. >> >> But on 4.13, I get >> >> [ 27.330378] usb 2-2: new SuperSpeed USB device number 2 using xhci_hcd >> [ 27.369383] usb-storage 2-2:1.0: USB Mass Storage device detected >> [ 27.375840] scsi host0: usb-storage 2-2:1.0 >> [ 28.403035] scsi 0:0:0:0: Direct-Access Kingston DataTraveler 3.0 PQ: 0 ANSI: 6 >> [ 28.413326] sd 0:0:0:0: [sda] 15109516 512-byte logical blocks: (7.74 GB/7.20 GiB) >> [ 28.423653] sd 0:0:0:0: [sda] Write Protect is off >> [ 28.429139] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA >> [ 28.441529] sda: sda1 >> [ 28.449431] sd 0:0:0:0: [sda] Attached SCSI removable disk >> >> [ 90.592134] xhci_hcd 0000:01:00.0: xHCI host controller not responding, assume dead >> [ 90.599857] xhci_hcd 0000:01:00.0: HC died; cleaning up >> [ 90.605336] usb 2-2: USB disconnect, device number 2 >> [ 90.630414] udevd[955]: inotify_add_watch(6, /dev/sda, 10) failed: No such file or directory >> >> Trying to replug into the same port = nothing happens >> (Linux did say "assume dead") >> >> Any idea what could have changed between 4.9 and 4.13 ? >> > > Quite a bit: > > $ git rev-list --no-merges --count v4.13-rc6 ^v4.9 -- drivers/usb/host/xhci drivers/usb/core/ > 58 > very likely cause is the more aggressive detection of pci removed xhci hosts See commit d9f11ba9f107aa335091ab8d7ba5eea714e46e8b xhci: Rework how we handle unresponsive or hoptlug removed hosts It checks if a xhci register reads returns 0xffffffff and assumes xhci died in that case. Could you add something like the below to check which what is killing the host? Or a BUG()/WARN() in xhci_hc_died() to get a backtrace of who called it. diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c index 51cd4b8..ade2ad6 100644 --- a/drivers/usb/host/xhci-ring.c +++ b/drivers/usb/host/xhci-ring.c @@ -922,7 +922,8 @@ void xhci_hc_died(struct xhci_hcd *xhci) if (xhci->xhc_state & XHCI_STATE_DYING) return; - xhci_err(xhci, "xHCI host controller not responding, assume dead\n"); + xhci_err(xhci, "xHC not responding in %pf, assume controller is dead\n", + __builtin_return_address(0)); xhci->xhc_state |= XHCI_STATE_DYING; xhci_cleanup_command_queue(xhci); Thanks Mathias ^ permalink raw reply related [flat|nested] 60+ messages in thread
* Re: Possible regression between 4.9 and 4.13 2017-08-23 7:51 ` Mathias Nyman @ 2017-08-23 9:18 ` Mason -1 siblings, 0 replies; 60+ messages in thread From: Mason @ 2017-08-23 9:18 UTC (permalink / raw) To: Mathias Nyman, Felipe Balbi, linux-pci, linux-usb, Linux ARM Cc: Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman On 23/08/2017 09:51, Mathias Nyman wrote: > On 23.08.2017 09:07, Felipe Balbi wrote: > >> Mason writes: >> >>> Any idea what could have changed between 4.9 and 4.13 ? >> >> Quite a bit: >> >> $ git rev-list --no-merges --count v4.13-rc6 ^v4.9 -- drivers/usb/host/xhci drivers/usb/core/ >> 58 > > very likely cause is the more aggressive detection of pci removed xhci hosts > > See commit d9f11ba9f107aa335091ab8d7ba5eea714e46e8b > xhci: Rework how we handle unresponsive or hoptlug removed hosts > > It checks if a xhci register reads returns 0xffffffff and assumes xhci > died in that case. > > Could you add something like the below to check which what is killing the host? > Or a BUG()/WARN() in xhci_hc_died() to get a backtrace of who called it. > > diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c > index 51cd4b8..ade2ad6 100644 > --- a/drivers/usb/host/xhci-ring.c > +++ b/drivers/usb/host/xhci-ring.c > @@ -922,7 +922,8 @@ void xhci_hc_died(struct xhci_hcd *xhci) > if (xhci->xhc_state & XHCI_STATE_DYING) > return; > > - xhci_err(xhci, "xHCI host controller not responding, assume dead\n"); > + xhci_err(xhci, "xHC not responding in %pf, assume controller is dead\n", > + __builtin_return_address(0)); > xhci->xhc_state |= XHCI_STATE_DYING; > > xhci_cleanup_command_queue(xhci); I'll try some coarse bisection to narrow it down. $ git describe --contains d9f11ba9f107aa335091ab8d7ba5eea714e46e8b v4.12-rc1~97^2~39 I'll check 4.11 first. I wanted to mention that the XHCI setup on 4.9 and 4.13 print slightly different things (at the beginning). On 4.9 [ 1.240322] xhci_hcd 0000:01:00.0: xHCI Host Controller [ 1.245617] xhci_hcd 0000:01:00.0: new USB bus registered, assigned bus number 1 [ 1.258691] xhci_hcd 0000:01:00.0: hcc params 0x014051cf hci version 0x100 quirks 0x00000010 [ 1.268090] hub 1-0:1.0: USB hub found [ 1.271905] hub 1-0:1.0: 4 ports detected [ 1.276372] xhci_hcd 0000:01:00.0: xHCI Host Controller [ 1.281645] xhci_hcd 0000:01:00.0: new USB bus registered, assigned bus number 2 [ 1.289173] usb usb2: We don't know the algorithms for LPM for this host, disabling LPM. [ 1.297775] hub 2-0:1.0: USB hub found [ 1.301577] hub 2-0:1.0: 4 ports detected [ 1.306194] usbcore: registered new interface driver usb-storage On 4.13 [ 1.222471] pcieport 0000:00:00.0: of_irq_parse_pci: failed with rc=-22 [ 1.229156] xhci_hcd 0000:01:00.0: Resetting [ 2.268836] xhci_hcd 0000:01:00.0: xHCI Host Controller [ 2.274126] xhci_hcd 0000:01:00.0: new USB bus registered, assigned bus number 1 [ 2.287222] xhci_hcd 0000:01:00.0: hcc params 0x014051cf hci version 0x100 quirks 0x00000010 [ 2.296653] hub 1-0:1.0: USB hub found [ 2.300478] hub 1-0:1.0: 4 ports detected [ 2.304962] xhci_hcd 0000:01:00.0: xHCI Host Controller [ 2.310246] xhci_hcd 0000:01:00.0: new USB bus registered, assigned bus number 2 [ 2.317776] usb usb2: We don't know the algorithms for LPM for this host, disabling LPM. [ 2.326419] hub 2-0:1.0: USB hub found [ 2.330229] hub 2-0:1.0: 4 ports detected [ 2.334869] usbcore: registered new interface driver usb-storage FWIW, "of_irq_parse_pci: failed with rc=-22" seems to come from: [ 1.257411] [<c03d80c8>] (of_irq_parse_pci) from [<c03d8270>] (of_irq_parse_and_map_pci+0x10/0x2c) [ 1.266420] [<c03d8270>] (of_irq_parse_and_map_pci) from [<c03100a8>] (pci_assign_irq+0x78/0xb0) [ 1.275254] [<c03100a8>] (pci_assign_irq) from [<c030a1c8>] (pci_device_probe+0x18/0x128) [ 1.283476] [<c030a1c8>] (pci_device_probe) from [<c0357864>] (driver_probe_device+0x244/0x2c8) The error logging was added by f1aa54840657f No, that just turned one specific error into a warning. Need to dig a bit more. Regards. ^ permalink raw reply [flat|nested] 60+ messages in thread
* Possible regression between 4.9 and 4.13 @ 2017-08-23 9:18 ` Mason 0 siblings, 0 replies; 60+ messages in thread From: Mason @ 2017-08-23 9:18 UTC (permalink / raw) To: linux-arm-kernel On 23/08/2017 09:51, Mathias Nyman wrote: > On 23.08.2017 09:07, Felipe Balbi wrote: > >> Mason writes: >> >>> Any idea what could have changed between 4.9 and 4.13 ? >> >> Quite a bit: >> >> $ git rev-list --no-merges --count v4.13-rc6 ^v4.9 -- drivers/usb/host/xhci drivers/usb/core/ >> 58 > > very likely cause is the more aggressive detection of pci removed xhci hosts > > See commit d9f11ba9f107aa335091ab8d7ba5eea714e46e8b > xhci: Rework how we handle unresponsive or hoptlug removed hosts > > It checks if a xhci register reads returns 0xffffffff and assumes xhci > died in that case. > > Could you add something like the below to check which what is killing the host? > Or a BUG()/WARN() in xhci_hc_died() to get a backtrace of who called it. > > diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c > index 51cd4b8..ade2ad6 100644 > --- a/drivers/usb/host/xhci-ring.c > +++ b/drivers/usb/host/xhci-ring.c > @@ -922,7 +922,8 @@ void xhci_hc_died(struct xhci_hcd *xhci) > if (xhci->xhc_state & XHCI_STATE_DYING) > return; > > - xhci_err(xhci, "xHCI host controller not responding, assume dead\n"); > + xhci_err(xhci, "xHC not responding in %pf, assume controller is dead\n", > + __builtin_return_address(0)); > xhci->xhc_state |= XHCI_STATE_DYING; > > xhci_cleanup_command_queue(xhci); I'll try some coarse bisection to narrow it down. $ git describe --contains d9f11ba9f107aa335091ab8d7ba5eea714e46e8b v4.12-rc1~97^2~39 I'll check 4.11 first. I wanted to mention that the XHCI setup on 4.9 and 4.13 print slightly different things (at the beginning). On 4.9 [ 1.240322] xhci_hcd 0000:01:00.0: xHCI Host Controller [ 1.245617] xhci_hcd 0000:01:00.0: new USB bus registered, assigned bus number 1 [ 1.258691] xhci_hcd 0000:01:00.0: hcc params 0x014051cf hci version 0x100 quirks 0x00000010 [ 1.268090] hub 1-0:1.0: USB hub found [ 1.271905] hub 1-0:1.0: 4 ports detected [ 1.276372] xhci_hcd 0000:01:00.0: xHCI Host Controller [ 1.281645] xhci_hcd 0000:01:00.0: new USB bus registered, assigned bus number 2 [ 1.289173] usb usb2: We don't know the algorithms for LPM for this host, disabling LPM. [ 1.297775] hub 2-0:1.0: USB hub found [ 1.301577] hub 2-0:1.0: 4 ports detected [ 1.306194] usbcore: registered new interface driver usb-storage On 4.13 [ 1.222471] pcieport 0000:00:00.0: of_irq_parse_pci: failed with rc=-22 [ 1.229156] xhci_hcd 0000:01:00.0: Resetting [ 2.268836] xhci_hcd 0000:01:00.0: xHCI Host Controller [ 2.274126] xhci_hcd 0000:01:00.0: new USB bus registered, assigned bus number 1 [ 2.287222] xhci_hcd 0000:01:00.0: hcc params 0x014051cf hci version 0x100 quirks 0x00000010 [ 2.296653] hub 1-0:1.0: USB hub found [ 2.300478] hub 1-0:1.0: 4 ports detected [ 2.304962] xhci_hcd 0000:01:00.0: xHCI Host Controller [ 2.310246] xhci_hcd 0000:01:00.0: new USB bus registered, assigned bus number 2 [ 2.317776] usb usb2: We don't know the algorithms for LPM for this host, disabling LPM. [ 2.326419] hub 2-0:1.0: USB hub found [ 2.330229] hub 2-0:1.0: 4 ports detected [ 2.334869] usbcore: registered new interface driver usb-storage FWIW, "of_irq_parse_pci: failed with rc=-22" seems to come from: [ 1.257411] [<c03d80c8>] (of_irq_parse_pci) from [<c03d8270>] (of_irq_parse_and_map_pci+0x10/0x2c) [ 1.266420] [<c03d8270>] (of_irq_parse_and_map_pci) from [<c03100a8>] (pci_assign_irq+0x78/0xb0) [ 1.275254] [<c03100a8>] (pci_assign_irq) from [<c030a1c8>] (pci_device_probe+0x18/0x128) [ 1.283476] [<c030a1c8>] (pci_device_probe) from [<c0357864>] (driver_probe_device+0x244/0x2c8) The error logging was added by f1aa54840657f No, that just turned one specific error into a warning. Need to dig a bit more. Regards. ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: Possible regression between 4.9 and 4.13 2017-08-23 7:51 ` Mathias Nyman @ 2017-08-23 9:31 ` Mason -1 siblings, 0 replies; 60+ messages in thread From: Mason @ 2017-08-23 9:31 UTC (permalink / raw) To: Mathias Nyman, Felipe Balbi, linux-pci, linux-usb, Linux ARM Cc: Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman On 23/08/2017 09:51, Mathias Nyman wrote: > very likely cause is the more aggressive detection of pci removed xhci hosts > > See commit d9f11ba9f107aa335091ab8d7ba5eea714e46e8b > xhci: Rework how we handle unresponsive or hoptlug removed hosts > > It checks if a xhci register reads returns 0xffffffff and assumes xhci > died in that case. > > Could you add something like the below to check which what is killing the host? > Or a BUG()/WARN() in xhci_hc_died() to get a backtrace of who called it. [ 46.525247] usb 2-2: new SuperSpeed USB device number 2 using xhci_hcd [ 46.565496] usb-storage 2-2:1.0: USB Mass Storage device detected [ 46.571934] scsi host0: usb-storage 2-2:1.0 [ 47.601227] scsi 0:0:0:0: Direct-Access Kingston DataTraveler 3.0 PQ: 0 ANSI: 6 [ 47.611340] sd 0:0:0:0: [sda] 15109516 512-byte logical blocks: (7.74 GB/7.20 GiB) [ 47.621624] sd 0:0:0:0: [sda] Write Protect is off [ 47.627131] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA [ 47.639637] sda: sda1 [ 47.648091] sd 0:0:0:0: [sda] Attached SCSI removable disk [ 58.100306] xhci_hcd 0000:01:00.0: xHCI host controller not responding, assume dead [ 58.108021] CPU: 0 PID: 939 Comm: kworker/0:2 Tainted: G C 4.13.0-rc6 #11 [ 58.115976] Hardware name: Sigma Tango DT [ 58.120016] Workqueue: usb_hub_wq hub_event [ 58.124241] [<c010f288>] (unwind_backtrace) from [<c010af58>] (show_stack+0x10/0x14) [ 58.132033] [<c010af58>] (show_stack) from [<c049d714>] (dump_stack+0x84/0x98) [ 58.139302] [<c049d714>] (dump_stack) from [<c03b090c>] (xhci_hc_died.part.9+0x50/0x23c) [ 58.147438] [<c03b090c>] (xhci_hc_died.part.9) from [<c03b5d80>] (xhci_hub_control+0xf3c/0x175c) [ 58.156273] [<c03b5d80>] (xhci_hub_control) from [<c03934a4>] (usb_hcd_submit_urb+0x264/0x814) [ 58.164932] [<c03934a4>] (usb_hcd_submit_urb) from [<c0394fa4>] (usb_start_wait_urb+0x4c/0xbc) [ 58.173591] [<c0394fa4>] (usb_start_wait_urb) from [<c03950b4>] (usb_control_msg+0xa0/0xcc) [ 58.181985] [<c03950b4>] (usb_control_msg) from [<c038bf54>] (usb_clear_port_feature+0x44/0x4c) [ 58.190730] [<c038bf54>] (usb_clear_port_feature) from [<c038c320>] (hub_port_reset+0x228/0x51c) [ 58.199561] [<c038c320>] (hub_port_reset) from [<c038fd68>] (hub_event+0x87c/0x108c) [ 58.207349] [<c038fd68>] (hub_event) from [<c012ecc4>] (process_one_work+0x1d8/0x3f0) [ 58.215220] [<c012ecc4>] (process_one_work) from [<c012f8d8>] (worker_thread+0x38/0x554) [ 58.223354] [<c012f8d8>] (worker_thread) from [<c01347d0>] (kthread+0x108/0x138) [ 58.230789] [<c01347d0>] (kthread) from [<c01076d8>] (ret_from_fork+0x14/0x3c) [ 58.238056] xhci_hcd 0000:01:00.0: HC died; cleaning up [ 58.243391] usb 2-2: USB disconnect, device number 2 ^ permalink raw reply [flat|nested] 60+ messages in thread
* Possible regression between 4.9 and 4.13 @ 2017-08-23 9:31 ` Mason 0 siblings, 0 replies; 60+ messages in thread From: Mason @ 2017-08-23 9:31 UTC (permalink / raw) To: linux-arm-kernel On 23/08/2017 09:51, Mathias Nyman wrote: > very likely cause is the more aggressive detection of pci removed xhci hosts > > See commit d9f11ba9f107aa335091ab8d7ba5eea714e46e8b > xhci: Rework how we handle unresponsive or hoptlug removed hosts > > It checks if a xhci register reads returns 0xffffffff and assumes xhci > died in that case. > > Could you add something like the below to check which what is killing the host? > Or a BUG()/WARN() in xhci_hc_died() to get a backtrace of who called it. [ 46.525247] usb 2-2: new SuperSpeed USB device number 2 using xhci_hcd [ 46.565496] usb-storage 2-2:1.0: USB Mass Storage device detected [ 46.571934] scsi host0: usb-storage 2-2:1.0 [ 47.601227] scsi 0:0:0:0: Direct-Access Kingston DataTraveler 3.0 PQ: 0 ANSI: 6 [ 47.611340] sd 0:0:0:0: [sda] 15109516 512-byte logical blocks: (7.74 GB/7.20 GiB) [ 47.621624] sd 0:0:0:0: [sda] Write Protect is off [ 47.627131] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA [ 47.639637] sda: sda1 [ 47.648091] sd 0:0:0:0: [sda] Attached SCSI removable disk [ 58.100306] xhci_hcd 0000:01:00.0: xHCI host controller not responding, assume dead [ 58.108021] CPU: 0 PID: 939 Comm: kworker/0:2 Tainted: G C 4.13.0-rc6 #11 [ 58.115976] Hardware name: Sigma Tango DT [ 58.120016] Workqueue: usb_hub_wq hub_event [ 58.124241] [<c010f288>] (unwind_backtrace) from [<c010af58>] (show_stack+0x10/0x14) [ 58.132033] [<c010af58>] (show_stack) from [<c049d714>] (dump_stack+0x84/0x98) [ 58.139302] [<c049d714>] (dump_stack) from [<c03b090c>] (xhci_hc_died.part.9+0x50/0x23c) [ 58.147438] [<c03b090c>] (xhci_hc_died.part.9) from [<c03b5d80>] (xhci_hub_control+0xf3c/0x175c) [ 58.156273] [<c03b5d80>] (xhci_hub_control) from [<c03934a4>] (usb_hcd_submit_urb+0x264/0x814) [ 58.164932] [<c03934a4>] (usb_hcd_submit_urb) from [<c0394fa4>] (usb_start_wait_urb+0x4c/0xbc) [ 58.173591] [<c0394fa4>] (usb_start_wait_urb) from [<c03950b4>] (usb_control_msg+0xa0/0xcc) [ 58.181985] [<c03950b4>] (usb_control_msg) from [<c038bf54>] (usb_clear_port_feature+0x44/0x4c) [ 58.190730] [<c038bf54>] (usb_clear_port_feature) from [<c038c320>] (hub_port_reset+0x228/0x51c) [ 58.199561] [<c038c320>] (hub_port_reset) from [<c038fd68>] (hub_event+0x87c/0x108c) [ 58.207349] [<c038fd68>] (hub_event) from [<c012ecc4>] (process_one_work+0x1d8/0x3f0) [ 58.215220] [<c012ecc4>] (process_one_work) from [<c012f8d8>] (worker_thread+0x38/0x554) [ 58.223354] [<c012f8d8>] (worker_thread) from [<c01347d0>] (kthread+0x108/0x138) [ 58.230789] [<c01347d0>] (kthread) from [<c01076d8>] (ret_from_fork+0x14/0x3c) [ 58.238056] xhci_hcd 0000:01:00.0: HC died; cleaning up [ 58.243391] usb 2-2: USB disconnect, device number 2 ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: Possible regression between 4.9 and 4.13 2017-08-23 9:31 ` Mason @ 2017-08-23 11:11 ` Mathias Nyman -1 siblings, 0 replies; 60+ messages in thread From: Mathias Nyman @ 2017-08-23 11:11 UTC (permalink / raw) To: Mason, Mathias Nyman, Felipe Balbi, linux-pci, linux-usb, Linux ARM Cc: Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman On 23.08.2017 12:31, Mason wrote: > On 23/08/2017 09:51, Mathias Nyman wrote: > >> very likely cause is the more aggressive detection of pci removed xhci hosts >> >> See commit d9f11ba9f107aa335091ab8d7ba5eea714e46e8b >> xhci: Rework how we handle unresponsive or hoptlug removed hosts >> >> It checks if a xhci register reads returns 0xffffffff and assumes xhci >> died in that case. >> >> Could you add something like the below to check which what is killing the host? >> Or a BUG()/WARN() in xhci_hc_died() to get a backtrace of who called it. > > [ 46.525247] usb 2-2: new SuperSpeed USB device number 2 using xhci_hcd > [ 46.565496] usb-storage 2-2:1.0: USB Mass Storage device detected > [ 46.571934] scsi host0: usb-storage 2-2:1.0 > [ 47.601227] scsi 0:0:0:0: Direct-Access Kingston DataTraveler 3.0 PQ: 0 ANSI: 6 > [ 47.611340] sd 0:0:0:0: [sda] 15109516 512-byte logical blocks: (7.74 GB/7.20 GiB) > [ 47.621624] sd 0:0:0:0: [sda] Write Protect is off > [ 47.627131] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA > [ 47.639637] sda: sda1 > [ 47.648091] sd 0:0:0:0: [sda] Attached SCSI removable disk > [ 58.100306] xhci_hcd 0000:01:00.0: xHCI host controller not responding, assume dead > [ 58.108021] CPU: 0 PID: 939 Comm: kworker/0:2 Tainted: G C 4.13.0-rc6 #11 > [ 58.115976] Hardware name: Sigma Tango DT > [ 58.120016] Workqueue: usb_hub_wq hub_event > [ 58.124241] [<c010f288>] (unwind_backtrace) from [<c010af58>] (show_stack+0x10/0x14) > [ 58.132033] [<c010af58>] (show_stack) from [<c049d714>] (dump_stack+0x84/0x98) > [ 58.139302] [<c049d714>] (dump_stack) from [<c03b090c>] (xhci_hc_died.part.9+0x50/0x23c) > [ 58.147438] [<c03b090c>] (xhci_hc_died.part.9) from [<c03b5d80>] (xhci_hub_control+0xf3c/0x175c) > [ 58.156273] [<c03b5d80>] (xhci_hub_control) from [<c03934a4>] (usb_hcd_submit_urb+0x264/0x814) > [ 58.164932] [<c03934a4>] (usb_hcd_submit_urb) from [<c0394fa4>] (usb_start_wait_urb+0x4c/0xbc) > [ 58.173591] [<c0394fa4>] (usb_start_wait_urb) from [<c03950b4>] (usb_control_msg+0xa0/0xcc) > [ 58.181985] [<c03950b4>] (usb_control_msg) from [<c038bf54>] (usb_clear_port_feature+0x44/0x4c) > [ 58.190730] [<c038bf54>] (usb_clear_port_feature) from [<c038c320>] (hub_port_reset+0x228/0x51c) > [ 58.199561] [<c038c320>] (hub_port_reset) from [<c038fd68>] (hub_event+0x87c/0x108c) > [ 58.207349] [<c038fd68>] (hub_event) from [<c012ecc4>] (process_one_work+0x1d8/0x3f0) > [ 58.215220] [<c012ecc4>] (process_one_work) from [<c012f8d8>] (worker_thread+0x38/0x554) > [ 58.223354] [<c012f8d8>] (worker_thread) from [<c01347d0>] (kthread+0x108/0x138) > [ 58.230789] [<c01347d0>] (kthread) from [<c01076d8>] (ret_from_fork+0x14/0x3c) > [ 58.238056] xhci_hcd 0000:01:00.0: HC died; cleaning up > [ 58.243391] usb 2-2: USB disconnect, device number 2 > -- xhci driver reads 0xffffffff from a mmio mapped xhci portsc register and bails out in: xhci-hub.c: temp = readl(port_array[wIndex]); if (temp == ~(u32)0) { xhci_hc_died(xhci); retval = -ENODEV; break; } In this case we read the register when hub thread asks to clear port feature. why portsc returns 0xffffffff is a nother quiestion, could the hub thread be running while xhci controller is (in D3)? Was xhci runtime suspended? There were some pcieport errors in another log you showed, maybe PCI devices are not properly recovered and the registers return 0xffffffff? -Mathias ^ permalink raw reply [flat|nested] 60+ messages in thread
* Possible regression between 4.9 and 4.13 @ 2017-08-23 11:11 ` Mathias Nyman 0 siblings, 0 replies; 60+ messages in thread From: Mathias Nyman @ 2017-08-23 11:11 UTC (permalink / raw) To: linux-arm-kernel On 23.08.2017 12:31, Mason wrote: > On 23/08/2017 09:51, Mathias Nyman wrote: > >> very likely cause is the more aggressive detection of pci removed xhci hosts >> >> See commit d9f11ba9f107aa335091ab8d7ba5eea714e46e8b >> xhci: Rework how we handle unresponsive or hoptlug removed hosts >> >> It checks if a xhci register reads returns 0xffffffff and assumes xhci >> died in that case. >> >> Could you add something like the below to check which what is killing the host? >> Or a BUG()/WARN() in xhci_hc_died() to get a backtrace of who called it. > > [ 46.525247] usb 2-2: new SuperSpeed USB device number 2 using xhci_hcd > [ 46.565496] usb-storage 2-2:1.0: USB Mass Storage device detected > [ 46.571934] scsi host0: usb-storage 2-2:1.0 > [ 47.601227] scsi 0:0:0:0: Direct-Access Kingston DataTraveler 3.0 PQ: 0 ANSI: 6 > [ 47.611340] sd 0:0:0:0: [sda] 15109516 512-byte logical blocks: (7.74 GB/7.20 GiB) > [ 47.621624] sd 0:0:0:0: [sda] Write Protect is off > [ 47.627131] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA > [ 47.639637] sda: sda1 > [ 47.648091] sd 0:0:0:0: [sda] Attached SCSI removable disk > [ 58.100306] xhci_hcd 0000:01:00.0: xHCI host controller not responding, assume dead > [ 58.108021] CPU: 0 PID: 939 Comm: kworker/0:2 Tainted: G C 4.13.0-rc6 #11 > [ 58.115976] Hardware name: Sigma Tango DT > [ 58.120016] Workqueue: usb_hub_wq hub_event > [ 58.124241] [<c010f288>] (unwind_backtrace) from [<c010af58>] (show_stack+0x10/0x14) > [ 58.132033] [<c010af58>] (show_stack) from [<c049d714>] (dump_stack+0x84/0x98) > [ 58.139302] [<c049d714>] (dump_stack) from [<c03b090c>] (xhci_hc_died.part.9+0x50/0x23c) > [ 58.147438] [<c03b090c>] (xhci_hc_died.part.9) from [<c03b5d80>] (xhci_hub_control+0xf3c/0x175c) > [ 58.156273] [<c03b5d80>] (xhci_hub_control) from [<c03934a4>] (usb_hcd_submit_urb+0x264/0x814) > [ 58.164932] [<c03934a4>] (usb_hcd_submit_urb) from [<c0394fa4>] (usb_start_wait_urb+0x4c/0xbc) > [ 58.173591] [<c0394fa4>] (usb_start_wait_urb) from [<c03950b4>] (usb_control_msg+0xa0/0xcc) > [ 58.181985] [<c03950b4>] (usb_control_msg) from [<c038bf54>] (usb_clear_port_feature+0x44/0x4c) > [ 58.190730] [<c038bf54>] (usb_clear_port_feature) from [<c038c320>] (hub_port_reset+0x228/0x51c) > [ 58.199561] [<c038c320>] (hub_port_reset) from [<c038fd68>] (hub_event+0x87c/0x108c) > [ 58.207349] [<c038fd68>] (hub_event) from [<c012ecc4>] (process_one_work+0x1d8/0x3f0) > [ 58.215220] [<c012ecc4>] (process_one_work) from [<c012f8d8>] (worker_thread+0x38/0x554) > [ 58.223354] [<c012f8d8>] (worker_thread) from [<c01347d0>] (kthread+0x108/0x138) > [ 58.230789] [<c01347d0>] (kthread) from [<c01076d8>] (ret_from_fork+0x14/0x3c) > [ 58.238056] xhci_hcd 0000:01:00.0: HC died; cleaning up > [ 58.243391] usb 2-2: USB disconnect, device number 2 > -- xhci driver reads 0xffffffff from a mmio mapped xhci portsc register and bails out in: xhci-hub.c: temp = readl(port_array[wIndex]); if (temp == ~(u32)0) { xhci_hc_died(xhci); retval = -ENODEV; break; } In this case we read the register when hub thread asks to clear port feature. why portsc returns 0xffffffff is a nother quiestion, could the hub thread be running while xhci controller is (in D3)? Was xhci runtime suspended? There were some pcieport errors in another log you showed, maybe PCI devices are not properly recovered and the registers return 0xffffffff? -Mathias ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: Possible regression between 4.9 and 4.13 2017-08-23 11:11 ` Mathias Nyman @ 2017-08-23 11:54 ` Mason -1 siblings, 0 replies; 60+ messages in thread From: Mason @ 2017-08-23 11:54 UTC (permalink / raw) To: Mathias Nyman, Felipe Balbi, linux-pci, linux-usb, Linux ARM Cc: Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman On 23/08/2017 13:11, Mathias Nyman wrote: > On 23.08.2017 12:31, Mason wrote: > >> [ 46.525247] usb 2-2: new SuperSpeed USB device number 2 using xhci_hcd >> [ 46.565496] usb-storage 2-2:1.0: USB Mass Storage device detected >> [ 46.571934] scsi host0: usb-storage 2-2:1.0 >> [ 47.601227] scsi 0:0:0:0: Direct-Access Kingston DataTraveler 3.0 PQ: 0 ANSI: 6 >> [ 47.611340] sd 0:0:0:0: [sda] 15109516 512-byte logical blocks: (7.74 GB/7.20 GiB) >> [ 47.621624] sd 0:0:0:0: [sda] Write Protect is off >> [ 47.627131] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA >> [ 47.639637] sda: sda1 >> [ 47.648091] sd 0:0:0:0: [sda] Attached SCSI removable disk >> [ 58.100306] xhci_hcd 0000:01:00.0: xHCI host controller not responding, assume dead >> [ 58.108021] CPU: 0 PID: 939 Comm: kworker/0:2 Tainted: G C 4.13.0-rc6 #11 >> [ 58.115976] Hardware name: Sigma Tango DT >> [ 58.120016] Workqueue: usb_hub_wq hub_event >> [ 58.124241] [<c010f288>] (unwind_backtrace) from [<c010af58>] (show_stack+0x10/0x14) >> [ 58.132033] [<c010af58>] (show_stack) from [<c049d714>] (dump_stack+0x84/0x98) >> [ 58.139302] [<c049d714>] (dump_stack) from [<c03b090c>] (xhci_hc_died.part.9+0x50/0x23c) >> [ 58.147438] [<c03b090c>] (xhci_hc_died.part.9) from [<c03b5d80>] (xhci_hub_control+0xf3c/0x175c) >> [ 58.156273] [<c03b5d80>] (xhci_hub_control) from [<c03934a4>] (usb_hcd_submit_urb+0x264/0x814) >> [ 58.164932] [<c03934a4>] (usb_hcd_submit_urb) from [<c0394fa4>] (usb_start_wait_urb+0x4c/0xbc) >> [ 58.173591] [<c0394fa4>] (usb_start_wait_urb) from [<c03950b4>] (usb_control_msg+0xa0/0xcc) >> [ 58.181985] [<c03950b4>] (usb_control_msg) from [<c038bf54>] (usb_clear_port_feature+0x44/0x4c) >> [ 58.190730] [<c038bf54>] (usb_clear_port_feature) from [<c038c320>] (hub_port_reset+0x228/0x51c) >> [ 58.199561] [<c038c320>] (hub_port_reset) from [<c038fd68>] (hub_event+0x87c/0x108c) >> [ 58.207349] [<c038fd68>] (hub_event) from [<c012ecc4>] (process_one_work+0x1d8/0x3f0) >> [ 58.215220] [<c012ecc4>] (process_one_work) from [<c012f8d8>] (worker_thread+0x38/0x554) >> [ 58.223354] [<c012f8d8>] (worker_thread) from [<c01347d0>] (kthread+0x108/0x138) >> [ 58.230789] [<c01347d0>] (kthread) from [<c01076d8>] (ret_from_fork+0x14/0x3c) >> [ 58.238056] xhci_hcd 0000:01:00.0: HC died; cleaning up >> [ 58.243391] usb 2-2: USB disconnect, device number 2 > > xhci driver reads 0xffffffff from a mmio mapped xhci portsc register and bails out in: > xhci-hub.c: > temp = readl(port_array[wIndex]); > if (temp == ~(u32)0) { > xhci_hc_died(xhci); > retval = -ENODEV; > break; > } > > In this case we read the register when hub thread asks to clear port feature. > > why portsc returns 0xffffffff is a another question, could the hub thread be running while xhci controller is (in D3)? > Was xhci runtime suspended? How do I tell? Should I disable SUSPEND support and all kinds of power management? > There were some pcieport errors in another log you showed, maybe PCI devices are not properly recovered > and the registers return 0xffffffff? FWIW, I just compiled v4.12-rc1 and I do get the broken behavior. v4.11.12 = OK v4.12-rc1 = KO PLUG [ 17.226953] usb 2-2: new SuperSpeed USB device number 2 using xhci_hcd [ 17.267195] usb-storage 2-2:1.0: USB Mass Storage device detected [ 17.273612] scsi host0: usb-storage 2-2:1.0 [ 18.296369] scsi 0:0:0:0: Direct-Access Kingston DataTraveler 3.0 PQ: 0 ANSI: 6 [ 18.307772] sd 0:0:0:0: [sda] 15109516 512-byte logical blocks: (7.74 GB/7.20 GiB) [ 18.316991] sd 0:0:0:0: [sda] Write Protect is off [ 18.322588] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA [ 18.334828] sda: sda1 [ 18.339507] sd 0:0:0:0: [sda] Attached SCSI removable disk [ 18.366202] random: fast init done UNPLUG [ 21.314111] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 [ 21.322219] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) [ 21.334039] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 [ 21.342453] pcieport 0000:00:00.0: [14] Completion Timeout (First) [ 21.349306] pcieport 0000:00:00.0: AER: Device recovery failed [ 22.055471] xhci_hcd 0000:01:00.0: xHCI host controller not responding, assume dead [ 22.063187] xhci_hcd 0000:01:00.0: HC died; cleaning up [ 22.068523] usb 2-2: USB disconnect, device number 2 [ 22.073774] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 [ 22.085369] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) [ 22.098823] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 [ 22.107245] pcieport 0000:00:00.0: [14] Completion Timeout (First) [ 22.114130] pcieport 0000:00:00.0: AER: Device recovery failed [ 22.120026] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 [ 22.128096] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) [ 22.139916] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 [ 22.148320] pcieport 0000:00:00.0: [14] Completion Timeout (First) [ 22.155162] pcieport 0000:00:00.0: AER: Device recovery failed The defconfig I used for testing: # CONFIG_SWAP is not set CONFIG_SYSVIPC=y CONFIG_NO_HZ_IDLE=y CONFIG_HIGH_RES_TIMERS=y # CONFIG_COMPAT_BRK is not set CONFIG_SLAB=y CONFIG_MODULES=y CONFIG_MODULE_UNLOAD=y CONFIG_MODVERSIONS=y CONFIG_ARCH_TANGO=y # CONFIG_ARM_ERRATA_643719 is not set CONFIG_PCI=y CONFIG_PCIEPORTBUS=y CONFIG_PCI_MSI=y CONFIG_PCIE_TANGO_SMP8759=y CONFIG_SMP=y CONFIG_PREEMPT=y CONFIG_HZ_300=y CONFIG_AEABI=y CONFIG_HIGHMEM=y # CONFIG_ATAGS is not set CONFIG_ARM_APPENDED_DTB=y CONFIG_ARM_ATAG_DTB_COMPAT=y CONFIG_CPU_FREQ=y CONFIG_CPU_FREQ_GOV_ONDEMAND=y CONFIG_CPUFREQ_DT=y CONFIG_VFP=y CONFIG_NEON=y CONFIG_NET=y CONFIG_PACKET=y CONFIG_UNIX=y CONFIG_INET=y CONFIG_IP_MULTICAST=y CONFIG_IP_PNP=y CONFIG_IP_PNP_DHCP=y # CONFIG_INET_XFRM_MODE_TRANSPORT is not set # CONFIG_INET_XFRM_MODE_TUNNEL is not set # CONFIG_INET_XFRM_MODE_BEET is not set # CONFIG_IPV6 is not set CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug" CONFIG_DEVTMPFS=y CONFIG_DEVTMPFS_MOUNT=y CONFIG_BLK_DEV_LOOP=y CONFIG_SCSI=y CONFIG_BLK_DEV_SD=y CONFIG_NETDEVICES=y CONFIG_NET_VENDOR_AURORA=y CONFIG_AURORA_NB8800=y CONFIG_AT803X_PHY=y # CONFIG_WLAN is not set # CONFIG_INPUT_KEYBOARD is not set # CONFIG_INPUT_MOUSE is not set # CONFIG_SERIO is not set CONFIG_SERIAL_8250=y # CONFIG_SERIAL_8250_DEPRECATED_OPTIONS is not set CONFIG_SERIAL_8250_CONSOLE=y CONFIG_SERIAL_8250_RT288X=y CONFIG_SERIAL_OF_PLATFORM=y # CONFIG_HW_RANDOM is not set CONFIG_I2C=y CONFIG_I2C_XLR=y CONFIG_GPIOLIB=y CONFIG_THERMAL=y CONFIG_CPU_THERMAL=y CONFIG_TANGO_THERMAL=y CONFIG_WATCHDOG=y CONFIG_TANGOX_WATCHDOG=y CONFIG_FB=y # CONFIG_HID is not set # CONFIG_USB_HID is not set CONFIG_USB=y CONFIG_USB_XHCI_HCD=y CONFIG_USB_STORAGE=y CONFIG_EXT4_FS=y CONFIG_FUSE_FS=m CONFIG_VFAT_FS=m CONFIG_TMPFS=y CONFIG_NFS_FS=y # CONFIG_NFS_V2 is not set CONFIG_ROOT_NFS=y CONFIG_NLS_CODEPAGE_437=m CONFIG_NLS_ISO8859_1=m CONFIG_NLS_UTF8=m CONFIG_PRINTK_TIME=y # CONFIG_CRYPTO_ECHAINIV is not set ^ permalink raw reply [flat|nested] 60+ messages in thread
* Possible regression between 4.9 and 4.13 @ 2017-08-23 11:54 ` Mason 0 siblings, 0 replies; 60+ messages in thread From: Mason @ 2017-08-23 11:54 UTC (permalink / raw) To: linux-arm-kernel On 23/08/2017 13:11, Mathias Nyman wrote: > On 23.08.2017 12:31, Mason wrote: > >> [ 46.525247] usb 2-2: new SuperSpeed USB device number 2 using xhci_hcd >> [ 46.565496] usb-storage 2-2:1.0: USB Mass Storage device detected >> [ 46.571934] scsi host0: usb-storage 2-2:1.0 >> [ 47.601227] scsi 0:0:0:0: Direct-Access Kingston DataTraveler 3.0 PQ: 0 ANSI: 6 >> [ 47.611340] sd 0:0:0:0: [sda] 15109516 512-byte logical blocks: (7.74 GB/7.20 GiB) >> [ 47.621624] sd 0:0:0:0: [sda] Write Protect is off >> [ 47.627131] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA >> [ 47.639637] sda: sda1 >> [ 47.648091] sd 0:0:0:0: [sda] Attached SCSI removable disk >> [ 58.100306] xhci_hcd 0000:01:00.0: xHCI host controller not responding, assume dead >> [ 58.108021] CPU: 0 PID: 939 Comm: kworker/0:2 Tainted: G C 4.13.0-rc6 #11 >> [ 58.115976] Hardware name: Sigma Tango DT >> [ 58.120016] Workqueue: usb_hub_wq hub_event >> [ 58.124241] [<c010f288>] (unwind_backtrace) from [<c010af58>] (show_stack+0x10/0x14) >> [ 58.132033] [<c010af58>] (show_stack) from [<c049d714>] (dump_stack+0x84/0x98) >> [ 58.139302] [<c049d714>] (dump_stack) from [<c03b090c>] (xhci_hc_died.part.9+0x50/0x23c) >> [ 58.147438] [<c03b090c>] (xhci_hc_died.part.9) from [<c03b5d80>] (xhci_hub_control+0xf3c/0x175c) >> [ 58.156273] [<c03b5d80>] (xhci_hub_control) from [<c03934a4>] (usb_hcd_submit_urb+0x264/0x814) >> [ 58.164932] [<c03934a4>] (usb_hcd_submit_urb) from [<c0394fa4>] (usb_start_wait_urb+0x4c/0xbc) >> [ 58.173591] [<c0394fa4>] (usb_start_wait_urb) from [<c03950b4>] (usb_control_msg+0xa0/0xcc) >> [ 58.181985] [<c03950b4>] (usb_control_msg) from [<c038bf54>] (usb_clear_port_feature+0x44/0x4c) >> [ 58.190730] [<c038bf54>] (usb_clear_port_feature) from [<c038c320>] (hub_port_reset+0x228/0x51c) >> [ 58.199561] [<c038c320>] (hub_port_reset) from [<c038fd68>] (hub_event+0x87c/0x108c) >> [ 58.207349] [<c038fd68>] (hub_event) from [<c012ecc4>] (process_one_work+0x1d8/0x3f0) >> [ 58.215220] [<c012ecc4>] (process_one_work) from [<c012f8d8>] (worker_thread+0x38/0x554) >> [ 58.223354] [<c012f8d8>] (worker_thread) from [<c01347d0>] (kthread+0x108/0x138) >> [ 58.230789] [<c01347d0>] (kthread) from [<c01076d8>] (ret_from_fork+0x14/0x3c) >> [ 58.238056] xhci_hcd 0000:01:00.0: HC died; cleaning up >> [ 58.243391] usb 2-2: USB disconnect, device number 2 > > xhci driver reads 0xffffffff from a mmio mapped xhci portsc register and bails out in: > xhci-hub.c: > temp = readl(port_array[wIndex]); > if (temp == ~(u32)0) { > xhci_hc_died(xhci); > retval = -ENODEV; > break; > } > > In this case we read the register when hub thread asks to clear port feature. > > why portsc returns 0xffffffff is a another question, could the hub thread be running while xhci controller is (in D3)? > Was xhci runtime suspended? How do I tell? Should I disable SUSPEND support and all kinds of power management? > There were some pcieport errors in another log you showed, maybe PCI devices are not properly recovered > and the registers return 0xffffffff? FWIW, I just compiled v4.12-rc1 and I do get the broken behavior. v4.11.12 = OK v4.12-rc1 = KO PLUG [ 17.226953] usb 2-2: new SuperSpeed USB device number 2 using xhci_hcd [ 17.267195] usb-storage 2-2:1.0: USB Mass Storage device detected [ 17.273612] scsi host0: usb-storage 2-2:1.0 [ 18.296369] scsi 0:0:0:0: Direct-Access Kingston DataTraveler 3.0 PQ: 0 ANSI: 6 [ 18.307772] sd 0:0:0:0: [sda] 15109516 512-byte logical blocks: (7.74 GB/7.20 GiB) [ 18.316991] sd 0:0:0:0: [sda] Write Protect is off [ 18.322588] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA [ 18.334828] sda: sda1 [ 18.339507] sd 0:0:0:0: [sda] Attached SCSI removable disk [ 18.366202] random: fast init done UNPLUG [ 21.314111] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 [ 21.322219] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) [ 21.334039] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 [ 21.342453] pcieport 0000:00:00.0: [14] Completion Timeout (First) [ 21.349306] pcieport 0000:00:00.0: AER: Device recovery failed [ 22.055471] xhci_hcd 0000:01:00.0: xHCI host controller not responding, assume dead [ 22.063187] xhci_hcd 0000:01:00.0: HC died; cleaning up [ 22.068523] usb 2-2: USB disconnect, device number 2 [ 22.073774] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 [ 22.085369] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) [ 22.098823] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 [ 22.107245] pcieport 0000:00:00.0: [14] Completion Timeout (First) [ 22.114130] pcieport 0000:00:00.0: AER: Device recovery failed [ 22.120026] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 [ 22.128096] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) [ 22.139916] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 [ 22.148320] pcieport 0000:00:00.0: [14] Completion Timeout (First) [ 22.155162] pcieport 0000:00:00.0: AER: Device recovery failed The defconfig I used for testing: # CONFIG_SWAP is not set CONFIG_SYSVIPC=y CONFIG_NO_HZ_IDLE=y CONFIG_HIGH_RES_TIMERS=y # CONFIG_COMPAT_BRK is not set CONFIG_SLAB=y CONFIG_MODULES=y CONFIG_MODULE_UNLOAD=y CONFIG_MODVERSIONS=y CONFIG_ARCH_TANGO=y # CONFIG_ARM_ERRATA_643719 is not set CONFIG_PCI=y CONFIG_PCIEPORTBUS=y CONFIG_PCI_MSI=y CONFIG_PCIE_TANGO_SMP8759=y CONFIG_SMP=y CONFIG_PREEMPT=y CONFIG_HZ_300=y CONFIG_AEABI=y CONFIG_HIGHMEM=y # CONFIG_ATAGS is not set CONFIG_ARM_APPENDED_DTB=y CONFIG_ARM_ATAG_DTB_COMPAT=y CONFIG_CPU_FREQ=y CONFIG_CPU_FREQ_GOV_ONDEMAND=y CONFIG_CPUFREQ_DT=y CONFIG_VFP=y CONFIG_NEON=y CONFIG_NET=y CONFIG_PACKET=y CONFIG_UNIX=y CONFIG_INET=y CONFIG_IP_MULTICAST=y CONFIG_IP_PNP=y CONFIG_IP_PNP_DHCP=y # CONFIG_INET_XFRM_MODE_TRANSPORT is not set # CONFIG_INET_XFRM_MODE_TUNNEL is not set # CONFIG_INET_XFRM_MODE_BEET is not set # CONFIG_IPV6 is not set CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug" CONFIG_DEVTMPFS=y CONFIG_DEVTMPFS_MOUNT=y CONFIG_BLK_DEV_LOOP=y CONFIG_SCSI=y CONFIG_BLK_DEV_SD=y CONFIG_NETDEVICES=y CONFIG_NET_VENDOR_AURORA=y CONFIG_AURORA_NB8800=y CONFIG_AT803X_PHY=y # CONFIG_WLAN is not set # CONFIG_INPUT_KEYBOARD is not set # CONFIG_INPUT_MOUSE is not set # CONFIG_SERIO is not set CONFIG_SERIAL_8250=y # CONFIG_SERIAL_8250_DEPRECATED_OPTIONS is not set CONFIG_SERIAL_8250_CONSOLE=y CONFIG_SERIAL_8250_RT288X=y CONFIG_SERIAL_OF_PLATFORM=y # CONFIG_HW_RANDOM is not set CONFIG_I2C=y CONFIG_I2C_XLR=y CONFIG_GPIOLIB=y CONFIG_THERMAL=y CONFIG_CPU_THERMAL=y CONFIG_TANGO_THERMAL=y CONFIG_WATCHDOG=y CONFIG_TANGOX_WATCHDOG=y CONFIG_FB=y # CONFIG_HID is not set # CONFIG_USB_HID is not set CONFIG_USB=y CONFIG_USB_XHCI_HCD=y CONFIG_USB_STORAGE=y CONFIG_EXT4_FS=y CONFIG_FUSE_FS=m CONFIG_VFAT_FS=m CONFIG_TMPFS=y CONFIG_NFS_FS=y # CONFIG_NFS_V2 is not set CONFIG_ROOT_NFS=y CONFIG_NLS_CODEPAGE_437=m CONFIG_NLS_ISO8859_1=m CONFIG_NLS_UTF8=m CONFIG_PRINTK_TIME=y # CONFIG_CRYPTO_ECHAINIV is not set ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: Possible regression between 4.9 and 4.13 2017-08-23 11:54 ` Mason @ 2017-08-23 12:41 ` Mason -1 siblings, 0 replies; 60+ messages in thread From: Mason @ 2017-08-23 12:41 UTC (permalink / raw) To: Mathias Nyman, Felipe Balbi, linux-pci, linux-usb, Linux ARM Cc: Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman On 23/08/2017 13:54, Mason wrote: > On 23/08/2017 13:11, Mathias Nyman wrote: > >> In this case we read the register when hub thread asks to clear port feature. >> >> why portsc returns 0xffffffff is a another question, could the hub thread be running while xhci controller is (in D3)? >> Was xhci runtime suspended? > > How do I tell? > Should I disable SUSPEND support and all kinds of power management? I compiled a minimal kernel, with lots of irrelevant drivers and frameworks left out, including power management. I still get the "xHCI host controller not responding, assume dead" issue. PLUG [ 59.803499] usb 2-2: new SuperSpeed USB device number 2 using xhci_hcd [ 59.836902] usb 2-2: New USB device found, idVendor=0951, idProduct=1666 [ 59.843653] usb 2-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3 [ 59.850900] usb 2-2: Product: DataTraveler 3.0 [ 59.855417] usb 2-2: Manufacturer: Kingston [ 59.859661] usb 2-2: SerialNumber: 002618887865F0C0F8646BFA [ 59.868249] usb-storage 2-2:1.0: USB Mass Storage device detected [ 59.874691] scsi host0: usb-storage 2-2:1.0 [ 60.882801] scsi 0:0:0:0: Direct-Access Kingston DataTraveler 3.0 PQ: 0 ANSI: 6 [ 60.891640] sd 0:0:0:0: [sda] 15109516 512-byte logical blocks: (7.74 GB/7.20 GiB) [ 60.899662] sd 0:0:0:0: [sda] Write Protect is off [ 60.904763] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA [ 60.916154] sda: sda1 [ 60.919798] sd 0:0:0:0: [sda] Attached SCSI removable disk UNPLUG [ 70.545087] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 [ 70.553169] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) [ 70.565084] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 [ 70.573528] pcieport 0000:00:00.0: [14] Completion Timeout (First) [ 70.580402] pcieport 0000:00:00.0: AER: Device recovery failed [ 71.275253] xhci_hcd 0000:01:00.0: xHCI host controller not responding, assume dead [ 71.282956] xhci_hcd 0000:01:00.0: HC died; cleaning up [ 71.288304] usb 2-2: USB disconnect, device number 2 [ 71.293445] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 [ 71.301851] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) [ 71.313785] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 [ 71.322240] pcieport 0000:00:00.0: [14] Completion Timeout (First) [ 71.329115] pcieport 0000:00:00.0: AER: Device recovery failed [ 71.335042] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 [ 71.343137] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) [ 71.354984] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 [ 71.363443] pcieport 0000:00:00.0: [14] Completion Timeout (First) [ 71.370289] pcieport 0000:00:00.0: AER: Device recovery failed defconfig for reference # CONFIG_SWAP is not set CONFIG_SYSVIPC=y CONFIG_NO_HZ_IDLE=y CONFIG_HIGH_RES_TIMERS=y # CONFIG_COMPAT_BRK is not set CONFIG_SLAB=y CONFIG_MODULES=y CONFIG_MODULE_UNLOAD=y CONFIG_MODVERSIONS=y CONFIG_ARCH_TANGO=y # CONFIG_ARM_ERRATA_643719 is not set CONFIG_PCI=y CONFIG_PCIEPORTBUS=y CONFIG_PCI_MSI=y CONFIG_PCIE_TANGO_SMP8759=y CONFIG_SMP=y CONFIG_PREEMPT=y CONFIG_HZ_300=y CONFIG_AEABI=y CONFIG_HIGHMEM=y # CONFIG_ATAGS is not set CONFIG_ARM_APPENDED_DTB=y CONFIG_ARM_ATAG_DTB_COMPAT=y CONFIG_VFP=y CONFIG_NEON=y # CONFIG_SUSPEND is not set CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug" CONFIG_DEVTMPFS=y CONFIG_DEVTMPFS_MOUNT=y CONFIG_BLK_DEV_LOOP=y CONFIG_SCSI=y CONFIG_BLK_DEV_SD=y # CONFIG_INPUT_KEYBOARD is not set # CONFIG_INPUT_MOUSE is not set # CONFIG_SERIO is not set CONFIG_SERIAL_8250=y # CONFIG_SERIAL_8250_DEPRECATED_OPTIONS is not set CONFIG_SERIAL_8250_CONSOLE=y CONFIG_SERIAL_8250_RT288X=y CONFIG_SERIAL_OF_PLATFORM=y # CONFIG_HW_RANDOM is not set # CONFIG_HWMON is not set # CONFIG_HID is not set # CONFIG_USB_HID is not set CONFIG_USB=y CONFIG_USB_ANNOUNCE_NEW_DEVICES=y CONFIG_USB_XHCI_HCD=y CONFIG_USB_STORAGE=y CONFIG_VFAT_FS=m CONFIG_TMPFS=y CONFIG_NLS_CODEPAGE_437=m CONFIG_NLS_ISO8859_1=m CONFIG_NLS_UTF8=m CONFIG_PRINTK_TIME=y ^ permalink raw reply [flat|nested] 60+ messages in thread
* Possible regression between 4.9 and 4.13 @ 2017-08-23 12:41 ` Mason 0 siblings, 0 replies; 60+ messages in thread From: Mason @ 2017-08-23 12:41 UTC (permalink / raw) To: linux-arm-kernel On 23/08/2017 13:54, Mason wrote: > On 23/08/2017 13:11, Mathias Nyman wrote: > >> In this case we read the register when hub thread asks to clear port feature. >> >> why portsc returns 0xffffffff is a another question, could the hub thread be running while xhci controller is (in D3)? >> Was xhci runtime suspended? > > How do I tell? > Should I disable SUSPEND support and all kinds of power management? I compiled a minimal kernel, with lots of irrelevant drivers and frameworks left out, including power management. I still get the "xHCI host controller not responding, assume dead" issue. PLUG [ 59.803499] usb 2-2: new SuperSpeed USB device number 2 using xhci_hcd [ 59.836902] usb 2-2: New USB device found, idVendor=0951, idProduct=1666 [ 59.843653] usb 2-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3 [ 59.850900] usb 2-2: Product: DataTraveler 3.0 [ 59.855417] usb 2-2: Manufacturer: Kingston [ 59.859661] usb 2-2: SerialNumber: 002618887865F0C0F8646BFA [ 59.868249] usb-storage 2-2:1.0: USB Mass Storage device detected [ 59.874691] scsi host0: usb-storage 2-2:1.0 [ 60.882801] scsi 0:0:0:0: Direct-Access Kingston DataTraveler 3.0 PQ: 0 ANSI: 6 [ 60.891640] sd 0:0:0:0: [sda] 15109516 512-byte logical blocks: (7.74 GB/7.20 GiB) [ 60.899662] sd 0:0:0:0: [sda] Write Protect is off [ 60.904763] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA [ 60.916154] sda: sda1 [ 60.919798] sd 0:0:0:0: [sda] Attached SCSI removable disk UNPLUG [ 70.545087] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 [ 70.553169] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) [ 70.565084] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 [ 70.573528] pcieport 0000:00:00.0: [14] Completion Timeout (First) [ 70.580402] pcieport 0000:00:00.0: AER: Device recovery failed [ 71.275253] xhci_hcd 0000:01:00.0: xHCI host controller not responding, assume dead [ 71.282956] xhci_hcd 0000:01:00.0: HC died; cleaning up [ 71.288304] usb 2-2: USB disconnect, device number 2 [ 71.293445] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 [ 71.301851] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) [ 71.313785] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 [ 71.322240] pcieport 0000:00:00.0: [14] Completion Timeout (First) [ 71.329115] pcieport 0000:00:00.0: AER: Device recovery failed [ 71.335042] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 [ 71.343137] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) [ 71.354984] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 [ 71.363443] pcieport 0000:00:00.0: [14] Completion Timeout (First) [ 71.370289] pcieport 0000:00:00.0: AER: Device recovery failed defconfig for reference # CONFIG_SWAP is not set CONFIG_SYSVIPC=y CONFIG_NO_HZ_IDLE=y CONFIG_HIGH_RES_TIMERS=y # CONFIG_COMPAT_BRK is not set CONFIG_SLAB=y CONFIG_MODULES=y CONFIG_MODULE_UNLOAD=y CONFIG_MODVERSIONS=y CONFIG_ARCH_TANGO=y # CONFIG_ARM_ERRATA_643719 is not set CONFIG_PCI=y CONFIG_PCIEPORTBUS=y CONFIG_PCI_MSI=y CONFIG_PCIE_TANGO_SMP8759=y CONFIG_SMP=y CONFIG_PREEMPT=y CONFIG_HZ_300=y CONFIG_AEABI=y CONFIG_HIGHMEM=y # CONFIG_ATAGS is not set CONFIG_ARM_APPENDED_DTB=y CONFIG_ARM_ATAG_DTB_COMPAT=y CONFIG_VFP=y CONFIG_NEON=y # CONFIG_SUSPEND is not set CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug" CONFIG_DEVTMPFS=y CONFIG_DEVTMPFS_MOUNT=y CONFIG_BLK_DEV_LOOP=y CONFIG_SCSI=y CONFIG_BLK_DEV_SD=y # CONFIG_INPUT_KEYBOARD is not set # CONFIG_INPUT_MOUSE is not set # CONFIG_SERIO is not set CONFIG_SERIAL_8250=y # CONFIG_SERIAL_8250_DEPRECATED_OPTIONS is not set CONFIG_SERIAL_8250_CONSOLE=y CONFIG_SERIAL_8250_RT288X=y CONFIG_SERIAL_OF_PLATFORM=y # CONFIG_HW_RANDOM is not set # CONFIG_HWMON is not set # CONFIG_HID is not set # CONFIG_USB_HID is not set CONFIG_USB=y CONFIG_USB_ANNOUNCE_NEW_DEVICES=y CONFIG_USB_XHCI_HCD=y CONFIG_USB_STORAGE=y CONFIG_VFAT_FS=m CONFIG_TMPFS=y CONFIG_NLS_CODEPAGE_437=m CONFIG_NLS_ISO8859_1=m CONFIG_NLS_UTF8=m CONFIG_PRINTK_TIME=y ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: Possible regression between 4.9 and 4.13 2017-08-23 12:41 ` Mason @ 2017-08-23 14:30 ` Mason -1 siblings, 0 replies; 60+ messages in thread From: Mason @ 2017-08-23 14:30 UTC (permalink / raw) To: Mathias Nyman, Felipe Balbi, linux-pci, linux-usb, Linux ARM Cc: Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman On 23/08/2017 14:41, Mason wrote: > I compiled a minimal kernel, with lots of irrelevant drivers and > frameworks left out, including power management. I still get the > "xHCI host controller not responding, assume dead" issue. The problem seems to have a timing-related aspect. I added a bunch of logs (to a slow serial console) and the HC was not killed. I was able to plug the Flash drive a second time. (I am logging config space reads and writes.) [ 1.098314] READ: bus=1 devfn=0 where=84 size=2 val=0x8 [ 1.103779] READ: bus=1 devfn=0 where=4 size=2 val=0x142 [ 1.109315] READ: bus=1 devfn=0 where=61 size=1 val=0x1 [ 1.114746] READ: bus=1 devfn=0 where=4 size=2 val=0x142 [ 1.120311] READ: bus=1 devfn=0 where=4 size=2 val=0x142 [ 1.125841] WRITE: bus=1 devfn=0 where=4 size=2 val=0x146 NB: I added msleep(2500) in usb_add_hcd() [ 3.681867] xhci_hcd 0000:01:00.0: xHCI Host Controller [ 3.687154] xhci_hcd 0000:01:00.0: new USB bus registered, assigned bus number 1 [ 3.694656] READ: bus=1 devfn=0 where=96 size=1 val=0x30 [ 3.705736] xhci_hcd 0000:01:00.0: hcc params 0x014051cf hci version 0x100 quirks 0x00000010 [ 3.714233] READ: bus=1 devfn=0 where=12 size=1 val=0x10 [ 3.719752] READ: bus=1 devfn=0 where=4 size=2 val=0x146 [ 3.725269] WRITE: bus=1 devfn=0 where=4 size=2 val=0x156 [ 3.730794] READ: bus=1 devfn=0 where=146 size=2 val=0x7 [ 3.736314] READ: bus=1 devfn=0 where=146 size=2 val=0x7 [ 3.741835] WRITE: bus=1 devfn=0 where=146 size=2 val=0x7 [ 3.747354] READ: bus=1 devfn=0 where=146 size=2 val=0x7 [ 3.752871] READ: bus=1 devfn=0 where=148 size=4 val=0x1000 [ 3.758775] READ: bus=1 devfn=0 where=146 size=2 val=0x7 [ 3.764297] WRITE: bus=1 devfn=0 where=146 size=2 val=0xc007 [ 3.770108] READ: bus=1 devfn=0 where=4 size=2 val=0x146 [ 3.775626] WRITE: bus=1 devfn=0 where=4 size=2 val=0x546 [ 3.781146] READ: bus=1 devfn=0 where=146 size=2 val=0xc007 [ 3.786925] WRITE: bus=1 devfn=0 where=146 size=2 val=0x8007 [ 3.792919] usb usb1: New USB device found, idVendor=1d6b, idProduct=0002 [ 3.799756] usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1 [ 3.807021] usb usb1: Product: xHCI Host Controller [ 3.811933] usb usb1: Manufacturer: Linux 4.12.0-rc1 xhci-hcd [ 3.817713] usb usb1: SerialNumber: 0000:01:00.0 [ 3.822773] hub 1-0:1.0: USB hub found [ 3.826598] hub 1-0:1.0: 4 ports detected NB: I added msleep(2500) in usb_add_hcd() [ 6.455246] xhci_hcd 0000:01:00.0: xHCI Host Controller [ 6.460520] xhci_hcd 0000:01:00.0: new USB bus registered, assigned bus number 2 [ 6.468028] usb usb2: We don't know the algorithms for LPM for this host, disabling LPM. [ 6.476236] usb usb2: New USB device found, idVendor=1d6b, idProduct=0003 [ 6.483068] usb usb2: New USB device strings: Mfr=3, Product=2, SerialNumber=1 [ 6.490334] usb usb2: Product: xHCI Host Controller [ 6.495240] usb usb2: Manufacturer: Linux 4.12.0-rc1 xhci-hcd [ 6.501020] usb usb2: SerialNumber: 0000:01:00.0 [ 6.505994] hub 2-0:1.0: USB hub found [ 6.509806] hub 2-0:1.0: 4 ports detected [ 6.514215] usbcore: registered new interface driver usb-storage [ 6.520313] Registering SWP/SWPB emulation handler [ 6.525541] READ: bus=0 devfn=0 where=132 size=4 val=0x8001 [ 6.531334] READ: bus=0 devfn=0 where=6 size=2 val=0x4010 [ 6.536955] READ: bus=0 devfn=0 where=52 size=1 val=0x50 [ 6.542484] READ: bus=0 devfn=0 where=80 size=2 val=0x7805 [ 6.548180] READ: bus=0 devfn=0 where=120 size=2 val=0x8001 [ 6.553969] READ: bus=0 devfn=0 where=128 size=2 val=0x10 [ 6.559584] READ: bus=0 devfn=0 where=124 size=2 val=0x6008 [ 6.565387] READ: bus=1 devfn=0 where=164 size=4 val=0x8fc0 [ 6.571167] READ: bus=1 devfn=0 where=6 size=2 val=0x10 [ 6.576609] READ: bus=1 devfn=0 where=52 size=1 val=0x50 [ 6.582129] READ: bus=1 devfn=0 where=80 size=2 val=0x7001 [ 6.587821] READ: bus=1 devfn=0 where=112 size=2 val=0x9005 [ 6.593601] READ: bus=1 devfn=0 where=144 size=2 val=0xa011 [ 6.599381] READ: bus=1 devfn=0 where=160 size=2 val=0x10 [ 6.604985] READ: bus=1 devfn=0 where=84 size=2 val=0x8 [ 6.623665] Freeing unused kernel memory: 9216K PLUG #1 [ 66.783559] usb 2-2: new SuperSpeed USB device number 2 using xhci_hcd [ 66.816910] usb 2-2: New USB device found, idVendor=0951, idProduct=1666 [ 66.823661] usb 2-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3 [ 66.830909] usb 2-2: Product: DataTraveler 3.0 [ 66.835417] usb 2-2: Manufacturer: Kingston [ 66.839660] usb 2-2: SerialNumber: 002618887865F0C0F8646BFA [ 66.848131] usb-storage 2-2:1.0: USB Mass Storage device detected [ 66.854584] scsi host0: usb-storage 2-2:1.0 [ 67.869446] scsi 0:0:0:0: Direct-Access Kingston DataTraveler 3.0 PQ: 0 ANSI: 6 [ 67.878270] sd 0:0:0:0: [sda] 15109516 512-byte logical blocks: (7.74 GB/7.20 GiB) [ 67.886248] sd 0:0:0:0: [sda] Write Protect is off [ 67.891347] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA [ 67.902708] sda: sda1 [ 67.906372] sd 0:0:0:0: [sda] Attached SCSI removable disk UNPLUG #1 [ 71.697358] READ: bus=0 devfn=0 where=2096 size=4 val=0x10000024 [ 71.703572] READ: bus=0 devfn=0 where=2100 size=4 val=0x0 [ 71.709170] WRITE: bus=0 devfn=0 where=2096 size=4 val=0x10000024 [ 71.715569] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 [ 71.723632] READ: bus=0 devfn=0 where=136 size=2 val=0x281f [ 71.729470] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 71.735373] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 71.741013] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 71.746914] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 71.752552] READ: bus=0 devfn=0 where=2072 size=4 val=0xe [ 71.758194] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) [ 71.770008] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 [ 71.778494] pcieport 0000:00:00.0: [14] Completion Timeout (First) [ 71.785358] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 71.791259] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 71.796897] READ: bus=0 devfn=0 where=2072 size=4 val=0xe [ 71.802524] pcieport 0000:00:00.0: AER: Device recovery failed [ 72.451908] READ: bus=0 devfn=0 where=2096 size=4 val=0x10000024 [ 72.458120] READ: bus=0 devfn=0 where=2100 size=4 val=0x0 [ 72.463717] WRITE: bus=0 devfn=0 where=2096 size=4 val=0x10000024 [ 72.470012] READ: bus=0 devfn=0 where=2096 size=4 val=0x10000024 [ 72.476221] READ: bus=0 devfn=0 where=2100 size=4 val=0x0 [ 72.481819] WRITE: bus=0 devfn=0 where=2096 size=4 val=0x10000024 [ 72.488109] READ: bus=0 devfn=0 where=2096 size=4 val=0x10000024 [ 72.494319] READ: bus=0 devfn=0 where=2100 size=4 val=0x0 [ 72.499916] WRITE: bus=0 devfn=0 where=2096 size=4 val=0x10000024 [ 72.506205] READ: bus=0 devfn=0 where=2096 size=4 val=0x10000024 [ 72.512415] READ: bus=0 devfn=0 where=2100 size=4 val=0x0 [ 72.518011] WRITE: bus=0 devfn=0 where=2096 size=4 val=0x10000024 [ 72.524263] xhci_hcd 0000:01:00.0: Cannot set link state. [ 72.529711] usb usb2-port2: cannot disable (err = -32) [ 72.534883] usb 2-2: USB disconnect, device number 2 [ 72.540042] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 [ 72.548365] READ: bus=0 devfn=0 where=136 size=2 val=0x281f [ 72.554264] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 72.560157] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 72.565778] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 72.571654] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 72.577273] READ: bus=0 devfn=0 where=2072 size=4 val=0xe [ 72.582891] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) [ 72.594705] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 [ 72.603122] pcieport 0000:00:00.0: [14] Completion Timeout (First) [ 72.609955] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 72.615833] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 72.621441] READ: bus=0 devfn=0 where=2072 size=4 val=0xe [ 72.627061] pcieport 0000:00:00.0: AER: Device recovery failed [ 72.632931] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 [ 72.640984] READ: bus=0 devfn=0 where=136 size=2 val=0x281f [ 72.646769] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 72.652636] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 72.658245] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 72.664114] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 72.669722] READ: bus=0 devfn=0 where=2072 size=4 val=0xe [ 72.675330] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) [ 72.687142] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 [ 72.695545] pcieport 0000:00:00.0: [14] Completion Timeout (First) [ 72.702376] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 72.708244] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 72.713856] READ: bus=0 devfn=0 where=2072 size=4 val=0xe [ 72.719473] pcieport 0000:00:00.0: AER: Device recovery failed [ 72.725342] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 [ 72.733394] READ: bus=0 devfn=0 where=136 size=2 val=0x281f [ 72.739178] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 72.745044] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 72.750653] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 72.756520] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 72.762128] READ: bus=0 devfn=0 where=2072 size=4 val=0xe [ 72.767734] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) [ 72.779548] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 [ 72.787950] pcieport 0000:00:00.0: [14] Completion Timeout (First) [ 72.794781] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 72.800649] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 72.806258] READ: bus=0 devfn=0 where=2072 size=4 val=0xe [ 72.811873] pcieport 0000:00:00.0: AER: Device recovery failed [ 72.817741] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 [ 72.825793] READ: bus=0 devfn=0 where=136 size=2 val=0x281f [ 72.831574] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 72.837442] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 72.843054] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 72.848922] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 72.854529] READ: bus=0 devfn=0 where=2072 size=4 val=0xe [ 72.860137] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) [ 72.871951] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 [ 72.880353] pcieport 0000:00:00.0: [14] Completion Timeout (First) [ 72.887184] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 72.893051] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 72.898660] READ: bus=0 devfn=0 where=2072 size=4 val=0xe [ 72.904273] pcieport 0000:00:00.0: AER: Device recovery failed PLUG #2 [ 165.860193] usb 2-2: new SuperSpeed USB device number 3 using xhci_hcd [ 165.893583] usb 2-2: New USB device found, idVendor=0951, idProduct=1666 [ 165.900333] usb 2-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3 [ 165.907515] usb 2-2: Product: DataTraveler 3.0 [ 165.911989] usb 2-2: Manufacturer: Kingston [ 165.916198] usb 2-2: SerialNumber: 002618887865F0C0F8646BFA [ 165.924547] usb-storage 2-2:1.0: USB Mass Storage device detected [ 165.930970] scsi host0: usb-storage 2-2:1.0 [ 166.962705] scsi 0:0:0:0: Direct-Access Kingston DataTraveler 3.0 PQ: 0 ANSI: 6 [ 166.971494] sd 0:0:0:0: [sda] 15109516 512-byte logical blocks: (7.74 GB/7.20 GiB) [ 166.979556] sd 0:0:0:0: [sda] Write Protect is off [ 166.984591] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA [ 166.995847] random: fast init done [ 166.999430] sda: sda1 [ 167.003039] sd 0:0:0:0: [sda] Attached SCSI removable disk UNPLUG #2 [ 171.918834] READ: bus=0 devfn=0 where=2096 size=4 val=0x10000024 [ 171.925046] READ: bus=0 devfn=0 where=2100 size=4 val=0x0 [ 171.930645] WRITE: bus=0 devfn=0 where=2096 size=4 val=0x10000024 [ 171.936941] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 [ 171.945000] READ: bus=0 devfn=0 where=136 size=2 val=0x281f [ 171.950784] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 171.956656] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 171.962263] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 171.968134] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 171.973741] READ: bus=0 devfn=0 where=2072 size=4 val=0xe [ 171.979354] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) [ 171.991164] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 [ 171.999597] pcieport 0000:00:00.0: [14] Completion Timeout (First) [ 172.006429] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 172.012300] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 172.017908] READ: bus=0 devfn=0 where=2072 size=4 val=0xe [ 172.023529] pcieport 0000:00:00.0: AER: Device recovery failed [ 172.675221] READ: bus=0 devfn=0 where=2096 size=4 val=0x10000024 [ 172.681432] READ: bus=0 devfn=0 where=2100 size=4 val=0x0 [ 172.687030] WRITE: bus=0 devfn=0 where=2096 size=4 val=0x10000024 [ 172.693325] READ: bus=0 devfn=0 where=2096 size=4 val=0x10000024 [ 172.699536] READ: bus=0 devfn=0 where=2100 size=4 val=0x0 [ 172.705133] WRITE: bus=0 devfn=0 where=2096 size=4 val=0x10000024 [ 172.711424] READ: bus=0 devfn=0 where=2096 size=4 val=0x10000024 [ 172.717633] READ: bus=0 devfn=0 where=2100 size=4 val=0x0 [ 172.723230] WRITE: bus=0 devfn=0 where=2096 size=4 val=0x10000024 [ 172.729517] READ: bus=0 devfn=0 where=2096 size=4 val=0x10000024 [ 172.735726] READ: bus=0 devfn=0 where=2100 size=4 val=0x0 [ 172.741322] WRITE: bus=0 devfn=0 where=2096 size=4 val=0x10000024 [ 172.747574] xhci_hcd 0000:01:00.0: Cannot set link state. [ 172.753021] usb usb2-port2: cannot disable (err = -32) [ 172.758193] usb 2-2: USB disconnect, device number 3 [ 172.763340] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 [ 172.771627] READ: bus=0 devfn=0 where=136 size=2 val=0x281f [ 172.777515] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 172.783408] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 172.789030] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 172.794907] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 172.800526] READ: bus=0 devfn=0 where=2072 size=4 val=0xe [ 172.806146] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) [ 172.817960] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 [ 172.826375] pcieport 0000:00:00.0: [14] Completion Timeout (First) [ 172.833208] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 172.839078] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 172.844685] READ: bus=0 devfn=0 where=2072 size=4 val=0xe [ 172.850305] pcieport 0000:00:00.0: AER: Device recovery failed [ 172.856183] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 [ 172.864236] READ: bus=0 devfn=0 where=136 size=2 val=0x281f [ 172.870020] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 172.875889] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 172.881497] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 172.887365] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 172.892974] READ: bus=0 devfn=0 where=2072 size=4 val=0xe [ 172.898582] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) [ 172.910393] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 [ 172.918796] pcieport 0000:00:00.0: [14] Completion Timeout (First) [ 172.925627] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 172.931494] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 172.937107] READ: bus=0 devfn=0 where=2072 size=4 val=0xe [ 172.942724] pcieport 0000:00:00.0: AER: Device recovery failed [ 172.948593] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 [ 172.956644] READ: bus=0 devfn=0 where=136 size=2 val=0x281f [ 172.962428] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 172.968295] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 172.973903] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 172.979771] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 172.985379] READ: bus=0 devfn=0 where=2072 size=4 val=0xe [ 172.990985] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) [ 173.002799] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 [ 173.011202] pcieport 0000:00:00.0: [14] Completion Timeout (First) [ 173.018033] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 173.023901] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 173.029510] READ: bus=0 devfn=0 where=2072 size=4 val=0xe [ 173.035123] pcieport 0000:00:00.0: AER: Device recovery failed [ 173.040990] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 [ 173.049042] READ: bus=0 devfn=0 where=136 size=2 val=0x281f [ 173.054825] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 173.060693] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 173.066305] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 173.072173] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 173.077780] READ: bus=0 devfn=0 where=2072 size=4 val=0xe [ 173.083388] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) [ 173.095202] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 [ 173.103605] pcieport 0000:00:00.0: [14] Completion Timeout (First) [ 173.110435] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 173.116303] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 173.121911] READ: bus=0 devfn=0 where=2072 size=4 val=0xe [ 173.127524] pcieport 0000:00:00.0: AER: Device recovery failed NOTE BENE: these issues do not occur at all with a USB2 Flash drive. [ 2093.564771] usb 1-2: new high-speed USB device number 2 using xhci_hcd [ 2093.790646] usb 1-2: New USB device found, idVendor=058f, idProduct=6387 [ 2093.797397] usb 1-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3 [ 2093.804583] usb 1-2: Product: Mass Storage [ 2093.808707] usb 1-2: Manufacturer: Generic [ 2093.812829] usb 1-2: SerialNumber: 31A69E70 [ 2093.819244] usb-storage 1-2:1.0: USB Mass Storage device detected [ 2093.825624] scsi host0: usb-storage 1-2:1.0 [ 2094.856918] scsi 0:0:0:0: Direct-Access Generic Flash Disk 8.07 PQ: 0 ANSI: 2 [ 2094.866196] sd 0:0:0:0: [sda] 4106240 512-byte logical blocks: (2.10 GB/1.96 GiB) [ 2094.874232] sd 0:0:0:0: [sda] Write Protect is off [ 2094.879350] sd 0:0:0:0: [sda] No Caching mode page found [ 2094.884816] sd 0:0:0:0: [sda] Assuming drive cache: write through [ 2094.909111] sda: sda1 [ 2094.912935] sd 0:0:0:0: [sda] Attached SCSI removable disk [ 2100.516396] usb 1-2: USB disconnect, device number 2 Regards. ^ permalink raw reply [flat|nested] 60+ messages in thread
* Possible regression between 4.9 and 4.13 @ 2017-08-23 14:30 ` Mason 0 siblings, 0 replies; 60+ messages in thread From: Mason @ 2017-08-23 14:30 UTC (permalink / raw) To: linux-arm-kernel On 23/08/2017 14:41, Mason wrote: > I compiled a minimal kernel, with lots of irrelevant drivers and > frameworks left out, including power management. I still get the > "xHCI host controller not responding, assume dead" issue. The problem seems to have a timing-related aspect. I added a bunch of logs (to a slow serial console) and the HC was not killed. I was able to plug the Flash drive a second time. (I am logging config space reads and writes.) [ 1.098314] READ: bus=1 devfn=0 where=84 size=2 val=0x8 [ 1.103779] READ: bus=1 devfn=0 where=4 size=2 val=0x142 [ 1.109315] READ: bus=1 devfn=0 where=61 size=1 val=0x1 [ 1.114746] READ: bus=1 devfn=0 where=4 size=2 val=0x142 [ 1.120311] READ: bus=1 devfn=0 where=4 size=2 val=0x142 [ 1.125841] WRITE: bus=1 devfn=0 where=4 size=2 val=0x146 NB: I added msleep(2500) in usb_add_hcd() [ 3.681867] xhci_hcd 0000:01:00.0: xHCI Host Controller [ 3.687154] xhci_hcd 0000:01:00.0: new USB bus registered, assigned bus number 1 [ 3.694656] READ: bus=1 devfn=0 where=96 size=1 val=0x30 [ 3.705736] xhci_hcd 0000:01:00.0: hcc params 0x014051cf hci version 0x100 quirks 0x00000010 [ 3.714233] READ: bus=1 devfn=0 where=12 size=1 val=0x10 [ 3.719752] READ: bus=1 devfn=0 where=4 size=2 val=0x146 [ 3.725269] WRITE: bus=1 devfn=0 where=4 size=2 val=0x156 [ 3.730794] READ: bus=1 devfn=0 where=146 size=2 val=0x7 [ 3.736314] READ: bus=1 devfn=0 where=146 size=2 val=0x7 [ 3.741835] WRITE: bus=1 devfn=0 where=146 size=2 val=0x7 [ 3.747354] READ: bus=1 devfn=0 where=146 size=2 val=0x7 [ 3.752871] READ: bus=1 devfn=0 where=148 size=4 val=0x1000 [ 3.758775] READ: bus=1 devfn=0 where=146 size=2 val=0x7 [ 3.764297] WRITE: bus=1 devfn=0 where=146 size=2 val=0xc007 [ 3.770108] READ: bus=1 devfn=0 where=4 size=2 val=0x146 [ 3.775626] WRITE: bus=1 devfn=0 where=4 size=2 val=0x546 [ 3.781146] READ: bus=1 devfn=0 where=146 size=2 val=0xc007 [ 3.786925] WRITE: bus=1 devfn=0 where=146 size=2 val=0x8007 [ 3.792919] usb usb1: New USB device found, idVendor=1d6b, idProduct=0002 [ 3.799756] usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1 [ 3.807021] usb usb1: Product: xHCI Host Controller [ 3.811933] usb usb1: Manufacturer: Linux 4.12.0-rc1 xhci-hcd [ 3.817713] usb usb1: SerialNumber: 0000:01:00.0 [ 3.822773] hub 1-0:1.0: USB hub found [ 3.826598] hub 1-0:1.0: 4 ports detected NB: I added msleep(2500) in usb_add_hcd() [ 6.455246] xhci_hcd 0000:01:00.0: xHCI Host Controller [ 6.460520] xhci_hcd 0000:01:00.0: new USB bus registered, assigned bus number 2 [ 6.468028] usb usb2: We don't know the algorithms for LPM for this host, disabling LPM. [ 6.476236] usb usb2: New USB device found, idVendor=1d6b, idProduct=0003 [ 6.483068] usb usb2: New USB device strings: Mfr=3, Product=2, SerialNumber=1 [ 6.490334] usb usb2: Product: xHCI Host Controller [ 6.495240] usb usb2: Manufacturer: Linux 4.12.0-rc1 xhci-hcd [ 6.501020] usb usb2: SerialNumber: 0000:01:00.0 [ 6.505994] hub 2-0:1.0: USB hub found [ 6.509806] hub 2-0:1.0: 4 ports detected [ 6.514215] usbcore: registered new interface driver usb-storage [ 6.520313] Registering SWP/SWPB emulation handler [ 6.525541] READ: bus=0 devfn=0 where=132 size=4 val=0x8001 [ 6.531334] READ: bus=0 devfn=0 where=6 size=2 val=0x4010 [ 6.536955] READ: bus=0 devfn=0 where=52 size=1 val=0x50 [ 6.542484] READ: bus=0 devfn=0 where=80 size=2 val=0x7805 [ 6.548180] READ: bus=0 devfn=0 where=120 size=2 val=0x8001 [ 6.553969] READ: bus=0 devfn=0 where=128 size=2 val=0x10 [ 6.559584] READ: bus=0 devfn=0 where=124 size=2 val=0x6008 [ 6.565387] READ: bus=1 devfn=0 where=164 size=4 val=0x8fc0 [ 6.571167] READ: bus=1 devfn=0 where=6 size=2 val=0x10 [ 6.576609] READ: bus=1 devfn=0 where=52 size=1 val=0x50 [ 6.582129] READ: bus=1 devfn=0 where=80 size=2 val=0x7001 [ 6.587821] READ: bus=1 devfn=0 where=112 size=2 val=0x9005 [ 6.593601] READ: bus=1 devfn=0 where=144 size=2 val=0xa011 [ 6.599381] READ: bus=1 devfn=0 where=160 size=2 val=0x10 [ 6.604985] READ: bus=1 devfn=0 where=84 size=2 val=0x8 [ 6.623665] Freeing unused kernel memory: 9216K PLUG #1 [ 66.783559] usb 2-2: new SuperSpeed USB device number 2 using xhci_hcd [ 66.816910] usb 2-2: New USB device found, idVendor=0951, idProduct=1666 [ 66.823661] usb 2-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3 [ 66.830909] usb 2-2: Product: DataTraveler 3.0 [ 66.835417] usb 2-2: Manufacturer: Kingston [ 66.839660] usb 2-2: SerialNumber: 002618887865F0C0F8646BFA [ 66.848131] usb-storage 2-2:1.0: USB Mass Storage device detected [ 66.854584] scsi host0: usb-storage 2-2:1.0 [ 67.869446] scsi 0:0:0:0: Direct-Access Kingston DataTraveler 3.0 PQ: 0 ANSI: 6 [ 67.878270] sd 0:0:0:0: [sda] 15109516 512-byte logical blocks: (7.74 GB/7.20 GiB) [ 67.886248] sd 0:0:0:0: [sda] Write Protect is off [ 67.891347] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA [ 67.902708] sda: sda1 [ 67.906372] sd 0:0:0:0: [sda] Attached SCSI removable disk UNPLUG #1 [ 71.697358] READ: bus=0 devfn=0 where=2096 size=4 val=0x10000024 [ 71.703572] READ: bus=0 devfn=0 where=2100 size=4 val=0x0 [ 71.709170] WRITE: bus=0 devfn=0 where=2096 size=4 val=0x10000024 [ 71.715569] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 [ 71.723632] READ: bus=0 devfn=0 where=136 size=2 val=0x281f [ 71.729470] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 71.735373] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 71.741013] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 71.746914] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 71.752552] READ: bus=0 devfn=0 where=2072 size=4 val=0xe [ 71.758194] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) [ 71.770008] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 [ 71.778494] pcieport 0000:00:00.0: [14] Completion Timeout (First) [ 71.785358] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 71.791259] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 71.796897] READ: bus=0 devfn=0 where=2072 size=4 val=0xe [ 71.802524] pcieport 0000:00:00.0: AER: Device recovery failed [ 72.451908] READ: bus=0 devfn=0 where=2096 size=4 val=0x10000024 [ 72.458120] READ: bus=0 devfn=0 where=2100 size=4 val=0x0 [ 72.463717] WRITE: bus=0 devfn=0 where=2096 size=4 val=0x10000024 [ 72.470012] READ: bus=0 devfn=0 where=2096 size=4 val=0x10000024 [ 72.476221] READ: bus=0 devfn=0 where=2100 size=4 val=0x0 [ 72.481819] WRITE: bus=0 devfn=0 where=2096 size=4 val=0x10000024 [ 72.488109] READ: bus=0 devfn=0 where=2096 size=4 val=0x10000024 [ 72.494319] READ: bus=0 devfn=0 where=2100 size=4 val=0x0 [ 72.499916] WRITE: bus=0 devfn=0 where=2096 size=4 val=0x10000024 [ 72.506205] READ: bus=0 devfn=0 where=2096 size=4 val=0x10000024 [ 72.512415] READ: bus=0 devfn=0 where=2100 size=4 val=0x0 [ 72.518011] WRITE: bus=0 devfn=0 where=2096 size=4 val=0x10000024 [ 72.524263] xhci_hcd 0000:01:00.0: Cannot set link state. [ 72.529711] usb usb2-port2: cannot disable (err = -32) [ 72.534883] usb 2-2: USB disconnect, device number 2 [ 72.540042] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 [ 72.548365] READ: bus=0 devfn=0 where=136 size=2 val=0x281f [ 72.554264] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 72.560157] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 72.565778] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 72.571654] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 72.577273] READ: bus=0 devfn=0 where=2072 size=4 val=0xe [ 72.582891] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) [ 72.594705] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 [ 72.603122] pcieport 0000:00:00.0: [14] Completion Timeout (First) [ 72.609955] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 72.615833] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 72.621441] READ: bus=0 devfn=0 where=2072 size=4 val=0xe [ 72.627061] pcieport 0000:00:00.0: AER: Device recovery failed [ 72.632931] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 [ 72.640984] READ: bus=0 devfn=0 where=136 size=2 val=0x281f [ 72.646769] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 72.652636] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 72.658245] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 72.664114] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 72.669722] READ: bus=0 devfn=0 where=2072 size=4 val=0xe [ 72.675330] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) [ 72.687142] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 [ 72.695545] pcieport 0000:00:00.0: [14] Completion Timeout (First) [ 72.702376] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 72.708244] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 72.713856] READ: bus=0 devfn=0 where=2072 size=4 val=0xe [ 72.719473] pcieport 0000:00:00.0: AER: Device recovery failed [ 72.725342] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 [ 72.733394] READ: bus=0 devfn=0 where=136 size=2 val=0x281f [ 72.739178] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 72.745044] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 72.750653] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 72.756520] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 72.762128] READ: bus=0 devfn=0 where=2072 size=4 val=0xe [ 72.767734] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) [ 72.779548] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 [ 72.787950] pcieport 0000:00:00.0: [14] Completion Timeout (First) [ 72.794781] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 72.800649] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 72.806258] READ: bus=0 devfn=0 where=2072 size=4 val=0xe [ 72.811873] pcieport 0000:00:00.0: AER: Device recovery failed [ 72.817741] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 [ 72.825793] READ: bus=0 devfn=0 where=136 size=2 val=0x281f [ 72.831574] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 72.837442] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 72.843054] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 72.848922] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 72.854529] READ: bus=0 devfn=0 where=2072 size=4 val=0xe [ 72.860137] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) [ 72.871951] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 [ 72.880353] pcieport 0000:00:00.0: [14] Completion Timeout (First) [ 72.887184] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 72.893051] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 72.898660] READ: bus=0 devfn=0 where=2072 size=4 val=0xe [ 72.904273] pcieport 0000:00:00.0: AER: Device recovery failed PLUG #2 [ 165.860193] usb 2-2: new SuperSpeed USB device number 3 using xhci_hcd [ 165.893583] usb 2-2: New USB device found, idVendor=0951, idProduct=1666 [ 165.900333] usb 2-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3 [ 165.907515] usb 2-2: Product: DataTraveler 3.0 [ 165.911989] usb 2-2: Manufacturer: Kingston [ 165.916198] usb 2-2: SerialNumber: 002618887865F0C0F8646BFA [ 165.924547] usb-storage 2-2:1.0: USB Mass Storage device detected [ 165.930970] scsi host0: usb-storage 2-2:1.0 [ 166.962705] scsi 0:0:0:0: Direct-Access Kingston DataTraveler 3.0 PQ: 0 ANSI: 6 [ 166.971494] sd 0:0:0:0: [sda] 15109516 512-byte logical blocks: (7.74 GB/7.20 GiB) [ 166.979556] sd 0:0:0:0: [sda] Write Protect is off [ 166.984591] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA [ 166.995847] random: fast init done [ 166.999430] sda: sda1 [ 167.003039] sd 0:0:0:0: [sda] Attached SCSI removable disk UNPLUG #2 [ 171.918834] READ: bus=0 devfn=0 where=2096 size=4 val=0x10000024 [ 171.925046] READ: bus=0 devfn=0 where=2100 size=4 val=0x0 [ 171.930645] WRITE: bus=0 devfn=0 where=2096 size=4 val=0x10000024 [ 171.936941] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 [ 171.945000] READ: bus=0 devfn=0 where=136 size=2 val=0x281f [ 171.950784] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 171.956656] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 171.962263] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 171.968134] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 171.973741] READ: bus=0 devfn=0 where=2072 size=4 val=0xe [ 171.979354] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) [ 171.991164] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 [ 171.999597] pcieport 0000:00:00.0: [14] Completion Timeout (First) [ 172.006429] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 172.012300] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 172.017908] READ: bus=0 devfn=0 where=2072 size=4 val=0xe [ 172.023529] pcieport 0000:00:00.0: AER: Device recovery failed [ 172.675221] READ: bus=0 devfn=0 where=2096 size=4 val=0x10000024 [ 172.681432] READ: bus=0 devfn=0 where=2100 size=4 val=0x0 [ 172.687030] WRITE: bus=0 devfn=0 where=2096 size=4 val=0x10000024 [ 172.693325] READ: bus=0 devfn=0 where=2096 size=4 val=0x10000024 [ 172.699536] READ: bus=0 devfn=0 where=2100 size=4 val=0x0 [ 172.705133] WRITE: bus=0 devfn=0 where=2096 size=4 val=0x10000024 [ 172.711424] READ: bus=0 devfn=0 where=2096 size=4 val=0x10000024 [ 172.717633] READ: bus=0 devfn=0 where=2100 size=4 val=0x0 [ 172.723230] WRITE: bus=0 devfn=0 where=2096 size=4 val=0x10000024 [ 172.729517] READ: bus=0 devfn=0 where=2096 size=4 val=0x10000024 [ 172.735726] READ: bus=0 devfn=0 where=2100 size=4 val=0x0 [ 172.741322] WRITE: bus=0 devfn=0 where=2096 size=4 val=0x10000024 [ 172.747574] xhci_hcd 0000:01:00.0: Cannot set link state. [ 172.753021] usb usb2-port2: cannot disable (err = -32) [ 172.758193] usb 2-2: USB disconnect, device number 3 [ 172.763340] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 [ 172.771627] READ: bus=0 devfn=0 where=136 size=2 val=0x281f [ 172.777515] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 172.783408] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 172.789030] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 172.794907] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 172.800526] READ: bus=0 devfn=0 where=2072 size=4 val=0xe [ 172.806146] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) [ 172.817960] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 [ 172.826375] pcieport 0000:00:00.0: [14] Completion Timeout (First) [ 172.833208] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 172.839078] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 172.844685] READ: bus=0 devfn=0 where=2072 size=4 val=0xe [ 172.850305] pcieport 0000:00:00.0: AER: Device recovery failed [ 172.856183] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 [ 172.864236] READ: bus=0 devfn=0 where=136 size=2 val=0x281f [ 172.870020] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 172.875889] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 172.881497] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 172.887365] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 172.892974] READ: bus=0 devfn=0 where=2072 size=4 val=0xe [ 172.898582] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) [ 172.910393] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 [ 172.918796] pcieport 0000:00:00.0: [14] Completion Timeout (First) [ 172.925627] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 172.931494] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 172.937107] READ: bus=0 devfn=0 where=2072 size=4 val=0xe [ 172.942724] pcieport 0000:00:00.0: AER: Device recovery failed [ 172.948593] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 [ 172.956644] READ: bus=0 devfn=0 where=136 size=2 val=0x281f [ 172.962428] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 172.968295] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 172.973903] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 172.979771] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 172.985379] READ: bus=0 devfn=0 where=2072 size=4 val=0xe [ 172.990985] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) [ 173.002799] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 [ 173.011202] pcieport 0000:00:00.0: [14] Completion Timeout (First) [ 173.018033] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 173.023901] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 173.029510] READ: bus=0 devfn=0 where=2072 size=4 val=0xe [ 173.035123] pcieport 0000:00:00.0: AER: Device recovery failed [ 173.040990] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 [ 173.049042] READ: bus=0 devfn=0 where=136 size=2 val=0x281f [ 173.054825] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 173.060693] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 173.066305] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 173.072173] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 173.077780] READ: bus=0 devfn=0 where=2072 size=4 val=0xe [ 173.083388] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) [ 173.095202] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 [ 173.103605] pcieport 0000:00:00.0: [14] Completion Timeout (First) [ 173.110435] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 173.116303] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 173.121911] READ: bus=0 devfn=0 where=2072 size=4 val=0xe [ 173.127524] pcieport 0000:00:00.0: AER: Device recovery failed NOTE BENE: these issues do not occur at all with a USB2 Flash drive. [ 2093.564771] usb 1-2: new high-speed USB device number 2 using xhci_hcd [ 2093.790646] usb 1-2: New USB device found, idVendor=058f, idProduct=6387 [ 2093.797397] usb 1-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3 [ 2093.804583] usb 1-2: Product: Mass Storage [ 2093.808707] usb 1-2: Manufacturer: Generic [ 2093.812829] usb 1-2: SerialNumber: 31A69E70 [ 2093.819244] usb-storage 1-2:1.0: USB Mass Storage device detected [ 2093.825624] scsi host0: usb-storage 1-2:1.0 [ 2094.856918] scsi 0:0:0:0: Direct-Access Generic Flash Disk 8.07 PQ: 0 ANSI: 2 [ 2094.866196] sd 0:0:0:0: [sda] 4106240 512-byte logical blocks: (2.10 GB/1.96 GiB) [ 2094.874232] sd 0:0:0:0: [sda] Write Protect is off [ 2094.879350] sd 0:0:0:0: [sda] No Caching mode page found [ 2094.884816] sd 0:0:0:0: [sda] Assuming drive cache: write through [ 2094.909111] sda: sda1 [ 2094.912935] sd 0:0:0:0: [sda] Attached SCSI removable disk [ 2100.516396] usb 1-2: USB disconnect, device number 2 Regards. ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: Possible regression between 4.9 and 4.13 2017-08-23 14:30 ` Mason @ 2017-08-28 8:39 ` Mathias Nyman -1 siblings, 0 replies; 60+ messages in thread From: Mathias Nyman @ 2017-08-28 8:39 UTC (permalink / raw) To: Mason, Mathias Nyman, Felipe Balbi, linux-pci, linux-usb, Linux ARM Cc: Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman On 23.08.2017 17:30, Mason wrote: > On 23/08/2017 14:41, Mason wrote: > >> I compiled a minimal kernel, with lots of irrelevant drivers and >> frameworks left out, including power management. I still get the >> "xHCI host controller not responding, assume dead" issue. > > The problem seems to have a timing-related aspect. > > I added a bunch of logs (to a slow serial console) and the HC was > not killed. I was able to plug the Flash drive a second time. > (I am logging config space reads and writes.) Could you take a log with the following added debug, without your extra delays, It should show a bit more about the state of the controller when we read 0xffffffff diff --git a/drivers/usb/host/xhci-hub.c b/drivers/usb/host/xhci-hub.c index 4bc6f42..a124c3d 100644 --- a/drivers/usb/host/xhci-hub.c +++ b/drivers/usb/host/xhci-hub.c @@ -23,6 +23,7 @@ #include <linux/slab.h> #include <asm/unaligned.h> +#include <linux/pci.h> #include "xhci.h" #include "xhci-trace.h" @@ -1280,7 +1281,11 @@ int xhci_hub_control(struct usb_hcd *hcd, u16 typeReq, u16 wValue, wIndex--; temp = readl(port_array[wIndex]); if (temp == ~(u32)0) { - xhci_hc_died(xhci); + struct pci_dev *pdev = to_pci_dev(hcd->self.controller); + xhci_err(xhci, "ClearPortFeat port%d @%p=%x, hcd->state:0x%x hcd->flags:0x%x, pci_state 0x%x\n", + wIndex, port_array[wIndex], temp, hcd->state, hcd->flags, pdev->current_state); + + WARN_ON(1); retval = -ENODEV; break; } Thanks -Mathias ^ permalink raw reply related [flat|nested] 60+ messages in thread
* Possible regression between 4.9 and 4.13 @ 2017-08-28 8:39 ` Mathias Nyman 0 siblings, 0 replies; 60+ messages in thread From: Mathias Nyman @ 2017-08-28 8:39 UTC (permalink / raw) To: linux-arm-kernel On 23.08.2017 17:30, Mason wrote: > On 23/08/2017 14:41, Mason wrote: > >> I compiled a minimal kernel, with lots of irrelevant drivers and >> frameworks left out, including power management. I still get the >> "xHCI host controller not responding, assume dead" issue. > > The problem seems to have a timing-related aspect. > > I added a bunch of logs (to a slow serial console) and the HC was > not killed. I was able to plug the Flash drive a second time. > (I am logging config space reads and writes.) Could you take a log with the following added debug, without your extra delays, It should show a bit more about the state of the controller when we read 0xffffffff diff --git a/drivers/usb/host/xhci-hub.c b/drivers/usb/host/xhci-hub.c index 4bc6f42..a124c3d 100644 --- a/drivers/usb/host/xhci-hub.c +++ b/drivers/usb/host/xhci-hub.c @@ -23,6 +23,7 @@ #include <linux/slab.h> #include <asm/unaligned.h> +#include <linux/pci.h> #include "xhci.h" #include "xhci-trace.h" @@ -1280,7 +1281,11 @@ int xhci_hub_control(struct usb_hcd *hcd, u16 typeReq, u16 wValue, wIndex--; temp = readl(port_array[wIndex]); if (temp == ~(u32)0) { - xhci_hc_died(xhci); + struct pci_dev *pdev = to_pci_dev(hcd->self.controller); + xhci_err(xhci, "ClearPortFeat port%d @%p=%x, hcd->state:0x%x hcd->flags:0x%x, pci_state 0x%x\n", + wIndex, port_array[wIndex], temp, hcd->state, hcd->flags, pdev->current_state); + + WARN_ON(1); retval = -ENODEV; break; } Thanks -Mathias ^ permalink raw reply related [flat|nested] 60+ messages in thread
* Re: Possible regression between 4.9 and 4.13 2017-08-28 8:39 ` Mathias Nyman @ 2017-08-28 14:40 ` Mason -1 siblings, 0 replies; 60+ messages in thread From: Mason @ 2017-08-28 14:40 UTC (permalink / raw) To: Mathias Nyman, Felipe Balbi, linux-pci, linux-usb, Linux ARM Cc: Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman On 28/08/2017 10:39, Mathias Nyman wrote: > Could you take a log with the following added debug, without > your extra delays, It should show a bit more about the state > of the controller when we read 0xffffffff I applied the following patch on top of v4.12-rc1 diff --git a/drivers/usb/host/xhci-hub.c b/drivers/usb/host/xhci-hub.c index 5e3e9d4c6956..c7ea7d4c801f 100644 --- a/drivers/usb/host/xhci-hub.c +++ b/drivers/usb/host/xhci-hub.c @@ -23,6 +23,7 @@ #include <linux/slab.h> #include <asm/unaligned.h> +#include <linux/pci.h> #include "xhci.h" #include "xhci-trace.h" @@ -1268,7 +1269,10 @@ int xhci_hub_control(struct usb_hcd *hcd, u16 typeReq, u16 wValue, wIndex--; temp = readl(port_array[wIndex]); if (temp == ~(u32)0) { - xhci_hc_died(xhci); + struct pci_dev *pdev = to_pci_dev(hcd->self.controller); + xhci_err(xhci, "ClearPortFeat port%d @%p=%x, hcd->state:0x%x hcd->flags:0x%x, pci_state 0x%x\n", + wIndex, port_array[wIndex], temp, hcd->state, hcd->flags, pdev->current_state); + WARN_ON(1); retval = -ENODEV; break; } And here are logs I get when I plug/unplug my USB3 device. [ 14.970148] usb 2-2: new SuperSpeed USB device number 2 using xhci_hcd [ 15.003487] usb 2-2: New USB device found, idVendor=0951, idProduct=1666 [ 15.010237] usb 2-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3 [ 15.017483] usb 2-2: Product: DataTraveler 3.0 [ 15.021990] usb 2-2: Manufacturer: Kingston [ 15.026234] usb 2-2: SerialNumber: 002618887865F0C0F8646BFA [ 15.034830] usb-storage 2-2:1.0: USB Mass Storage device detected [ 15.041269] scsi host0: usb-storage 2-2:1.0 [ 16.056140] scsi 0:0:0:0: Direct-Access Kingston DataTraveler 3.0 PQ: 0 ANSI: 6 [ 16.064979] sd 0:0:0:0: [sda] 15109516 512-byte logical blocks: (7.74 GB/7.20 GiB) [ 16.072978] sd 0:0:0:0: [sda] Write Protect is off [ 16.078076] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA [ 16.089417] sda: sda1 [ 16.093050] sd 0:0:0:0: [sda] Attached SCSI removable disk [ 22.152078] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 [ 22.160157] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) [ 22.172051] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 [ 22.180493] pcieport 0000:00:00.0: [14] Completion Timeout (First) [ 22.187368] pcieport 0000:00:00.0: AER: Device recovery failed [ 22.885269] xhci_hcd 0000:01:00.0: ClearPortFeat port1 @e0852430=ffffffff, hcd->state:0x1 hcd->flags:0x1a5, pci_state 0x0 [ 22.896284] ------------[ cut here ]------------ [ 22.900938] WARNING: CPU: 0 PID: 127 at drivers/usb/host/xhci-hub.c:1275 xhci_hub_control+0x10f4/0x1778 [ 22.910377] Modules linked in: [ 22.913447] CPU: 0 PID: 127 Comm: kworker/0:1 Tainted: G C 4.12.0-rc1 #4 [ 22.921314] Hardware name: Sigma Tango DT [ 22.925342] Workqueue: usb_hub_wq hub_event [ 22.929564] [<c010e8b4>] (unwind_backtrace) from [<c010ac00>] (show_stack+0x10/0x14) [ 22.937353] [<c010ac00>] (show_stack) from [<c0257a30>] (dump_stack+0x84/0x98) [ 22.944617] [<c0257a30>] (dump_stack) from [<c01183d0>] (__warn+0xe8/0x100) [ 22.951616] [<c01183d0>] (__warn) from [<c0118498>] (warn_slowpath_null+0x20/0x28) [ 22.959227] [<c0118498>] (warn_slowpath_null) from [<c031ad90>] (xhci_hub_control+0x10f4/0x1778) [ 22.968062] [<c031ad90>] (xhci_hub_control) from [<c02fbb4c>] (usb_hcd_submit_urb+0x264/0x810) [ 22.976719] [<c02fbb4c>] (usb_hcd_submit_urb) from [<c02fccec>] (usb_submit_urb+0x2b0/0x4b4) [ 22.985201] [<c02fccec>] (usb_submit_urb) from [<c02fd3c4>] (usb_start_wait_urb+0x4c/0xbc) [ 22.993509] [<c02fd3c4>] (usb_start_wait_urb) from [<c02fd4d4>] (usb_control_msg+0xa0/0xcc) [ 23.001904] [<c02fd4d4>] (usb_control_msg) from [<c02f5718>] (usb_clear_port_feature+0x44/0x4c) [ 23.010648] [<c02f5718>] (usb_clear_port_feature) from [<c02f60fc>] (hub_port_reset+0x228/0x51c) [ 23.019479] [<c02f60fc>] (hub_port_reset) from [<c02f82f0>] (hub_event+0x1f4/0xe64) [ 23.027177] [<c02f82f0>] (hub_event) from [<c012d398>] (process_one_work+0x1d4/0x3ec) [ 23.035049] [<c012d398>] (process_one_work) from [<c012dfb4>] (worker_thread+0x38/0x554) [ 23.043185] [<c012dfb4>] (worker_thread) from [<c0132c84>] (kthread+0x108/0x138) [ 23.050620] [<c0132c84>] (kthread) from [<c01076f8>] (ret_from_fork+0x14/0x3c) [ 23.057877] ---[ end trace 5e4494cf1f6e3761 ]--- [ 23.062691] xhci_hcd 0000:01:00.0: ClearPortFeat port1 @e0852430=ffffffff, hcd->state:0x1 hcd->flags:0x1a5, pci_state 0x0 [ 23.073707] ------------[ cut here ]------------ [ 23.078349] WARNING: CPU: 0 PID: 127 at drivers/usb/host/xhci-hub.c:1275 xhci_hub_control+0x10f4/0x1778 [ 23.087787] Modules linked in: [ 23.090854] CPU: 0 PID: 127 Comm: kworker/0:1 Tainted: G WC 4.12.0-rc1 #4 [ 23.098720] Hardware name: Sigma Tango DT [ 23.102745] Workqueue: usb_hub_wq hub_event [ 23.106953] [<c010e8b4>] (unwind_backtrace) from [<c010ac00>] (show_stack+0x10/0x14) [ 23.114737] [<c010ac00>] (show_stack) from [<c0257a30>] (dump_stack+0x84/0x98) [ 23.121998] [<c0257a30>] (dump_stack) from [<c01183d0>] (__warn+0xe8/0x100) [ 23.128996] [<c01183d0>] (__warn) from [<c0118498>] (warn_slowpath_null+0x20/0x28) [ 23.136606] [<c0118498>] (warn_slowpath_null) from [<c031ad90>] (xhci_hub_control+0x10f4/0x1778) [ 23.145439] [<c031ad90>] (xhci_hub_control) from [<c02fbb4c>] (usb_hcd_submit_urb+0x264/0x810) [ 23.154095] [<c02fbb4c>] (usb_hcd_submit_urb) from [<c02fccec>] (usb_submit_urb+0x2b0/0x4b4) [ 23.162577] [<c02fccec>] (usb_submit_urb) from [<c02fd3c4>] (usb_start_wait_urb+0x4c/0xbc) [ 23.170884] [<c02fd3c4>] (usb_start_wait_urb) from [<c02fd4d4>] (usb_control_msg+0xa0/0xcc) [ 23.179278] [<c02fd4d4>] (usb_control_msg) from [<c02f5718>] (usb_clear_port_feature+0x44/0x4c) [ 23.188021] [<c02f5718>] (usb_clear_port_feature) from [<c02f611c>] (hub_port_reset+0x248/0x51c) [ 23.196851] [<c02f611c>] (hub_port_reset) from [<c02f82f0>] (hub_event+0x1f4/0xe64) [ 23.204547] [<c02f82f0>] (hub_event) from [<c012d398>] (process_one_work+0x1d4/0x3ec) [ 23.212418] [<c012d398>] (process_one_work) from [<c012dfb4>] (worker_thread+0x38/0x554) [ 23.220551] [<c012dfb4>] (worker_thread) from [<c0132c84>] (kthread+0x108/0x138) [ 23.227986] [<c0132c84>] (kthread) from [<c01076f8>] (ret_from_fork+0x14/0x3c) [ 23.235242] ---[ end trace 5e4494cf1f6e3762 ]--- [ 23.239953] xhci_hcd 0000:01:00.0: Cannot set link state. [ 23.245403] usb usb2-port2: cannot disable (err = -32) [ 23.250575] usb 2-2: USB disconnect, device number 2 [ 23.255724] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 [ 23.264048] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) [ 23.275985] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 [ 23.284417] pcieport 0000:00:00.0: [14] Completion Timeout (First) [ 23.291256] pcieport 0000:00:00.0: AER: Device recovery failed [ 23.297144] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 [ 23.305218] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) [ 23.317047] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 [ 23.325467] pcieport 0000:00:00.0: [14] Completion Timeout (First) [ 23.332309] pcieport 0000:00:00.0: AER: Device recovery failed [ 23.338188] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 [ 23.346273] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) [ 23.358093] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 [ 23.366518] pcieport 0000:00:00.0: [14] Completion Timeout (First) [ 23.373357] pcieport 0000:00:00.0: AER: Device recovery failed [ 23.379229] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 [ 23.387287] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) [ 23.399101] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 [ 23.407504] pcieport 0000:00:00.0: [14] Completion Timeout (First) [ 23.414344] pcieport 0000:00:00.0: AER: Device recovery failed [ 23.434143] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 [ 23.442263] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) [ 23.454100] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 [ 23.462542] pcieport 0000:00:00.0: [14] Completion Timeout (First) [ 23.469427] pcieport 0000:00:00.0: AER: Device recovery failed Regards. ^ permalink raw reply related [flat|nested] 60+ messages in thread
* Possible regression between 4.9 and 4.13 @ 2017-08-28 14:40 ` Mason 0 siblings, 0 replies; 60+ messages in thread From: Mason @ 2017-08-28 14:40 UTC (permalink / raw) To: linux-arm-kernel On 28/08/2017 10:39, Mathias Nyman wrote: > Could you take a log with the following added debug, without > your extra delays, It should show a bit more about the state > of the controller when we read 0xffffffff I applied the following patch on top of v4.12-rc1 diff --git a/drivers/usb/host/xhci-hub.c b/drivers/usb/host/xhci-hub.c index 5e3e9d4c6956..c7ea7d4c801f 100644 --- a/drivers/usb/host/xhci-hub.c +++ b/drivers/usb/host/xhci-hub.c @@ -23,6 +23,7 @@ #include <linux/slab.h> #include <asm/unaligned.h> +#include <linux/pci.h> #include "xhci.h" #include "xhci-trace.h" @@ -1268,7 +1269,10 @@ int xhci_hub_control(struct usb_hcd *hcd, u16 typeReq, u16 wValue, wIndex--; temp = readl(port_array[wIndex]); if (temp == ~(u32)0) { - xhci_hc_died(xhci); + struct pci_dev *pdev = to_pci_dev(hcd->self.controller); + xhci_err(xhci, "ClearPortFeat port%d @%p=%x, hcd->state:0x%x hcd->flags:0x%x, pci_state 0x%x\n", + wIndex, port_array[wIndex], temp, hcd->state, hcd->flags, pdev->current_state); + WARN_ON(1); retval = -ENODEV; break; } And here are logs I get when I plug/unplug my USB3 device. [ 14.970148] usb 2-2: new SuperSpeed USB device number 2 using xhci_hcd [ 15.003487] usb 2-2: New USB device found, idVendor=0951, idProduct=1666 [ 15.010237] usb 2-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3 [ 15.017483] usb 2-2: Product: DataTraveler 3.0 [ 15.021990] usb 2-2: Manufacturer: Kingston [ 15.026234] usb 2-2: SerialNumber: 002618887865F0C0F8646BFA [ 15.034830] usb-storage 2-2:1.0: USB Mass Storage device detected [ 15.041269] scsi host0: usb-storage 2-2:1.0 [ 16.056140] scsi 0:0:0:0: Direct-Access Kingston DataTraveler 3.0 PQ: 0 ANSI: 6 [ 16.064979] sd 0:0:0:0: [sda] 15109516 512-byte logical blocks: (7.74 GB/7.20 GiB) [ 16.072978] sd 0:0:0:0: [sda] Write Protect is off [ 16.078076] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA [ 16.089417] sda: sda1 [ 16.093050] sd 0:0:0:0: [sda] Attached SCSI removable disk [ 22.152078] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 [ 22.160157] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) [ 22.172051] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 [ 22.180493] pcieport 0000:00:00.0: [14] Completion Timeout (First) [ 22.187368] pcieport 0000:00:00.0: AER: Device recovery failed [ 22.885269] xhci_hcd 0000:01:00.0: ClearPortFeat port1 @e0852430=ffffffff, hcd->state:0x1 hcd->flags:0x1a5, pci_state 0x0 [ 22.896284] ------------[ cut here ]------------ [ 22.900938] WARNING: CPU: 0 PID: 127 at drivers/usb/host/xhci-hub.c:1275 xhci_hub_control+0x10f4/0x1778 [ 22.910377] Modules linked in: [ 22.913447] CPU: 0 PID: 127 Comm: kworker/0:1 Tainted: G C 4.12.0-rc1 #4 [ 22.921314] Hardware name: Sigma Tango DT [ 22.925342] Workqueue: usb_hub_wq hub_event [ 22.929564] [<c010e8b4>] (unwind_backtrace) from [<c010ac00>] (show_stack+0x10/0x14) [ 22.937353] [<c010ac00>] (show_stack) from [<c0257a30>] (dump_stack+0x84/0x98) [ 22.944617] [<c0257a30>] (dump_stack) from [<c01183d0>] (__warn+0xe8/0x100) [ 22.951616] [<c01183d0>] (__warn) from [<c0118498>] (warn_slowpath_null+0x20/0x28) [ 22.959227] [<c0118498>] (warn_slowpath_null) from [<c031ad90>] (xhci_hub_control+0x10f4/0x1778) [ 22.968062] [<c031ad90>] (xhci_hub_control) from [<c02fbb4c>] (usb_hcd_submit_urb+0x264/0x810) [ 22.976719] [<c02fbb4c>] (usb_hcd_submit_urb) from [<c02fccec>] (usb_submit_urb+0x2b0/0x4b4) [ 22.985201] [<c02fccec>] (usb_submit_urb) from [<c02fd3c4>] (usb_start_wait_urb+0x4c/0xbc) [ 22.993509] [<c02fd3c4>] (usb_start_wait_urb) from [<c02fd4d4>] (usb_control_msg+0xa0/0xcc) [ 23.001904] [<c02fd4d4>] (usb_control_msg) from [<c02f5718>] (usb_clear_port_feature+0x44/0x4c) [ 23.010648] [<c02f5718>] (usb_clear_port_feature) from [<c02f60fc>] (hub_port_reset+0x228/0x51c) [ 23.019479] [<c02f60fc>] (hub_port_reset) from [<c02f82f0>] (hub_event+0x1f4/0xe64) [ 23.027177] [<c02f82f0>] (hub_event) from [<c012d398>] (process_one_work+0x1d4/0x3ec) [ 23.035049] [<c012d398>] (process_one_work) from [<c012dfb4>] (worker_thread+0x38/0x554) [ 23.043185] [<c012dfb4>] (worker_thread) from [<c0132c84>] (kthread+0x108/0x138) [ 23.050620] [<c0132c84>] (kthread) from [<c01076f8>] (ret_from_fork+0x14/0x3c) [ 23.057877] ---[ end trace 5e4494cf1f6e3761 ]--- [ 23.062691] xhci_hcd 0000:01:00.0: ClearPortFeat port1 @e0852430=ffffffff, hcd->state:0x1 hcd->flags:0x1a5, pci_state 0x0 [ 23.073707] ------------[ cut here ]------------ [ 23.078349] WARNING: CPU: 0 PID: 127 at drivers/usb/host/xhci-hub.c:1275 xhci_hub_control+0x10f4/0x1778 [ 23.087787] Modules linked in: [ 23.090854] CPU: 0 PID: 127 Comm: kworker/0:1 Tainted: G WC 4.12.0-rc1 #4 [ 23.098720] Hardware name: Sigma Tango DT [ 23.102745] Workqueue: usb_hub_wq hub_event [ 23.106953] [<c010e8b4>] (unwind_backtrace) from [<c010ac00>] (show_stack+0x10/0x14) [ 23.114737] [<c010ac00>] (show_stack) from [<c0257a30>] (dump_stack+0x84/0x98) [ 23.121998] [<c0257a30>] (dump_stack) from [<c01183d0>] (__warn+0xe8/0x100) [ 23.128996] [<c01183d0>] (__warn) from [<c0118498>] (warn_slowpath_null+0x20/0x28) [ 23.136606] [<c0118498>] (warn_slowpath_null) from [<c031ad90>] (xhci_hub_control+0x10f4/0x1778) [ 23.145439] [<c031ad90>] (xhci_hub_control) from [<c02fbb4c>] (usb_hcd_submit_urb+0x264/0x810) [ 23.154095] [<c02fbb4c>] (usb_hcd_submit_urb) from [<c02fccec>] (usb_submit_urb+0x2b0/0x4b4) [ 23.162577] [<c02fccec>] (usb_submit_urb) from [<c02fd3c4>] (usb_start_wait_urb+0x4c/0xbc) [ 23.170884] [<c02fd3c4>] (usb_start_wait_urb) from [<c02fd4d4>] (usb_control_msg+0xa0/0xcc) [ 23.179278] [<c02fd4d4>] (usb_control_msg) from [<c02f5718>] (usb_clear_port_feature+0x44/0x4c) [ 23.188021] [<c02f5718>] (usb_clear_port_feature) from [<c02f611c>] (hub_port_reset+0x248/0x51c) [ 23.196851] [<c02f611c>] (hub_port_reset) from [<c02f82f0>] (hub_event+0x1f4/0xe64) [ 23.204547] [<c02f82f0>] (hub_event) from [<c012d398>] (process_one_work+0x1d4/0x3ec) [ 23.212418] [<c012d398>] (process_one_work) from [<c012dfb4>] (worker_thread+0x38/0x554) [ 23.220551] [<c012dfb4>] (worker_thread) from [<c0132c84>] (kthread+0x108/0x138) [ 23.227986] [<c0132c84>] (kthread) from [<c01076f8>] (ret_from_fork+0x14/0x3c) [ 23.235242] ---[ end trace 5e4494cf1f6e3762 ]--- [ 23.239953] xhci_hcd 0000:01:00.0: Cannot set link state. [ 23.245403] usb usb2-port2: cannot disable (err = -32) [ 23.250575] usb 2-2: USB disconnect, device number 2 [ 23.255724] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 [ 23.264048] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) [ 23.275985] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 [ 23.284417] pcieport 0000:00:00.0: [14] Completion Timeout (First) [ 23.291256] pcieport 0000:00:00.0: AER: Device recovery failed [ 23.297144] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 [ 23.305218] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) [ 23.317047] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 [ 23.325467] pcieport 0000:00:00.0: [14] Completion Timeout (First) [ 23.332309] pcieport 0000:00:00.0: AER: Device recovery failed [ 23.338188] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 [ 23.346273] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) [ 23.358093] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 [ 23.366518] pcieport 0000:00:00.0: [14] Completion Timeout (First) [ 23.373357] pcieport 0000:00:00.0: AER: Device recovery failed [ 23.379229] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 [ 23.387287] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) [ 23.399101] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 [ 23.407504] pcieport 0000:00:00.0: [14] Completion Timeout (First) [ 23.414344] pcieport 0000:00:00.0: AER: Device recovery failed [ 23.434143] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 [ 23.442263] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) [ 23.454100] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 [ 23.462542] pcieport 0000:00:00.0: [14] Completion Timeout (First) [ 23.469427] pcieport 0000:00:00.0: AER: Device recovery failed Regards. ^ permalink raw reply related [flat|nested] 60+ messages in thread
* Re: Possible regression between 4.9 and 4.13 2017-08-28 14:40 ` Mason @ 2017-08-29 13:28 ` Mathias Nyman -1 siblings, 0 replies; 60+ messages in thread From: Mathias Nyman @ 2017-08-29 13:28 UTC (permalink / raw) To: Mason, Felipe Balbi, linux-pci, linux-usb, Linux ARM Cc: Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman On 28.08.2017 17:40, Mason wrote: > On 28/08/2017 10:39, Mathias Nyman wrote: > >> Could you take a log with the following added debug, without >> your extra delays, It should show a bit more about the state >> of the controller when we read 0xffffffff > > I applied the following patch on top of v4.12-rc1 > > diff --git a/drivers/usb/host/xhci-hub.c b/drivers/usb/host/xhci-hub.c > index 5e3e9d4c6956..c7ea7d4c801f 100644 > --- a/drivers/usb/host/xhci-hub.c > +++ b/drivers/usb/host/xhci-hub.c > @@ -23,6 +23,7 @@ > > #include <linux/slab.h> > #include <asm/unaligned.h> > +#include <linux/pci.h> > > #include "xhci.h" > #include "xhci-trace.h" > @@ -1268,7 +1269,10 @@ int xhci_hub_control(struct usb_hcd *hcd, u16 typeReq, u16 wValue, > wIndex--; > temp = readl(port_array[wIndex]); > if (temp == ~(u32)0) { > - xhci_hc_died(xhci); > + struct pci_dev *pdev = to_pci_dev(hcd->self.controller); > + xhci_err(xhci, "ClearPortFeat port%d @%p=%x, hcd->state:0x%x hcd->flags:0x%x, pci_state 0x%x\n", > + wIndex, port_array[wIndex], temp, hcd->state, hcd->flags, pdev->current_state); > + WARN_ON(1); > retval = -ENODEV; > break; > } > > > And here are logs I get when I plug/unplug my USB3 device. > > [ 14.970148] usb 2-2: new SuperSpeed USB device number 2 using xhci_hcd > [ 15.003487] usb 2-2: New USB device found, idVendor=0951, idProduct=1666 > [ 15.010237] usb 2-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3 > [ 15.017483] usb 2-2: Product: DataTraveler 3.0 > [ 15.021990] usb 2-2: Manufacturer: Kingston > [ 15.026234] usb 2-2: SerialNumber: 002618887865F0C0F8646BFA > [ 15.034830] usb-storage 2-2:1.0: USB Mass Storage device detected > [ 15.041269] scsi host0: usb-storage 2-2:1.0 > [ 16.056140] scsi 0:0:0:0: Direct-Access Kingston DataTraveler 3.0 PQ: 0 ANSI: 6 > [ 16.064979] sd 0:0:0:0: [sda] 15109516 512-byte logical blocks: (7.74 GB/7.20 GiB) > [ 16.072978] sd 0:0:0:0: [sda] Write Protect is off > [ 16.078076] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA > [ 16.089417] sda: sda1 > [ 16.093050] sd 0:0:0:0: [sda] Attached SCSI removable disk > > > [ 22.152078] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 > [ 22.160157] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) > [ 22.172051] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 > [ 22.180493] pcieport 0000:00:00.0: [14] Completion Timeout (First) > [ 22.187368] pcieport 0000:00:00.0: AER: Device recovery failed > [ 22.885269] xhci_hcd 0000:01:00.0: ClearPortFeat port1 @e0852430=ffffffff, hcd->state:0x1 hcd->flags:0x1a5, pci_state 0x0 State is HC_STATE_RUNNING, Flags bits 0,2,5,7,8 set: #define HCD_FLAG_HW_ACCESSIBLE 0 /* at full power */ #define HCD_FLAG_POLL_RH 2 /* poll for rh status? */ #define HCD_FLAG_RH_RUNNING 5 /* root hub is running? */ #define HCD_FLAG_INTF_AUTHORIZED 7 /* authorize interfaces? */ #define HCD_FLAG_DEV_AUTHORIZED 8 /* authorize devices? */ And pci state seems to be D0 (according to driver, pdev->current_state) I can't see anything wrong from xhci/usb point of view. I'd focus more on the PCI errors in the logs as the cause for reading 0xffffffff from xhci mmio. Then again it might be a bit too drastic to kill xhci just because we read 0xffffffff once from a mmio xhci register. Maybe we should return an error a couple times before actually tearing down xhci. This tight check was originally done to detect pci hotplug removed hosts as soon as possible. -Mathias ^ permalink raw reply [flat|nested] 60+ messages in thread
* Possible regression between 4.9 and 4.13 @ 2017-08-29 13:28 ` Mathias Nyman 0 siblings, 0 replies; 60+ messages in thread From: Mathias Nyman @ 2017-08-29 13:28 UTC (permalink / raw) To: linux-arm-kernel On 28.08.2017 17:40, Mason wrote: > On 28/08/2017 10:39, Mathias Nyman wrote: > >> Could you take a log with the following added debug, without >> your extra delays, It should show a bit more about the state >> of the controller when we read 0xffffffff > > I applied the following patch on top of v4.12-rc1 > > diff --git a/drivers/usb/host/xhci-hub.c b/drivers/usb/host/xhci-hub.c > index 5e3e9d4c6956..c7ea7d4c801f 100644 > --- a/drivers/usb/host/xhci-hub.c > +++ b/drivers/usb/host/xhci-hub.c > @@ -23,6 +23,7 @@ > > #include <linux/slab.h> > #include <asm/unaligned.h> > +#include <linux/pci.h> > > #include "xhci.h" > #include "xhci-trace.h" > @@ -1268,7 +1269,10 @@ int xhci_hub_control(struct usb_hcd *hcd, u16 typeReq, u16 wValue, > wIndex--; > temp = readl(port_array[wIndex]); > if (temp == ~(u32)0) { > - xhci_hc_died(xhci); > + struct pci_dev *pdev = to_pci_dev(hcd->self.controller); > + xhci_err(xhci, "ClearPortFeat port%d @%p=%x, hcd->state:0x%x hcd->flags:0x%x, pci_state 0x%x\n", > + wIndex, port_array[wIndex], temp, hcd->state, hcd->flags, pdev->current_state); > + WARN_ON(1); > retval = -ENODEV; > break; > } > > > And here are logs I get when I plug/unplug my USB3 device. > > [ 14.970148] usb 2-2: new SuperSpeed USB device number 2 using xhci_hcd > [ 15.003487] usb 2-2: New USB device found, idVendor=0951, idProduct=1666 > [ 15.010237] usb 2-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3 > [ 15.017483] usb 2-2: Product: DataTraveler 3.0 > [ 15.021990] usb 2-2: Manufacturer: Kingston > [ 15.026234] usb 2-2: SerialNumber: 002618887865F0C0F8646BFA > [ 15.034830] usb-storage 2-2:1.0: USB Mass Storage device detected > [ 15.041269] scsi host0: usb-storage 2-2:1.0 > [ 16.056140] scsi 0:0:0:0: Direct-Access Kingston DataTraveler 3.0 PQ: 0 ANSI: 6 > [ 16.064979] sd 0:0:0:0: [sda] 15109516 512-byte logical blocks: (7.74 GB/7.20 GiB) > [ 16.072978] sd 0:0:0:0: [sda] Write Protect is off > [ 16.078076] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA > [ 16.089417] sda: sda1 > [ 16.093050] sd 0:0:0:0: [sda] Attached SCSI removable disk > > > [ 22.152078] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 > [ 22.160157] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) > [ 22.172051] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 > [ 22.180493] pcieport 0000:00:00.0: [14] Completion Timeout (First) > [ 22.187368] pcieport 0000:00:00.0: AER: Device recovery failed > [ 22.885269] xhci_hcd 0000:01:00.0: ClearPortFeat port1 @e0852430=ffffffff, hcd->state:0x1 hcd->flags:0x1a5, pci_state 0x0 State is HC_STATE_RUNNING, Flags bits 0,2,5,7,8 set: #define HCD_FLAG_HW_ACCESSIBLE 0 /* at full power */ #define HCD_FLAG_POLL_RH 2 /* poll for rh status? */ #define HCD_FLAG_RH_RUNNING 5 /* root hub is running? */ #define HCD_FLAG_INTF_AUTHORIZED 7 /* authorize interfaces? */ #define HCD_FLAG_DEV_AUTHORIZED 8 /* authorize devices? */ And pci state seems to be D0 (according to driver, pdev->current_state) I can't see anything wrong from xhci/usb point of view. I'd focus more on the PCI errors in the logs as the cause for reading 0xffffffff from xhci mmio. Then again it might be a bit too drastic to kill xhci just because we read 0xffffffff once from a mmio xhci register. Maybe we should return an error a couple times before actually tearing down xhci. This tight check was originally done to detect pci hotplug removed hosts as soon as possible. -Mathias ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: Possible regression between 4.9 and 4.13 2017-08-29 13:28 ` Mathias Nyman @ 2017-08-29 13:38 ` Lukas Wunner -1 siblings, 0 replies; 60+ messages in thread From: Lukas Wunner @ 2017-08-29 13:38 UTC (permalink / raw) To: Mathias Nyman Cc: Mason, Felipe Balbi, linux-pci, linux-usb, Linux ARM, Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman On Tue, Aug 29, 2017 at 04:28:53PM +0300, Mathias Nyman wrote: > Then again it might be a bit too drastic to kill xhci just because > we read 0xffffffff once from a mmio xhci register. Maybe we should > return an error a couple times before actually tearing down xhci. > > This tight check was originally done to detect pci hotplug removed > hosts as soon as possible. Just make pci_dev_is_disconnected() public to detect PCI hot removal. We *know* when the device was hot removed, so I think there's no need to guess that based on reading "all ones" from mmio (which may happen for entirely legitimate reasons unrelated to hot removal). Thanks, Lukas ^ permalink raw reply [flat|nested] 60+ messages in thread
* Possible regression between 4.9 and 4.13 @ 2017-08-29 13:38 ` Lukas Wunner 0 siblings, 0 replies; 60+ messages in thread From: Lukas Wunner @ 2017-08-29 13:38 UTC (permalink / raw) To: linux-arm-kernel On Tue, Aug 29, 2017 at 04:28:53PM +0300, Mathias Nyman wrote: > Then again it might be a bit too drastic to kill xhci just because > we read 0xffffffff once from a mmio xhci register. Maybe we should > return an error a couple times before actually tearing down xhci. > > This tight check was originally done to detect pci hotplug removed > hosts as soon as possible. Just make pci_dev_is_disconnected() public to detect PCI hot removal. We *know* when the device was hot removed, so I think there's no need to guess that based on reading "all ones" from mmio (which may happen for entirely legitimate reasons unrelated to hot removal). Thanks, Lukas ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: Possible regression between 4.9 and 4.13 2017-08-29 13:38 ` Lukas Wunner @ 2017-08-29 14:47 ` Greg Kroah-Hartman -1 siblings, 0 replies; 60+ messages in thread From: Greg Kroah-Hartman @ 2017-08-29 14:47 UTC (permalink / raw) To: Lukas Wunner Cc: Mathias Nyman, Mason, Felipe Balbi, linux-pci, linux-usb, Linux ARM, Bjorn Helgaas, Alan Stern On Tue, Aug 29, 2017 at 03:38:52PM +0200, Lukas Wunner wrote: > On Tue, Aug 29, 2017 at 04:28:53PM +0300, Mathias Nyman wrote: > > Then again it might be a bit too drastic to kill xhci just because > > we read 0xffffffff once from a mmio xhci register. Maybe we should > > return an error a couple times before actually tearing down xhci. > > > > This tight check was originally done to detect pci hotplug removed > > hosts as soon as possible. > > Just make pci_dev_is_disconnected() public to detect PCI hot removal. > We *know* when the device was hot removed, so I think there's no need > to guess that based on reading "all ones" from mmio (which may happen > for entirely legitimate reasons unrelated to hot removal). No, you don't always "know" when a device is removed, don't rely on it, not all platforms support that. One more reason why I hate that function and I'm glad it's not exported for others to think it somehow actually works for their system... Reading all ff shows the device is removed, that's all the PCI spec guarantees. What other legitimate reason could that happen for? thanks, greg k-h ^ permalink raw reply [flat|nested] 60+ messages in thread
* Possible regression between 4.9 and 4.13 @ 2017-08-29 14:47 ` Greg Kroah-Hartman 0 siblings, 0 replies; 60+ messages in thread From: Greg Kroah-Hartman @ 2017-08-29 14:47 UTC (permalink / raw) To: linux-arm-kernel On Tue, Aug 29, 2017 at 03:38:52PM +0200, Lukas Wunner wrote: > On Tue, Aug 29, 2017 at 04:28:53PM +0300, Mathias Nyman wrote: > > Then again it might be a bit too drastic to kill xhci just because > > we read 0xffffffff once from a mmio xhci register. Maybe we should > > return an error a couple times before actually tearing down xhci. > > > > This tight check was originally done to detect pci hotplug removed > > hosts as soon as possible. > > Just make pci_dev_is_disconnected() public to detect PCI hot removal. > We *know* when the device was hot removed, so I think there's no need > to guess that based on reading "all ones" from mmio (which may happen > for entirely legitimate reasons unrelated to hot removal). No, you don't always "know" when a device is removed, don't rely on it, not all platforms support that. One more reason why I hate that function and I'm glad it's not exported for others to think it somehow actually works for their system... Reading all ff shows the device is removed, that's all the PCI spec guarantees. What other legitimate reason could that happen for? thanks, greg k-h ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: Possible regression between 4.9 and 4.13 2017-08-29 14:47 ` Greg Kroah-Hartman @ 2017-08-29 15:34 ` Lukas Wunner -1 siblings, 0 replies; 60+ messages in thread From: Lukas Wunner @ 2017-08-29 15:34 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Mathias Nyman, Mason, Felipe Balbi, linux-pci, linux-usb, Linux ARM, Bjorn Helgaas, Alan Stern On Tue, Aug 29, 2017 at 04:47:25PM +0200, Greg Kroah-Hartman wrote: > On Tue, Aug 29, 2017 at 03:38:52PM +0200, Lukas Wunner wrote: > > On Tue, Aug 29, 2017 at 04:28:53PM +0300, Mathias Nyman wrote: > > > Then again it might be a bit too drastic to kill xhci just because > > > we read 0xffffffff once from a mmio xhci register. Maybe we should > > > return an error a couple times before actually tearing down xhci. > > > > > > This tight check was originally done to detect pci hotplug removed > > > hosts as soon as possible. > > > > Just make pci_dev_is_disconnected() public to detect PCI hot removal. > > We *know* when the device was hot removed, so I think there's no need > > to guess that based on reading "all ones" from mmio (which may happen > > for entirely legitimate reasons unrelated to hot removal). > > No, you don't always "know" when a device is removed, don't rely on it, > not all platforms support that. Please explain, which platforms don't support that? They wouldn't be compliant with the spec it seems. PCIe r3.1, section 6.7.3: "A Downstream Port with hot-plug capabilities supports the following hot-plug events: Presence Detect Changed A Downstream Port with hot-plug capabilities monitors the slot it controls for the slot events listed above. [...] If enabled through the associated enable field, slot events must generate a software notification." And pciehp sets the flag on all downstream devices that they're removed once the software notification has been received and processed. > Reading all ff shows the device is removed, that's all the PCI spec > guarantees. What other legitimate reason could that happen for? Is 0xffffffff not a valid value to be stored in and read from mmio space? Best regards, Lukas ^ permalink raw reply [flat|nested] 60+ messages in thread
* Possible regression between 4.9 and 4.13 @ 2017-08-29 15:34 ` Lukas Wunner 0 siblings, 0 replies; 60+ messages in thread From: Lukas Wunner @ 2017-08-29 15:34 UTC (permalink / raw) To: linux-arm-kernel On Tue, Aug 29, 2017 at 04:47:25PM +0200, Greg Kroah-Hartman wrote: > On Tue, Aug 29, 2017 at 03:38:52PM +0200, Lukas Wunner wrote: > > On Tue, Aug 29, 2017 at 04:28:53PM +0300, Mathias Nyman wrote: > > > Then again it might be a bit too drastic to kill xhci just because > > > we read 0xffffffff once from a mmio xhci register. Maybe we should > > > return an error a couple times before actually tearing down xhci. > > > > > > This tight check was originally done to detect pci hotplug removed > > > hosts as soon as possible. > > > > Just make pci_dev_is_disconnected() public to detect PCI hot removal. > > We *know* when the device was hot removed, so I think there's no need > > to guess that based on reading "all ones" from mmio (which may happen > > for entirely legitimate reasons unrelated to hot removal). > > No, you don't always "know" when a device is removed, don't rely on it, > not all platforms support that. Please explain, which platforms don't support that? They wouldn't be compliant with the spec it seems. PCIe r3.1, section 6.7.3: "A Downstream Port with hot-plug capabilities supports the following hot-plug events: Presence Detect Changed A Downstream Port with hot-plug capabilities monitors the slot it controls for the slot events listed above. [...] If enabled through the associated enable field, slot events must generate a software notification." And pciehp sets the flag on all downstream devices that they're removed once the software notification has been received and processed. > Reading all ff shows the device is removed, that's all the PCI spec > guarantees. What other legitimate reason could that happen for? Is 0xffffffff not a valid value to be stored in and read from mmio space? Best regards, Lukas ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: Possible regression between 4.9 and 4.13 2017-08-29 15:34 ` Lukas Wunner @ 2017-08-29 15:51 ` Greg Kroah-Hartman -1 siblings, 0 replies; 60+ messages in thread From: Greg Kroah-Hartman @ 2017-08-29 15:51 UTC (permalink / raw) To: Lukas Wunner Cc: Mathias Nyman, Mason, Felipe Balbi, linux-pci, linux-usb, Linux ARM, Bjorn Helgaas, Alan Stern On Tue, Aug 29, 2017 at 05:34:56PM +0200, Lukas Wunner wrote: > On Tue, Aug 29, 2017 at 04:47:25PM +0200, Greg Kroah-Hartman wrote: > > On Tue, Aug 29, 2017 at 03:38:52PM +0200, Lukas Wunner wrote: > > > On Tue, Aug 29, 2017 at 04:28:53PM +0300, Mathias Nyman wrote: > > > > Then again it might be a bit too drastic to kill xhci just because > > > > we read 0xffffffff once from a mmio xhci register. Maybe we should > > > > return an error a couple times before actually tearing down xhci. > > > > > > > > This tight check was originally done to detect pci hotplug removed > > > > hosts as soon as possible. > > > > > > Just make pci_dev_is_disconnected() public to detect PCI hot removal. > > > We *know* when the device was hot removed, so I think there's no need > > > to guess that based on reading "all ones" from mmio (which may happen > > > for entirely legitimate reasons unrelated to hot removal). > > > > No, you don't always "know" when a device is removed, don't rely on it, > > not all platforms support that. > > Please explain, which platforms don't support that? They wouldn't be > compliant with the spec it seems. > > PCIe r3.1, section 6.7.3: > > "A Downstream Port with hot-plug capabilities supports the > following hot-plug events: > > Presence Detect Changed > > A Downstream Port with hot-plug capabilities monitors the slot > it controls for the slot events listed above. [...] > If enabled through the associated enable field, slot events > must generate a software notification." > > And pciehp sets the flag on all downstream devices that they're removed > once the software notification has been received and processed. What about all of the non-pciehp platforms? :) Also, there is always a race between when that notification has been sent and processed on the PCIe channel and the reading of all 1s on the PCI bus by the driver. For fun, try disconnecting a USB device from one of the more modern laptops with a USB 3.1 connection on it. The bios treats those as a pci hotpluggable xhci controller on the PCI bus. When the device is disconnected, the BIOS rips out the PCI device as well, but all that time, the xhci driver is thinking the device is still present as it takes a while for the BIOS to do all of the needed housekeeping. It's a really long time for everything to shut down and to help prevent the driver from going crazy, it has to detect ffff reads as "disconnection happened". All PCI drivers have had to do this for decades now, it's nothing new here, PCIe just gave us a chance to be notified that the device really is gone now, PCI hotplug has always been out-of-band like this. > > Reading all ff shows the device is removed, that's all the PCI spec > > guarantees. What other legitimate reason could that happen for? > > Is 0xffffffff not a valid value to be stored in and read from mmio space? For a specific register, doubtful, which is why the code errors out, right? If it is a valid value, then it shouldn't be exiting, and move on to the next read. I don't understand what we are arguing about here anymore... thanks, greg k-h ^ permalink raw reply [flat|nested] 60+ messages in thread
* Possible regression between 4.9 and 4.13 @ 2017-08-29 15:51 ` Greg Kroah-Hartman 0 siblings, 0 replies; 60+ messages in thread From: Greg Kroah-Hartman @ 2017-08-29 15:51 UTC (permalink / raw) To: linux-arm-kernel On Tue, Aug 29, 2017 at 05:34:56PM +0200, Lukas Wunner wrote: > On Tue, Aug 29, 2017 at 04:47:25PM +0200, Greg Kroah-Hartman wrote: > > On Tue, Aug 29, 2017 at 03:38:52PM +0200, Lukas Wunner wrote: > > > On Tue, Aug 29, 2017 at 04:28:53PM +0300, Mathias Nyman wrote: > > > > Then again it might be a bit too drastic to kill xhci just because > > > > we read 0xffffffff once from a mmio xhci register. Maybe we should > > > > return an error a couple times before actually tearing down xhci. > > > > > > > > This tight check was originally done to detect pci hotplug removed > > > > hosts as soon as possible. > > > > > > Just make pci_dev_is_disconnected() public to detect PCI hot removal. > > > We *know* when the device was hot removed, so I think there's no need > > > to guess that based on reading "all ones" from mmio (which may happen > > > for entirely legitimate reasons unrelated to hot removal). > > > > No, you don't always "know" when a device is removed, don't rely on it, > > not all platforms support that. > > Please explain, which platforms don't support that? They wouldn't be > compliant with the spec it seems. > > PCIe r3.1, section 6.7.3: > > "A Downstream Port with hot-plug capabilities supports the > following hot-plug events: > > Presence Detect Changed > > A Downstream Port with hot-plug capabilities monitors the slot > it controls for the slot events listed above. [...] > If enabled through the associated enable field, slot events > must generate a software notification." > > And pciehp sets the flag on all downstream devices that they're removed > once the software notification has been received and processed. What about all of the non-pciehp platforms? :) Also, there is always a race between when that notification has been sent and processed on the PCIe channel and the reading of all 1s on the PCI bus by the driver. For fun, try disconnecting a USB device from one of the more modern laptops with a USB 3.1 connection on it. The bios treats those as a pci hotpluggable xhci controller on the PCI bus. When the device is disconnected, the BIOS rips out the PCI device as well, but all that time, the xhci driver is thinking the device is still present as it takes a while for the BIOS to do all of the needed housekeeping. It's a really long time for everything to shut down and to help prevent the driver from going crazy, it has to detect ffff reads as "disconnection happened". All PCI drivers have had to do this for decades now, it's nothing new here, PCIe just gave us a chance to be notified that the device really is gone now, PCI hotplug has always been out-of-band like this. > > Reading all ff shows the device is removed, that's all the PCI spec > > guarantees. What other legitimate reason could that happen for? > > Is 0xffffffff not a valid value to be stored in and read from mmio space? For a specific register, doubtful, which is why the code errors out, right? If it is a valid value, then it shouldn't be exiting, and move on to the next read. I don't understand what we are arguing about here anymore... thanks, greg k-h ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: Possible regression between 4.9 and 4.13 2017-08-29 15:51 ` Greg Kroah-Hartman @ 2017-08-30 6:36 ` Lukas Wunner -1 siblings, 0 replies; 60+ messages in thread From: Lukas Wunner @ 2017-08-30 6:36 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Mathias Nyman, Mason, Felipe Balbi, linux-pci, linux-usb, Linux ARM, Bjorn Helgaas, Alan Stern On Tue, Aug 29, 2017 at 05:51:38PM +0200, Greg Kroah-Hartman wrote: > On Tue, Aug 29, 2017 at 05:34:56PM +0200, Lukas Wunner wrote: > > On Tue, Aug 29, 2017 at 04:47:25PM +0200, Greg Kroah-Hartman wrote: > > > On Tue, Aug 29, 2017 at 03:38:52PM +0200, Lukas Wunner wrote: > > Is 0xffffffff not a valid value to be stored in and read from mmio space? > > For a specific register, doubtful Well, "doubtful" means you don't know for sure. It's fine to check for "all ones" as a heuristic if that's not a valid value for the register read, however a hotplug notification is a *definitive* indication the hardware is gone. I you seem to prefer forgoing a *definitive* indication for a mere heuristic, that doesn't make sense from my point of view. > It's a really long time for everything to shut down and to help > prevent the driver from going crazy, [...] > Also, there is always a race between when that notification has been > sent and processed on the PCIe channel and the reading of all 1s on the > PCI bus by the driver. Yes I know that. In practice the interrupt signaling hot removal comes fast enough to prevent drivers from "going crazy", as I've mentioned in this patch: https://patchwork.kernel.org/patch/9405255/ > For fun, try disconnecting a USB device from one of the more modern > laptops with a USB 3.1 connection on it. The bios treats those as a pci > hotpluggable xhci controller on the PCI bus. When the device is > disconnected, the BIOS rips out the PCI device as well, but all that > time, the xhci driver is thinking the device is still present as it > takes a while for the BIOS to do all of the needed housekeeping. Yes, that is the case with Thunderbolt 3 controllers on non-Macs: The XHCI controller appears below downstream bridge 2 of the Thunderbolt controller's PCIe switch. Once the last device is removed, the PCIe switch and all devices below it disappear and the controller is powered down. The controller is thus only visible if powered up. On Mac this is all native instead: Native pciehp, native tunnel setup, native PM. > > > > Just make pci_dev_is_disconnected() public to detect PCI hot removal. > > > > We *know* when the device was hot removed, so I think there's no need > > > > to guess that based on reading "all ones" from mmio (which may happen > > > > for entirely legitimate reasons unrelated to hot removal). > > > > > > No, you don't always "know" when a device is removed, don't rely on it, > > > not all platforms support that. > > > > Please explain, which platforms don't support that? They wouldn't be > > compliant with the spec it seems. > > What about all of the non-pciehp platforms? :) Fair enough, those should be extended to set PCI_DEV_DISCONNECTED as well. Thanks, Lukas ^ permalink raw reply [flat|nested] 60+ messages in thread
* Possible regression between 4.9 and 4.13 @ 2017-08-30 6:36 ` Lukas Wunner 0 siblings, 0 replies; 60+ messages in thread From: Lukas Wunner @ 2017-08-30 6:36 UTC (permalink / raw) To: linux-arm-kernel On Tue, Aug 29, 2017 at 05:51:38PM +0200, Greg Kroah-Hartman wrote: > On Tue, Aug 29, 2017 at 05:34:56PM +0200, Lukas Wunner wrote: > > On Tue, Aug 29, 2017 at 04:47:25PM +0200, Greg Kroah-Hartman wrote: > > > On Tue, Aug 29, 2017 at 03:38:52PM +0200, Lukas Wunner wrote: > > Is 0xffffffff not a valid value to be stored in and read from mmio space? > > For a specific register, doubtful Well, "doubtful" means you don't know for sure. It's fine to check for "all ones" as a heuristic if that's not a valid value for the register read, however a hotplug notification is a *definitive* indication the hardware is gone. I you seem to prefer forgoing a *definitive* indication for a mere heuristic, that doesn't make sense from my point of view. > It's a really long time for everything to shut down and to help > prevent the driver from going crazy, [...] > Also, there is always a race between when that notification has been > sent and processed on the PCIe channel and the reading of all 1s on the > PCI bus by the driver. Yes I know that. In practice the interrupt signaling hot removal comes fast enough to prevent drivers from "going crazy", as I've mentioned in this patch: https://patchwork.kernel.org/patch/9405255/ > For fun, try disconnecting a USB device from one of the more modern > laptops with a USB 3.1 connection on it. The bios treats those as a pci > hotpluggable xhci controller on the PCI bus. When the device is > disconnected, the BIOS rips out the PCI device as well, but all that > time, the xhci driver is thinking the device is still present as it > takes a while for the BIOS to do all of the needed housekeeping. Yes, that is the case with Thunderbolt 3 controllers on non-Macs: The XHCI controller appears below downstream bridge 2 of the Thunderbolt controller's PCIe switch. Once the last device is removed, the PCIe switch and all devices below it disappear and the controller is powered down. The controller is thus only visible if powered up. On Mac this is all native instead: Native pciehp, native tunnel setup, native PM. > > > > Just make pci_dev_is_disconnected() public to detect PCI hot removal. > > > > We *know* when the device was hot removed, so I think there's no need > > > > to guess that based on reading "all ones" from mmio (which may happen > > > > for entirely legitimate reasons unrelated to hot removal). > > > > > > No, you don't always "know" when a device is removed, don't rely on it, > > > not all platforms support that. > > > > Please explain, which platforms don't support that? They wouldn't be > > compliant with the spec it seems. > > What about all of the non-pciehp platforms? :) Fair enough, those should be extended to set PCI_DEV_DISCONNECTED as well. Thanks, Lukas ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: Possible regression between 4.9 and 4.13 2017-08-30 6:36 ` Lukas Wunner @ 2017-08-30 6:45 ` Greg Kroah-Hartman -1 siblings, 0 replies; 60+ messages in thread From: Greg Kroah-Hartman @ 2017-08-30 6:45 UTC (permalink / raw) To: Lukas Wunner Cc: Mathias Nyman, Mason, Felipe Balbi, linux-pci, linux-usb, Linux ARM, Bjorn Helgaas, Alan Stern On Wed, Aug 30, 2017 at 08:36:23AM +0200, Lukas Wunner wrote: > On Tue, Aug 29, 2017 at 05:51:38PM +0200, Greg Kroah-Hartman wrote: > > On Tue, Aug 29, 2017 at 05:34:56PM +0200, Lukas Wunner wrote: > > > On Tue, Aug 29, 2017 at 04:47:25PM +0200, Greg Kroah-Hartman wrote: > > > > On Tue, Aug 29, 2017 at 03:38:52PM +0200, Lukas Wunner wrote: > > > Is 0xffffffff not a valid value to be stored in and read from mmio space? > > > > For a specific register, doubtful > > Well, "doubtful" means you don't know for sure. > > It's fine to check for "all ones" as a heuristic if that's not a valid > value for the register read, however a hotplug notification is a > *definitive* indication the hardware is gone. > > I you seem to prefer forgoing a *definitive* indication for a mere > heuristic, that doesn't make sense from my point of view. I still don't know what you are arguing about here. The _driver_ knows if a specific read allows all ones as a valid return value. If it isn't, then the driver knows the device is now gone. It's that simple, don't do that type of check if all ones is a valid read. And that's not what is happening here anyway, so again, what is this discussion about? Unless there's something specific we can do here for the xhci driver, I think this thread is dead until someone determines what is going wrong with the hardware the original reporter posted about. greg k-h ^ permalink raw reply [flat|nested] 60+ messages in thread
* Possible regression between 4.9 and 4.13 @ 2017-08-30 6:45 ` Greg Kroah-Hartman 0 siblings, 0 replies; 60+ messages in thread From: Greg Kroah-Hartman @ 2017-08-30 6:45 UTC (permalink / raw) To: linux-arm-kernel On Wed, Aug 30, 2017 at 08:36:23AM +0200, Lukas Wunner wrote: > On Tue, Aug 29, 2017 at 05:51:38PM +0200, Greg Kroah-Hartman wrote: > > On Tue, Aug 29, 2017 at 05:34:56PM +0200, Lukas Wunner wrote: > > > On Tue, Aug 29, 2017 at 04:47:25PM +0200, Greg Kroah-Hartman wrote: > > > > On Tue, Aug 29, 2017 at 03:38:52PM +0200, Lukas Wunner wrote: > > > Is 0xffffffff not a valid value to be stored in and read from mmio space? > > > > For a specific register, doubtful > > Well, "doubtful" means you don't know for sure. > > It's fine to check for "all ones" as a heuristic if that's not a valid > value for the register read, however a hotplug notification is a > *definitive* indication the hardware is gone. > > I you seem to prefer forgoing a *definitive* indication for a mere > heuristic, that doesn't make sense from my point of view. I still don't know what you are arguing about here. The _driver_ knows if a specific read allows all ones as a valid return value. If it isn't, then the driver knows the device is now gone. It's that simple, don't do that type of check if all ones is a valid read. And that's not what is happening here anyway, so again, what is this discussion about? Unless there's something specific we can do here for the xhci driver, I think this thread is dead until someone determines what is going wrong with the hardware the original reporter posted about. greg k-h ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: Possible regression between 4.9 and 4.13 2017-08-29 13:28 ` Mathias Nyman @ 2017-08-29 23:53 ` Lukas Wunner -1 siblings, 0 replies; 60+ messages in thread From: Lukas Wunner @ 2017-08-29 23:53 UTC (permalink / raw) To: Mathias Nyman Cc: Mason, Felipe Balbi, linux-pci, linux-usb, Linux ARM, Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman On Tue, Aug 29, 2017 at 04:28:53PM +0300, Mathias Nyman wrote: > This tight check was originally done to detect pci hotplug removed > hosts as soon as possible. In Mason's case, the parent of the XHCI controller isn't a hotplug port, see this lspci output: https://www.spinics.net/lists/linux-usb/msg160010.html Please check is_hotplug_bridge in the parent's struct pci_dev before assuming that the XHCI controller was unplugged. Thanks, Lukas ^ permalink raw reply [flat|nested] 60+ messages in thread
* Possible regression between 4.9 and 4.13 @ 2017-08-29 23:53 ` Lukas Wunner 0 siblings, 0 replies; 60+ messages in thread From: Lukas Wunner @ 2017-08-29 23:53 UTC (permalink / raw) To: linux-arm-kernel On Tue, Aug 29, 2017 at 04:28:53PM +0300, Mathias Nyman wrote: > This tight check was originally done to detect pci hotplug removed > hosts as soon as possible. In Mason's case, the parent of the XHCI controller isn't a hotplug port, see this lspci output: https://www.spinics.net/lists/linux-usb/msg160010.html Please check is_hotplug_bridge in the parent's struct pci_dev before assuming that the XHCI controller was unplugged. Thanks, Lukas ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: Possible regression between 4.9 and 4.13 2017-08-29 23:53 ` Lukas Wunner @ 2017-08-30 6:02 ` Greg Kroah-Hartman -1 siblings, 0 replies; 60+ messages in thread From: Greg Kroah-Hartman @ 2017-08-30 6:02 UTC (permalink / raw) To: Lukas Wunner Cc: Mathias Nyman, Mason, Felipe Balbi, linux-pci, linux-usb, Linux ARM, Bjorn Helgaas, Alan Stern On Wed, Aug 30, 2017 at 01:53:10AM +0200, Lukas Wunner wrote: > On Tue, Aug 29, 2017 at 04:28:53PM +0300, Mathias Nyman wrote: > > This tight check was originally done to detect pci hotplug removed > > hosts as soon as possible. > > In Mason's case, the parent of the XHCI controller isn't a hotplug port, > see this lspci output: > > https://www.spinics.net/lists/linux-usb/msg160010.html > > Please check is_hotplug_bridge in the parent's struct pci_dev before > assuming that the XHCI controller was unplugged. How can you guarantee that this is set on some systems? Will it be set on cardbus devices? What about on a "normal" system where I can just go and yank out a PCI card at will? I don't think this is a valid thing to check, and again, why are we arguing this point? It's been this way since the 1990's, this isn't a new thing... To get back to the original issue here, the hardware seems to have died, the driver stops talking to it, and all is good. The "regression" here is that we now properly can determine that the hardware is crap. So, how do you think we should proceed, delay a bit longer before saying the device is gone? How long is "long enough"? How many bus errors are we allowed to tolerate (hint, the PCI spec says none...) Maybe someone wants to get to the root problem here, why is the hardware suddenly reporting all 1s? thanks, greg k-h ^ permalink raw reply [flat|nested] 60+ messages in thread
* Possible regression between 4.9 and 4.13 @ 2017-08-30 6:02 ` Greg Kroah-Hartman 0 siblings, 0 replies; 60+ messages in thread From: Greg Kroah-Hartman @ 2017-08-30 6:02 UTC (permalink / raw) To: linux-arm-kernel On Wed, Aug 30, 2017 at 01:53:10AM +0200, Lukas Wunner wrote: > On Tue, Aug 29, 2017 at 04:28:53PM +0300, Mathias Nyman wrote: > > This tight check was originally done to detect pci hotplug removed > > hosts as soon as possible. > > In Mason's case, the parent of the XHCI controller isn't a hotplug port, > see this lspci output: > > https://www.spinics.net/lists/linux-usb/msg160010.html > > Please check is_hotplug_bridge in the parent's struct pci_dev before > assuming that the XHCI controller was unplugged. How can you guarantee that this is set on some systems? Will it be set on cardbus devices? What about on a "normal" system where I can just go and yank out a PCI card at will? I don't think this is a valid thing to check, and again, why are we arguing this point? It's been this way since the 1990's, this isn't a new thing... To get back to the original issue here, the hardware seems to have died, the driver stops talking to it, and all is good. The "regression" here is that we now properly can determine that the hardware is crap. So, how do you think we should proceed, delay a bit longer before saying the device is gone? How long is "long enough"? How many bus errors are we allowed to tolerate (hint, the PCI spec says none...) Maybe someone wants to get to the root problem here, why is the hardware suddenly reporting all 1s? thanks, greg k-h ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: Possible regression between 4.9 and 4.13 2017-08-30 6:02 ` Greg Kroah-Hartman @ 2017-08-30 8:55 ` Mason -1 siblings, 0 replies; 60+ messages in thread From: Mason @ 2017-08-30 8:55 UTC (permalink / raw) To: Greg Kroah-Hartman, Lukas Wunner Cc: Mathias Nyman, Felipe Balbi, linux-pci, linux-usb, Linux ARM, Bjorn Helgaas, Alan Stern On 30/08/2017 08:02, Greg Kroah-Hartman wrote: > To get back to the original issue here, the hardware seems to have died, > the driver stops talking to it, and all is good. The "regression" here > is that we now properly can determine that the hardware is crap. Before 4.12, when I unplugged my USB3 Flash drive, Linux would detect a few "Uncorrected Non-Fatal errors" via AER, but it was still possible to plug the drive back in. Since 4.12, once I unplug the drive, the whole USB3 card is marked as dead (all 4 ports), and I can no longer plug anything in (not even the USB2 drive that didn't have any issues, IIRC). It seems a bit premature to "mark as dead" something that remains functional, doesn't it? Disclaimer, there are many variables in this setup, and I've only tested a small fraction of the problem space: only one system, only one USB3 board, only one USB3 Flash drive. > So, how do you think we should proceed, delay a bit longer before saying > the device is gone? How long is "long enough"? How many bus errors are > we allowed to tolerate (hint, the PCI spec says none...) > > Maybe someone wants to get to the root problem here, why is the hardware > suddenly reporting all 1s? I'm afraid I won't be able to make any progress on this front, unless I can get my hands on a PCIe packet analyzer. Regards. ^ permalink raw reply [flat|nested] 60+ messages in thread
* Possible regression between 4.9 and 4.13 @ 2017-08-30 8:55 ` Mason 0 siblings, 0 replies; 60+ messages in thread From: Mason @ 2017-08-30 8:55 UTC (permalink / raw) To: linux-arm-kernel On 30/08/2017 08:02, Greg Kroah-Hartman wrote: > To get back to the original issue here, the hardware seems to have died, > the driver stops talking to it, and all is good. The "regression" here > is that we now properly can determine that the hardware is crap. Before 4.12, when I unplugged my USB3 Flash drive, Linux would detect a few "Uncorrected Non-Fatal errors" via AER, but it was still possible to plug the drive back in. Since 4.12, once I unplug the drive, the whole USB3 card is marked as dead (all 4 ports), and I can no longer plug anything in (not even the USB2 drive that didn't have any issues, IIRC). It seems a bit premature to "mark as dead" something that remains functional, doesn't it? Disclaimer, there are many variables in this setup, and I've only tested a small fraction of the problem space: only one system, only one USB3 board, only one USB3 Flash drive. > So, how do you think we should proceed, delay a bit longer before saying > the device is gone? How long is "long enough"? How many bus errors are > we allowed to tolerate (hint, the PCI spec says none...) > > Maybe someone wants to get to the root problem here, why is the hardware > suddenly reporting all 1s? I'm afraid I won't be able to make any progress on this front, unless I can get my hands on a PCIe packet analyzer. Regards. ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: Possible regression between 4.9 and 4.13 2017-08-30 8:55 ` Mason @ 2017-08-30 9:06 ` Greg Kroah-Hartman -1 siblings, 0 replies; 60+ messages in thread From: Greg Kroah-Hartman @ 2017-08-30 9:06 UTC (permalink / raw) To: Mason Cc: Lukas Wunner, Mathias Nyman, Felipe Balbi, linux-pci, linux-usb, Linux ARM, Bjorn Helgaas, Alan Stern On Wed, Aug 30, 2017 at 10:55:37AM +0200, Mason wrote: > On 30/08/2017 08:02, Greg Kroah-Hartman wrote: > > > To get back to the original issue here, the hardware seems to have died, > > the driver stops talking to it, and all is good. The "regression" here > > is that we now properly can determine that the hardware is crap. > > Before 4.12, when I unplugged my USB3 Flash drive, Linux would > detect a few "Uncorrected Non-Fatal errors" via AER, but it was > still possible to plug the drive back in. > > Since 4.12, once I unplug the drive, the whole USB3 card is marked > as dead (all 4 ports), and I can no longer plug anything in (not even > the USB2 drive that didn't have any issues, IIRC). > > It seems a bit premature to "mark as dead" something that remains > functional, doesn't it? I agree, but if the device sends all ones, it's a good indication it is really dead, right? Or something is wrong with it. > Disclaimer, there are many variables in this setup, and I've only > tested a small fraction of the problem space: only one system, > only one USB3 board, only one USB3 Flash drive. Did you ever happen to narrow this down to a single git commit using 'git bisect'? I can't remember what happened in the beginning of this thread... > > So, how do you think we should proceed, delay a bit longer before saying > > the device is gone? How long is "long enough"? How many bus errors are > > we allowed to tolerate (hint, the PCI spec says none...) > > > > Maybe someone wants to get to the root problem here, why is the hardware > > suddenly reporting all 1s? > > I'm afraid I won't be able to make any progress on this front, > unless I can get my hands on a PCIe packet analyzer. Odds of that happening are pretty rare, right? I've never even seen one of those... thanks, greg k-h ^ permalink raw reply [flat|nested] 60+ messages in thread
* Possible regression between 4.9 and 4.13 @ 2017-08-30 9:06 ` Greg Kroah-Hartman 0 siblings, 0 replies; 60+ messages in thread From: Greg Kroah-Hartman @ 2017-08-30 9:06 UTC (permalink / raw) To: linux-arm-kernel On Wed, Aug 30, 2017 at 10:55:37AM +0200, Mason wrote: > On 30/08/2017 08:02, Greg Kroah-Hartman wrote: > > > To get back to the original issue here, the hardware seems to have died, > > the driver stops talking to it, and all is good. The "regression" here > > is that we now properly can determine that the hardware is crap. > > Before 4.12, when I unplugged my USB3 Flash drive, Linux would > detect a few "Uncorrected Non-Fatal errors" via AER, but it was > still possible to plug the drive back in. > > Since 4.12, once I unplug the drive, the whole USB3 card is marked > as dead (all 4 ports), and I can no longer plug anything in (not even > the USB2 drive that didn't have any issues, IIRC). > > It seems a bit premature to "mark as dead" something that remains > functional, doesn't it? I agree, but if the device sends all ones, it's a good indication it is really dead, right? Or something is wrong with it. > Disclaimer, there are many variables in this setup, and I've only > tested a small fraction of the problem space: only one system, > only one USB3 board, only one USB3 Flash drive. Did you ever happen to narrow this down to a single git commit using 'git bisect'? I can't remember what happened in the beginning of this thread... > > So, how do you think we should proceed, delay a bit longer before saying > > the device is gone? How long is "long enough"? How many bus errors are > > we allowed to tolerate (hint, the PCI spec says none...) > > > > Maybe someone wants to get to the root problem here, why is the hardware > > suddenly reporting all 1s? > > I'm afraid I won't be able to make any progress on this front, > unless I can get my hands on a PCIe packet analyzer. Odds of that happening are pretty rare, right? I've never even seen one of those... thanks, greg k-h ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: Possible regression between 4.9 and 4.13 2017-08-30 9:06 ` Greg Kroah-Hartman @ 2017-08-31 9:39 ` Mason -1 siblings, 0 replies; 60+ messages in thread From: Mason @ 2017-08-31 9:39 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Lukas Wunner, Mathias Nyman, Felipe Balbi, linux-pci, linux-usb, Linux ARM, Bjorn Helgaas, Alan Stern On 30/08/2017 11:06, Greg Kroah-Hartman wrote: > On Wed, Aug 30, 2017 at 10:55:37AM +0200, Mason wrote: > >> On 30/08/2017 08:02, Greg Kroah-Hartman wrote: >> >>> To get back to the original issue here, the hardware seems to have died, >>> the driver stops talking to it, and all is good. The "regression" here >>> is that we now properly can determine that the hardware is crap. >> >> Before 4.12, when I unplugged my USB3 Flash drive, Linux would >> detect a few "Uncorrected Non-Fatal errors" via AER, but it was >> still possible to plug the drive back in. >> >> Since 4.12, once I unplug the drive, the whole USB3 card is marked >> as dead (all 4 ports), and I can no longer plug anything in (not even >> the USB2 drive that didn't have any issues, IIRC). >> >> It seems a bit premature to "mark as dead" something that remains >> functional, doesn't it? > > I agree, but if the device sends all ones, it's a good indication it is > really dead, right? Or something is wrong with it. I wouldn't call it dead if I can plug the drive back in, and have it working... But I agree that something fishy is happening... >> Disclaimer, there are many variables in this setup, and I've only >> tested a small fraction of the problem space: only one system, >> only one USB3 board, only one USB3 Flash drive. > > Did you ever happen to narrow this down to a single git commit using > 'git bisect'? I can't remember what happened in the beginning of this > thread... Mathias pointed out d9f11ba9f107aa335091ab8d7ba5eea714e46e8b >>> So, how do you think we should proceed, delay a bit longer before saying >>> the device is gone? How long is "long enough"? How many bus errors are >>> we allowed to tolerate (hint, the PCI spec says none...) >>> >>> Maybe someone wants to get to the root problem here, why is the hardware >>> suddenly reporting all 1s? >> >> I'm afraid I won't be able to make any progress on this front, >> unless I can get my hands on a PCIe packet analyzer. > > Odds of that happening are pretty rare, right? I've never even seen one > of those... I had a "Summit T24 Analyzer" on my desk a few months ago, but I was getting strange results, and the knowledgeable people in my company were not available at the time. http://teledynelecroy.com/protocolanalyzer/protocoloverview.aspx?seriesid=445 Regards. ^ permalink raw reply [flat|nested] 60+ messages in thread
* Possible regression between 4.9 and 4.13 @ 2017-08-31 9:39 ` Mason 0 siblings, 0 replies; 60+ messages in thread From: Mason @ 2017-08-31 9:39 UTC (permalink / raw) To: linux-arm-kernel On 30/08/2017 11:06, Greg Kroah-Hartman wrote: > On Wed, Aug 30, 2017 at 10:55:37AM +0200, Mason wrote: > >> On 30/08/2017 08:02, Greg Kroah-Hartman wrote: >> >>> To get back to the original issue here, the hardware seems to have died, >>> the driver stops talking to it, and all is good. The "regression" here >>> is that we now properly can determine that the hardware is crap. >> >> Before 4.12, when I unplugged my USB3 Flash drive, Linux would >> detect a few "Uncorrected Non-Fatal errors" via AER, but it was >> still possible to plug the drive back in. >> >> Since 4.12, once I unplug the drive, the whole USB3 card is marked >> as dead (all 4 ports), and I can no longer plug anything in (not even >> the USB2 drive that didn't have any issues, IIRC). >> >> It seems a bit premature to "mark as dead" something that remains >> functional, doesn't it? > > I agree, but if the device sends all ones, it's a good indication it is > really dead, right? Or something is wrong with it. I wouldn't call it dead if I can plug the drive back in, and have it working... But I agree that something fishy is happening... >> Disclaimer, there are many variables in this setup, and I've only >> tested a small fraction of the problem space: only one system, >> only one USB3 board, only one USB3 Flash drive. > > Did you ever happen to narrow this down to a single git commit using > 'git bisect'? I can't remember what happened in the beginning of this > thread... Mathias pointed out d9f11ba9f107aa335091ab8d7ba5eea714e46e8b >>> So, how do you think we should proceed, delay a bit longer before saying >>> the device is gone? How long is "long enough"? How many bus errors are >>> we allowed to tolerate (hint, the PCI spec says none...) >>> >>> Maybe someone wants to get to the root problem here, why is the hardware >>> suddenly reporting all 1s? >> >> I'm afraid I won't be able to make any progress on this front, >> unless I can get my hands on a PCIe packet analyzer. > > Odds of that happening are pretty rare, right? I've never even seen one > of those... I had a "Summit T24 Analyzer" on my desk a few months ago, but I was getting strange results, and the knowledgeable people in my company were not available at the time. http://teledynelecroy.com/protocolanalyzer/protocoloverview.aspx?seriesid=445 Regards. ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: Possible regression between 4.9 and 4.13 2017-08-31 9:39 ` Mason @ 2017-08-31 11:40 ` Mathias Nyman -1 siblings, 0 replies; 60+ messages in thread From: Mathias Nyman @ 2017-08-31 11:40 UTC (permalink / raw) To: Mason, Greg Kroah-Hartman Cc: Lukas Wunner, Felipe Balbi, linux-pci, linux-usb, Linux ARM, Bjorn Helgaas, Alan Stern On 31.08.2017 12:39, Mason wrote: > On 30/08/2017 11:06, Greg Kroah-Hartman wrote: > >> On Wed, Aug 30, 2017 at 10:55:37AM +0200, Mason wrote: >> >>> On 30/08/2017 08:02, Greg Kroah-Hartman wrote: >>> >>>> To get back to the original issue here, the hardware seems to have died, >>>> the driver stops talking to it, and all is good. The "regression" here >>>> is that we now properly can determine that the hardware is crap. >>> >>> Before 4.12, when I unplugged my USB3 Flash drive, Linux would >>> detect a few "Uncorrected Non-Fatal errors" via AER, but it was >>> still possible to plug the drive back in. >>> >>> Since 4.12, once I unplug the drive, the whole USB3 card is marked >>> as dead (all 4 ports), and I can no longer plug anything in (not even >>> the USB2 drive that didn't have any issues, IIRC). >>> >>> It seems a bit premature to "mark as dead" something that remains >>> functional, doesn't it? >> >> I agree, but if the device sends all ones, it's a good indication it is >> really dead, right? Or something is wrong with it. > > I wouldn't call it dead if I can plug the drive back in, and have > it working... But I agree that something fishy is happening... > >>> Disclaimer, there are many variables in this setup, and I've only >>> tested a small fraction of the problem space: only one system, >>> only one USB3 board, only one USB3 Flash drive. >> >> Did you ever happen to narrow this down to a single git commit using >> 'git bisect'? I can't remember what happened in the beginning of this >> thread... > > Mathias pointed out d9f11ba9f107aa335091ab8d7ba5eea714e46e8b > That patch only changes how xhci reacts to reading 0xffffffff. we used to just returned -ENODEV, but after patch we assume hardware is broken or removed. -Mathias ^ permalink raw reply [flat|nested] 60+ messages in thread
* Possible regression between 4.9 and 4.13 @ 2017-08-31 11:40 ` Mathias Nyman 0 siblings, 0 replies; 60+ messages in thread From: Mathias Nyman @ 2017-08-31 11:40 UTC (permalink / raw) To: linux-arm-kernel On 31.08.2017 12:39, Mason wrote: > On 30/08/2017 11:06, Greg Kroah-Hartman wrote: > >> On Wed, Aug 30, 2017 at 10:55:37AM +0200, Mason wrote: >> >>> On 30/08/2017 08:02, Greg Kroah-Hartman wrote: >>> >>>> To get back to the original issue here, the hardware seems to have died, >>>> the driver stops talking to it, and all is good. The "regression" here >>>> is that we now properly can determine that the hardware is crap. >>> >>> Before 4.12, when I unplugged my USB3 Flash drive, Linux would >>> detect a few "Uncorrected Non-Fatal errors" via AER, but it was >>> still possible to plug the drive back in. >>> >>> Since 4.12, once I unplug the drive, the whole USB3 card is marked >>> as dead (all 4 ports), and I can no longer plug anything in (not even >>> the USB2 drive that didn't have any issues, IIRC). >>> >>> It seems a bit premature to "mark as dead" something that remains >>> functional, doesn't it? >> >> I agree, but if the device sends all ones, it's a good indication it is >> really dead, right? Or something is wrong with it. > > I wouldn't call it dead if I can plug the drive back in, and have > it working... But I agree that something fishy is happening... > >>> Disclaimer, there are many variables in this setup, and I've only >>> tested a small fraction of the problem space: only one system, >>> only one USB3 board, only one USB3 Flash drive. >> >> Did you ever happen to narrow this down to a single git commit using >> 'git bisect'? I can't remember what happened in the beginning of this >> thread... > > Mathias pointed out d9f11ba9f107aa335091ab8d7ba5eea714e46e8b > That patch only changes how xhci reacts to reading 0xffffffff. we used to just returned -ENODEV, but after patch we assume hardware is broken or removed. -Mathias ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: Possible regression between 4.9 and 4.13 2017-08-30 8:55 ` Mason @ 2017-08-30 9:07 ` Ard Biesheuvel -1 siblings, 0 replies; 60+ messages in thread From: Ard Biesheuvel @ 2017-08-30 9:07 UTC (permalink / raw) To: Mason Cc: Greg Kroah-Hartman, Lukas Wunner, Mathias Nyman, Felipe Balbi, linux-pci, linux-usb, Bjorn Helgaas, Alan Stern, Linux ARM On 30 August 2017 at 09:55, Mason <slash.tmp@free.fr> wrote: > On 30/08/2017 08:02, Greg Kroah-Hartman wrote: > >> To get back to the original issue here, the hardware seems to have died, >> the driver stops talking to it, and all is good. The "regression" here >> is that we now properly can determine that the hardware is crap. > > Before 4.12, when I unplugged my USB3 Flash drive, Linux would > detect a few "Uncorrected Non-Fatal errors" via AER, but it was > still possible to plug the drive back in. > > Since 4.12, once I unplug the drive, the whole USB3 card is marked > as dead (all 4 ports), and I can no longer plug anything in (not even > the USB2 drive that didn't have any issues, IIRC). > > It seems a bit premature to "mark as dead" something that remains > functional, doesn't it? > > Disclaimer, there are many variables in this setup, and I've only > tested a small fraction of the problem space: only one system, > only one USB3 board, only one USB3 Flash drive. > Please don't forget to mention that this is quirky hardware that depends on BROKEN because it multiplexes MMIO and config space accesses in the same memory window without any locking whatsoever (which would be difficult to do in the first place because we don't use accessors for MMIO in the kernel). So how likely is it that you are attempting to read from the xhci BAR window while a config space access is in progress? Any way to instrument this in your driver? >> So, how do you think we should proceed, delay a bit longer before saying >> the device is gone? How long is "long enough"? How many bus errors are >> we allowed to tolerate (hint, the PCI spec says none...) >> >> Maybe someone wants to get to the root problem here, why is the hardware >> suddenly reporting all 1s? > > I'm afraid I won't be able to make any progress on this front, > unless I can get my hands on a PCIe packet analyzer. > > Regards. > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 60+ messages in thread
* Possible regression between 4.9 and 4.13 @ 2017-08-30 9:07 ` Ard Biesheuvel 0 siblings, 0 replies; 60+ messages in thread From: Ard Biesheuvel @ 2017-08-30 9:07 UTC (permalink / raw) To: linux-arm-kernel On 30 August 2017 at 09:55, Mason <slash.tmp@free.fr> wrote: > On 30/08/2017 08:02, Greg Kroah-Hartman wrote: > >> To get back to the original issue here, the hardware seems to have died, >> the driver stops talking to it, and all is good. The "regression" here >> is that we now properly can determine that the hardware is crap. > > Before 4.12, when I unplugged my USB3 Flash drive, Linux would > detect a few "Uncorrected Non-Fatal errors" via AER, but it was > still possible to plug the drive back in. > > Since 4.12, once I unplug the drive, the whole USB3 card is marked > as dead (all 4 ports), and I can no longer plug anything in (not even > the USB2 drive that didn't have any issues, IIRC). > > It seems a bit premature to "mark as dead" something that remains > functional, doesn't it? > > Disclaimer, there are many variables in this setup, and I've only > tested a small fraction of the problem space: only one system, > only one USB3 board, only one USB3 Flash drive. > Please don't forget to mention that this is quirky hardware that depends on BROKEN because it multiplexes MMIO and config space accesses in the same memory window without any locking whatsoever (which would be difficult to do in the first place because we don't use accessors for MMIO in the kernel). So how likely is it that you are attempting to read from the xhci BAR window while a config space access is in progress? Any way to instrument this in your driver? >> So, how do you think we should proceed, delay a bit longer before saying >> the device is gone? How long is "long enough"? How many bus errors are >> we allowed to tolerate (hint, the PCI spec says none...) >> >> Maybe someone wants to get to the root problem here, why is the hardware >> suddenly reporting all 1s? > > I'm afraid I won't be able to make any progress on this front, > unless I can get my hands on a PCIe packet analyzer. > > Regards. > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel at lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: Possible regression between 4.9 and 4.13 2017-08-30 9:07 ` Ard Biesheuvel @ 2017-08-30 9:22 ` Greg Kroah-Hartman -1 siblings, 0 replies; 60+ messages in thread From: Greg Kroah-Hartman @ 2017-08-30 9:22 UTC (permalink / raw) To: Ard Biesheuvel Cc: Mason, Lukas Wunner, Mathias Nyman, Felipe Balbi, linux-pci, linux-usb, Bjorn Helgaas, Alan Stern, Linux ARM On Wed, Aug 30, 2017 at 10:07:59AM +0100, Ard Biesheuvel wrote: > On 30 August 2017 at 09:55, Mason <slash.tmp@free.fr> wrote: > > On 30/08/2017 08:02, Greg Kroah-Hartman wrote: > > > >> To get back to the original issue here, the hardware seems to have died, > >> the driver stops talking to it, and all is good. The "regression" here > >> is that we now properly can determine that the hardware is crap. > > > > Before 4.12, when I unplugged my USB3 Flash drive, Linux would > > detect a few "Uncorrected Non-Fatal errors" via AER, but it was > > still possible to plug the drive back in. > > > > Since 4.12, once I unplug the drive, the whole USB3 card is marked > > as dead (all 4 ports), and I can no longer plug anything in (not even > > the USB2 drive that didn't have any issues, IIRC). > > > > It seems a bit premature to "mark as dead" something that remains > > functional, doesn't it? > > > > Disclaimer, there are many variables in this setup, and I've only > > tested a small fraction of the problem space: only one system, > > only one USB3 board, only one USB3 Flash drive. > > > > Please don't forget to mention that this is quirky hardware that > depends on BROKEN because it multiplexes MMIO and config space > accesses in the same memory window without any locking whatsoever > (which would be difficult to do in the first place because we don't > use accessors for MMIO in the kernel). > > So how likely is it that you are attempting to read from the xhci BAR > window while a config space access is in progress? Any way to > instrument this in your driver? Seriously? Ok, that's crap hardware, sorry, I don't feel bad at all here. You are going to have worse problems than just a single USB controller issue if that's your hardware design, go kick some hardware engineers for me please. good luck, you are on your own :( greg k-h ^ permalink raw reply [flat|nested] 60+ messages in thread
* Possible regression between 4.9 and 4.13 @ 2017-08-30 9:22 ` Greg Kroah-Hartman 0 siblings, 0 replies; 60+ messages in thread From: Greg Kroah-Hartman @ 2017-08-30 9:22 UTC (permalink / raw) To: linux-arm-kernel On Wed, Aug 30, 2017 at 10:07:59AM +0100, Ard Biesheuvel wrote: > On 30 August 2017 at 09:55, Mason <slash.tmp@free.fr> wrote: > > On 30/08/2017 08:02, Greg Kroah-Hartman wrote: > > > >> To get back to the original issue here, the hardware seems to have died, > >> the driver stops talking to it, and all is good. The "regression" here > >> is that we now properly can determine that the hardware is crap. > > > > Before 4.12, when I unplugged my USB3 Flash drive, Linux would > > detect a few "Uncorrected Non-Fatal errors" via AER, but it was > > still possible to plug the drive back in. > > > > Since 4.12, once I unplug the drive, the whole USB3 card is marked > > as dead (all 4 ports), and I can no longer plug anything in (not even > > the USB2 drive that didn't have any issues, IIRC). > > > > It seems a bit premature to "mark as dead" something that remains > > functional, doesn't it? > > > > Disclaimer, there are many variables in this setup, and I've only > > tested a small fraction of the problem space: only one system, > > only one USB3 board, only one USB3 Flash drive. > > > > Please don't forget to mention that this is quirky hardware that > depends on BROKEN because it multiplexes MMIO and config space > accesses in the same memory window without any locking whatsoever > (which would be difficult to do in the first place because we don't > use accessors for MMIO in the kernel). > > So how likely is it that you are attempting to read from the xhci BAR > window while a config space access is in progress? Any way to > instrument this in your driver? Seriously? Ok, that's crap hardware, sorry, I don't feel bad at all here. You are going to have worse problems than just a single USB controller issue if that's your hardware design, go kick some hardware engineers for me please. good luck, you are on your own :( greg k-h ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: Possible regression between 4.9 and 4.13 2017-08-30 9:07 ` Ard Biesheuvel @ 2017-08-30 9:37 ` Mason -1 siblings, 0 replies; 60+ messages in thread From: Mason @ 2017-08-30 9:37 UTC (permalink / raw) To: Ard Biesheuvel Cc: Greg Kroah-Hartman, Lukas Wunner, Mathias Nyman, Felipe Balbi, linux-pci, linux-usb, Bjorn Helgaas, Alan Stern, Linux ARM On 30/08/2017 11:07, Ard Biesheuvel wrote: > Please don't forget to mention that this is quirky hardware that > depends on BROKEN because it multiplexes MMIO and config space > accesses in the same memory window without any locking whatsoever > (which would be difficult to do in the first place because we don't > use accessors for MMIO in the kernel). You're right, it was in the back of my mind, but I didn't state it explicitly for the benefit of linux-usb readers. > So how likely is it that you are attempting to read from the xhci BAR > window while a config space access is in progress? Any way to > instrument this in your driver? I logged config space accesses here: https://www.spinics.net/lists/arm-kernel/msg602832.html IIRC, the config space accesses are generated by the AER ISR. So disabling the AER driver should guarantee that no config space accesses are occurring when the drive is unplugged. Regards. ^ permalink raw reply [flat|nested] 60+ messages in thread
* Possible regression between 4.9 and 4.13 @ 2017-08-30 9:37 ` Mason 0 siblings, 0 replies; 60+ messages in thread From: Mason @ 2017-08-30 9:37 UTC (permalink / raw) To: linux-arm-kernel On 30/08/2017 11:07, Ard Biesheuvel wrote: > Please don't forget to mention that this is quirky hardware that > depends on BROKEN because it multiplexes MMIO and config space > accesses in the same memory window without any locking whatsoever > (which would be difficult to do in the first place because we don't > use accessors for MMIO in the kernel). You're right, it was in the back of my mind, but I didn't state it explicitly for the benefit of linux-usb readers. > So how likely is it that you are attempting to read from the xhci BAR > window while a config space access is in progress? Any way to > instrument this in your driver? I logged config space accesses here: https://www.spinics.net/lists/arm-kernel/msg602832.html IIRC, the config space accesses are generated by the AER ISR. So disabling the AER driver should guarantee that no config space accesses are occurring when the drive is unplugged. Regards. ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: Possible regression between 4.9 and 4.13 2017-08-30 9:37 ` Mason @ 2017-08-31 9:17 ` Mason -1 siblings, 0 replies; 60+ messages in thread From: Mason @ 2017-08-31 9:17 UTC (permalink / raw) To: Ard Biesheuvel, Greg Kroah-Hartman Cc: Lukas Wunner, Mathias Nyman, Felipe Balbi, linux-pci, linux-usb, Bjorn Helgaas, Alan Stern, Linux ARM On 30/08/2017 11:37, Mason wrote: > On 30/08/2017 11:07, Ard Biesheuvel wrote: > >> Please don't forget to mention that this is quirky hardware that >> depends on BROKEN because it multiplexes MMIO and config space >> accesses in the same memory window without any locking whatsoever >> (which would be difficult to do in the first place because we don't >> use accessors for MMIO in the kernel). > > You're right, it was in the back of my mind, but I didn't state > it explicitly for the benefit of linux-usb readers. > >> So how likely is it that you are attempting to read from the xhci >> BAR window while a config space access is in progress? Any way to >> instrument this in your driver? > > I logged config space accesses here: > > https://www.spinics.net/lists/arm-kernel/msg602832.html > > IIRC, the config space accesses are generated by the AER ISR. > So disabling the AER driver should guarantee that no config space > accesses are occurring when the drive is unplugged. I checked, and I *did* remember correctly. Disabling the AER driver results in 0 config space access occurring when the USB3 drive is unplugged. This confirms that the controller's broken design (muxing config and mem space) is not responsible for the glitches occurring on unplug events. Furthermore, I confirm that once the controller has been deemed "dead", even USB2 drives are no longer detected, and all USB port on the PCIe board are disabled. Regards. For reads/writes in config space, I have: if (do_debug) { printk("\t READ: bus=%d devfn=%u where=%d size=%d val=0x%x\n", bus->number, devfn, where, size, *val); dump_stack(); } if (do_debug) { printk("\tWRITE: bus=%d devfn=%u where=%d size=%d val=0x%x\n", bus->number, devfn, where, size, val); dump_stack(); } During setup I do get, e.g. [ 7.621417] READ: bus=1 devfn=0 where=84 size=2 val=0x8 [ 7.626840] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G C 4.12.0-rc1 #2 [ 7.634358] Hardware name: Sigma Tango DT [ 7.638387] [<c010e8b4>] (unwind_backtrace) from [<c010ac00>] (show_stack+0x10/0x14) [ 7.646171] [<c010ac00>] (show_stack) from [<c0257a30>] (dump_stack+0x84/0x98) [ 7.653429] [<c0257a30>] (dump_stack) from [<c029cb34>] (smp8759_config_read+0xa0/0xa4) [ 7.661474] [<c029cb34>] (smp8759_config_read) from [<c0282908>] (pci_bus_read_config_word+0x6c/0x94) [ 7.670742] [<c0282908>] (pci_bus_read_config_word) from [<c0282cfc>] (pci_read_config_word+0x24/0x38) [ 7.680097] [<c0282cfc>] (pci_read_config_word) from [<c028c5c0>] (__pci_dev_reset+0x11c/0x2fc) [ 7.688841] [<c028c5c0>] (__pci_dev_reset) from [<c028c9c4>] (pci_probe_reset_function+0xc/0x10) [ 7.697673] [<c028c9c4>] (pci_probe_reset_function) from [<c028f720>] (pci_create_sysfs_dev_files+0x2a8/0x374) [ 7.707728] [<c028f720>] (pci_create_sysfs_dev_files) from [<c0515cf8>] (pci_sysfs_init+0x34/0x54) [ 7.716734] [<c0515cf8>] (pci_sysfs_init) from [<c010175c>] (do_one_initcall+0x44/0x168) [ 7.724867] [<c010175c>] (do_one_initcall) from [<c0500dd8>] (kernel_init_freeable+0x15c/0x1e8) [ 7.733611] [<c0500dd8>] (kernel_init_freeable) from [<c0332348>] (kernel_init+0x8/0x108) [ 7.741831] [<c0332348>] (kernel_init) from [<c01076f8>] (ret_from_fork+0x14/0x3c) On plug/unplug events, there are no config space accesses: [ 88.006750] usb 2-2: new SuperSpeed USB device number 2 using xhci_hcd [ 88.040179] usb 2-2: New USB device found, idVendor=0951, idProduct=1666 [ 88.046930] usb 2-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3 [ 88.054177] usb 2-2: Product: DataTraveler 3.0 [ 88.058684] usb 2-2: Manufacturer: Kingston [ 88.062927] usb 2-2: SerialNumber: 002618887865F0C0F8646BFA [ 88.071523] usb-storage 2-2:1.0: USB Mass Storage device detected [ 88.081334] scsi host0: usb-storage 2-2:1.0 [ 89.096074] scsi 0:0:0:0: Direct-Access Kingston DataTraveler 3.0 PQ: 0 ANSI: 6 [ 89.104828] sd 0:0:0:0: [sda] 15109516 512-byte logical blocks: (7.74 GB/7.20 GiB) [ 89.112996] sd 0:0:0:0: [sda] Write Protect is off [ 89.118060] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA [ 89.129463] sda: sda1 [ 89.133104] sd 0:0:0:0: [sda] Attached SCSI removable disk [ 103.375210] xhci_hcd 0000:01:00.0: xHCI host controller not responding, assume dead [ 103.382917] xhci_hcd 0000:01:00.0: HC died; cleaning up [ 103.388281] usb 2-2: USB disconnect, device number 2 ^ permalink raw reply [flat|nested] 60+ messages in thread
* Possible regression between 4.9 and 4.13 @ 2017-08-31 9:17 ` Mason 0 siblings, 0 replies; 60+ messages in thread From: Mason @ 2017-08-31 9:17 UTC (permalink / raw) To: linux-arm-kernel On 30/08/2017 11:37, Mason wrote: > On 30/08/2017 11:07, Ard Biesheuvel wrote: > >> Please don't forget to mention that this is quirky hardware that >> depends on BROKEN because it multiplexes MMIO and config space >> accesses in the same memory window without any locking whatsoever >> (which would be difficult to do in the first place because we don't >> use accessors for MMIO in the kernel). > > You're right, it was in the back of my mind, but I didn't state > it explicitly for the benefit of linux-usb readers. > >> So how likely is it that you are attempting to read from the xhci >> BAR window while a config space access is in progress? Any way to >> instrument this in your driver? > > I logged config space accesses here: > > https://www.spinics.net/lists/arm-kernel/msg602832.html > > IIRC, the config space accesses are generated by the AER ISR. > So disabling the AER driver should guarantee that no config space > accesses are occurring when the drive is unplugged. I checked, and I *did* remember correctly. Disabling the AER driver results in 0 config space access occurring when the USB3 drive is unplugged. This confirms that the controller's broken design (muxing config and mem space) is not responsible for the glitches occurring on unplug events. Furthermore, I confirm that once the controller has been deemed "dead", even USB2 drives are no longer detected, and all USB port on the PCIe board are disabled. Regards. For reads/writes in config space, I have: if (do_debug) { printk("\t READ: bus=%d devfn=%u where=%d size=%d val=0x%x\n", bus->number, devfn, where, size, *val); dump_stack(); } if (do_debug) { printk("\tWRITE: bus=%d devfn=%u where=%d size=%d val=0x%x\n", bus->number, devfn, where, size, val); dump_stack(); } During setup I do get, e.g. [ 7.621417] READ: bus=1 devfn=0 where=84 size=2 val=0x8 [ 7.626840] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G C 4.12.0-rc1 #2 [ 7.634358] Hardware name: Sigma Tango DT [ 7.638387] [<c010e8b4>] (unwind_backtrace) from [<c010ac00>] (show_stack+0x10/0x14) [ 7.646171] [<c010ac00>] (show_stack) from [<c0257a30>] (dump_stack+0x84/0x98) [ 7.653429] [<c0257a30>] (dump_stack) from [<c029cb34>] (smp8759_config_read+0xa0/0xa4) [ 7.661474] [<c029cb34>] (smp8759_config_read) from [<c0282908>] (pci_bus_read_config_word+0x6c/0x94) [ 7.670742] [<c0282908>] (pci_bus_read_config_word) from [<c0282cfc>] (pci_read_config_word+0x24/0x38) [ 7.680097] [<c0282cfc>] (pci_read_config_word) from [<c028c5c0>] (__pci_dev_reset+0x11c/0x2fc) [ 7.688841] [<c028c5c0>] (__pci_dev_reset) from [<c028c9c4>] (pci_probe_reset_function+0xc/0x10) [ 7.697673] [<c028c9c4>] (pci_probe_reset_function) from [<c028f720>] (pci_create_sysfs_dev_files+0x2a8/0x374) [ 7.707728] [<c028f720>] (pci_create_sysfs_dev_files) from [<c0515cf8>] (pci_sysfs_init+0x34/0x54) [ 7.716734] [<c0515cf8>] (pci_sysfs_init) from [<c010175c>] (do_one_initcall+0x44/0x168) [ 7.724867] [<c010175c>] (do_one_initcall) from [<c0500dd8>] (kernel_init_freeable+0x15c/0x1e8) [ 7.733611] [<c0500dd8>] (kernel_init_freeable) from [<c0332348>] (kernel_init+0x8/0x108) [ 7.741831] [<c0332348>] (kernel_init) from [<c01076f8>] (ret_from_fork+0x14/0x3c) On plug/unplug events, there are no config space accesses: [ 88.006750] usb 2-2: new SuperSpeed USB device number 2 using xhci_hcd [ 88.040179] usb 2-2: New USB device found, idVendor=0951, idProduct=1666 [ 88.046930] usb 2-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3 [ 88.054177] usb 2-2: Product: DataTraveler 3.0 [ 88.058684] usb 2-2: Manufacturer: Kingston [ 88.062927] usb 2-2: SerialNumber: 002618887865F0C0F8646BFA [ 88.071523] usb-storage 2-2:1.0: USB Mass Storage device detected [ 88.081334] scsi host0: usb-storage 2-2:1.0 [ 89.096074] scsi 0:0:0:0: Direct-Access Kingston DataTraveler 3.0 PQ: 0 ANSI: 6 [ 89.104828] sd 0:0:0:0: [sda] 15109516 512-byte logical blocks: (7.74 GB/7.20 GiB) [ 89.112996] sd 0:0:0:0: [sda] Write Protect is off [ 89.118060] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA [ 89.129463] sda: sda1 [ 89.133104] sd 0:0:0:0: [sda] Attached SCSI removable disk [ 103.375210] xhci_hcd 0000:01:00.0: xHCI host controller not responding, assume dead [ 103.382917] xhci_hcd 0000:01:00.0: HC died; cleaning up [ 103.388281] usb 2-2: USB disconnect, device number 2 ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: Possible regression between 4.9 and 4.13 2017-08-31 9:17 ` Mason @ 2017-08-31 11:38 ` Mathias Nyman -1 siblings, 0 replies; 60+ messages in thread From: Mathias Nyman @ 2017-08-31 11:38 UTC (permalink / raw) To: Mason, Ard Biesheuvel, Greg Kroah-Hartman Cc: Lukas Wunner, Felipe Balbi, linux-pci, linux-usb, Bjorn Helgaas, Alan Stern, Linux ARM On 31.08.2017 12:17, Mason wrote: > On 30/08/2017 11:37, Mason wrote: > >> On 30/08/2017 11:07, Ard Biesheuvel wrote: >> >>> Please don't forget to mention that this is quirky hardware that >>> depends on BROKEN because it multiplexes MMIO and config space >>> accesses in the same memory window without any locking whatsoever >>> (which would be difficult to do in the first place because we don't >>> use accessors for MMIO in the kernel). >> >> You're right, it was in the back of my mind, but I didn't state >> it explicitly for the benefit of linux-usb readers. >> >>> So how likely is it that you are attempting to read from the xhci >>> BAR window while a config space access is in progress? Any way to >>> instrument this in your driver? >> >> I logged config space accesses here: >> >> https://www.spinics.net/lists/arm-kernel/msg602832.html >> >> IIRC, the config space accesses are generated by the AER ISR. >> So disabling the AER driver should guarantee that no config space >> accesses are occurring when the drive is unplugged. > > I checked, and I *did* remember correctly. > > Disabling the AER driver results in 0 config space access occurring > when the USB3 drive is unplugged. This confirms that the controller's > broken design (muxing config and mem space) is not responsible for > the glitches occurring on unplug events. > > Furthermore, I confirm that once the controller has been deemed "dead", > even USB2 drives are no longer detected, and all USB port on the PCIe > board are disabled. xhci handles both USB3 and USB2, If there is only a xhci in use then all usb ports will be disabled. Many systems have both ehci and xhci, where ehci handles USB2 side. I'm guessing yours only have the xhci. -Mathias ^ permalink raw reply [flat|nested] 60+ messages in thread
* Possible regression between 4.9 and 4.13 @ 2017-08-31 11:38 ` Mathias Nyman 0 siblings, 0 replies; 60+ messages in thread From: Mathias Nyman @ 2017-08-31 11:38 UTC (permalink / raw) To: linux-arm-kernel On 31.08.2017 12:17, Mason wrote: > On 30/08/2017 11:37, Mason wrote: > >> On 30/08/2017 11:07, Ard Biesheuvel wrote: >> >>> Please don't forget to mention that this is quirky hardware that >>> depends on BROKEN because it multiplexes MMIO and config space >>> accesses in the same memory window without any locking whatsoever >>> (which would be difficult to do in the first place because we don't >>> use accessors for MMIO in the kernel). >> >> You're right, it was in the back of my mind, but I didn't state >> it explicitly for the benefit of linux-usb readers. >> >>> So how likely is it that you are attempting to read from the xhci >>> BAR window while a config space access is in progress? Any way to >>> instrument this in your driver? >> >> I logged config space accesses here: >> >> https://www.spinics.net/lists/arm-kernel/msg602832.html >> >> IIRC, the config space accesses are generated by the AER ISR. >> So disabling the AER driver should guarantee that no config space >> accesses are occurring when the drive is unplugged. > > I checked, and I *did* remember correctly. > > Disabling the AER driver results in 0 config space access occurring > when the USB3 drive is unplugged. This confirms that the controller's > broken design (muxing config and mem space) is not responsible for > the glitches occurring on unplug events. > > Furthermore, I confirm that once the controller has been deemed "dead", > even USB2 drives are no longer detected, and all USB port on the PCIe > board are disabled. xhci handles both USB3 and USB2, If there is only a xhci in use then all usb ports will be disabled. Many systems have both ehci and xhci, where ehci handles USB2 side. I'm guessing yours only have the xhci. -Mathias ^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: Possible regression between 4.9 and 4.13 2017-08-23 7:51 ` Mathias Nyman @ 2017-08-23 10:19 ` Mason -1 siblings, 0 replies; 60+ messages in thread From: Mason @ 2017-08-23 10:19 UTC (permalink / raw) To: Mathias Nyman, Felipe Balbi, linux-pci, linux-usb, Linux ARM Cc: Bjorn Helgaas, Alan Stern, Greg Kroah-Hartman, Jon Derrick, Keith Busch On 23/08/2017 09:51, Mathias Nyman wrote: > very likely cause is the more aggressive detection of pci removed xhci hosts > > See commit d9f11ba9f107aa335091ab8d7ba5eea714e46e8b > xhci: Rework how we handle unresponsive or hoptlug removed hosts > > It checks if a xhci register reads returns 0xffffffff and assumes xhci > died in that case. I've just tested 4.11.12 + a few local patches to back-port PCIe host bridge support. It "works" as well as 4.9 (i.e. modulo the "AER: Uncorrected (Non-Fatal) error received") [ 0.508533] pcie_tango 50000000.pcie: simultaneous PCI config and MMIO accesses may cause data corruption [ 0.519622] OF: PCI: host bridge /soc/pcie@2e000 ranges: [ 0.519645] OF: PCI: MEM 0x50400000..0x53ffffff -> 0x00400000 [ 0.519725] pcie_tango 50000000.pcie: ECAM at [mem 0x50000000-0x503fffff] for [bus 00-03] [ 0.519872] pcie_tango 50000000.pcie: PCI host bridge to bus 0000:00 [ 0.519886] pci_bus 0000:00: root bus resource [bus 00-03] [ 0.519898] pci_bus 0000:00: root bus resource [mem 0x50400000-0x53ffffff] (bus address [0x00400000-0x03ffffff]) [ 0.520201] PCI: bus0: Fast back to back transfers disabled [ 0.520213] pci 0000:00:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring [ 0.520922] PCI: bus1: Fast back to back transfers disabled [ 0.520964] pci 0000:00:00.0: of_irq_parse_pci: failed with rc=-22 [ 0.520993] pci 0000:00:00.0: BAR 8: assigned [mem 0x50400000-0x504fffff] [ 0.521004] pci 0000:01:00.0: BAR 0: assigned [mem 0x50400000-0x50401fff 64bit] [ 0.521025] pci 0000:00:00.0: PCI bridge to [bus 01] [ 0.521033] pci 0000:00:00.0: bridge window [mem 0x50400000-0x504fffff] [ 0.521085] pcieport 0000:00:00.0: enabling device (0140 -> 0142) [ 0.521282] pcieport 0000:00:00.0: Signaling PME with IRQ 30 [ 0.521402] pcieport 0000:00:00.0: AER enabled with IRQ 30 [ 0.521526] pci 0000:01:00.0: enabling device (0140 -> 0142) ... [ 1.239706] xhci_hcd 0000:01:00.0: xHCI Host Controller [ 1.244998] xhci_hcd 0000:01:00.0: new USB bus registered, assigned bus number 1 [ 1.258048] xhci_hcd 0000:01:00.0: hcc params 0x014051cf hci version 0x100 quirks 0x00000010 [ 1.267467] hub 1-0:1.0: USB hub found [ 1.271287] hub 1-0:1.0: 4 ports detected [ 1.275761] xhci_hcd 0000:01:00.0: xHCI Host Controller [ 1.281048] xhci_hcd 0000:01:00.0: new USB bus registered, assigned bus number 2 [ 1.288578] usb usb2: We don't know the algorithms for LPM for this host, disabling LPM. [ 1.297234] hub 2-0:1.0: USB hub found [ 1.301042] hub 2-0:1.0: 4 ports detected [ 1.305681] usbcore: registered new interface driver usb-storage PLUG #1 [ 26.104607] usb 2-2: new SuperSpeed USB device number 2 using xhci_hcd [ 26.143799] usb-storage 2-2:1.0: USB Mass Storage device detected [ 26.150253] scsi host0: usb-storage 2-2:1.0 [ 27.177298] scsi 0:0:0:0: Direct-Access Kingston DataTraveler 3.0 PQ: 0 ANSI: 6 [ 27.187586] sd 0:0:0:0: [sda] 15109516 512-byte logical blocks: (7.74 GB/7.20 GiB) [ 27.199000] sd 0:0:0:0: [sda] Write Protect is off [ 27.204186] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA [ 27.216322] sda: sda1 [ 27.220584] sd 0:0:0:0: [sda] Attached SCSI removable disk [ 27.252046] random: fast init done UNPLUG #1 [ 37.334040] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 [ 37.342135] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) [ 37.353970] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 [ 37.362589] pcieport 0000:00:00.0: [14] Completion Timeout (First) [ 37.369485] pcieport 0000:00:00.0: AER: Device recovery failed [ 38.066538] xhci_hcd 0000:01:00.0: Cannot set link state. [ 38.072039] usb usb2-port2: cannot disable (err = -32) [ 38.077348] usb 2-2: USB disconnect, device number 2 [ 38.082711] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 [ 38.094279] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) [ 38.108006] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 [ 38.116878] pcieport 0000:00:00.0: [14] Completion Timeout (First) [ 38.123954] pcieport 0000:00:00.0: AER: Device recovery failed PLUG #2 [ 55.097922] usb 2-2: new SuperSpeed USB device number 3 using xhci_hcd [ 55.137590] usb-storage 2-2:1.0: USB Mass Storage device detected [ 55.144016] scsi host0: usb-storage 2-2:1.0 [ 56.163907] scsi 0:0:0:0: Direct-Access Kingston DataTraveler 3.0 PQ: 0 ANSI: 6 [ 56.174851] sd 0:0:0:0: [sda] 15109516 512-byte logical blocks: (7.74 GB/7.20 GiB) [ 56.184218] sd 0:0:0:0: [sda] Write Protect is off [ 56.190162] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA [ 56.202117] sda: sda1 [ 56.207112] sd 0:0:0:0: [sda] Attached SCSI removable disk UNPLUG #2 [ 63.228310] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 [ 63.236403] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) [ 63.248220] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 [ 63.256653] pcieport 0000:00:00.0: [14] Completion Timeout (First) [ 63.263523] pcieport 0000:00:00.0: AER: Device recovery failed [ 63.959768] xhci_hcd 0000:01:00.0: Cannot set link state. [ 63.965227] usb usb2-port2: cannot disable (err = -32) [ 63.970409] usb 2-2: USB disconnect, device number 3 [ 63.975664] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 [ 63.987356] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) [ 64.000021] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 [ 64.008655] pcieport 0000:00:00.0: [14] Completion Timeout (First) [ 64.015553] pcieport 0000:00:00.0: AER: Device recovery failed [ 64.021449] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 [ 64.029580] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) [ 64.041410] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 [ 64.049818] pcieport 0000:00:00.0: [14] Completion Timeout (First) [ 64.056658] pcieport 0000:00:00.0: AER: Device recovery failed Bjorn, What do you make of the AER logs? What can I do to debug this issue? Regards. FWIW, verbose lspci output below. # lspci -vv 00:00.0 PCI bridge: Sigma Designs, Inc. Device 0024 (rev 01) (prog-if 00 [Normal decode]) Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR+ <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin ? routed to IRQ 30 Region 0: Memory at <ignored> (64-bit, non-prefetchable) Bus: primary=00, secondary=01, subordinate=01, sec-latency=0 I/O behind bridge: 00000000-00000fff Memory behind bridge: 00400000-004fffff Prefetchable memory behind bridge: 00000000-000fffff Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR- BridgeCtl: Parity+ SERR- NoISA- VGA- MAbort- >Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: [50] MSI: Enable+ Count=1/4 Maskable- 64bit+ Address: 00000000a002e07c Data: 0000 Capabilities: [78] Power Management version 3 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold-) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=3 PME- Capabilities: [80] Express (v2) Root Port (Slot-), MSI 03 DevCap: MaxPayload 256 bytes, PhantFunc 0 ExtTag- RBE+ DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+ RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ MaxPayload 128 bytes, MaxReadReq 512 bytes DevSta: CorrErr+ UncorrErr+ FatalErr- UnsuppReq- AuxPwr- TransPend+ LnkCap: Port #1, Speed 5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <2us, L1 <4us ClockPM- Surprise- LLActRep- BwNot+ ASPMOptComp- LnkCtl: ASPM Disabled; RCB 128 bytes Disabled- CommClk- ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 5GT/s, Width x1, TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt- RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna+ CRSVisible- RootCap: CRSVisible- RootSta: PME ReqID 0000, PMEStatus- PMEPending- DevCap2: Completion Timeout: Range B, TimeoutDis-, LTR-, OBFF Not Supported ARIFwd- DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled ARIFwd- LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1- EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- Capabilities: [100 v1] Virtual Channel Caps: LPEVC=0 RefClk=100ns PATEntryBits=1 Arb: Fixed- WRR32- WRR64- WRR128- Ctrl: ArbSelect=Fixed Status: InProgress- VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans- Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256- Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=ff Status: NegoPending- InProgress- Capabilities: [800 v1] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO+ CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ AERCap: First Error Pointer: 0e, GenCap- CGenEn- ChkCap- ChkEn- Kernel driver in use: pcieport 01:00.0 USB controller: Renesas Technology Corp. uPD720201 USB 3.0 Host Controller (rev 03) (prog-if 30 [XHCI]) Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 0 Region 0: Memory at 50400000 (64-bit, non-prefetchable) [size=8K] Capabilities: [50] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [70] MSI: Enable- Count=1/8 Maskable- 64bit+ Address: 0000000000000000 Data: 0000 Capabilities: [90] MSI-X: Enable+ Count=8 Masked- Vector table: BAR=0 offset=00001000 PBA: BAR=0 offset=00001080 Capabilities: [a0] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0.000W DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+ MaxPayload 128 bytes, MaxReadReq 512 bytes DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend- LnkCap: Port #0, Speed 5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <4us, L1 unlimited ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp- LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk- ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Not Supported, TimeoutDis+, LTR+, OBFF Not Supported DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1- EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- Capabilities: [100 v1] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr- CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn- Capabilities: [150 v1] Latency Tolerance Reporting Max snoop latency: 0ns Max no snoop latency: 0ns Kernel driver in use: xhci_hcd ^ permalink raw reply [flat|nested] 60+ messages in thread
* Possible regression between 4.9 and 4.13 @ 2017-08-23 10:19 ` Mason 0 siblings, 0 replies; 60+ messages in thread From: Mason @ 2017-08-23 10:19 UTC (permalink / raw) To: linux-arm-kernel On 23/08/2017 09:51, Mathias Nyman wrote: > very likely cause is the more aggressive detection of pci removed xhci hosts > > See commit d9f11ba9f107aa335091ab8d7ba5eea714e46e8b > xhci: Rework how we handle unresponsive or hoptlug removed hosts > > It checks if a xhci register reads returns 0xffffffff and assumes xhci > died in that case. I've just tested 4.11.12 + a few local patches to back-port PCIe host bridge support. It "works" as well as 4.9 (i.e. modulo the "AER: Uncorrected (Non-Fatal) error received") [ 0.508533] pcie_tango 50000000.pcie: simultaneous PCI config and MMIO accesses may cause data corruption [ 0.519622] OF: PCI: host bridge /soc/pcie at 2e000 ranges: [ 0.519645] OF: PCI: MEM 0x50400000..0x53ffffff -> 0x00400000 [ 0.519725] pcie_tango 50000000.pcie: ECAM at [mem 0x50000000-0x503fffff] for [bus 00-03] [ 0.519872] pcie_tango 50000000.pcie: PCI host bridge to bus 0000:00 [ 0.519886] pci_bus 0000:00: root bus resource [bus 00-03] [ 0.519898] pci_bus 0000:00: root bus resource [mem 0x50400000-0x53ffffff] (bus address [0x00400000-0x03ffffff]) [ 0.520201] PCI: bus0: Fast back to back transfers disabled [ 0.520213] pci 0000:00:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring [ 0.520922] PCI: bus1: Fast back to back transfers disabled [ 0.520964] pci 0000:00:00.0: of_irq_parse_pci: failed with rc=-22 [ 0.520993] pci 0000:00:00.0: BAR 8: assigned [mem 0x50400000-0x504fffff] [ 0.521004] pci 0000:01:00.0: BAR 0: assigned [mem 0x50400000-0x50401fff 64bit] [ 0.521025] pci 0000:00:00.0: PCI bridge to [bus 01] [ 0.521033] pci 0000:00:00.0: bridge window [mem 0x50400000-0x504fffff] [ 0.521085] pcieport 0000:00:00.0: enabling device (0140 -> 0142) [ 0.521282] pcieport 0000:00:00.0: Signaling PME with IRQ 30 [ 0.521402] pcieport 0000:00:00.0: AER enabled with IRQ 30 [ 0.521526] pci 0000:01:00.0: enabling device (0140 -> 0142) ... [ 1.239706] xhci_hcd 0000:01:00.0: xHCI Host Controller [ 1.244998] xhci_hcd 0000:01:00.0: new USB bus registered, assigned bus number 1 [ 1.258048] xhci_hcd 0000:01:00.0: hcc params 0x014051cf hci version 0x100 quirks 0x00000010 [ 1.267467] hub 1-0:1.0: USB hub found [ 1.271287] hub 1-0:1.0: 4 ports detected [ 1.275761] xhci_hcd 0000:01:00.0: xHCI Host Controller [ 1.281048] xhci_hcd 0000:01:00.0: new USB bus registered, assigned bus number 2 [ 1.288578] usb usb2: We don't know the algorithms for LPM for this host, disabling LPM. [ 1.297234] hub 2-0:1.0: USB hub found [ 1.301042] hub 2-0:1.0: 4 ports detected [ 1.305681] usbcore: registered new interface driver usb-storage PLUG #1 [ 26.104607] usb 2-2: new SuperSpeed USB device number 2 using xhci_hcd [ 26.143799] usb-storage 2-2:1.0: USB Mass Storage device detected [ 26.150253] scsi host0: usb-storage 2-2:1.0 [ 27.177298] scsi 0:0:0:0: Direct-Access Kingston DataTraveler 3.0 PQ: 0 ANSI: 6 [ 27.187586] sd 0:0:0:0: [sda] 15109516 512-byte logical blocks: (7.74 GB/7.20 GiB) [ 27.199000] sd 0:0:0:0: [sda] Write Protect is off [ 27.204186] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA [ 27.216322] sda: sda1 [ 27.220584] sd 0:0:0:0: [sda] Attached SCSI removable disk [ 27.252046] random: fast init done UNPLUG #1 [ 37.334040] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 [ 37.342135] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) [ 37.353970] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 [ 37.362589] pcieport 0000:00:00.0: [14] Completion Timeout (First) [ 37.369485] pcieport 0000:00:00.0: AER: Device recovery failed [ 38.066538] xhci_hcd 0000:01:00.0: Cannot set link state. [ 38.072039] usb usb2-port2: cannot disable (err = -32) [ 38.077348] usb 2-2: USB disconnect, device number 2 [ 38.082711] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 [ 38.094279] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) [ 38.108006] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 [ 38.116878] pcieport 0000:00:00.0: [14] Completion Timeout (First) [ 38.123954] pcieport 0000:00:00.0: AER: Device recovery failed PLUG #2 [ 55.097922] usb 2-2: new SuperSpeed USB device number 3 using xhci_hcd [ 55.137590] usb-storage 2-2:1.0: USB Mass Storage device detected [ 55.144016] scsi host0: usb-storage 2-2:1.0 [ 56.163907] scsi 0:0:0:0: Direct-Access Kingston DataTraveler 3.0 PQ: 0 ANSI: 6 [ 56.174851] sd 0:0:0:0: [sda] 15109516 512-byte logical blocks: (7.74 GB/7.20 GiB) [ 56.184218] sd 0:0:0:0: [sda] Write Protect is off [ 56.190162] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA [ 56.202117] sda: sda1 [ 56.207112] sd 0:0:0:0: [sda] Attached SCSI removable disk UNPLUG #2 [ 63.228310] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 [ 63.236403] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) [ 63.248220] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 [ 63.256653] pcieport 0000:00:00.0: [14] Completion Timeout (First) [ 63.263523] pcieport 0000:00:00.0: AER: Device recovery failed [ 63.959768] xhci_hcd 0000:01:00.0: Cannot set link state. [ 63.965227] usb usb2-port2: cannot disable (err = -32) [ 63.970409] usb 2-2: USB disconnect, device number 3 [ 63.975664] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 [ 63.987356] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) [ 64.000021] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 [ 64.008655] pcieport 0000:00:00.0: [14] Completion Timeout (First) [ 64.015553] pcieport 0000:00:00.0: AER: Device recovery failed [ 64.021449] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 [ 64.029580] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) [ 64.041410] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 [ 64.049818] pcieport 0000:00:00.0: [14] Completion Timeout (First) [ 64.056658] pcieport 0000:00:00.0: AER: Device recovery failed Bjorn, What do you make of the AER logs? What can I do to debug this issue? Regards. FWIW, verbose lspci output below. # lspci -vv 00:00.0 PCI bridge: Sigma Designs, Inc. Device 0024 (rev 01) (prog-if 00 [Normal decode]) Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR+ <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin ? routed to IRQ 30 Region 0: Memory@<ignored> (64-bit, non-prefetchable) Bus: primary=00, secondary=01, subordinate=01, sec-latency=0 I/O behind bridge: 00000000-00000fff Memory behind bridge: 00400000-004fffff Prefetchable memory behind bridge: 00000000-000fffff Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR- BridgeCtl: Parity+ SERR- NoISA- VGA- MAbort- >Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: [50] MSI: Enable+ Count=1/4 Maskable- 64bit+ Address: 00000000a002e07c Data: 0000 Capabilities: [78] Power Management version 3 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold-) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=3 PME- Capabilities: [80] Express (v2) Root Port (Slot-), MSI 03 DevCap: MaxPayload 256 bytes, PhantFunc 0 ExtTag- RBE+ DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+ RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ MaxPayload 128 bytes, MaxReadReq 512 bytes DevSta: CorrErr+ UncorrErr+ FatalErr- UnsuppReq- AuxPwr- TransPend+ LnkCap: Port #1, Speed 5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <2us, L1 <4us ClockPM- Surprise- LLActRep- BwNot+ ASPMOptComp- LnkCtl: ASPM Disabled; RCB 128 bytes Disabled- CommClk- ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 5GT/s, Width x1, TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt- RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna+ CRSVisible- RootCap: CRSVisible- RootSta: PME ReqID 0000, PMEStatus- PMEPending- DevCap2: Completion Timeout: Range B, TimeoutDis-, LTR-, OBFF Not Supported ARIFwd- DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled ARIFwd- LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1- EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- Capabilities: [100 v1] Virtual Channel Caps: LPEVC=0 RefClk=100ns PATEntryBits=1 Arb: Fixed- WRR32- WRR64- WRR128- Ctrl: ArbSelect=Fixed Status: InProgress- VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans- Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256- Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=ff Status: NegoPending- InProgress- Capabilities: [800 v1] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO+ CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ AERCap: First Error Pointer: 0e, GenCap- CGenEn- ChkCap- ChkEn- Kernel driver in use: pcieport 01:00.0 USB controller: Renesas Technology Corp. uPD720201 USB 3.0 Host Controller (rev 03) (prog-if 30 [XHCI]) Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 0 Region 0: Memory@50400000 (64-bit, non-prefetchable) [size=8K] Capabilities: [50] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [70] MSI: Enable- Count=1/8 Maskable- 64bit+ Address: 0000000000000000 Data: 0000 Capabilities: [90] MSI-X: Enable+ Count=8 Masked- Vector table: BAR=0 offset=00001000 PBA: BAR=0 offset=00001080 Capabilities: [a0] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0.000W DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+ MaxPayload 128 bytes, MaxReadReq 512 bytes DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend- LnkCap: Port #0, Speed 5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <4us, L1 unlimited ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp- LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk- ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Not Supported, TimeoutDis+, LTR+, OBFF Not Supported DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1- EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- Capabilities: [100 v1] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr- CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn- Capabilities: [150 v1] Latency Tolerance Reporting Max snoop latency: 0ns Max no snoop latency: 0ns Kernel driver in use: xhci_hcd ^ permalink raw reply [flat|nested] 60+ messages in thread
end of thread, other threads:[~2017-08-31 11:40 UTC | newest] Thread overview: 60+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-08-22 17:34 Possible regression between 4.9 and 4.13 Mason 2017-08-22 17:34 ` Mason 2017-08-23 6:07 ` Felipe Balbi 2017-08-23 6:07 ` Felipe Balbi 2017-08-23 7:51 ` Mathias Nyman 2017-08-23 7:51 ` Mathias Nyman 2017-08-23 9:18 ` Mason 2017-08-23 9:18 ` Mason 2017-08-23 9:31 ` Mason 2017-08-23 9:31 ` Mason 2017-08-23 11:11 ` Mathias Nyman 2017-08-23 11:11 ` Mathias Nyman 2017-08-23 11:54 ` Mason 2017-08-23 11:54 ` Mason 2017-08-23 12:41 ` Mason 2017-08-23 12:41 ` Mason 2017-08-23 14:30 ` Mason 2017-08-23 14:30 ` Mason 2017-08-28 8:39 ` Mathias Nyman 2017-08-28 8:39 ` Mathias Nyman 2017-08-28 14:40 ` Mason 2017-08-28 14:40 ` Mason 2017-08-29 13:28 ` Mathias Nyman 2017-08-29 13:28 ` Mathias Nyman 2017-08-29 13:38 ` Lukas Wunner 2017-08-29 13:38 ` Lukas Wunner 2017-08-29 14:47 ` Greg Kroah-Hartman 2017-08-29 14:47 ` Greg Kroah-Hartman 2017-08-29 15:34 ` Lukas Wunner 2017-08-29 15:34 ` Lukas Wunner 2017-08-29 15:51 ` Greg Kroah-Hartman 2017-08-29 15:51 ` Greg Kroah-Hartman 2017-08-30 6:36 ` Lukas Wunner 2017-08-30 6:36 ` Lukas Wunner 2017-08-30 6:45 ` Greg Kroah-Hartman 2017-08-30 6:45 ` Greg Kroah-Hartman 2017-08-29 23:53 ` Lukas Wunner 2017-08-29 23:53 ` Lukas Wunner 2017-08-30 6:02 ` Greg Kroah-Hartman 2017-08-30 6:02 ` Greg Kroah-Hartman 2017-08-30 8:55 ` Mason 2017-08-30 8:55 ` Mason 2017-08-30 9:06 ` Greg Kroah-Hartman 2017-08-30 9:06 ` Greg Kroah-Hartman 2017-08-31 9:39 ` Mason 2017-08-31 9:39 ` Mason 2017-08-31 11:40 ` Mathias Nyman 2017-08-31 11:40 ` Mathias Nyman 2017-08-30 9:07 ` Ard Biesheuvel 2017-08-30 9:07 ` Ard Biesheuvel 2017-08-30 9:22 ` Greg Kroah-Hartman 2017-08-30 9:22 ` Greg Kroah-Hartman 2017-08-30 9:37 ` Mason 2017-08-30 9:37 ` Mason 2017-08-31 9:17 ` Mason 2017-08-31 9:17 ` Mason 2017-08-31 11:38 ` Mathias Nyman 2017-08-31 11:38 ` Mathias Nyman 2017-08-23 10:19 ` Mason 2017-08-23 10:19 ` Mason
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.