All of lore.kernel.org
 help / color / mirror / Atom feed
* [BUG] FL1009: xHCI host not responding to stop endpoint command.
@ 2014-01-12 20:13 Arnaud Ebalard
  2014-01-12 21:36 ` Arnaud Ebalard
  0 siblings, 1 reply; 28+ messages in thread
From: Arnaud Ebalard @ 2014-01-12 20:13 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

Both with a LaCie 2.5" USB 3.0 Rugged Mini disk (powered by the port)
and a 3.5" SATA disk connected via an ICY DOCK MB981U3S-1S dock station
(external power), I can transfer huge files w/o issue but I get the
following when transferring small files (e.g. copy of some debian
directory to a folder on the disk):

# cp -rf /lib/ /bin/ /sbin/ /opt /var/ /usr/ test/
[  327.130045] xhci_hcd 0000:02:00.0: xHCI host not responding to stop endpoint command.
[  327.137899] xhci_hcd 0000:02:00.0: Assuming host is dying, halting host.
[  327.144644] xhci_hcd 0000:02:00.0: HC died; cleaning up
[  327.150065] usb 3-2: USB disconnect, device number 2
[  327.155190] sd 4:0:0:0: Device offlined - not ready after error recovery
[  327.170110] sd 4:0:0:0: [sdc] Unhandled error code
[  327.174921] sd 4:0:0:0: [sdc]  
[  327.178075] sd 4:0:0:0: [sdc] CDB: 
[  327.181601] end_request: I/O error, dev sdc, sector 4949712
[  327.187188] EXT4-fs warning (device sdc1): ext4_end_bio:317: I/O error writing to inode 7342793 (offset 318767104 size 8388608 starting block 618744)
[  327.200604] Buffer I/O error on device sdc1, logical block 618458
[  327.206711] Buffer I/O error on device sdc1, logical block 618459
[  327.212818] Buffer I/O error on device sdc1, logical block 618460
[  327.218922] Buffer I/O error on device sdc1, logical block 618461
[  327.225030] Buffer I/O error on device sdc1, logical block 618462
[  327.231137] Buffer I/O error on device sdc1, logical block 618463
[  327.237240] Buffer I/O error on device sdc1, logical block 618464
[  327.243347] Buffer I/O error on device sdc1, logical block 618465
[  327.249451] Buffer I/O error on device sdc1, logical block 618466
[  327.255557] Buffer I/O error on device sdc1, logical block 618467
[  327.261717] EXT4-fs warning (device sdc1): ext4_end_bio:317: I/O error writing to inode 7342793 (offset 327155712 size 8388608 starting block 618782)
[  327.275162] EXT4-fs warning (device sdc1): ext4_end_bio:317: I/O error writing to inode 7342793 (offset 327155712 size 8388608 starting block 618812)
[  327.288603] EXT4-fs warning (device sdc1): ext4_end_bio:317: I/O error writing to inode 7342793 (offset 327155712 size 8388608 starting block 618842)
[  327.302043] EXT4-fs warning (device sdc1): ext4_end_bio:317: I/O error writing to inode 7342793 (offset 327155712 size 8388608 starting block 618872)
[  327.315483] EXT4-fs warning (device sdc1): ext4_end_bio:317: I/O error writing to inode 7342793 (offset 327155712 size 8388608 starting block 618902)
[  327.328922] EXT4-fs warning (device sdc1): ext4_end_bio:317: I/O error writing to inode 7342793 (offset 327155712 size 8388608 starting block 618932)
[  327.342362] EXT4-fs warning (device sdc1): ext4_end_bio:317: I/O error writing to inode 7342793 (offset 327155712 size 8388608 starting block 618962)
[  327.355803] EXT4-fs warning (device sdc1): ext4_end_bio:317: I/O error writing to inode 7342793 (offset 327155712 size 8388608 starting block 618992)
[  327.369243] EXT4-fs warning (device sdc1): ext4_end_bio:317: I/O error writing to inode 7342793 (offset 327155712 size 8388608 starting block 619022)
[  327.390294] sd 4:0:0:0: [sdc] Unhandled error code
[  327.395102] sd 4:0:0:0: [sdc]  
[  327.398254] sd 4:0:0:0: [sdc] CDB: 
[  327.401775] end_request: I/O error, dev sdc, sector 4949952
[  327.413642] Aborting journal on device sdc1-8.
[  327.418381] EXT4-fs error (device sdc1): ext4_journal_check_start:56: Detected aborted journal
[  327.427057] EXT4-fs (sdc1): Remounting filesystem read-only
[  327.432645] EXT4-fs (sdc1): previous I/O error to superblock detected
[  327.439119] EXT4-fs (sdc1): ext4_writepages: jbd2_start: 0 pages, ino 7342793; err -30
[  327.447223] JBD2: Error -5 detected when updating journal superblock for sdc1-8.
[  327.465498] sd 4:0:0:0: [sdc] Synchronizing SCSI cache
[  327.470799] sd 4:0:0:0: [sdc]  
[  327.476202] EXT4-fs (sdc1): I/O error while writing superblock


Previous trace is obtained on a NETGEAR ReadyNAS 102 (Marvell Armada 370
based NAS) using current 3.13.0-rc7. The NAS has a Fresco Logic FL1009
XHCI controller: 

$ uname -a
Linux proof 3.13.0-rc7.rn102-00126-g228fdc083b01-dirty #43 Sun Jan 12 15:31:22 CET 2014 armv7l GNU/Linux

$ lspci
00:01.0 PCI bridge: Marvell Technology Group Ltd. Device 7846
00:02.0 PCI bridge: Marvell Technology Group Ltd. Device 7846
01:00.0 SATA controller: Marvell Technology Group Ltd. Device 9170 (rev 12)
02:00.0 USB controller: Fresco Logic FL1009 USB 3.0 Host Controller (rev 02)

$ lspci -vvvv
...
02:00.0 USB controller: Fresco Logic FL1009 USB 3.0 Host Controller (rev 02) (prog-if 30 [XHCI])
        Subsystem: Fresco Logic Device 0000
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 104
        Region 0: Memory at e0100000 (64-bit, non-prefetchable) [size=64K]
        Region 2: Memory at e0110000 (64-bit, non-prefetchable) [size=4K]
        Region 4: Memory@e0111000 (64-bit, non-prefetchable) [size=4K]
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI- D1+ D2- AuxCurrent=375mA PME(D0+,D1+,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [50] MSI: Enable- Count=1/8 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [70] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
                LnkCap: Port #0, Speed 5GT/s, Width x1, ASPM L0s L1, Latency L0 unlimited, L1 unlimited
                        ClockPM- Surprise- LLActRep- BwNot-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk-
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Not Supported, TimeoutDis+
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-
                LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
        Capabilities: [b0] MSI-X: Enable+ Count=8 Masked-
                Vector table: BAR=2 offset=00000000
                PBA: BAR=4 offset=00000000
        Capabilities: [100 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
        Kernel driver in use: xhci_hcd

I decided to do the same tests with the disk connected to a NETGEAR
ReadyNAS Duo v2 (Kirkwood 88F6282 based one) with the same kernel
version. The main difference is the XHCI controller, i.e. a NEC
Corporation uPD720200 as shown below. On the Duo v2, I cannot reproduce
the issue (did the test twice), which *may* indicate the bug is somewhat
related to FL1009 controller (missing quirk, some feature the uPD720200
does not have, etc). 

$ Linux mood 3.13.0-rc7.duov2-00126-g228fdc083b01-dirty #10 Sat Jan 11 15:58:11 CET 2014 armv5tel GNU/Linux

$ lspci
00:01.0 PCI bridge: Marvell Technology Group Ltd. Device 7846
01:00.0 USB controller: NEC Corporation uPD720200 USB 3.0 Host Controller (rev 04)

$lspci -vvvv
...
01:00.0 USB controller: NEC Corporation uPD720200 USB 3.0 Host Controller (rev 04) (prog-if 30 [XHCI])
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 32 bytes
        Interrupt: pin A routed to IRQ 26
        Region 0: Memory@e0000000 (64-bit, non-prefetchable) [size=8K]
        Capabilities: [50] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [70] MSI: Enable- Count=1/8 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [90] MSI-X: Enable- Count=8 Masked-
                Vector table: BAR=0 offset=00001000
                PBA: BAR=0 offset=00001080
        Capabilities: [a0] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
                LnkCap: Port #0, Speed 5GT/s, Width x1, ASPM L0s L1, Latency L0 <4us, L1 unlimited
                        ClockPM+ Surprise- LLActRep- BwNot-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk-
                        ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Not Supported, TimeoutDis+, LTR+, OBFF Not Supported
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
                LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
        Capabilities: [100 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
        Capabilities: [140 v1] Device Serial Number ff-ff-ff-ff-ff-ff-ff-ff
        Capabilities: [150 v1] Latency Tolerance Reporting
                Max snoop latency: 0ns
                Max no snoop latency: 0ns
        Kernel driver in use: xhci_hcd

If you have any idea, or patches, I can test those.

Cheers,

a+

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [BUG] FL1009: xHCI host not responding to stop endpoint command.
  2014-01-12 20:13 [BUG] FL1009: xHCI host not responding to stop endpoint command Arnaud Ebalard
@ 2014-01-12 21:36 ` Arnaud Ebalard
  2014-01-14 17:07   ` Sarah Sharp
  0 siblings, 1 reply; 28+ messages in thread
From: Arnaud Ebalard @ 2014-01-12 21:36 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

arno at natisbad.org (Arnaud Ebalard) writes:

> Both with a LaCie 2.5" USB 3.0 Rugged Mini disk (powered by the port)
> and a 3.5" SATA disk connected via an ICY DOCK MB981U3S-1S dock station
> (external power), I can transfer huge files w/o issue but I get the
> following when transferring small files (e.g. copy of some debian
> directory to a folder on the disk):
>
> # cp -rf /lib/ /bin/ /sbin/ /opt /var/ /usr/ test/
> [  327.130045] xhci_hcd 0000:02:00.0: xHCI host not responding to stop endpoint command.
> [  327.137899] xhci_hcd 0000:02:00.0: Assuming host is dying, halting host.
> [  327.144644] xhci_hcd 0000:02:00.0: HC died; cleaning up
> [  327.150065] usb 3-2: USB disconnect, device number 2
> [  327.155190] sd 4:0:0:0: Device offlined - not ready after error recovery
> [  327.170110] sd 4:0:0:0: [sdc] Unhandled error code
> [  327.174921] sd 4:0:0:0: [sdc]  
> [  327.178075] sd 4:0:0:0: [sdc] CDB: 
> [  327.181601] end_request: I/O error, dev sdc, sector 4949712
> [  327.187188] EXT4-fs warning (device sdc1): ext4_end_bio:317: I/O error writing to inode 7342793 (offset 318767104 size 8388608 starting block 618744)
> [  327.200604] Buffer I/O error on device sdc1, logical block 618458
> [  327.206711] Buffer I/O error on device sdc1, logical block 618459
> [  327.212818] Buffer I/O error on device sdc1, logical block 618460
> [  327.218922] Buffer I/O error on device sdc1, logical block 618461
> [  327.225030] Buffer I/O error on device sdc1, logical block 618462
> [  327.231137] Buffer I/O error on device sdc1, logical block 618463
> [  327.237240] Buffer I/O error on device sdc1, logical block 618464
> [  327.243347] Buffer I/O error on device sdc1, logical block 618465
> [  327.249451] Buffer I/O error on device sdc1, logical block 618466
> [  327.255557] Buffer I/O error on device sdc1, logical block 618467
> [  327.261717] EXT4-fs warning (device sdc1): ext4_end_bio:317: I/O error writing to inode 7342793 (offset 327155712 size 8388608 starting block 618782)
> [  327.275162] EXT4-fs warning (device sdc1): ext4_end_bio:317: I/O error writing to inode 7342793 (offset 327155712 size 8388608 starting block 618812)
> [  327.288603] EXT4-fs warning (device sdc1): ext4_end_bio:317: I/O error writing to inode 7342793 (offset 327155712 size 8388608 starting block 618842)
> [  327.302043] EXT4-fs warning (device sdc1): ext4_end_bio:317: I/O error writing to inode 7342793 (offset 327155712 size 8388608 starting block 618872)
> [  327.315483] EXT4-fs warning (device sdc1): ext4_end_bio:317: I/O error writing to inode 7342793 (offset 327155712 size 8388608 starting block 618902)
> [  327.328922] EXT4-fs warning (device sdc1): ext4_end_bio:317: I/O error writing to inode 7342793 (offset 327155712 size 8388608 starting block 618932)
> [  327.342362] EXT4-fs warning (device sdc1): ext4_end_bio:317: I/O error writing to inode 7342793 (offset 327155712 size 8388608 starting block 618962)
> [  327.355803] EXT4-fs warning (device sdc1): ext4_end_bio:317: I/O error writing to inode 7342793 (offset 327155712 size 8388608 starting block 618992)
> [  327.369243] EXT4-fs warning (device sdc1): ext4_end_bio:317: I/O error writing to inode 7342793 (offset 327155712 size 8388608 starting block 619022)
> [  327.390294] sd 4:0:0:0: [sdc] Unhandled error code
> [  327.395102] sd 4:0:0:0: [sdc]  
> [  327.398254] sd 4:0:0:0: [sdc] CDB: 
> [  327.401775] end_request: I/O error, dev sdc, sector 4949952
> [  327.413642] Aborting journal on device sdc1-8.
> [  327.418381] EXT4-fs error (device sdc1): ext4_journal_check_start:56: Detected aborted journal
> [  327.427057] EXT4-fs (sdc1): Remounting filesystem read-only
> [  327.432645] EXT4-fs (sdc1): previous I/O error to superblock detected
> [  327.439119] EXT4-fs (sdc1): ext4_writepages: jbd2_start: 0 pages, ino 7342793; err -30
> [  327.447223] JBD2: Error -5 detected when updating journal superblock for sdc1-8.
> [  327.465498] sd 4:0:0:0: [sdc] Synchronizing SCSI cache
> [  327.470799] sd 4:0:0:0: [sdc]  
> [  327.476202] EXT4-fs (sdc1): I/O error while writing superblock
>
>
> Previous trace is obtained on a NETGEAR ReadyNAS 102 (Marvell Armada 370
> based NAS) using current 3.13.0-rc7. The NAS has a Fresco Logic FL1009
> XHCI controller: 

I can add the following:

2) I just tested the copy of the small files using the ICY DOCK
MB981U3S-1S connected to my ReadyNAS 102 *on a 3.11.7 kernel*
and it completed successfully. 

2) on current 3.13.0-rc7 kernel (w/ Bj?rn patch applied just in case), I
just transferred 40GB through a Logitec LAN-GT JU3H3 (it is based on an
ASIX AX88179) connected to my ReadyNAS Duo v2 (NEC XHCI controller) at
an average rate of 82MB/s. Bottom line: no issue. When I try to do the
same on my ReadyNAS 102, I immediately get complaints:

[  383.280429] xhci_hcd 0000:02:00.0: ep 0x82 - asked for 20480 bytes, 20400 bytes untransferred
[  383.280892] xhci_hcd 0000:02:00.0: ep 0x82 - asked for 20480 bytes, 20400 bytes untransferred
[  411.620073] xhci_hcd 0000:02:00.0: ep 0x82 - asked for 20480 bytes, 20352 bytes untransferred
[  411.620750] xhci_hcd 0000:02:00.0: ep 0x82 - asked for 20480 bytes, 20400 bytes untransferred
[  411.621727] xhci_hcd 0000:02:00.0: ep 0x82 - asked for 20480 bytes, 20400 bytes untransferred
[  412.066651] xhci_hcd 0000:02:00.0: ep 0x82 - asked for 20480 bytes, 20352 bytes untransferred
[  412.067233] xhci_hcd 0000:02:00.0: ep 0x82 - asked for 20480 bytes, 20400 bytes untransferred
[  412.068196] xhci_hcd 0000:02:00.0: ep 0x82 - asked for 20480 bytes, 20400 bytes untransferred
[  412.242708] xhci_hcd 0000:02:00.0: ep 0x82 - asked for 20480 bytes, 20352 bytes untransferred
[  412.243333] xhci_hcd 0000:02:00.0: ep 0x82 - asked for 20480 bytes, 20400 bytes untransferred
[  412.244246] xhci_hcd 0000:02:00.0: ep 0x82 - asked for 20480 bytes, 20400 bytes untransferred
[  412.417956] xhci_hcd 0000:02:00.0: ep 0x82 - asked for 20480 bytes, 20352 bytes untransferred
[  412.418516] xhci_hcd 0000:02:00.0: ep 0x82 - asked for 20480 bytes, 20400 bytes untransferred
[  412.419473] xhci_hcd 0000:02:00.0: ep 0x82 - asked for 20480 bytes, 20400 bytes untransferred
[  412.545330] xhci_hcd 0000:02:00.0: ep 0x82 - asked for 20480 bytes, 20352 bytes untransferred

If someone has a system w/ a Fresco Logic FL1009, I would be interested
by some feedback on how it behaves w/ current kernel.

Cheers,

a+

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [BUG] FL1009: xHCI host not responding to stop endpoint command.
  2014-01-12 21:36 ` Arnaud Ebalard
@ 2014-01-14 17:07   ` Sarah Sharp
  2014-01-14 18:11     ` Bjørn Mork
  2014-01-14 21:54     ` Arnaud Ebalard
  0 siblings, 2 replies; 28+ messages in thread
From: Sarah Sharp @ 2014-01-14 17:07 UTC (permalink / raw)
  To: linux-arm-kernel

On Sun, Jan 12, 2014 at 10:36:56PM +0100, Arnaud Ebalard wrote:
> Hi,
> 
> arno at natisbad.org (Arnaud Ebalard) writes:
> 
> > Both with a LaCie 2.5" USB 3.0 Rugged Mini disk (powered by the port)
> > and a 3.5" SATA disk connected via an ICY DOCK MB981U3S-1S dock station
> > (external power), I can transfer huge files w/o issue but I get the
> > following when transferring small files (e.g. copy of some debian
> > directory to a folder on the disk):
> >
> > # cp -rf /lib/ /bin/ /sbin/ /opt /var/ /usr/ test/
> > [  327.130045] xhci_hcd 0000:02:00.0: xHCI host not responding to stop endpoint command.
> > [  327.137899] xhci_hcd 0000:02:00.0: Assuming host is dying, halting host.
> > [  327.144644] xhci_hcd 0000:02:00.0: HC died; cleaning up

Hmm, you're the second person to report an issue with a host dying.

> I can add the following:
> 
> 2) I just tested the copy of the small files using the ICY DOCK
> MB981U3S-1S connected to my ReadyNAS 102 *on a 3.11.7 kernel*
> and it completed successfully. 

Please try a 3.13-rc7 kernel after running `git revert 35773dac5f86`.

> 2) on current 3.13.0-rc7 kernel (w/ Bj?rn patch applied just in case), I

Which patch are you referring to?

> just transferred 40GB through a Logitec LAN-GT JU3H3 (it is based on an
> ASIX AX88179) connected to my ReadyNAS Duo v2 (NEC XHCI controller) at
> an average rate of 82MB/s. Bottom line: no issue. When I try to do the
> same on my ReadyNAS 102, I immediately get complaints:

Was the ReadyNAS 102 with the Fresco Logic host running 3.13.0-rc7 as
well?

> [  383.280429] xhci_hcd 0000:02:00.0: ep 0x82 - asked for 20480 bytes, 20400 bytes untransferred
> [  383.280892] xhci_hcd 0000:02:00.0: ep 0x82 - asked for 20480 bytes, 20400 bytes untransferred
> [  411.620073] xhci_hcd 0000:02:00.0: ep 0x82 - asked for 20480 bytes, 20352 bytes untransferred
> [  411.620750] xhci_hcd 0000:02:00.0: ep 0x82 - asked for 20480 bytes, 20400 bytes untransferred
> [  411.621727] xhci_hcd 0000:02:00.0: ep 0x82 - asked for 20480 bytes, 20400 bytes untransferred
> [  412.066651] xhci_hcd 0000:02:00.0: ep 0x82 - asked for 20480 bytes, 20352 bytes untransferred
> [  412.067233] xhci_hcd 0000:02:00.0: ep 0x82 - asked for 20480 bytes, 20400 bytes untransferred
> [  412.068196] xhci_hcd 0000:02:00.0: ep 0x82 - asked for 20480 bytes, 20400 bytes untransferred
> [  412.242708] xhci_hcd 0000:02:00.0: ep 0x82 - asked for 20480 bytes, 20352 bytes untransferred
> [  412.243333] xhci_hcd 0000:02:00.0: ep 0x82 - asked for 20480 bytes, 20400 bytes untransferred
> [  412.244246] xhci_hcd 0000:02:00.0: ep 0x82 - asked for 20480 bytes, 20400 bytes untransferred
> [  412.417956] xhci_hcd 0000:02:00.0: ep 0x82 - asked for 20480 bytes, 20352 bytes untransferred
> [  412.418516] xhci_hcd 0000:02:00.0: ep 0x82 - asked for 20480 bytes, 20400 bytes untransferred
> [  412.419473] xhci_hcd 0000:02:00.0: ep 0x82 - asked for 20480 bytes, 20400 bytes untransferred
> [  412.545330] xhci_hcd 0000:02:00.0: ep 0x82 - asked for 20480 bytes, 20352 bytes untransferred

Those messages are normal.  It just means the device transferred less
data than the host asked for, which is part of normal USB operation.

Does the dock work despite the messages?

Please send me the output of `sudo lspci -vvv -n` for the ReadyNAS 102
Fresco Logic host.

Sarah Sharp

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [BUG] FL1009: xHCI host not responding to stop endpoint command.
  2014-01-14 17:07   ` Sarah Sharp
@ 2014-01-14 18:11     ` Bjørn Mork
  2014-01-14 21:54     ` Arnaud Ebalard
  1 sibling, 0 replies; 28+ messages in thread
From: Bjørn Mork @ 2014-01-14 18:11 UTC (permalink / raw)
  To: linux-arm-kernel

Sarah Sharp <sarah.a.sharp@linux.intel.com> writes:
> On Sun, Jan 12, 2014 at 10:36:56PM +0100, Arnaud Ebalard wrote:
>
>> arno at natisbad.org (Arnaud Ebalard) writes:
>> 
>> 2) on current 3.13.0-rc7 kernel (w/ Bj?rn patch applied just in case), I
>
> Which patch are you referring to?

I wondered about that too at first :-)

But luckily I don't work that much, and the reference to the ASIX
AX88179 made it clearer.  I believe it must have been this patch:

 https://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=fdc3452cd2c7b2bfe0f378f92123f4f9a98fa2bd


Bj?rn

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [BUG] FL1009: xHCI host not responding to stop endpoint command.
  2014-01-14 17:07   ` Sarah Sharp
  2014-01-14 18:11     ` Bjørn Mork
@ 2014-01-14 21:54     ` Arnaud Ebalard
  2014-01-15  9:59       ` David Laight
  1 sibling, 1 reply; 28+ messages in thread
From: Arnaud Ebalard @ 2014-01-14 21:54 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Sarah,

Sarah Sharp <sarah.a.sharp@linux.intel.com> writes:

>> I can add the following:
>> 
>> 2) I just tested the copy of the small files using the ICY DOCK
>> MB981U3S-1S connected to my ReadyNAS 102 *on a 3.11.7 kernel*
>> and it completed successfully. 
>
> Please try a 3.13-rc7 kernel after running `git revert 35773dac5f86`.

I tried current 3.13.0-rc8 w/ 35773dac5f86 reverted and the result is
the same:

Powering the dock station:

[   70.530128] usb 3-1: new SuperSpeed USB device number 2 using xhci_hcd
[   70.550649] usb 3-1: New USB device found, idVendor=174c, idProduct=5106
[   70.557368] usb 3-1: New USB device strings: Mfr=2, Product=3, SerialNumber=1
[   70.564529] usb 3-1: Product: AS2105
[   70.568112] usb 3-1: Manufacturer: ASMedia
[   70.572222] usb 3-1: SerialNumber: 00000000000000000000
[   70.579174] usb-storage 3-1:1.0: USB Mass Storage device detected
[   70.586379] scsi4 : usb-storage 3-1:1.0
[   71.590415] scsi 4:0:0:0: Direct-Access     ASMT     2105             0    PQ: 0 ANSI: 6
[   71.601630] sd 4:0:0:0: Attached scsi generic sg2 type 0
[   76.968663] sd 4:0:0:0: [sdc] 488281250 512-byte logical blocks: (250 GB/232 GiB)
[   76.976800] sd 4:0:0:0: [sdc] Write Protect is off
[   76.982190] sd 4:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[   77.028598]  sdc: sdc1 sdc2
[   77.035729] sd 4:0:0:0: [sdc] Attached SCSI disk


Starting the copy after mounting the disk and creating test/:

# cp -r /bin/ /lib /sbin /opt /etc /usr test/
[  398.130048] xhci_hcd 0000:02:00.0: xHCI host not responding to stop endpoint command.
[  398.137905] xhci_hcd 0000:02:00.0: Assuming host is dying, halting host.
[  398.144650] xhci_hcd 0000:02:00.0: HC died; cleaning up
[  398.150337] usb 3-1: USB disconnect, device number 2
[  398.155472] sd 4:0:0:0: Device offlined - not ready after error recovery
[  398.170111] sd 4:0:0:0: [sdc] Unhandled error code
[  398.174923] sd 4:0:0:0: [sdc]  
[  398.178076] sd 4:0:0:0: [sdc] CDB: 
[  398.181606] end_request: I/O error, dev sdc, sector 235538432
[  398.187367] EXT4-fs warning (device sdc1): ext4_end_bio:317: I/O error writing to inode 7347656 (offset 0 size 323584 starting block 29442334)
[  398.200175] Buffer I/O error on device sdc1, logical block 29442048
[  398.206457] Buffer I/O error on device sdc1, logical block 29442049
[  398.212739] Buffer I/O error on device sdc1, logical block 29442050
[  398.219016] Buffer I/O error on device sdc1, logical block 29442051
[  398.225298] Buffer I/O error on device sdc1, logical block 29442052
[  398.231579] Buffer I/O error on device sdc1, logical block 29442053
[  398.237857] Buffer I/O error on device sdc1, logical block 29442054
[  398.244137] Buffer I/O error on device sdc1, logical block 29442055
[  398.250418] Buffer I/O error on device sdc1, logical block 29442056
[  398.256696] Buffer I/O error on device sdc1, logical block 29442057
[  398.264481] sd 4:0:0:0: [sdc] Unhandled error code
[  398.269290] sd 4:0:0:0: [sdc]  
[  398.272466] sd 4:0:0:0: [sdc] CDB: 
[  398.275980] end_request: I/O error, dev sdc, sector 235538672
[  398.281744] EXT4-fs warning (device sdc1): ext4_end_bio:317: I/O error writing to inode 7347656 (offset 0 size 323584 starting block 29442364)
[  398.294606] EXT4-fs warning (device sdc1): ext4_end_bio:317: I/O error writing to inode 7347656 (offset 0 size 323584 starting block 29442383)

>> 2) on current 3.13.0-rc7 kernel (w/ Bj?rn patch applied just in case), I
>
> Which patch are you referring to?

This one:

  Commit 60e453a940ac ("USBNET: fix handling padding packet")
  added an extra SG entry in case padding is necessary, but
  failed to update the initialisation of the list. This can
  cause list traversal to fall off the end of the list,
  resulting in an oops.
  
  Fixes: 60e453a940ac ("USBNET: fix handling padding packet")
  Reported-by: Thomas Kear <thomas@kear.co.nz>
  Cc: Ming Lei <ming.lei@canonical.com>
  Signed-off-by: Bj?rn Mork <bjorn@mork.no>
  

>> just transferred 40GB through a Logitec LAN-GT JU3H3 (it is based on an
>> ASIX AX88179) connected to my ReadyNAS Duo v2 (NEC XHCI controller) at
>> an average rate of 82MB/s. Bottom line: no issue. When I try to do the
>> same on my ReadyNAS 102, I immediately get complaints:
>
> Was the ReadyNAS 102 with the Fresco Logic host running 3.13.0-rc7 as
> well?

Yes. Summary is:

 - RN102 w/ 3.11.7: OK
 - RN102 w/ 3.13-rc7: KO
 - Duo v2 w/ 3.13.0-rc7: OK

>> [  383.280429] xhci_hcd 0000:02:00.0: ep 0x82 - asked for 20480 bytes, 20400 bytes untransferred
>> [  383.280892] xhci_hcd 0000:02:00.0: ep 0x82 - asked for 20480 bytes, 20400 bytes untransferred
>> [  411.620073] xhci_hcd 0000:02:00.0: ep 0x82 - asked for 20480 bytes, 20352 bytes untransferred
>> [  411.620750] xhci_hcd 0000:02:00.0: ep 0x82 - asked for 20480 bytes, 20400 bytes untransferred
>> [  411.621727] xhci_hcd 0000:02:00.0: ep 0x82 - asked for 20480 bytes, 20400 bytes untransferred
>> [  412.066651] xhci_hcd 0000:02:00.0: ep 0x82 - asked for 20480 bytes, 20352 bytes untransferred
>> [  412.067233] xhci_hcd 0000:02:00.0: ep 0x82 - asked for 20480 bytes, 20400 bytes untransferred
>> [  412.068196] xhci_hcd 0000:02:00.0: ep 0x82 - asked for 20480 bytes, 20400 bytes untransferred
>> [  412.242708] xhci_hcd 0000:02:00.0: ep 0x82 - asked for 20480 bytes, 20352 bytes untransferred
>> [  412.243333] xhci_hcd 0000:02:00.0: ep 0x82 - asked for 20480 bytes, 20400 bytes untransferred
>> [  412.244246] xhci_hcd 0000:02:00.0: ep 0x82 - asked for 20480 bytes, 20400 bytes untransferred
>> [  412.417956] xhci_hcd 0000:02:00.0: ep 0x82 - asked for 20480 bytes, 20352 bytes untransferred
>> [  412.418516] xhci_hcd 0000:02:00.0: ep 0x82 - asked for 20480 bytes, 20400 bytes untransferred
>> [  412.419473] xhci_hcd 0000:02:00.0: ep 0x82 - asked for 20480 bytes, 20400 bytes untransferred
>> [  412.545330] xhci_hcd 0000:02:00.0: ep 0x82 - asked for 20480 bytes, 20352 bytes untransferred
>
> Those messages are normal.  It just means the device transferred less
> data than the host asked for, which is part of normal USB operation.
>
> Does the dock work despite the messages?

no

> Please send me the output of `sudo lspci -vvv -n` for the ReadyNAS 102
> Fresco Logic host.

# lspci -vvv -n
00:01.0 0604: 11ab:7846 (prog-if 00 [Normal decode])
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
        I/O behind bridge: 00010000-00010fff
        Memory behind bridge: e0000000-e00fffff
        Prefetchable memory behind bridge: 00000000-000fffff
        Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
        BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
                PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-

00:02.0 0604: 11ab:7846 (prog-if 00 [Normal decode])
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Bus: primary=00, secondary=02, subordinate=02, sec-latency=0
        I/O behind bridge: 0000f000-00000fff
        Memory behind bridge: e0100000-e01fffff
        Prefetchable memory behind bridge: 00000000-000fffff
        Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
        BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
                PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-

01:00.0 0106: 1b4b:9170 (rev 12) (prog-if 01 [AHCI 1.0])
        Subsystem: 1b4b:9170
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 105
        Region 0: I/O ports at 10010 [size=8]
        Region 1: I/O ports at 10020 [size=4]
        Region 2: I/O ports at 10018 [size=8]
        Region 3: I/O ports at 10024 [size=4]
        Region 4: I/O ports at 10000 [size=16]
        Region 5: Memory at e0010000 (32-bit, non-prefetchable) [size=512]
        Expansion ROM@e0000000 [disabled] [size=64K]
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold-)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit-
                Address: d0020a04  Data: 0f10
        Capabilities: [70] Express (v2) Legacy Endpoint, MSI 00
                DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <1us, L1 <8us
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
                LnkCap: Port #0, Speed 5GT/s, Width x1, ASPM L0s L1, Latency L0 <512ns, L1 <64us
                        ClockPM- Surprise- LLActRep- BwNot-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk-
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Not Supported, TimeoutDis+, LTR-, OBFF Not Supported
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
                LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
        Capabilities: [100 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
        Kernel driver in use: ahci

02:00.0 0c03: 1b73:1009 (rev 02) (prog-if 30 [XHCI])
        Subsystem: 1b73:0000
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 104
        Region 0: Memory at e0100000 (64-bit, non-prefetchable) [size=64K]
        Region 2: Memory at e0110000 (64-bit, non-prefetchable) [size=4K]
        Region 4: Memory@e0111000 (64-bit, non-prefetchable) [size=4K]
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI- D1+ D2- AuxCurrent=375mA PME(D0+,D1+,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [50] MSI: Enable- Count=1/8 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [70] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
                LnkCap: Port #0, Speed 5GT/s, Width x1, ASPM L0s L1, Latency L0 unlimited, L1 unlimited
                        ClockPM- Surprise- LLActRep- BwNot-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk-
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Not Supported, TimeoutDis+, LTR-, OBFF Not Supported
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
                LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
        Capabilities: [b0] MSI-X: Enable+ Count=8 Masked-
                Vector table: BAR=2 offset=00000000
                PBA: BAR=4 offset=00000000
        Capabilities: [100 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
        Kernel driver in use: xhci_hcd

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [BUG] FL1009: xHCI host not responding to stop endpoint command.
  2014-01-14 21:54     ` Arnaud Ebalard
@ 2014-01-15  9:59       ` David Laight
  2014-01-15 19:04         ` Arnaud Ebalard
  0 siblings, 1 reply; 28+ messages in thread
From: David Laight @ 2014-01-15  9:59 UTC (permalink / raw)
  To: linux-arm-kernel

From: Arnaud Ebalard
> Sent: 14 January 2014 21:54
> To: Sarah Sharp
> Cc: linux-usb at vger.kernel.org; linux-arm-kernel at lists.infradead.org; Bj?rn Mork
> Subject: Re: [BUG] FL1009: xHCI host not responding to stop endpoint command.
> 
> Hi Sarah,
> 
> Sarah Sharp <sarah.a.sharp@linux.intel.com> writes:
> 
> >> I can add the following:
> >>
> >> 2) I just tested the copy of the small files using the ICY DOCK
> >> MB981U3S-1S connected to my ReadyNAS 102 *on a 3.11.7 kernel*
> >> and it completed successfully.
> >
> > Please try a 3.13-rc7 kernel after running `git revert 35773dac5f86`.
> 
> I tried current 3.13.0-rc8 w/ 35773dac5f86 reverted and the result is
> the same:

That patch only affects an error code and stops the fs code retrying for ever.

Does everything work if you comment out the code in xhci-ring.c that adds
NOP TRBs to the ring end in order to stop the LINK TRB appearing in the middle
of a TB.
The ethernet code needs it, but the disk transfers are (probably) aligned
such that they don't.

If that all works I'll look at writing a patch that either doesn't use NOPs
or checks the alignment of all the fragments.

	David

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [BUG] FL1009: xHCI host not responding to stop endpoint command.
  2014-01-15  9:59       ` David Laight
@ 2014-01-15 19:04         ` Arnaud Ebalard
  2014-01-16 18:50           ` Sarah Sharp
  0 siblings, 1 reply; 28+ messages in thread
From: Arnaud Ebalard @ 2014-01-15 19:04 UTC (permalink / raw)
  To: linux-arm-kernel

Hi David,

David Laight <David.Laight@ACULAB.COM> writes:

>> I tried current 3.13.0-rc8 w/ 35773dac5f86 reverted and the result is
>> the same:
>
> That patch only affects an error code and stops the fs code retrying
> for ever.

Are you sure? ...

> Does everything work if you comment out the code in xhci-ring.c that adds
> NOP TRBs to the ring end in order to stop the LINK TRB appearing in the middle
> of a TB.
> The ethernet code needs it, but the disk transfers are (probably) aligned
> such that they don't.

... AFAICT, this is exactly what commit 35773dac5f86 does and reverting
it does not help. If I am mistaken, can you point which part you want me
to remove in the code to test?

I am slowly starting to see a bisect session coming ;-)

Cheers,

a+

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [BUG] FL1009: xHCI host not responding to stop endpoint command.
  2014-01-15 19:04         ` Arnaud Ebalard
@ 2014-01-16 18:50           ` Sarah Sharp
  2014-01-17  6:25             ` Arnaud Ebalard
  0 siblings, 1 reply; 28+ messages in thread
From: Sarah Sharp @ 2014-01-16 18:50 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jan 15, 2014 at 08:04:45PM +0100, Arnaud Ebalard wrote:
> Hi David,
> 
> David Laight <David.Laight@ACULAB.COM> writes:
> 
> >> I tried current 3.13.0-rc8 w/ 35773dac5f86 reverted and the result is
> >> the same:
> >
> > That patch only affects an error code and stops the fs code retrying
> > for ever.
> 
> Are you sure? ...
> 
> > Does everything work if you comment out the code in xhci-ring.c that adds
> > NOP TRBs to the ring end in order to stop the LINK TRB appearing in the middle
> > of a TB.
> > The ethernet code needs it, but the disk transfers are (probably) aligned
> > such that they don't.
> 
> ... AFAICT, this is exactly what commit 35773dac5f86 does and reverting
> it does not help. If I am mistaken, can you point which part you want me
> to remove in the code to test?
> 
> I am slowly starting to see a bisect session coming ;-)

Try reverting commit 60e102ac73cd40069d077014c93c86dc7205cb68.  That was
causing issues with another Fresco Logic host.  If that doesn't help,
then yes, please git bisect.

Sarah Sharp

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [BUG] FL1009: xHCI host not responding to stop endpoint command.
  2014-01-16 18:50           ` Sarah Sharp
@ 2014-01-17  6:25             ` Arnaud Ebalard
  2014-01-17  8:31               ` Bjørn Mork
  0 siblings, 1 reply; 28+ messages in thread
From: Arnaud Ebalard @ 2014-01-17  6:25 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

Sarah Sharp <sarah.a.sharp@linux.intel.com> writes:

>> ... AFAICT, this is exactly what commit 35773dac5f86 does and reverting
>> it does not help. If I am mistaken, can you point which part you want me
>> to remove in the code to test?
>> 
>> I am slowly starting to see a bisect session coming ;-)
>
> Try reverting commit 60e102ac73cd40069d077014c93c86dc7205cb68.

AFAICT, this commit does not exist in master (Linus tree), i.e. it is
not in 3.13.0-rc8.

> That was causing issues with another Fresco Logic host.  If that
> doesn't help, then yes, please git bisect.

I have started a bisect session yesterday. I will try and finish it this
evening.

Cheers,

a+

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [BUG] FL1009: xHCI host not responding to stop endpoint command.
  2014-01-17  6:25             ` Arnaud Ebalard
@ 2014-01-17  8:31               ` Bjørn Mork
  2014-01-17 20:54                 ` Sarah Sharp
  0 siblings, 1 reply; 28+ messages in thread
From: Bjørn Mork @ 2014-01-17  8:31 UTC (permalink / raw)
  To: linux-arm-kernel

arno at natisbad.org (Arnaud Ebalard) writes:
> Sarah Sharp <sarah.a.sharp@linux.intel.com> writes:
>
>>> ... AFAICT, this is exactly what commit 35773dac5f86 does and reverting
>>> it does not help. If I am mistaken, can you point which part you want me
>>> to remove in the code to test?
>>> 
>>> I am slowly starting to see a bisect session coming ;-)
>>
>> Try reverting commit 60e102ac73cd40069d077014c93c86dc7205cb68.
>
> AFAICT, this commit does not exist in master (Linus tree), i.e. it is
> not in 3.13.0-rc8.

That commit is a stable backport of 9df89d85b407690afa46ddfbccc80bec6869971d 
which is in v3.13-rc8:

bjorn at nemi:/usr/local/src/git/linux$ git tag --contains 9df89d85b407690afa46ddfbccc80bec6869971d
usb-3.13-rc1
usb-3.13-rc3
usb-3.13-rc5
v3.13-rc1
v3.13-rc2
v3.13-rc3
v3.13-rc4
v3.13-rc5
v3.13-rc6
v3.13-rc7
v3.13-rc8

The stable backport is in all v3.12.x releases:

bjorn at nemi:/usr/local/src/git/linux$ git tag --contains 60e102ac73cd40069d077014c93c86dc7205cb68
v3.12.1
v3.12.2
v3.12.3
v3.12.4
v3.12.5
v3.12.6
v3.12.7
v3.12.8



Bj?rn

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [BUG] FL1009: xHCI host not responding to stop endpoint command.
  2014-01-17  8:31               ` Bjørn Mork
@ 2014-01-17 20:54                 ` Sarah Sharp
  2014-01-18 21:49                   ` Arnaud Ebalard
  0 siblings, 1 reply; 28+ messages in thread
From: Sarah Sharp @ 2014-01-17 20:54 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jan 17, 2014 at 09:31:16AM +0100, Bj?rn Mork wrote:
> arno at natisbad.org (Arnaud Ebalard) writes:
> > Sarah Sharp <sarah.a.sharp@linux.intel.com> writes:
> >
> >>> ... AFAICT, this is exactly what commit 35773dac5f86 does and reverting
> >>> it does not help. If I am mistaken, can you point which part you want me
> >>> to remove in the code to test?
> >>> 
> >>> I am slowly starting to see a bisect session coming ;-)
> >>
> >> Try reverting commit 60e102ac73cd40069d077014c93c86dc7205cb68.
> >
> > AFAICT, this commit does not exist in master (Linus tree), i.e. it is
> > not in 3.13.0-rc8.
> 
> That commit is a stable backport of 9df89d85b407690afa46ddfbccc80bec6869971d 
> which is in v3.13-rc8:
> 
> bjorn at nemi:/usr/local/src/git/linux$ git tag --contains 9df89d85b407690afa46ddfbccc80bec6869971d
> usb-3.13-rc1
> usb-3.13-rc3
> usb-3.13-rc5
> v3.13-rc1
> v3.13-rc2
> v3.13-rc3
> v3.13-rc4
> v3.13-rc5
> v3.13-rc6
> v3.13-rc7
> v3.13-rc8

Sorry for using the stable commit ID.  Arnaud, please try reverting
commit 9df89d85b407690afa46ddfbccc80bec6869971d "usbcore: set
lpm_capable field for LPM capable root hubs" and see if it fixes your
issues.

Sarah Sharp

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [BUG] FL1009: xHCI host not responding to stop endpoint command.
  2014-01-17 20:54                 ` Sarah Sharp
@ 2014-01-18 21:49                   ` Arnaud Ebalard
  2014-01-21 21:17                     ` Sarah Sharp
  2014-02-18 13:10                     ` Thomas Petazzoni
  0 siblings, 2 replies; 28+ messages in thread
From: Arnaud Ebalard @ 2014-01-18 21:49 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

I have added Thomas in the recipients, because I guess he may be of some
help debugging the issue further. Thomas, the beginning of the thread is
here: http://thread.gmane.org/gmane.linux.usb.general/101531

Sarah Sharp <sarah.a.sharp@linux.intel.com> writes:

>> >>> I am slowly starting to see a bisect session coming ;-)
>> >>
>> >> Try reverting commit 60e102ac73cd40069d077014c93c86dc7205cb68.
>> >
>> > AFAICT, this commit does not exist in master (Linus tree), i.e. it is
>> > not in 3.13.0-rc8.
>> 
>> That commit is a stable backport of 9df89d85b407690afa46ddfbccc80bec6869971d 
>> which is in v3.13-rc8:
>> 
>> bjorn at nemi:/usr/local/src/git/linux$ git tag --contains 9df89d85b407690afa46ddfbccc80bec6869971d
>> usb-3.13-rc1
>> usb-3.13-rc3
>> usb-3.13-rc5
>> v3.13-rc1
>> v3.13-rc2
>> v3.13-rc3
>> v3.13-rc4
>> v3.13-rc5
>> v3.13-rc6
>> v3.13-rc7
>> v3.13-rc8
>
> Sorry for using the stable commit ID.  Arnaud, please try reverting
> commit 9df89d85b407690afa46ddfbccc80bec6869971d "usbcore: set
> lpm_capable field for LPM capable root hubs" and see if it fixes your
> issues.

Nope, 9df89d85b407690 does not fix the issue but I guess I found the
reason: I think the regression is not directly due to some usb/XHCI
related change. More below.

I started a git bisect session but I had to stop between the two
following commits, because the last ones I tested after those were just
not bootable.

 bad : f9efbce6334844c7f8b9b9459f6d7a6fbc2928e0 (merge commit)
 good: aac59e3efce3dca787b11e34726001603ce3d161 (merge commit)

At that point, I decided to switch to a manual review of the changes
introduced *between* those commits:

$ git log -p aac59e3efce3..f9efbce63348 | grep -c ^commit
524

I looked at which files where touched (387 in total) and dropped those
that are not use on my platform or cannot be suspected fo the bug. I
ended up w/:

  drivers/irqchip/irq-armada-370-xp.c
  drivers/pci/host/pci-mvebu.c

Which are modified by those commits:

  commit f5072dfbac05: PCI: mvebu: make local functions static
  commit 032b4c0cc321: PCI: mvebu: add I/O access wrappers
  commit 9f352f0e6c0f: PCI: mvebu: Dynamically detect if the PEX link is up to enable hot plug
  commit cc54ccd9a696: PCI: mvebu: add support for Marvell Dove SoCs
  commit 52ba992e201f: PCI: mvebu: add support for reset on GPIO
  commit e5615c30c1c9: PCI: mvebu: remove subsys_initcall
  commit bf09b6ae588f: PCI: mvebu: increment nports only for registered ports
  commit b42285f66f87: PCI: mvebu: move clock enable before register access
  commit 5b4deb6526bd: PCI: mvebu: add support for MSI
  commit 31f614edb726: irqchip: armada-370-xp: implement MSI support
  commit 627dfcc249e2: irqchip: armada-370-xp: properly request resources
  
I started suspecting the introduction of MSI support in Marvell PCIe
host controller driver (FL1009 is on the PCIe bus) and compiled a
a 3.13.0-rc8 w/ CONFIG_PCI_MSI disabled (it was enabled in all my
previous tests): I did not manage to reproduce the issue with this
kernel. As a side note, commits 5b4deb6526bd, 31f614edb726 and
627dfcc249e2 are

ATM, I do not know if the problem is related to a bug in introduced MSI
support or some weird incompatibility of that functionality with the
FL1009 which would require some quirk in XHCI stack.

Thomas, I took a look at the changes but I am not familiar w/ how MSI
work. You may have an idea on what is going on here.

Cheers,

a+

ps: Thomas, this is completely unrelated but the code below caught my
eye at the beginning of a hunk in 31f614edb726. When CONFIG_PCI_MSI is
disabled, why is irqnr now compared to 1 instead of 0?

@@ -214,12 +365,39 @@ armada_370_xp_handle_irq(struct pt_regs *regs)
                if (irqnr > 1022)
                        break;
 
-               if (irqnr > 0) {
+               if (irqnr > 1) {
                        irqnr = irq_find_mapping(armada_370_xp_mpic_domain,
                                        irqnr);
                        handle_IRQ(irqnr, regs);
                        continue;
                }
+
+#ifdef CONFIG_PCI_MSI
+               /* MSI handling */
+               if (irqnr == 1) {

The comparisonWhen CONFIG_PCI_MSI

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [BUG] FL1009: xHCI host not responding to stop endpoint command.
  2014-01-18 21:49                   ` Arnaud Ebalard
@ 2014-01-21 21:17                     ` Sarah Sharp
  2014-01-22 22:23                       ` Arnaud Ebalard
  2014-02-18 13:10                     ` Thomas Petazzoni
  1 sibling, 1 reply; 28+ messages in thread
From: Sarah Sharp @ 2014-01-21 21:17 UTC (permalink / raw)
  To: linux-arm-kernel

On Sat, Jan 18, 2014 at 10:49:17PM +0100, Arnaud Ebalard wrote:
> Hi,
> 
> I have added Thomas in the recipients, because I guess he may be of some
> help debugging the issue further. Thomas, the beginning of the thread is
> here: http://thread.gmane.org/gmane.linux.usb.general/101531

...

> I started suspecting the introduction of MSI support in Marvell PCIe
> host controller driver (FL1009 is on the PCIe bus) and compiled a
> a 3.13.0-rc8 w/ CONFIG_PCI_MSI disabled (it was enabled in all my
> previous tests): I did not manage to reproduce the issue with this
> kernel. As a side note, commits 5b4deb6526bd, 31f614edb726 and
> 627dfcc249e2 are
> 
> ATM, I do not know if the problem is related to a bug in introduced MSI
> support or some weird incompatibility of that functionality with the
> FL1009 which would require some quirk in XHCI stack.

We've actually had issues in the past with Fresco Logic hosts not
supporting MSI properly, even though the PCI devices claim to have MSI
support.  So turning off CONFIG_PCI_MSI may actually mean the Fresco
Logic host is to blame, rather than the Marvell patches.  I assume MSI
wouldn't have been turned on for the Fresco Logic host unless the parent
PCI host controller supported it.

Let's see if the Fresco Logic host is really the root cause.  Please
apply the this patch to 3.13.0-rc8 and recompile with CONFIG_PCI_MSI
enabled:

diff --git a/drivers/usb/host/xhci-pci.c b/drivers/usb/host/xhci-pci.c
index 6c03584ac15f..74748444c040 100644
--- a/drivers/usb/host/xhci-pci.c
+++ b/drivers/usb/host/xhci-pci.c
@@ -30,6 +30,7 @@
 /* Device for a quirk */
 #define PCI_VENDOR_ID_FRESCO_LOGIC	0x1b73
 #define PCI_DEVICE_ID_FRESCO_LOGIC_PDK	0x1000
+#define PCI_DEVICE_ID_FRESCO_LOGIC_FL1009	0x1009
 #define PCI_DEVICE_ID_FRESCO_LOGIC_FL1400	0x1400
 
 #define PCI_VENDOR_ID_ETRON		0x1b6f
@@ -63,6 +64,9 @@ static void xhci_pci_quirks(struct device *dev, struct xhci_hcd *xhci)
 
 	/* Look for vendor-specific quirks */
 	if (pdev->vendor == PCI_VENDOR_ID_FRESCO_LOGIC &&
+			pdev->device == PCI_DEVICE_ID_FRESCO_LOGIC_FL1009)
+		xhci->quirks |= XHCI_BROKEN_MSI;
+	if (pdev->vendor == PCI_VENDOR_ID_FRESCO_LOGIC &&
 			(pdev->device == PCI_DEVICE_ID_FRESCO_LOGIC_PDK ||
 			 pdev->device == PCI_DEVICE_ID_FRESCO_LOGIC_FL1400)) {
 		if (pdev->device == PCI_DEVICE_ID_FRESCO_LOGIC_PDK &&

Sarah Sharp

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [BUG] FL1009: xHCI host not responding to stop endpoint command.
  2014-01-21 21:17                     ` Sarah Sharp
@ 2014-01-22 22:23                       ` Arnaud Ebalard
  2014-01-22 22:26                         ` Jason Cooper
  0 siblings, 1 reply; 28+ messages in thread
From: Arnaud Ebalard @ 2014-01-22 22:23 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Sarah,

Sarah Sharp <sarah.a.sharp@linux.intel.com> writes:

> On Sat, Jan 18, 2014 at 10:49:17PM +0100, Arnaud Ebalard wrote:
>> Hi,
>> 
>> I have added Thomas in the recipients, because I guess he may be of some
>> help debugging the issue further. Thomas, the beginning of the thread is
>> here: http://thread.gmane.org/gmane.linux.usb.general/101531
>
> ...
>
>> I started suspecting the introduction of MSI support in Marvell PCIe
>> host controller driver (FL1009 is on the PCIe bus) and compiled a
>> a 3.13.0-rc8 w/ CONFIG_PCI_MSI disabled (it was enabled in all my
>> previous tests): I did not manage to reproduce the issue with this
>> kernel. As a side note, commits 5b4deb6526bd, 31f614edb726 and
>> 627dfcc249e2 are
>> 
>> ATM, I do not know if the problem is related to a bug in introduced MSI
>> support or some weird incompatibility of that functionality with the
>> FL1009 which would require some quirk in XHCI stack.
>
> We've actually had issues in the past with Fresco Logic hosts not
> supporting MSI properly, even though the PCI devices claim to have MSI
> support.  So turning off CONFIG_PCI_MSI may actually mean the Fresco
> Logic host is to blame, rather than the Marvell patches.  I assume MSI
> wouldn't have been turned on for the Fresco Logic host unless the parent
> PCI host controller supported it.
>
> Let's see if the Fresco Logic host is really the root cause.  Please
> apply the this patch to 3.13.0-rc8 and recompile with CONFIG_PCI_MSI
> enabled:
>
> diff --git a/drivers/usb/host/xhci-pci.c b/drivers/usb/host/xhci-pci.c
> index 6c03584ac15f..74748444c040 100644
> --- a/drivers/usb/host/xhci-pci.c
> +++ b/drivers/usb/host/xhci-pci.c
> @@ -30,6 +30,7 @@
>  /* Device for a quirk */
>  #define PCI_VENDOR_ID_FRESCO_LOGIC	0x1b73
>  #define PCI_DEVICE_ID_FRESCO_LOGIC_PDK	0x1000
> +#define PCI_DEVICE_ID_FRESCO_LOGIC_FL1009	0x1009
>  #define PCI_DEVICE_ID_FRESCO_LOGIC_FL1400	0x1400
>  
>  #define PCI_VENDOR_ID_ETRON		0x1b6f
> @@ -63,6 +64,9 @@ static void xhci_pci_quirks(struct device *dev, struct xhci_hcd *xhci)
>  
>  	/* Look for vendor-specific quirks */
>  	if (pdev->vendor == PCI_VENDOR_ID_FRESCO_LOGIC &&
> +			pdev->device == PCI_DEVICE_ID_FRESCO_LOGIC_FL1009)
> +		xhci->quirks |= XHCI_BROKEN_MSI;
> +	if (pdev->vendor == PCI_VENDOR_ID_FRESCO_LOGIC &&
>  			(pdev->device == PCI_DEVICE_ID_FRESCO_LOGIC_PDK ||
>  			 pdev->device == PCI_DEVICE_ID_FRESCO_LOGIC_FL1400)) {
>  		if (pdev->device == PCI_DEVICE_ID_FRESCO_LOGIC_PDK &&

With the patch applied on top of 3.13.0 kernel recompiled w/
CONFIG_PCI_MSI enabled, I cannot reproduce the bug. I guess
you can add my:

 Reported-and-tested-By: Arnaud Ebalard <arno@natisbad.org>

Since you'll have to push the patch to -stable team at least for 3.13,
I wonder if it would not make sense to extend that at least to 3.12.
and possibly 3.10 (3.2 is still widely used but I wonder if it makes
sense to go that far).

Cheers,

a+

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [BUG] FL1009: xHCI host not responding to stop endpoint command.
  2014-01-22 22:23                       ` Arnaud Ebalard
@ 2014-01-22 22:26                         ` Jason Cooper
  2014-01-22 22:43                           ` Arnaud Ebalard
  0 siblings, 1 reply; 28+ messages in thread
From: Jason Cooper @ 2014-01-22 22:26 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jan 22, 2014 at 11:23:23PM +0100, Arnaud Ebalard wrote:
> With the patch applied on top of 3.13.0 kernel recompiled w/
> CONFIG_PCI_MSI enabled, I cannot reproduce the bug. I guess
> you can add my:
> 
>  Reported-and-tested-By: Arnaud Ebalard <arno@natisbad.org>
> 
> Since you'll have to push the patch to -stable team at least for 3.13,
> I wonder if it would not make sense to extend that at least to 3.12.
> and possibly 3.10 (3.2 is still widely used but I wonder if it makes
> sense to go that far).

Can you pinpoint the commit that introduced the regression?

thx,

Jason.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [BUG] FL1009: xHCI host not responding to stop endpoint command.
  2014-01-22 22:26                         ` Jason Cooper
@ 2014-01-22 22:43                           ` Arnaud Ebalard
  2014-01-22 23:56                             ` Sarah Sharp
  0 siblings, 1 reply; 28+ messages in thread
From: Arnaud Ebalard @ 2014-01-22 22:43 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Jason,

Jason Cooper <jason@lakedaemon.net> writes:

> On Wed, Jan 22, 2014 at 11:23:23PM +0100, Arnaud Ebalard wrote:
>> With the patch applied on top of 3.13.0 kernel recompiled w/
>> CONFIG_PCI_MSI enabled, I cannot reproduce the bug. I guess
>> you can add my:
>> 
>>  Reported-and-tested-By: Arnaud Ebalard <arno@natisbad.org>
>> 
>> Since you'll have to push the patch to -stable team at least for 3.13,
>> I wonder if it would not make sense to extend that at least to 3.12.
>> and possibly 3.10 (3.2 is still widely used but I wonder if it makes
>> sense to go that far).
>
> Can you pinpoint the commit that introduced the regression?

f5182b4155b9d686c5540a6822486400e34ddd98 "xhci: Disable MSI for some Fresco Logic hosts."

Technically, this is not per se the commit which introduced the
regression but the one that *partially* fixed it by introducing the XHCI
quirk to skip MSI enabling for Fresco Logic chips. The thing is it
should have included the FL1009 in the targets. Sarah, can you confirm
this?

Jason, the logic is summarized here, AFAICT:

commit 455f58925247e8a1a1941e159f3636ad6ee4c90b
Author: Oliver Neukum <oneukum@suse.de>
Date:   Mon Sep 30 15:50:54 2013 +0200

    xhci: quirk for extra long delay for S4
    
    It has been reported that this chipset really cannot
    sleep without this extraordinary delay.
    
    This patch should be backported, in order to ensure this host functions
    under stable kernels.  The last quirk for Fresco Logic hosts (commit
    bba18e33f25072ebf70fd8f7f0cdbf8cdb59a746 "xhci: Extend Fresco Logic MSI
    quirk.") was backported to stable kernels as old as 2.6.36.
    
    Signed-off-by: Oliver Neukum <oneukum@suse.de>
    Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
    Cc: stable at vger.kernel.org


commit bba18e33f25072ebf70fd8f7f0cdbf8cdb59a746
Author: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Date:   Wed Oct 17 13:44:06 2012 -0700

    xhci: Extend Fresco Logic MSI quirk.
    
    Ali reports that plugging a device into the Fresco Logic xHCI host with
    PCI device ID 1400 produces an IRQ error:
    
     do_IRQ: 3.176 No irq handler for vector (irq -1)
    
    Other early Fresco Logic host revisions don't support MSI, even though
    their PCI config space claims they do.  Extend the quirk to disabling
    MSI to this chipset revision.  Also enable the short transfer quirk,
    since it's likely this revision also has that quirk, and it should be
    harmless to enable.
    
    <SNIP>

    This patch should be backported to stable kernels as old as 2.6.36, that
    contain the commit f5182b4155b9d686c5540a6822486400e34ddd98 "xhci:
    Disable MSI for some Fresco Logic hosts."
    
    Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
    Reported-by: A Sh <smr.ash1991@gmail.com>
    Tested-by: A Sh <smr.ash1991@gmail.com>
    Cc: stable at vger.kernel.org


commit f5182b4155b9d686c5540a6822486400e34ddd98
Author: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Date:   Thu Jun 2 11:33:02 2011 -0700

    xhci: Disable MSI for some Fresco Logic hosts.
    
    Some Fresco Logic hosts, including those found in the AUAU N533V laptop,
    advertise MSI, but fail to actually generate MSI interrupts.  Add a new
    xHCI quirk to skip MSI enabling for the Fresco Logic host controllers.
    Fresco Logic confirms that all chips with PCI vendor ID 0x1b73 and device
    ID 0x1000, regardless of PCI revision ID, do not support MSI.
    
    This should be backported to stable kernels as far back as 2.6.36, which
    was the first kernel to support MSI on xHCI hosts.
    
    Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
    Reported-by: Sergey Galanov <sergey.e.galanov@gmail.com>
    Cc: stable at kernel.org

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [BUG] FL1009: xHCI host not responding to stop endpoint command.
  2014-01-22 22:43                           ` Arnaud Ebalard
@ 2014-01-22 23:56                             ` Sarah Sharp
  2014-01-23  8:24                               ` Arnaud Ebalard
  2014-02-10 18:57                               ` Arnaud Ebalard
  0 siblings, 2 replies; 28+ messages in thread
From: Sarah Sharp @ 2014-01-22 23:56 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jan 22, 2014 at 11:43:16PM +0100, Arnaud Ebalard wrote:
> Hi Jason,
> 
> Jason Cooper <jason@lakedaemon.net> writes:
> 
> > On Wed, Jan 22, 2014 at 11:23:23PM +0100, Arnaud Ebalard wrote:
> >> With the patch applied on top of 3.13.0 kernel recompiled w/
> >> CONFIG_PCI_MSI enabled, I cannot reproduce the bug. I guess
> >> you can add my:
> >> 
> >>  Reported-and-tested-By: Arnaud Ebalard <arno@natisbad.org>
> >> 
> >> Since you'll have to push the patch to -stable team at least for 3.13,
> >> I wonder if it would not make sense to extend that at least to 3.12.
> >> and possibly 3.10 (3.2 is still widely used but I wonder if it makes
> >> sense to go that far).
> >
> > Can you pinpoint the commit that introduced the regression?
> 
> f5182b4155b9d686c5540a6822486400e34ddd98 "xhci: Disable MSI for some Fresco Logic hosts."
> 
> Technically, this is not per se the commit which introduced the
> regression but the one that *partially* fixed it by introducing the XHCI
> quirk to skip MSI enabling for Fresco Logic chips. The thing is it
> should have included the FL1009 in the targets. Sarah, can you confirm
> this?

I don't know if it should have included FL1009, it was just a guess,
based on the fact that the 0x1000 and 0x1400 devices did need MSI
disabled.  I can attempt to ask the Fresco Logic folks I know, but I'm
not sure if/when I'll get a response back.

That still doesn't necessarily rule out MSI issues in the Marvell PCI
host controller code.  Can you attach another PCI device with MSI
support under the host and see if it works?

Sarah Sharp

> Jason, the logic is summarized here, AFAICT:
> 
> commit 455f58925247e8a1a1941e159f3636ad6ee4c90b
> Author: Oliver Neukum <oneukum@suse.de>
> Date:   Mon Sep 30 15:50:54 2013 +0200
> 
>     xhci: quirk for extra long delay for S4
>     
>     It has been reported that this chipset really cannot
>     sleep without this extraordinary delay.
>     
>     This patch should be backported, in order to ensure this host functions
>     under stable kernels.  The last quirk for Fresco Logic hosts (commit
>     bba18e33f25072ebf70fd8f7f0cdbf8cdb59a746 "xhci: Extend Fresco Logic MSI
>     quirk.") was backported to stable kernels as old as 2.6.36.
>     
>     Signed-off-by: Oliver Neukum <oneukum@suse.de>
>     Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
>     Cc: stable at vger.kernel.org
> 
> 
> commit bba18e33f25072ebf70fd8f7f0cdbf8cdb59a746
> Author: Sarah Sharp <sarah.a.sharp@linux.intel.com>
> Date:   Wed Oct 17 13:44:06 2012 -0700
> 
>     xhci: Extend Fresco Logic MSI quirk.
>     
>     Ali reports that plugging a device into the Fresco Logic xHCI host with
>     PCI device ID 1400 produces an IRQ error:
>     
>      do_IRQ: 3.176 No irq handler for vector (irq -1)
>     
>     Other early Fresco Logic host revisions don't support MSI, even though
>     their PCI config space claims they do.  Extend the quirk to disabling
>     MSI to this chipset revision.  Also enable the short transfer quirk,
>     since it's likely this revision also has that quirk, and it should be
>     harmless to enable.
>     
>     <SNIP>
> 
>     This patch should be backported to stable kernels as old as 2.6.36, that
>     contain the commit f5182b4155b9d686c5540a6822486400e34ddd98 "xhci:
>     Disable MSI for some Fresco Logic hosts."
>     
>     Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
>     Reported-by: A Sh <smr.ash1991@gmail.com>
>     Tested-by: A Sh <smr.ash1991@gmail.com>
>     Cc: stable at vger.kernel.org
> 
> 
> commit f5182b4155b9d686c5540a6822486400e34ddd98
> Author: Sarah Sharp <sarah.a.sharp@linux.intel.com>
> Date:   Thu Jun 2 11:33:02 2011 -0700
> 
>     xhci: Disable MSI for some Fresco Logic hosts.
>     
>     Some Fresco Logic hosts, including those found in the AUAU N533V laptop,
>     advertise MSI, but fail to actually generate MSI interrupts.  Add a new
>     xHCI quirk to skip MSI enabling for the Fresco Logic host controllers.
>     Fresco Logic confirms that all chips with PCI vendor ID 0x1b73 and device
>     ID 0x1000, regardless of PCI revision ID, do not support MSI.
>     
>     This should be backported to stable kernels as far back as 2.6.36, which
>     was the first kernel to support MSI on xHCI hosts.
>     
>     Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
>     Reported-by: Sergey Galanov <sergey.e.galanov@gmail.com>
>     Cc: stable at kernel.org

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [BUG] FL1009: xHCI host not responding to stop endpoint command.
  2014-01-22 23:56                             ` Sarah Sharp
@ 2014-01-23  8:24                               ` Arnaud Ebalard
  2014-01-23 11:09                                 ` Willy Tarreau
  2014-01-26 13:30                                 ` Thomas Petazzoni
  2014-02-10 18:57                               ` Arnaud Ebalard
  1 sibling, 2 replies; 28+ messages in thread
From: Arnaud Ebalard @ 2014-01-23  8:24 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Sarah,

Sarah Sharp <sarah.a.sharp@linux.intel.com> writes:

>> > Can you pinpoint the commit that introduced the regression?
>> 
>> f5182b4155b9d686c5540a6822486400e34ddd98 "xhci: Disable MSI for some Fresco Logic hosts."
>> 
>> Technically, this is not per se the commit which introduced the
>> regression but the one that *partially* fixed it by introducing the XHCI
>> quirk to skip MSI enabling for Fresco Logic chips. The thing is it
>> should have included the FL1009 in the targets. Sarah, can you confirm
>> this?
>
> I don't know if it should have included FL1009, it was just a guess,
> based on the fact that the 0x1000 and 0x1400 devices did need MSI
> disabled.  I can attempt to ask the Fresco Logic folks I know, but I'm
> not sure if/when I'll get a response back.
>
> That still doesn't necessarily rule out MSI issues in the Marvell PCI
> host controller code.  Can you attach another PCI device with MSI
> support under the host and see if it works?

The various Armada-based devices I have are NAS which do not have PCIe
slots to plug additional devices (everything is soldered). I don't know
which device Thomas used for its tests. Just in case, I also added Willy
in CC: who have various boards and may also have done more test with
additional PCIe devices and CONFIG_PCI_MSI enabled on 3.13 kernel.

Cheers,

a+

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [BUG] FL1009: xHCI host not responding to stop endpoint command.
  2014-01-23  8:24                               ` Arnaud Ebalard
@ 2014-01-23 11:09                                 ` Willy Tarreau
  2014-01-27 22:20                                   ` Arnaud Ebalard
  2014-01-26 13:30                                 ` Thomas Petazzoni
  1 sibling, 1 reply; 28+ messages in thread
From: Willy Tarreau @ 2014-01-23 11:09 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jan 23, 2014 at 09:24:41AM +0100, Arnaud Ebalard wrote:
> Hi Sarah,
> 
> Sarah Sharp <sarah.a.sharp@linux.intel.com> writes:
> 
> >> > Can you pinpoint the commit that introduced the regression?
> >> 
> >> f5182b4155b9d686c5540a6822486400e34ddd98 "xhci: Disable MSI for some Fresco Logic hosts."
> >> 
> >> Technically, this is not per se the commit which introduced the
> >> regression but the one that *partially* fixed it by introducing the XHCI
> >> quirk to skip MSI enabling for Fresco Logic chips. The thing is it
> >> should have included the FL1009 in the targets. Sarah, can you confirm
> >> this?
> >
> > I don't know if it should have included FL1009, it was just a guess,
> > based on the fact that the 0x1000 and 0x1400 devices did need MSI
> > disabled.  I can attempt to ask the Fresco Logic folks I know, but I'm
> > not sure if/when I'll get a response back.
> >
> > That still doesn't necessarily rule out MSI issues in the Marvell PCI
> > host controller code.  Can you attach another PCI device with MSI
> > support under the host and see if it works?
> 
> The various Armada-based devices I have are NAS which do not have PCIe
> slots to plug additional devices (everything is soldered). I don't know
> which device Thomas used for its tests. Just in case, I also added Willy
> in CC: who have various boards and may also have done more test with
> additional PCIe devices and CONFIG_PCI_MSI enabled on 3.13 kernel.

I've been running an intel i350 dual-port NIC (igb driver) supporting
MSI on the mirabox, and it used to work in 3.10+many of the patches
coming from the Free-electrons team. Some recent changes to the PCI
code introduced a regression preventing this driver from correctly
registering an MSI interrupt, and I did not have enough time to
investigate it deep enough to fix it. That said, I know how to hack
it to work again, so if it can be of any use, I can run a test on
the mirabox (armada370) and on the XPGP board (armadaXP).

Willy

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [BUG] FL1009: xHCI host not responding to stop endpoint command.
  2014-01-23  8:24                               ` Arnaud Ebalard
  2014-01-23 11:09                                 ` Willy Tarreau
@ 2014-01-26 13:30                                 ` Thomas Petazzoni
  2014-01-27 18:36                                   ` Arnaud Ebalard
  1 sibling, 1 reply; 28+ messages in thread
From: Thomas Petazzoni @ 2014-01-26 13:30 UTC (permalink / raw)
  To: linux-arm-kernel

Dear Arnaud Ebalard,

On Thu, 23 Jan 2014 09:24:41 +0100, Arnaud Ebalard wrote:

> The various Armada-based devices I have are NAS which do not have PCIe
> slots to plug additional devices (everything is soldered). I don't know
> which device Thomas used for its tests. Just in case, I also added Willy
> in CC: who have various boards and may also have done more test with
> additional PCIe devices and CONFIG_PCI_MSI enabled on 3.13 kernel.

The device I've used to test MSI is a e1000e PCIe Intel network card.
It uses one MSI interrupt, so admittedly, the MSI testing is quite
limited for now.

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [BUG] FL1009: xHCI host not responding to stop endpoint command.
  2014-01-26 13:30                                 ` Thomas Petazzoni
@ 2014-01-27 18:36                                   ` Arnaud Ebalard
  0 siblings, 0 replies; 28+ messages in thread
From: Arnaud Ebalard @ 2014-01-27 18:36 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Thomas and Sarah,

Thomas Petazzoni <thomas.petazzoni@free-electrons.com> writes:

> On Thu, 23 Jan 2014 09:24:41 +0100, Arnaud Ebalard wrote:
>
>> The various Armada-based devices I have are NAS which do not have PCIe
>> slots to plug additional devices (everything is soldered). I don't know
>> which device Thomas used for its tests. Just in case, I also added Willy
>> in CC: who have various boards and may also have done more test with
>> additional PCIe devices and CONFIG_PCI_MSI enabled on 3.13 kernel.
>
> The device I've used to test MSI is a e1000e PCIe Intel network card.
> It uses one MSI interrupt, so admittedly, the MSI testing is quite
> limited for now.

I had a second thought this WE about a previous question from Sarah: my
platforms do not have a PCIe extension slots to test other devices but
the RN102 does have an additional device connected on the PCIe bus: a
Marvell 88SE9170 SATA Controller. I have put below the output of lspci
-vvv (on a 3.13.0-rc8 kernel w/ CONFIG_PCI_MSI enabled) in case you can
spot something obviously wrong in it:

00:01.0 PCI bridge: Marvell Technology Group Ltd. Device 7846 (prog-if 00 [Normal decode])
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx-
	Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
	I/O behind bridge: 00010000-00010fff
	Memory behind bridge: e0000000-e00fffff
	Prefetchable memory behind bridge: 00000000-000fffff
	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
	BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-

00:02.0 PCI bridge: Marvell Technology Group Ltd. Device 7846 (prog-if 00 [Normal decode])
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx-
	Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Bus: primary=00, secondary=02, subordinate=02, sec-latency=0
	I/O behind bridge: 0000f000-00000fff
	Memory behind bridge: e0100000-e01fffff
	Prefetchable memory behind bridge: 00000000-000fffff
	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
	BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-

01:00.0 SATA controller: Marvell Technology Group Ltd. Device 9170 (rev 12) (prog-if 01 [AHCI 1.0])
	Subsystem: Marvell Technology Group Ltd. Device 9170
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 105
	Region 0: I/O ports at 10010 [size=8]
	Region 1: I/O ports at 10020 [size=4]
	Region 2: I/O ports at 10018 [size=8]
	Region 3: I/O ports at 10024 [size=4]
	Region 4: I/O ports at 10000 [size=16]
	Region 5: Memory at e0010000 (32-bit, non-prefetchable) [size=512]
	Expansion ROM@e0000000 [disabled] [size=64K]
	Capabilities: [40] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold-)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit-
		Address: d0020a04  Data: 0f10
	Capabilities: [70] Express (v2) Legacy Endpoint, MSI 00
		DevCap:	MaxPayload 512 bytes, PhantFunc 0, Latency L0s <1us, L1 <8us
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
			MaxPayload 128 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <512ns, L1 <64us
			ClockPM- Surprise- LLActRep- BwNot-
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk-
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Not Supported, TimeoutDis+, LTR-, OBFF Not Supported
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
		LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
	Capabilities: [100 v1] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		AERCap:	First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
	Kernel driver in use: ahci

02:00.0 USB controller: Fresco Logic FL1009 USB 3.0 Host Controller (rev 02) (prog-if 30 [XHCI])
	Subsystem: Fresco Logic Device 0000
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 104
	Region 0: Memory at e0100000 (64-bit, non-prefetchable) [size=64K]
	Region 2: Memory at e0110000 (64-bit, non-prefetchable) [size=4K]
	Region 4: Memory@e0111000 (64-bit, non-prefetchable) [size=4K]
	Capabilities: [40] Power Management version 3
		Flags: PMEClk- DSI- D1+ D2- AuxCurrent=375mA PME(D0+,D1+,D2-,D3hot+,D3cold+)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [50] MSI: Enable- Count=1/8 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Capabilities: [70] Express (v2) Endpoint, MSI 00
		DevCap:	MaxPayload 512 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 128 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
		LnkCap:	Port #0, Speed 5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s unlimited, L1 unlimited
			ClockPM- Surprise- LLActRep- BwNot-
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk-
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Not Supported, TimeoutDis+, LTR-, OBFF Not Supported
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
		LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
	Capabilities: [b0] MSI-X: Enable+ Count=8 Masked-
		Vector table: BAR=2 offset=00000000
		PBA: BAR=4 offset=00000000
	Capabilities: [100 v1] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		AERCap:	First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
	Kernel driver in use: xhci_hcd

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [BUG] FL1009: xHCI host not responding to stop endpoint command.
  2014-01-23 11:09                                 ` Willy Tarreau
@ 2014-01-27 22:20                                   ` Arnaud Ebalard
  0 siblings, 0 replies; 28+ messages in thread
From: Arnaud Ebalard @ 2014-01-27 22:20 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Willy,

Willy Tarreau <w@1wt.eu> writes:

>> > I don't know if it should have included FL1009, it was just a guess,
>> > based on the fact that the 0x1000 and 0x1400 devices did need MSI
>> > disabled.  I can attempt to ask the Fresco Logic folks I know, but I'm
>> > not sure if/when I'll get a response back.
>> >
>> > That still doesn't necessarily rule out MSI issues in the Marvell PCI
>> > host controller code.  Can you attach another PCI device with MSI
>> > support under the host and see if it works?
>> 
>> The various Armada-based devices I have are NAS which do not have PCIe
>> slots to plug additional devices (everything is soldered). I don't know
>> which device Thomas used for its tests. Just in case, I also added Willy
>> in CC: who have various boards and may also have done more test with
>> additional PCIe devices and CONFIG_PCI_MSI enabled on 3.13 kernel.
>
> I've been running an intel i350 dual-port NIC (igb driver) supporting
> MSI on the mirabox, and it used to work in 3.10+many of the patches
> coming from the Free-electrons team. Some recent changes to the PCI
> code introduced a regression preventing this driver from correctly
> registering an MSI interrupt, and I did not have enough time to
> investigate it deep enough to fix it. That said, I know how to hack
> it to work again, so if it can be of any use, I can run a test on
> the mirabox (armada370) and on the XPGP board (armadaXP).

Thanks for the proposal, Willy. I guess Thomas can tell better than me
what kind of tests would help ruling out a problem in MSI support and
put the blame on FL chip ;-) Thomas, if you need me to test something on
some of my platforms, do not hesitate.

Cheers,

a+

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [BUG] FL1009: xHCI host not responding to stop endpoint command.
  2014-01-22 23:56                             ` Sarah Sharp
  2014-01-23  8:24                               ` Arnaud Ebalard
@ 2014-02-10 18:57                               ` Arnaud Ebalard
  2014-02-14  0:09                                 ` Sarah Sharp
  1 sibling, 1 reply; 28+ messages in thread
From: Arnaud Ebalard @ 2014-02-10 18:57 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Sarah,

Sarah Sharp <sarah.a.sharp@linux.intel.com> writes:

> On Wed, Jan 22, 2014 at 11:43:16PM +0100, Arnaud Ebalard wrote:
>> Hi Jason,
>> 
>> Jason Cooper <jason@lakedaemon.net> writes:
>> 
>> > On Wed, Jan 22, 2014 at 11:23:23PM +0100, Arnaud Ebalard wrote:
>> >> With the patch applied on top of 3.13.0 kernel recompiled w/
>> >> CONFIG_PCI_MSI enabled, I cannot reproduce the bug. I guess
>> >> you can add my:
>> >> 
>> >>  Reported-and-tested-By: Arnaud Ebalard <arno@natisbad.org>
>> >> 
>> >> Since you'll have to push the patch to -stable team at least for 3.13,
>> >> I wonder if it would not make sense to extend that at least to 3.12.
>> >> and possibly 3.10 (3.2 is still widely used but I wonder if it makes
>> >> sense to go that far).
>> >
>> > Can you pinpoint the commit that introduced the regression?
>> 
>> f5182b4155b9d686c5540a6822486400e34ddd98 "xhci: Disable MSI for some Fresco Logic hosts."
>> 
>> Technically, this is not per se the commit which introduced the
>> regression but the one that *partially* fixed it by introducing the XHCI
>> quirk to skip MSI enabling for Fresco Logic chips. The thing is it
>> should have included the FL1009 in the targets. Sarah, can you confirm
>> this?
>
> I don't know if it should have included FL1009, it was just a guess,
> based on the fact that the 0x1000 and 0x1400 devices did need MSI
> disabled.  I can attempt to ask the Fresco Logic folks I know, but I'm
> not sure if/when I'll get a response back.
>
> That still doesn't necessarily rule out MSI issues in the Marvell PCI
> host controller code.  Can you attach another PCI device with MSI
> support under the host and see if it works?

Unless you have some objections or some positive feedback from Fresco
Logic people, can you queue your quirks for FL1009 for 3.14-rc* and
-stable? Note that I am just asking, i.e. if you want to wait a bit
more, I am not that in a hurry.

Cheers,

a+

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [BUG] FL1009: xHCI host not responding to stop endpoint command.
  2014-02-10 18:57                               ` Arnaud Ebalard
@ 2014-02-14  0:09                                 ` Sarah Sharp
  2014-02-14  8:26                                   ` Thomas Petazzoni
  0 siblings, 1 reply; 28+ messages in thread
From: Sarah Sharp @ 2014-02-14  0:09 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Feb 10, 2014 at 07:57:42PM +0100, Arnaud Ebalard wrote:
> Sarah Sharp <sarah.a.sharp@linux.intel.com> writes:
> 
> > On Wed, Jan 22, 2014 at 11:43:16PM +0100, Arnaud Ebalard wrote:
> >> f5182b4155b9d686c5540a6822486400e34ddd98 "xhci: Disable MSI for some Fresco Logic hosts."
> >> 
> >> Technically, this is not per se the commit which introduced the
> >> regression but the one that *partially* fixed it by introducing the XHCI
> >> quirk to skip MSI enabling for Fresco Logic chips. The thing is it
> >> should have included the FL1009 in the targets. Sarah, can you confirm
> >> this?
> >
> > I don't know if it should have included FL1009, it was just a guess,
> > based on the fact that the 0x1000 and 0x1400 devices did need MSI
> > disabled.  I can attempt to ask the Fresco Logic folks I know, but I'm
> > not sure if/when I'll get a response back.
> >
> > That still doesn't necessarily rule out MSI issues in the Marvell PCI
> > host controller code.  Can you attach another PCI device with MSI
> > support under the host and see if it works?
> 
> Unless you have some objections or some positive feedback from Fresco
> Logic people, can you queue your quirks for FL1009 for 3.14-rc* and
> -stable? Note that I am just asking, i.e. if you want to wait a bit
> more, I am not that in a hurry.

Sorry for not getting back to you sooner.  The Fresco Logic folks said
that the FL1000 and FL1400 hosts are actually the same chipset, and it
doesn't support MSI.  However, they say the FL1009 *should* support MSI.

So that doesn't rule out issues with the Marvell PCI MSI code.  I
suspect that's actually the root cause, since I haven't gotten any bug
reports that the FL1009 doesn't work with MSI enabled on other systems.

Sarah Sharp

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [BUG] FL1009: xHCI host not responding to stop endpoint command.
  2014-02-14  0:09                                 ` Sarah Sharp
@ 2014-02-14  8:26                                   ` Thomas Petazzoni
  0 siblings, 0 replies; 28+ messages in thread
From: Thomas Petazzoni @ 2014-02-14  8:26 UTC (permalink / raw)
  To: linux-arm-kernel

Sarah, Arnaud,

On Thu, 13 Feb 2014 16:09:10 -0800, Sarah Sharp wrote:

> > Unless you have some objections or some positive feedback from Fresco
> > Logic people, can you queue your quirks for FL1009 for 3.14-rc* and
> > -stable? Note that I am just asking, i.e. if you want to wait a bit
> > more, I am not that in a hurry.
> 
> Sorry for not getting back to you sooner.  The Fresco Logic folks said
> that the FL1000 and FL1400 hosts are actually the same chipset, and it
> doesn't support MSI.  However, they say the FL1009 *should* support MSI.
> 
> So that doesn't rule out issues with the Marvell PCI MSI code.  I
> suspect that's actually the root cause, since I haven't gotten any bug
> reports that the FL1009 doesn't work with MSI enabled on other systems.

Ok, I'll try to have a look into this, by re-reading the entire
thread, and trying to propose some patches that add debugging details
in the Marvell PCI MSI code to try to understand what's going on.

Thanks!

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [BUG] FL1009: xHCI host not responding to stop endpoint command.
  2014-01-18 21:49                   ` Arnaud Ebalard
  2014-01-21 21:17                     ` Sarah Sharp
@ 2014-02-18 13:10                     ` Thomas Petazzoni
  2014-02-18 20:54                       ` Arnaud Ebalard
  1 sibling, 1 reply; 28+ messages in thread
From: Thomas Petazzoni @ 2014-02-18 13:10 UTC (permalink / raw)
  To: linux-arm-kernel

Dear Arnaud Ebalard,

On Sat, 18 Jan 2014 22:49:17 +0100, Arnaud Ebalard wrote:

> I started suspecting the introduction of MSI support in Marvell PCIe
> host controller driver (FL1009 is on the PCIe bus) and compiled a
> a 3.13.0-rc8 w/ CONFIG_PCI_MSI disabled (it was enabled in all my
> previous tests): I did not manage to reproduce the issue with this
> kernel. As a side note, commits 5b4deb6526bd, 31f614edb726 and
> 627dfcc249e2 are
> 
> ATM, I do not know if the problem is related to a bug in introduced MSI
> support or some weird incompatibility of that functionality with the
> FL1009 which would require some quirk in XHCI stack.
> 
> Thomas, I took a look at the changes but I am not familiar w/ how MSI
> work. You may have an idea on what is going on here.

I finally got some idea: your kernel 3.13-rc7 lacks a very important
fix we did in the irqchip driver MSI handling. You really need to have
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/irqchip/irq-armada-370-xp.c?id=c7f7bd4a136e4b02dd2a66bf95aec545bd93e8db
applied to get proper MSI behavior. Without this patch, there is a race
condition, and some MSI interrupts might be lost.

This commit was merged in v3.14-rc2, and backported to 3.13 and
previous stable releases.

Can you test after applying this commit?

> ps: Thomas, this is completely unrelated but the code below caught my
> eye at the beginning of a hunk in 31f614edb726. When CONFIG_PCI_MSI is
> disabled, why is irqnr now compared to 1 instead of 0?

This is not important. IRQs 0 and 1 are reserved for doorbells, which
are only used for IPI (IRQ 0) and MSI (IRQ 1). Therefore, doing
irq_find_mapping() for either IRQ 0 or IRQ 1 is not useful.

Best regards,

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [BUG] FL1009: xHCI host not responding to stop endpoint command.
  2014-02-18 13:10                     ` Thomas Petazzoni
@ 2014-02-18 20:54                       ` Arnaud Ebalard
  2014-02-18 21:24                         ` Thomas Petazzoni
  0 siblings, 1 reply; 28+ messages in thread
From: Arnaud Ebalard @ 2014-02-18 20:54 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Thomas,

Thomas Petazzoni <thomas.petazzoni@free-electrons.com> writes:

> Dear Arnaud Ebalard,
>
> On Sat, 18 Jan 2014 22:49:17 +0100, Arnaud Ebalard wrote:
>
>> I started suspecting the introduction of MSI support in Marvell PCIe
>> host controller driver (FL1009 is on the PCIe bus) and compiled a
>> a 3.13.0-rc8 w/ CONFIG_PCI_MSI disabled (it was enabled in all my
>> previous tests): I did not manage to reproduce the issue with this
>> kernel. As a side note, commits 5b4deb6526bd, 31f614edb726 and
>> 627dfcc249e2 are
>> 
>> ATM, I do not know if the problem is related to a bug in introduced MSI
>> support or some weird incompatibility of that functionality with the
>> FL1009 which would require some quirk in XHCI stack.
>> 
>> Thomas, I took a look at the changes but I am not familiar w/ how MSI
>> work. You may have an idea on what is going on here.
>
> I finally got some idea: your kernel 3.13-rc7 lacks a very important
> fix we did in the irqchip driver MSI handling. You really need to have
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/irqchip/irq-armada-370-xp.c?id=c7f7bd4a136e4b02dd2a66bf95aec545bd93e8db
> applied to get proper MSI behavior. Without this patch, there is a race
> condition, and some MSI interrupts might be lost.
>
> This commit was merged in v3.14-rc2, and backported to 3.13 and
> previous stable releases.
>
> Can you test after applying this commit?

Just to be sure, I compiled a 3.13 w/ PCI_MSI enabled and w/o the fix:
it failed as usual. Then, I just applied the fix on top of it and tested
again: I was unable to make it fail, i.e. this oneline fixes the issue.

Sarah, I guess this also validates the fact that FL1009 has good MSI
support ;-)

Thanks for the time you both spent. Let's close the case.

Cheers,

a+

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [BUG] FL1009: xHCI host not responding to stop endpoint command.
  2014-02-18 20:54                       ` Arnaud Ebalard
@ 2014-02-18 21:24                         ` Thomas Petazzoni
  0 siblings, 0 replies; 28+ messages in thread
From: Thomas Petazzoni @ 2014-02-18 21:24 UTC (permalink / raw)
  To: linux-arm-kernel

Dear Arnaud Ebalard,

On Tue, 18 Feb 2014 21:54:31 +0100, Arnaud Ebalard wrote:

> > I finally got some idea: your kernel 3.13-rc7 lacks a very important
> > fix we did in the irqchip driver MSI handling. You really need to have
> > http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/irqchip/irq-armada-370-xp.c?id=c7f7bd4a136e4b02dd2a66bf95aec545bd93e8db
> > applied to get proper MSI behavior. Without this patch, there is a race
> > condition, and some MSI interrupts might be lost.
> >
> > This commit was merged in v3.14-rc2, and backported to 3.13 and
> > previous stable releases.
> >
> > Can you test after applying this commit?
> 
> Just to be sure, I compiled a 3.13 w/ PCI_MSI enabled and w/o the fix:
> it failed as usual. Then, I just applied the fix on top of it and tested
> again: I was unable to make it fail, i.e. this oneline fixes the issue.

Cool!

> Sarah, I guess this also validates the fact that FL1009 has good MSI
> support ;-)
> 
> Thanks for the time you both spent. Let's close the case.

You're welcome. Sorry for having realized so late from where the
problem could be coming from, and that it was in fact already fixed.

Thanks to you for reporting and investigating the issue in the first
place!

Best regards,

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2014-02-18 21:24 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-01-12 20:13 [BUG] FL1009: xHCI host not responding to stop endpoint command Arnaud Ebalard
2014-01-12 21:36 ` Arnaud Ebalard
2014-01-14 17:07   ` Sarah Sharp
2014-01-14 18:11     ` Bjørn Mork
2014-01-14 21:54     ` Arnaud Ebalard
2014-01-15  9:59       ` David Laight
2014-01-15 19:04         ` Arnaud Ebalard
2014-01-16 18:50           ` Sarah Sharp
2014-01-17  6:25             ` Arnaud Ebalard
2014-01-17  8:31               ` Bjørn Mork
2014-01-17 20:54                 ` Sarah Sharp
2014-01-18 21:49                   ` Arnaud Ebalard
2014-01-21 21:17                     ` Sarah Sharp
2014-01-22 22:23                       ` Arnaud Ebalard
2014-01-22 22:26                         ` Jason Cooper
2014-01-22 22:43                           ` Arnaud Ebalard
2014-01-22 23:56                             ` Sarah Sharp
2014-01-23  8:24                               ` Arnaud Ebalard
2014-01-23 11:09                                 ` Willy Tarreau
2014-01-27 22:20                                   ` Arnaud Ebalard
2014-01-26 13:30                                 ` Thomas Petazzoni
2014-01-27 18:36                                   ` Arnaud Ebalard
2014-02-10 18:57                               ` Arnaud Ebalard
2014-02-14  0:09                                 ` Sarah Sharp
2014-02-14  8:26                                   ` Thomas Petazzoni
2014-02-18 13:10                     ` Thomas Petazzoni
2014-02-18 20:54                       ` Arnaud Ebalard
2014-02-18 21:24                         ` Thomas Petazzoni

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.