linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/1] USB EHCI: repeated resets on full and low speed devices
@ 2020-08-31 16:08 Khalid Aziz
  2020-08-31 16:08 ` [RFC PATCH 1/1] usb: ehci: Remove erroneous return of EPROTO upon detection of stall Khalid Aziz
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Khalid Aziz @ 2020-08-31 16:08 UTC (permalink / raw)
  To: stern, gregkh, erkka.talvitie; +Cc: Khalid Aziz, inux-usb, linux-kernel

I recently replaced the motherboard on my desktop with an MSI B450-A
Pro Max motherboard. Since then my keybaords, mouse and tablet have
become very unreliable. I see messages like this over and over in
dmesg:

ug 23 00:01:49 rhapsody kernel: [198769.314732] usb 1-2.4: reset full-speed USB
 device number 27 using ehci-pci
Aug 23 00:01:49 rhapsody kernel: [198769.562234] usb 1-2.1: reset full-speed USB
 device number 28 using ehci-pci
Aug 23 00:01:52 rhapsody kernel: [198772.570704] usb 1-2.1: reset full-speed USB
 device number 28 using ehci-pci
Aug 23 00:02:02 rhapsody kernel: [198782.526669] usb 1-2.4: reset full-speed USB
 device number 27 using ehci-pci
Aug 23 00:02:03 rhapsody kernel: [198782.714660] usb 1-2.1: reset full-speed USB
 device number 28 using ehci-pci
Aug 23 00:02:04 rhapsody kernel: [198784.210171] usb 1-2.3: reset low-speed USB device number 26 using ehci-pci
Aug 23 00:02:06 rhapsody kernel: [198786.110181] usb 1-2.4: reset full-speed USB device number 27 using ehci-pci
Aug 23 00:02:08 rhapsody kernel: [198787.726158] usb 1-2.4: reset full-speed USB device number 27 using ehci-pci
Aug 23 00:02:10 rhapsody kernel: [198790.126628] usb 1-2.1: reset full-speed USB device number 28 using ehci-pci
Aug 23 00:02:10 rhapsody kernel: [198790.314141] usb 1-2.4: reset full-speed USB device number 27 using ehci-pci
Aug 23 00:02:12 rhapsody kernel: [198792.518765] usb 1-2.4: reset full-speed USB device number 27 using ehci-pci

The devices I am using are:

- Logitech K360 wireless keyboard
- Wired Lenovo USB keyboard
- Wired Lenovo USB mouse
- Wired Wacom Intuos tablet

After a reset, the wireless keyboard simply stops working. Rest of
the devices keep seeing intermittent failure.

I tried various combinations of hubs and USB controllers to see what
works. MSI B450-A motherboard has USB 3.0 and USB 3.1 controllers. I
added a USB 2.0 PCI card as well for this test:

03:00.0 USB controller: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset USB 3.1 XHCI Controller (rev 01)
29:01.0 USB controller: NEC Corporation OHCI USB Controller (rev 43)
29:01.1 USB controller: NEC Corporation OHCI USB Controller (rev 43)
29:01.2 USB controller: NEC Corporation uPD72010x USB 2.0 Controller (rev 04)
2c:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller

I have a bus powered USB 3.0 hub, a bus powered USB 2.0 hub and a
self powered USB 2.0 hub built into my monitor.

I have connected my devices directly into the ports on motherboard
and PCI card as well as into external hub. Here are the results I
saw when devices wee plugged into various combination of ports:

1. USB 3.0/3.1 controller - does NOT work
2. USB 2.0 controller - WORKS
3. USB 3.0/3.1 controller -> Self powered USB 2.0 hub in monitor - does
   NOT work
4. USB 3.0/3.1 controller -> bus powered USB 3.0 hub - does NOT work
5. USB 3.0/3.1 controller -> Bus powered USB 2.0 hub - WORKS
7. USB 2.0 controller -> Bus powered USB 3.0 hub - does NOT work
8. USB 2.0 controller -> Bus powered 2.0 hub - Does not work

I narrowed the failure down to following lines (this code was added
in 5.5 with commit 64cc3f12d1c7 "USB: EHCI: Do not return -EPIPE
when hub is disconnected"):

drivers/usb/host/ehci-q.c:

 217                 } else if ((token & QTD_STS_MMF) &&
 218                                         (QTD_PID(token) == PID_CODE_IN)) {
 219                         status = -EPROTO;
 220                 /* CERR nonzero + halt --> stall */

At the time of failure, when we reach this conditional, token is
either 0x80408d46 or 0x408d46 which means following bits are set:

QTD_STS_STS, QTD_STS_MMF, QTD_STS_HALT, QTD_IOC, QTD_TOGGLE

and 

        QTD_PID = 1
        QTD_CERR = 3
        QTD_LENGTH = 0x40 (64)

This causes  the branch "(token & QTD_STS_MMF) && (QTD_PID(token) ==
PID_CODE_IN" to be taken and qtd_copy_status() returns EPROTO. This
return value in qh_completions() results in ehci_clear_tt_buffer()
being called:

drivers/usb/host/ehci-q.c:
 472                         /* As part of low/full-speed endpoint-halt processi     ng
 473                          * we must clear the TT buffer (11.17.5).
 474                          */
 475                         if (unlikely(last_status != -EINPROGRESS &&
 476                                         last_status != -EREMOTEIO)) {
 477                                 /* The TT's in some hubs malfunction when t     hey
 478                                  * receive this request following a STALL (     they
 479                                  * stop sending isochronous packets).  Sinc     e a
 480                                  * STALL can't leave the TT buffer in a bus     y
 481                                  * state (if you believe Figures 11-48 - 11     -51
 482                                  * in the USB 2.0 spec), we won't clear the      TT
 483                                  * buffer in this case.  Strictly speaking      this
 484                                  * is a violation of the spec.
 485                                  */
 486                                 if (last_status != -EPIPE)
 487                                         ehci_clear_tt_buffer(ehci, qh, urb,
 488                                                         token);
 489                         }

It seems like clearing TT buffers in this case is resulting in hub
hanging. A USB reset gets it going again until we repeat the cycle
over again. The comment in this code says "The TT's in some hubs
malfunction when they receive this request following a STALL (they
stop sending isochronous packets)". That may be what is happening.

Removing the code that returns EPROTO for such case solves the
problem on my machine (as in the RFC patch) but that probably is not
the right solution. I do not understand USB protocol well enough to
propose a better solution. Does anyone have a better idea?


Khalid Aziz (1):
  usb: ehci: Remove erroneous return of EPROTO upon detection of stall 

 drivers/usb/host/ehci-q.c | 4 ----
 1 file changed, 4 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC PATCH 1/1] usb: ehci: Remove erroneous return of EPROTO upon detection of stall
  2020-08-31 16:08 [RFC PATCH 0/1] USB EHCI: repeated resets on full and low speed devices Khalid Aziz
@ 2020-08-31 16:08 ` Khalid Aziz
  2020-08-31 16:23   ` [RFC RESEND " Khalid Aziz
  2020-09-04 15:19   ` [RFC " Greg KH
  2020-08-31 16:23 ` [RFC RESEND PATCH 0/1] USB EHCI: repeated resets on full and low speed devices Khalid Aziz
  2020-09-01  2:31 ` Alan Stern
  2 siblings, 2 replies; 14+ messages in thread
From: Khalid Aziz @ 2020-08-31 16:08 UTC (permalink / raw)
  To: stern, gregkh, erkka.talvitie
  Cc: Khalid Aziz, inux-usb, linux-kernel, Khalid Aziz

With the USB 3.0/3.1 controller on MSI B450-A Pro Max motherboard,
full speed and low speed devices see constant resets making
keyboards and mouse unreliable and unusable. These resets are caused
by detection of stall in qtd_copy_status() and returning EPROTO
which in turn results in TT buffers in hub being cleared. Hubs do
not seem to repsond well to this and seem to hang which causes
further USB transactions to time out. A reset finally clears the
issue until we repeat the cycle all over again.

Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com>
Cc: Khalid Aziz <khalid@gonehiking.org>
---
 drivers/usb/host/ehci-q.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/drivers/usb/host/ehci-q.c b/drivers/usb/host/ehci-q.c
index 8a5c9b3ebe1e..7d4b2bc4633c 100644
--- a/drivers/usb/host/ehci-q.c
+++ b/drivers/usb/host/ehci-q.c
@@ -214,10 +214,6 @@ static int qtd_copy_status (
 		 * When MMF is active and PID Code is IN, queue is halted.
 		 * EHCI Specification, Table 4-13.
 		 */
-		} else if ((token & QTD_STS_MMF) &&
-					(QTD_PID(token) == PID_CODE_IN)) {
-			status = -EPROTO;
-		/* CERR nonzero + halt --> stall */
 		} else if (QTD_CERR(token)) {
 			status = -EPIPE;
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC RESEND PATCH 0/1] USB EHCI: repeated resets on full and low speed devices
  2020-08-31 16:08 [RFC PATCH 0/1] USB EHCI: repeated resets on full and low speed devices Khalid Aziz
  2020-08-31 16:08 ` [RFC PATCH 1/1] usb: ehci: Remove erroneous return of EPROTO upon detection of stall Khalid Aziz
@ 2020-08-31 16:23 ` Khalid Aziz
  2020-09-01  2:31 ` Alan Stern
  2 siblings, 0 replies; 14+ messages in thread
From: Khalid Aziz @ 2020-08-31 16:23 UTC (permalink / raw)
  To: stern, gregkh, erkka.talvitie; +Cc: Khalid Aziz, linux-usb, linux-kernel

[Resending since I screwed up linux-usb mailing list address in
cut-n-paste in original email]


I recently replaced the motherboard on my desktop with an MSI B450-A
Pro Max motherboard. Since then my keybaords, mouse and tablet have
become very unreliable. I see messages like this over and over in
dmesg:

ug 23 00:01:49 rhapsody kernel: [198769.314732] usb 1-2.4: reset full-speed USB
 device number 27 using ehci-pci
Aug 23 00:01:49 rhapsody kernel: [198769.562234] usb 1-2.1: reset full-speed USB
 device number 28 using ehci-pci
Aug 23 00:01:52 rhapsody kernel: [198772.570704] usb 1-2.1: reset full-speed USB
 device number 28 using ehci-pci
Aug 23 00:02:02 rhapsody kernel: [198782.526669] usb 1-2.4: reset full-speed USB
 device number 27 using ehci-pci
Aug 23 00:02:03 rhapsody kernel: [198782.714660] usb 1-2.1: reset full-speed USB
 device number 28 using ehci-pci
Aug 23 00:02:04 rhapsody kernel: [198784.210171] usb 1-2.3: reset low-speed USB device number 26 using ehci-pci
Aug 23 00:02:06 rhapsody kernel: [198786.110181] usb 1-2.4: reset full-speed USB device number 27 using ehci-pci
Aug 23 00:02:08 rhapsody kernel: [198787.726158] usb 1-2.4: reset full-speed USB device number 27 using ehci-pci
Aug 23 00:02:10 rhapsody kernel: [198790.126628] usb 1-2.1: reset full-speed USB device number 28 using ehci-pci
Aug 23 00:02:10 rhapsody kernel: [198790.314141] usb 1-2.4: reset full-speed USB device number 27 using ehci-pci
Aug 23 00:02:12 rhapsody kernel: [198792.518765] usb 1-2.4: reset full-speed USB device number 27 using ehci-pci

The devices I am using are:

- Logitech K360 wireless keyboard
- Wired Lenovo USB keyboard
- Wired Lenovo USB mouse
- Wired Wacom Intuos tablet

After a reset, the wireless keyboard simply stops working. Rest of
the devices keep seeing intermittent failure.

I tried various combinations of hubs and USB controllers to see what
works. MSI B450-A motherboard has USB 3.0 and USB 3.1 controllers. I
added a USB 2.0 PCI card as well for this test:

03:00.0 USB controller: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset USB 3.1 XHCI Controller (rev 01)
29:01.0 USB controller: NEC Corporation OHCI USB Controller (rev 43)
29:01.1 USB controller: NEC Corporation OHCI USB Controller (rev 43)
29:01.2 USB controller: NEC Corporation uPD72010x USB 2.0 Controller (rev 04)
2c:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller

I have a bus powered USB 3.0 hub, a bus powered USB 2.0 hub and a
self powered USB 2.0 hub built into my monitor.

I have connected my devices directly into the ports on motherboard
and PCI card as well as into external hub. Here are the results I
saw when devices wee plugged into various combination of ports:

1. USB 3.0/3.1 controller - does NOT work
2. USB 2.0 controller - WORKS
3. USB 3.0/3.1 controller -> Self powered USB 2.0 hub in monitor - does
   NOT work
4. USB 3.0/3.1 controller -> bus powered USB 3.0 hub - does NOT work
5. USB 3.0/3.1 controller -> Bus powered USB 2.0 hub - WORKS
7. USB 2.0 controller -> Bus powered USB 3.0 hub - does NOT work
8. USB 2.0 controller -> Bus powered 2.0 hub - Does not work

I narrowed the failure down to following lines (this code was added
in 5.5 with commit 64cc3f12d1c7 "USB: EHCI: Do not return -EPIPE
when hub is disconnected"):

drivers/usb/host/ehci-q.c:

 217                 } else if ((token & QTD_STS_MMF) &&
 218                                         (QTD_PID(token) == PID_CODE_IN)) {
 219                         status = -EPROTO;
 220                 /* CERR nonzero + halt --> stall */

At the time of failure, when we reach this conditional, token is
either 0x80408d46 or 0x408d46 which means following bits are set:

QTD_STS_STS, QTD_STS_MMF, QTD_STS_HALT, QTD_IOC, QTD_TOGGLE

and 

        QTD_PID = 1
        QTD_CERR = 3
        QTD_LENGTH = 0x40 (64)

This causes  the branch "(token & QTD_STS_MMF) && (QTD_PID(token) ==
PID_CODE_IN" to be taken and qtd_copy_status() returns EPROTO. This
return value in qh_completions() results in ehci_clear_tt_buffer()
being called:

drivers/usb/host/ehci-q.c:
 472                         /* As part of low/full-speed endpoint-halt processi     ng
 473                          * we must clear the TT buffer (11.17.5).
 474                          */
 475                         if (unlikely(last_status != -EINPROGRESS &&
 476                                         last_status != -EREMOTEIO)) {
 477                                 /* The TT's in some hubs malfunction when t     hey
 478                                  * receive this request following a STALL (     they
 479                                  * stop sending isochronous packets).  Sinc     e a
 480                                  * STALL can't leave the TT buffer in a bus     y
 481                                  * state (if you believe Figures 11-48 - 11     -51
 482                                  * in the USB 2.0 spec), we won't clear the      TT
 483                                  * buffer in this case.  Strictly speaking      this
 484                                  * is a violation of the spec.
 485                                  */
 486                                 if (last_status != -EPIPE)
 487                                         ehci_clear_tt_buffer(ehci, qh, urb,
 488                                                         token);
 489                         }

It seems like clearing TT buffers in this case is resulting in hub
hanging. A USB reset gets it going again until we repeat the cycle
over again. The comment in this code says "The TT's in some hubs
malfunction when they receive this request following a STALL (they
stop sending isochronous packets)". That may be what is happening.

Removing the code that returns EPROTO for such case solves the
problem on my machine (as in the RFC patch) but that probably is not
the right solution. I do not understand USB protocol well enough to
propose a better solution. Does anyone have a better idea?


Khalid Aziz (1):
  usb: ehci: Remove erroneous return of EPROTO upon detection of stall 

 drivers/usb/host/ehci-q.c | 4 ----
 1 file changed, 4 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC RESEND PATCH 1/1] usb: ehci: Remove erroneous return of EPROTO upon detection of stall
  2020-08-31 16:08 ` [RFC PATCH 1/1] usb: ehci: Remove erroneous return of EPROTO upon detection of stall Khalid Aziz
@ 2020-08-31 16:23   ` Khalid Aziz
  2020-09-04 15:19   ` [RFC " Greg KH
  1 sibling, 0 replies; 14+ messages in thread
From: Khalid Aziz @ 2020-08-31 16:23 UTC (permalink / raw)
  To: stern, gregkh, erkka.talvitie
  Cc: Khalid Aziz, linux-usb, linux-kernel, Khalid Aziz

With the USB 3.0/3.1 controller on MSI B450-A Pro Max motherboard,
full speed and low speed devices see constant resets making
keyboards and mouse unreliable and unusable. These resets are caused
by detection of stall in qtd_copy_status() and returning EPROTO
which in turn results in TT buffers in hub being cleared. Hubs do
not seem to repsond well to this and seem to hang which causes
further USB transactions to time out. A reset finally clears the
issue until we repeat the cycle all over again.

Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com>
Cc: Khalid Aziz <khalid@gonehiking.org>
---
 drivers/usb/host/ehci-q.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/drivers/usb/host/ehci-q.c b/drivers/usb/host/ehci-q.c
index 8a5c9b3ebe1e..7d4b2bc4633c 100644
--- a/drivers/usb/host/ehci-q.c
+++ b/drivers/usb/host/ehci-q.c
@@ -214,10 +214,6 @@ static int qtd_copy_status (
 		 * When MMF is active and PID Code is IN, queue is halted.
 		 * EHCI Specification, Table 4-13.
 		 */
-		} else if ((token & QTD_STS_MMF) &&
-					(QTD_PID(token) == PID_CODE_IN)) {
-			status = -EPROTO;
-		/* CERR nonzero + halt --> stall */
 		} else if (QTD_CERR(token)) {
 			status = -EPIPE;
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [RFC RESEND PATCH 0/1] USB EHCI: repeated resets on full and low speed devices
  2020-08-31 16:08 [RFC PATCH 0/1] USB EHCI: repeated resets on full and low speed devices Khalid Aziz
  2020-08-31 16:08 ` [RFC PATCH 1/1] usb: ehci: Remove erroneous return of EPROTO upon detection of stall Khalid Aziz
  2020-08-31 16:23 ` [RFC RESEND PATCH 0/1] USB EHCI: repeated resets on full and low speed devices Khalid Aziz
@ 2020-09-01  2:31 ` Alan Stern
  2020-09-01 15:51   ` Khalid Aziz
       [not found]   ` <608418fa-b0ce-c2a4-ad79-fe505c842587@oracle.com>
  2 siblings, 2 replies; 14+ messages in thread
From: Alan Stern @ 2020-09-01  2:31 UTC (permalink / raw)
  To: Khalid Aziz; +Cc: gregkh, erkka.talvitie, linux-usb, linux-kernel

On Mon, Aug 31, 2020 at 10:23:30AM -0600, Khalid Aziz wrote:
> [Resending since I screwed up linux-usb mailing list address in
> cut-n-paste in original email]
> 
> 
> I recently replaced the motherboard on my desktop with an MSI B450-A
> Pro Max motherboard. Since then my keybaords, mouse and tablet have
> become very unreliable. I see messages like this over and over in
> dmesg:
> 
> ug 23 00:01:49 rhapsody kernel: [198769.314732] usb 1-2.4: reset full-speed USB
>  device number 27 using ehci-pci
> Aug 23 00:01:49 rhapsody kernel: [198769.562234] usb 1-2.1: reset full-speed USB
>  device number 28 using ehci-pci
> Aug 23 00:01:52 rhapsody kernel: [198772.570704] usb 1-2.1: reset full-speed USB
>  device number 28 using ehci-pci
> Aug 23 00:02:02 rhapsody kernel: [198782.526669] usb 1-2.4: reset full-speed USB
>  device number 27 using ehci-pci
> Aug 23 00:02:03 rhapsody kernel: [198782.714660] usb 1-2.1: reset full-speed USB
>  device number 28 using ehci-pci
> Aug 23 00:02:04 rhapsody kernel: [198784.210171] usb 1-2.3: reset low-speed USB device number 26 using ehci-pci
> Aug 23 00:02:06 rhapsody kernel: [198786.110181] usb 1-2.4: reset full-speed USB device number 27 using ehci-pci
> Aug 23 00:02:08 rhapsody kernel: [198787.726158] usb 1-2.4: reset full-speed USB device number 27 using ehci-pci
> Aug 23 00:02:10 rhapsody kernel: [198790.126628] usb 1-2.1: reset full-speed USB device number 28 using ehci-pci
> Aug 23 00:02:10 rhapsody kernel: [198790.314141] usb 1-2.4: reset full-speed USB device number 27 using ehci-pci
> Aug 23 00:02:12 rhapsody kernel: [198792.518765] usb 1-2.4: reset full-speed USB device number 27 using ehci-pci
> 
> The devices I am using are:
> 
> - Logitech K360 wireless keyboard
> - Wired Lenovo USB keyboard
> - Wired Lenovo USB mouse
> - Wired Wacom Intuos tablet
> 
> After a reset, the wireless keyboard simply stops working. Rest of
> the devices keep seeing intermittent failure.
> 
> I tried various combinations of hubs and USB controllers to see what
> works. MSI B450-A motherboard has USB 3.0 and USB 3.1 controllers. I
> added a USB 2.0 PCI card as well for this test:
> 
> 03:00.0 USB controller: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset USB 3.1 XHCI Controller (rev 01)
> 29:01.0 USB controller: NEC Corporation OHCI USB Controller (rev 43)
> 29:01.1 USB controller: NEC Corporation OHCI USB Controller (rev 43)
> 29:01.2 USB controller: NEC Corporation uPD72010x USB 2.0 Controller (rev 04)
> 2c:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller
> 
> I have a bus powered USB 3.0 hub, a bus powered USB 2.0 hub and a
> self powered USB 2.0 hub built into my monitor.
> 
> I have connected my devices directly into the ports on motherboard
> and PCI card as well as into external hub. Here are the results I
> saw when devices wee plugged into various combination of ports:
> 
> 1. USB 3.0/3.1 controller - does NOT work
> 2. USB 2.0 controller - WORKS
> 3. USB 3.0/3.1 controller -> Self powered USB 2.0 hub in monitor - does
>    NOT work
> 4. USB 3.0/3.1 controller -> bus powered USB 3.0 hub - does NOT work
> 5. USB 3.0/3.1 controller -> Bus powered USB 2.0 hub - WORKS
> 7. USB 2.0 controller -> Bus powered USB 3.0 hub - does NOT work
> 8. USB 2.0 controller -> Bus powered 2.0 hub - Does not work

The error messages in your log extract all refer to ehci-pci, which is 
the driver for a USB-2 controller.  They are completely unrelated to any 
problems you may be having with USB-3 controllers.

> I narrowed the failure down to following lines (this code was added
> in 5.5 with commit 64cc3f12d1c7 "USB: EHCI: Do not return -EPIPE
> when hub is disconnected"):
> 
> drivers/usb/host/ehci-q.c:
> 
>  217                 } else if ((token & QTD_STS_MMF) &&
>  218                                         (QTD_PID(token) == PID_CODE_IN)) {
>  219                         status = -EPROTO;
>  220                 /* CERR nonzero + halt --> stall */
> 
> At the time of failure, when we reach this conditional, token is
> either 0x80408d46 or 0x408d46 which means following bits are set:
> 
> QTD_STS_STS, QTD_STS_MMF, QTD_STS_HALT, QTD_IOC, QTD_TOGGLE
> 
> and 
> 
>         QTD_PID = 1
>         QTD_CERR = 3
>         QTD_LENGTH = 0x40 (64)
> 
> This causes  the branch "(token & QTD_STS_MMF) && (QTD_PID(token) ==
> PID_CODE_IN" to be taken and qtd_copy_status() returns EPROTO. This
> return value in qh_completions() results in ehci_clear_tt_buffer()
> being called:
> 
> drivers/usb/host/ehci-q.c:
>  472                         /* As part of low/full-speed endpoint-halt processi     ng
>  473                          * we must clear the TT buffer (11.17.5).
>  474                          */
>  475                         if (unlikely(last_status != -EINPROGRESS &&
>  476                                         last_status != -EREMOTEIO)) {
>  477                                 /* The TT's in some hubs malfunction when t     hey
>  478                                  * receive this request following a STALL (     they
>  479                                  * stop sending isochronous packets).  Sinc     e a
>  480                                  * STALL can't leave the TT buffer in a bus     y
>  481                                  * state (if you believe Figures 11-48 - 11     -51
>  482                                  * in the USB 2.0 spec), we won't clear the      TT
>  483                                  * buffer in this case.  Strictly speaking      this
>  484                                  * is a violation of the spec.
>  485                                  */
>  486                                 if (last_status != -EPIPE)
>  487                                         ehci_clear_tt_buffer(ehci, qh, urb,
>  488                                                         token);
>  489                         }
> 
> It seems like clearing TT buffers in this case is resulting in hub
> hanging. A USB reset gets it going again until we repeat the cycle
> over again. The comment in this code says "The TT's in some hubs
> malfunction when they receive this request following a STALL (they
> stop sending isochronous packets)". That may be what is happening.

What makes you think that?  Do you have any evidence that the hub is 
receiving a STALL?  Indeed, the commit you referenced above specifically 
mentions that when MMF is set and the PID code is IN then it is not a 
STALL.

> Removing the code that returns EPROTO for such case solves the
> problem on my machine (as in the RFC patch)

It certainly can't solve the problem for any USB-3 connections, because 
the patch doesn't touch any of the USB-3 driver code.

>  but that probably is not
> the right solution. I do not understand USB protocol well enough to
> propose a better solution. Does anyone have a better idea?

Can you collect a usbmon trace showing an example of this problem?

One possibility is to introduce a special quirk for the NEC uPD72010x 
EHCI controller.  But we should hold off on that until we know exactly 
what is happening.

Alan Stern

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC RESEND PATCH 0/1] USB EHCI: repeated resets on full and low speed devices
  2020-09-01  2:31 ` Alan Stern
@ 2020-09-01 15:51   ` Khalid Aziz
  2020-09-01 16:18     ` Alan Stern
       [not found]   ` <608418fa-b0ce-c2a4-ad79-fe505c842587@oracle.com>
  1 sibling, 1 reply; 14+ messages in thread
From: Khalid Aziz @ 2020-09-01 15:51 UTC (permalink / raw)
  To: Alan Stern; +Cc: gregkh, erkka.talvitie, linux-usb, linux-kernel

On 8/31/20 8:31 PM, Alan Stern wrote:
> On Mon, Aug 31, 2020 at 10:23:30AM -0600, Khalid Aziz wrote:
>> [Resending since I screwed up linux-usb mailing list address in
>> cut-n-paste in original email]
>>
>>
>> I recently replaced the motherboard on my desktop with an MSI B450-A
>> Pro Max motherboard. Since then my keybaords, mouse and tablet have
>> become very unreliable. I see messages like this over and over in
>> dmesg:
>>
>> ug 23 00:01:49 rhapsody kernel: [198769.314732] usb 1-2.4: reset full-speed USB
>>  device number 27 using ehci-pci
>> Aug 23 00:01:49 rhapsody kernel: [198769.562234] usb 1-2.1: reset full-speed USB
>>  device number 28 using ehci-pci
>> Aug 23 00:01:52 rhapsody kernel: [198772.570704] usb 1-2.1: reset full-speed USB
>>  device number 28 using ehci-pci
>> Aug 23 00:02:02 rhapsody kernel: [198782.526669] usb 1-2.4: reset full-speed USB
>>  device number 27 using ehci-pci
>> Aug 23 00:02:03 rhapsody kernel: [198782.714660] usb 1-2.1: reset full-speed USB
>>  device number 28 using ehci-pci
>> Aug 23 00:02:04 rhapsody kernel: [198784.210171] usb 1-2.3: reset low-speed USB device number 26 using ehci-pci
>> Aug 23 00:02:06 rhapsody kernel: [198786.110181] usb 1-2.4: reset full-speed USB device number 27 using ehci-pci
>> Aug 23 00:02:08 rhapsody kernel: [198787.726158] usb 1-2.4: reset full-speed USB device number 27 using ehci-pci
>> Aug 23 00:02:10 rhapsody kernel: [198790.126628] usb 1-2.1: reset full-speed USB device number 28 using ehci-pci
>> Aug 23 00:02:10 rhapsody kernel: [198790.314141] usb 1-2.4: reset full-speed USB device number 27 using ehci-pci
>> Aug 23 00:02:12 rhapsody kernel: [198792.518765] usb 1-2.4: reset full-speed USB device number 27 using ehci-pci
>>
>> The devices I am using are:
>>
>> - Logitech K360 wireless keyboard
>> - Wired Lenovo USB keyboard
>> - Wired Lenovo USB mouse
>> - Wired Wacom Intuos tablet
>>
>> After a reset, the wireless keyboard simply stops working. Rest of
>> the devices keep seeing intermittent failure.
>>
>> I tried various combinations of hubs and USB controllers to see what
>> works. MSI B450-A motherboard has USB 3.0 and USB 3.1 controllers. I
>> added a USB 2.0 PCI card as well for this test:
>>
>> 03:00.0 USB controller: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset USB 3.1 XHCI Controller (rev 01)
>> 29:01.0 USB controller: NEC Corporation OHCI USB Controller (rev 43)
>> 29:01.1 USB controller: NEC Corporation OHCI USB Controller (rev 43)
>> 29:01.2 USB controller: NEC Corporation uPD72010x USB 2.0 Controller (rev 04)
>> 2c:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller
>>
>> I have a bus powered USB 3.0 hub, a bus powered USB 2.0 hub and a
>> self powered USB 2.0 hub built into my monitor.
>>
>> I have connected my devices directly into the ports on motherboard
>> and PCI card as well as into external hub. Here are the results I
>> saw when devices wee plugged into various combination of ports:
>>
>> 1. USB 3.0/3.1 controller - does NOT work
>> 2. USB 2.0 controller - WORKS
>> 3. USB 3.0/3.1 controller -> Self powered USB 2.0 hub in monitor - does
>>    NOT work
>> 4. USB 3.0/3.1 controller -> bus powered USB 3.0 hub - does NOT work
>> 5. USB 3.0/3.1 controller -> Bus powered USB 2.0 hub - WORKS
>> 7. USB 2.0 controller -> Bus powered USB 3.0 hub - does NOT work
>> 8. USB 2.0 controller -> Bus powered 2.0 hub - Does not work
> 
> The error messages in your log extract all refer to ehci-pci, which is 
> the driver for a USB-2 controller.  They are completely unrelated to any 
> problems you may be having with USB-3 controllers.

I just happened to cut and paste the messages from when I was testing
with the USB 2.0 controller. Here are the messages when I ran the test
with USB 3.0 controller:

Aug 13 14:25:48 rhapsody kernel: [78779.868354] usb 1-9.4: reset
full-speed USB  device number 38 using xhci_hcd
Aug 13 14:26:18 rhapsody kernel: [78809.939457] usb 1-9.4: reset
full-speed USB  device number 38 using xhci_hcd
Aug 13 14:26:39 rhapsody kernel: [78830.899982] usb 1-9.4: reset
full-speed USB  device number 38 using xhci_hcd
Aug 13 14:26:39 rhapsody kernel: [78831.379883] usb 1-9.2: reset
low-speed USB device number 36 using xhci_hcd
Aug 13 14:26:40 rhapsody kernel: [78832.043900] usb 1-9.3: reset
low-speed USB device number 37 using xhci_hcd
Aug 13 14:26:47 rhapsody kernel: [78839.520211] usb 1-9.4: reset
full-speed USB device number 38 using xhci_hcd
Aug 13 14:26:49 rhapsody kernel: [78841.035843] usb 1-9.2: reset
low-speed USB device number 36 using xhci_hcd
Aug 13 14:26:50 rhapsody kernel: [78841.695837] usb 1-9.3: reset
low-speed USB device number 37 using xhci_hcd
Aug 13 14:27:57 rhapsody kernel: [78909.299772] usb 1-9.4: reset
full-speed USB device number 38 using xhci_hcd
Aug 13 14:27:58 rhapsody kernel: [78909.779179] usb 1-9.2: reset
low-speed USB device number 36 using xhci_hcd
Aug 13 14:28:05 rhapsody kernel: [78916.650851] usb 1-9.4: reset
full-speed USB device number 38 using xhci_hcd
Aug 13 14:32:02 rhapsody kernel: [79153.986777] usb 1-9.4: reset
full-speed USB device number 38 using xhci_hcd
Aug 13 14:32:22 rhapsody kernel: [79173.898757] usb 1-9.4: reset
full-speed USB device number 38 using xhci_hcd
Aug 13 14:32:23 rhapsody kernel: [79175.174206] usb 1-9.3: reset
low-speed USB device number 37 using xhci_hcd
Aug 13 14:32:24 rhapsody kernel: [79175.833619] usb 1-9.2: reset
low-speed USB device number 36 using xhci_hcd
Aug 13 14:34:23 rhapsody kernel: [79295.230293] usb 1-9.4: reset
full-speed USB device number 38 using xhci_hcd

> 
>> I narrowed the failure down to following lines (this code was added
>> in 5.5 with commit 64cc3f12d1c7 "USB: EHCI: Do not return -EPIPE
>> when hub is disconnected"):
>>
>> drivers/usb/host/ehci-q.c:
>>
>>  217                 } else if ((token & QTD_STS_MMF) &&
>>  218                                         (QTD_PID(token) == PID_CODE_IN)) {
>>  219                         status = -EPROTO;
>>  220                 /* CERR nonzero + halt --> stall */
>>
>> At the time of failure, when we reach this conditional, token is
>> either 0x80408d46 or 0x408d46 which means following bits are set:
>>
>> QTD_STS_STS, QTD_STS_MMF, QTD_STS_HALT, QTD_IOC, QTD_TOGGLE
>>
>> and 
>>
>>         QTD_PID = 1
>>         QTD_CERR = 3
>>         QTD_LENGTH = 0x40 (64)
>>
>> This causes  the branch "(token & QTD_STS_MMF) && (QTD_PID(token) ==
>> PID_CODE_IN" to be taken and qtd_copy_status() returns EPROTO. This
>> return value in qh_completions() results in ehci_clear_tt_buffer()
>> being called:
>>
>> drivers/usb/host/ehci-q.c:
>>  472                         /* As part of low/full-speed endpoint-halt processi     ng
>>  473                          * we must clear the TT buffer (11.17.5).
>>  474                          */
>>  475                         if (unlikely(last_status != -EINPROGRESS &&
>>  476                                         last_status != -EREMOTEIO)) {
>>  477                                 /* The TT's in some hubs malfunction when t     hey
>>  478                                  * receive this request following a STALL (     they
>>  479                                  * stop sending isochronous packets).  Sinc     e a
>>  480                                  * STALL can't leave the TT buffer in a bus     y
>>  481                                  * state (if you believe Figures 11-48 - 11     -51
>>  482                                  * in the USB 2.0 spec), we won't clear the      TT
>>  483                                  * buffer in this case.  Strictly speaking      this
>>  484                                  * is a violation of the spec.
>>  485                                  */
>>  486                                 if (last_status != -EPIPE)
>>  487                                         ehci_clear_tt_buffer(ehci, qh, urb,
>>  488                                                         token);
>>  489                         }
>>
>> It seems like clearing TT buffers in this case is resulting in hub
>> hanging. A USB reset gets it going again until we repeat the cycle
>> over again. The comment in this code says "The TT's in some hubs
>> malfunction when they receive this request following a STALL (they
>> stop sending isochronous packets)". That may be what is happening.
> 
> What makes you think that?  Do you have any evidence that the hub is 
> receiving a STALL?  Indeed, the commit you referenced above specifically 
> mentions that when MMF is set and the PID code is IN then it is not a 
> STALL.
> 

You are probably right about that. I do not understand USB protocol well
enough. Eliminating clearing TT buffers when split transaction is
incomplete fixed the problem for me. If I changed qtd_copy_status() to
return EPIPE as it was doing before commit 64cc3f12d1c7, USB resets went
away on my machine, so I am wondering if the comment at
drivers/usb/host/ehci-q.c:477 is applicable here.

>> Removing the code that returns EPROTO for such case solves the
>> problem on my machine (as in the RFC patch)
> 
> It certainly can't solve the problem for any USB-3 connections, because 
> the patch doesn't touch any of the USB-3 driver code.

Right. It solves the problem I see with USB 2.0 controller. I continue
to see issues with USB 3.0 if I move the hub to a USB 3.0 port.

> 
>>  but that probably is not
>> the right solution. I do not understand USB protocol well enough to
>> propose a better solution. Does anyone have a better idea?
> 
> Can you collect a usbmon trace showing an example of this problem?
Sure. I will do that. Tracing the code when debugging for USB 2.0
controller led me to that specific line of code. As I said, I do not
understand USB well enough to say if changing that code is the right
solution, and it indeed solves the problem for USB 2.0 only.

> 
> One possibility is to introduce a special quirk for the NEC uPD72010x 
> EHCI controller.  But we should hold off on that until we know exactly 
> what is happening.

I do not believe whatever is causing USB resets is unique to NEC chip. I
am seeing issues on the USB 3.0 controllers as well.

> 
> Alan Stern
> 

Thanks,
Khalid


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC RESEND PATCH 0/1] USB EHCI: repeated resets on full and low speed devices
  2020-09-01 15:51   ` Khalid Aziz
@ 2020-09-01 16:18     ` Alan Stern
  0 siblings, 0 replies; 14+ messages in thread
From: Alan Stern @ 2020-09-01 16:18 UTC (permalink / raw)
  To: Khalid Aziz; +Cc: gregkh, erkka.talvitie, linux-usb, linux-kernel

On Tue, Sep 01, 2020 at 08:51:14AM -0700, Khalid Aziz wrote:
> >> At the time of failure, when we reach this conditional, token is
> >> either 0x80408d46 or 0x408d46 which means following bits are set:
> >>
> >> QTD_STS_STS, QTD_STS_MMF, QTD_STS_HALT, QTD_IOC, QTD_TOGGLE
> >>
> >> and 
> >>
> >>         QTD_PID = 1
> >>         QTD_CERR = 3
> >>         QTD_LENGTH = 0x40 (64)
> >>
> >> This causes  the branch "(token & QTD_STS_MMF) && (QTD_PID(token) ==
> >> PID_CODE_IN" to be taken and qtd_copy_status() returns EPROTO. This
> >> return value in qh_completions() results in ehci_clear_tt_buffer()
> >> being called:

I didn't mention this before, but that combination of events doesn't 
make sense.  The MMF bit is supposed to get set only for queue heads in 
the periodic list, that is, only for interrupt transactions.  But 
ehci_clear_tt_buffer() doesn't do anything for interrupt endpoints; it 
tests specifically for that right at the start.

Maybe your EHCI controller is setting the MMF bit when it shouldn't.  
The usbmon output will help clear this up.

Or maybe the hubs you are testing don't work right.  That's the only 
reason I can think of for the failures you see with the USB-3 
controller; the way it operates is very different from the way EHCI 
does.

Alan Stern

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC RESEND PATCH 0/1] USB EHCI: repeated resets on full and low speed devices
       [not found]   ` <608418fa-b0ce-c2a4-ad79-fe505c842587@oracle.com>
@ 2020-09-01 16:36     ` Alan Stern
  2020-09-01 17:00       ` Khalid Aziz
  0 siblings, 1 reply; 14+ messages in thread
From: Alan Stern @ 2020-09-01 16:36 UTC (permalink / raw)
  To: Khalid Aziz; +Cc: gregkh, erkka.talvitie, linux-usb, linux-kernel

On Tue, Sep 01, 2020 at 09:15:46AM -0700, Khalid Aziz wrote:
> On 8/31/20 8:31 PM, Alan Stern wrote:
> > Can you collect a usbmon trace showing an example of this problem?
> > 
> 
> I have attached usbmon traces for when USB hub with keyboards and mouse
> is plugged into USB 2.0 port and when it is plugged into the NEC USB 3.0
> port.

The usbmon traces show lots of errors, but no Clear-TT events.  The 
large number of errors suggests that you've got a hardware problem; 
either a bad hub or bad USB connections.

Alan Stern

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC RESEND PATCH 0/1] USB EHCI: repeated resets on full and low speed devices
  2020-09-01 16:36     ` Alan Stern
@ 2020-09-01 17:00       ` Khalid Aziz
  2020-09-01 19:51         ` Alan Stern
  0 siblings, 1 reply; 14+ messages in thread
From: Khalid Aziz @ 2020-09-01 17:00 UTC (permalink / raw)
  To: Alan Stern; +Cc: gregkh, erkka.talvitie, linux-usb, linux-kernel

On 9/1/20 10:36 AM, Alan Stern wrote:
> On Tue, Sep 01, 2020 at 09:15:46AM -0700, Khalid Aziz wrote:
>> On 8/31/20 8:31 PM, Alan Stern wrote:
>>> Can you collect a usbmon trace showing an example of this problem?
>>>
>>
>> I have attached usbmon traces for when USB hub with keyboards and mouse
>> is plugged into USB 2.0 port and when it is plugged into the NEC USB 3.0
>> port.
> 
> The usbmon traces show lots of errors, but no Clear-TT events.  The 
> large number of errors suggests that you've got a hardware problem; 
> either a bad hub or bad USB connections.

That is what I thought initially which is why I got additional hubs and
a USB 2.0 PCI card to test. I am seeing errors across 3 USB controllers,
4 USB hubs and 4 slow/full speed devices. All of the hubs and slow/full
devices work with zero errors on my laptop. My keyboard/mouse devices
and 2 of my USB hubs predate motherboard update and they all worked
flawlessly before the motherboard upgrade. Some combinations of these
also works with no errors on my desktop with new motherboard that I had
listed in my original email:

2. USB 2.0 controller - WORKS
5. USB 3.0/3.1 controller -> Bus powered USB 2.0 hub - WORKS

I am not seeing a common failure here that would point to any specific
hardware being bad. Besides, that one code change (which I still can't
say is the right code change) in ehci-q.c makes USB 2.0 controller work
reliably with all my devices.

--
Khalid


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC RESEND PATCH 0/1] USB EHCI: repeated resets on full and low speed devices
  2020-09-01 17:00       ` Khalid Aziz
@ 2020-09-01 19:51         ` Alan Stern
  2020-09-01 22:54           ` Khalid Aziz
  0 siblings, 1 reply; 14+ messages in thread
From: Alan Stern @ 2020-09-01 19:51 UTC (permalink / raw)
  To: Khalid Aziz; +Cc: gregkh, erkka.talvitie, linux-usb, linux-kernel

On Tue, Sep 01, 2020 at 11:00:16AM -0600, Khalid Aziz wrote:
> On 9/1/20 10:36 AM, Alan Stern wrote:
> > On Tue, Sep 01, 2020 at 09:15:46AM -0700, Khalid Aziz wrote:
> >> On 8/31/20 8:31 PM, Alan Stern wrote:
> >>> Can you collect a usbmon trace showing an example of this problem?
> >>>
> >>
> >> I have attached usbmon traces for when USB hub with keyboards and mouse
> >> is plugged into USB 2.0 port and when it is plugged into the NEC USB 3.0
> >> port.
> > 
> > The usbmon traces show lots of errors, but no Clear-TT events.  The 
> > large number of errors suggests that you've got a hardware problem; 
> > either a bad hub or bad USB connections.
> 
> That is what I thought initially which is why I got additional hubs and
> a USB 2.0 PCI card to test. I am seeing errors across 3 USB controllers,
> 4 USB hubs and 4 slow/full speed devices. All of the hubs and slow/full
> devices work with zero errors on my laptop. My keyboard/mouse devices
> and 2 of my USB hubs predate motherboard update and they all worked
> flawlessly before the motherboard upgrade. Some combinations of these
> also works with no errors on my desktop with new motherboard that I had
> listed in my original email:

It's a very puzzling situation.

One thing which probably would work well, surprisingly, would be to buy 
an old USB-1.1 hub and plug it into the PCI card.  That combination is 
likely to be similar to what you see when plugging the devices directly 
into the PCI card.  It might even work okay with the USB-3 controllers.

> 2. USB 2.0 controller - WORKS
> 5. USB 3.0/3.1 controller -> Bus powered USB 2.0 hub - WORKS
> 
> I am not seeing a common failure here that would point to any specific
> hardware being bad. Besides, that one code change (which I still can't
> say is the right code change) in ehci-q.c makes USB 2.0 controller work
> reliably with all my devices.

The USB and EHCI designs are flawed in that under the circumstances 
you're seeing, they don't have any way to tell the difference between a 
STALL and a host timing error.  The current code treats these situations 
as timing/transmission errors (resulting in device resets); your change 
causes them to be treated as STALLs.  However, there are known, common 
situations in which those same symptoms really are caused by 
transmission errors, so we don't want to start treating them as STALLs.

Besides, I suspect that your code change does _not_ make the USB-2 
controller work reliably with your devices.  You should collect a usbmon 
trace under those conditions; I predict it will be full of STALLs.  And 
furthermore, I believe these STALLs will not show up in a usbmon trace 
made with the devices plugged directly into the PCI card.  If I'm right 
about these things, the errors are still present even with your patch; 
all it does is hide them.

Short of a USB bus analyzer, however, there's no way to tell what's 
really going on.

Alan Stern

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC RESEND PATCH 0/1] USB EHCI: repeated resets on full and low speed devices
  2020-09-01 19:51         ` Alan Stern
@ 2020-09-01 22:54           ` Khalid Aziz
  2020-09-02  1:44             ` Alan Stern
  0 siblings, 1 reply; 14+ messages in thread
From: Khalid Aziz @ 2020-09-01 22:54 UTC (permalink / raw)
  To: Alan Stern; +Cc: gregkh, erkka.talvitie, linux-usb, linux-kernel

On 9/1/20 1:51 PM, Alan Stern wrote:
> On Tue, Sep 01, 2020 at 11:00:16AM -0600, Khalid Aziz wrote:
>> On 9/1/20 10:36 AM, Alan Stern wrote:
>>> On Tue, Sep 01, 2020 at 09:15:46AM -0700, Khalid Aziz wrote:
>>>> On 8/31/20 8:31 PM, Alan Stern wrote:
>>>>> Can you collect a usbmon trace showing an example of this problem?
>>>>>
>>>>
>>>> I have attached usbmon traces for when USB hub with keyboards and mouse
>>>> is plugged into USB 2.0 port and when it is plugged into the NEC USB 3.0
>>>> port.
>>>
>>> The usbmon traces show lots of errors, but no Clear-TT events.  The 
>>> large number of errors suggests that you've got a hardware problem; 
>>> either a bad hub or bad USB connections.
>>
>> That is what I thought initially which is why I got additional hubs and
>> a USB 2.0 PCI card to test. I am seeing errors across 3 USB controllers,
>> 4 USB hubs and 4 slow/full speed devices. All of the hubs and slow/full
>> devices work with zero errors on my laptop. My keyboard/mouse devices
>> and 2 of my USB hubs predate motherboard update and they all worked
>> flawlessly before the motherboard upgrade. Some combinations of these
>> also works with no errors on my desktop with new motherboard that I had
>> listed in my original email:
> 
> It's a very puzzling situation.
> 
> One thing which probably would work well, surprisingly, would be to buy 
> an old USB-1.1 hub and plug it into the PCI card.  That combination is 
> likely to be similar to what you see when plugging the devices directly 
> into the PCI card.  It might even work okay with the USB-3 controllers.
> 
>> 2. USB 2.0 controller - WORKS
>> 5. USB 3.0/3.1 controller -> Bus powered USB 2.0 hub - WORKS
>>
>> I am not seeing a common failure here that would point to any specific
>> hardware being bad. Besides, that one code change (which I still can't
>> say is the right code change) in ehci-q.c makes USB 2.0 controller work
>> reliably with all my devices.
> 
> The USB and EHCI designs are flawed in that under the circumstances 
> you're seeing, they don't have any way to tell the difference between a 
> STALL and a host timing error.  The current code treats these situations 
> as timing/transmission errors (resulting in device resets); your change 
> causes them to be treated as STALLs.  However, there are known, common 
> situations in which those same symptoms really are caused by 
> transmission errors, so we don't want to start treating them as STALLs.
> 
> Besides, I suspect that your code change does _not_ make the USB-2 
> controller work reliably with your devices.  You should collect a usbmon 
> trace under those conditions; I predict it will be full of STALLs.  And 
> furthermore, I believe these STALLs will not show up in a usbmon trace 
> made with the devices plugged directly into the PCI card.  If I'm right 
> about these things, the errors are still present even with your patch; 
> all it does is hide them.
> 
> Short of a USB bus analyzer, however, there's no way to tell what's 
> really going on.

I have managed to find a hardware combination that seems to work, so for
now at least my machine is usable. I will figure out how to interpret
usbmon output and run more experiments. There seems to be a real problem
in the driver somewhere and should be solved.

Thanks,
Khalid



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC RESEND PATCH 0/1] USB EHCI: repeated resets on full and low speed devices
  2020-09-01 22:54           ` Khalid Aziz
@ 2020-09-02  1:44             ` Alan Stern
  0 siblings, 0 replies; 14+ messages in thread
From: Alan Stern @ 2020-09-02  1:44 UTC (permalink / raw)
  To: Khalid Aziz; +Cc: gregkh, erkka.talvitie, linux-usb, linux-kernel

On Tue, Sep 01, 2020 at 04:54:48PM -0600, Khalid Aziz wrote:
> I have managed to find a hardware combination that seems to work, so for
> now at least my machine is usable. I will figure out how to interpret
> usbmon output and run more experiments. There seems to be a real problem
> in the driver somewhere and should be solved.

Correction: You're using two different drivers.  Although it's not 
impossible, it seems very unlikely that they both contain the same bug.

Alan Stern

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 1/1] usb: ehci: Remove erroneous return of EPROTO upon detection of stall
  2020-08-31 16:08 ` [RFC PATCH 1/1] usb: ehci: Remove erroneous return of EPROTO upon detection of stall Khalid Aziz
  2020-08-31 16:23   ` [RFC RESEND " Khalid Aziz
@ 2020-09-04 15:19   ` Greg KH
  2020-09-04 16:43     ` Khalid Aziz
  1 sibling, 1 reply; 14+ messages in thread
From: Greg KH @ 2020-09-04 15:19 UTC (permalink / raw)
  To: Khalid Aziz; +Cc: stern, erkka.talvitie, linux-usb, linux-kernel, Khalid Aziz

On Mon, Aug 31, 2020 at 10:08:43AM -0600, Khalid Aziz wrote:
> With the USB 3.0/3.1 controller on MSI B450-A Pro Max motherboard,
> full speed and low speed devices see constant resets making
> keyboards and mouse unreliable and unusable. These resets are caused
> by detection of stall in qtd_copy_status() and returning EPROTO
> which in turn results in TT buffers in hub being cleared. Hubs do
> not seem to repsond well to this and seem to hang which causes
> further USB transactions to time out. A reset finally clears the
> issue until we repeat the cycle all over again.
> 
> Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com>
> Cc: Khalid Aziz <khalid@gonehiking.org>
> ---
>  drivers/usb/host/ehci-q.c | 4 ----
>  1 file changed, 4 deletions(-)
> 
> diff --git a/drivers/usb/host/ehci-q.c b/drivers/usb/host/ehci-q.c
> index 8a5c9b3ebe1e..7d4b2bc4633c 100644
> --- a/drivers/usb/host/ehci-q.c
> +++ b/drivers/usb/host/ehci-q.c
> @@ -214,10 +214,6 @@ static int qtd_copy_status (
>  		 * When MMF is active and PID Code is IN, queue is halted.
>  		 * EHCI Specification, Table 4-13.
>  		 */
> -		} else if ((token & QTD_STS_MMF) &&
> -					(QTD_PID(token) == PID_CODE_IN)) {
> -			status = -EPROTO;
> -		/* CERR nonzero + halt --> stall */
>  		} else if (QTD_CERR(token)) {
>  			status = -EPIPE;
>  

Removing this check is not a good idea, any chance you can come up with
some other test instead for this broken hardware?

What about getting a USB hub that works?  :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 1/1] usb: ehci: Remove erroneous return of EPROTO upon detection of stall
  2020-09-04 15:19   ` [RFC " Greg KH
@ 2020-09-04 16:43     ` Khalid Aziz
  0 siblings, 0 replies; 14+ messages in thread
From: Khalid Aziz @ 2020-09-04 16:43 UTC (permalink / raw)
  To: Greg KH; +Cc: stern, erkka.talvitie, linux-usb, linux-kernel, Khalid Aziz

On 9/4/20 9:19 AM, Greg KH wrote:
> On Mon, Aug 31, 2020 at 10:08:43AM -0600, Khalid Aziz wrote:
>> With the USB 3.0/3.1 controller on MSI B450-A Pro Max motherboard,
>> full speed and low speed devices see constant resets making
>> keyboards and mouse unreliable and unusable. These resets are caused
>> by detection of stall in qtd_copy_status() and returning EPROTO
>> which in turn results in TT buffers in hub being cleared. Hubs do
>> not seem to repsond well to this and seem to hang which causes
>> further USB transactions to time out. A reset finally clears the
>> issue until we repeat the cycle all over again.
>>
>> Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com>
>> Cc: Khalid Aziz <khalid@gonehiking.org>
>> ---
>>  drivers/usb/host/ehci-q.c | 4 ----
>>  1 file changed, 4 deletions(-)
>>
>> diff --git a/drivers/usb/host/ehci-q.c b/drivers/usb/host/ehci-q.c
>> index 8a5c9b3ebe1e..7d4b2bc4633c 100644
>> --- a/drivers/usb/host/ehci-q.c
>> +++ b/drivers/usb/host/ehci-q.c
>> @@ -214,10 +214,6 @@ static int qtd_copy_status (
>>  		 * When MMF is active and PID Code is IN, queue is halted.
>>  		 * EHCI Specification, Table 4-13.
>>  		 */
>> -		} else if ((token & QTD_STS_MMF) &&
>> -					(QTD_PID(token) == PID_CODE_IN)) {
>> -			status = -EPROTO;
>> -		/* CERR nonzero + halt --> stall */
>>  		} else if (QTD_CERR(token)) {
>>  			status = -EPIPE;
>>  
> 
> Removing this check is not a good idea, any chance you can come up with
> some other test instead for this broken hardware?
> 
> What about getting a USB hub that works?  :)
> 

I agree removing that check is not the right way to fix this problem. It
just so happens, the USB resets disappear when that check is removed. It
is more likely that check needs to be refined further to differentiate
between a hub that was unplugged (reason for the original commit) and a
hub that is seeing split transaction errors on full/low speed devices.

I am not sure if hardware is broken. I currently am using one of the
four hubs I have in a working configuration. The hub I was using before
motherboard replacement on my desktop stopped working with new
motherboard. Suspecting hardware defect on the motherboard, I bought a
PCI plug in USB 2.0 card but that showed the same failure. So I got two
more USB hubs just in case my existing hubs were broken. In all I tried
seven combinations of hardware and five of them failed the same way.
Every one of these hubs, keyboards, mouse and tablet works with no
problems on my laptop. All high speed and super speed devices (various
storage devices I have) work flawlessly on my desktop plugged into any
port or any hub. My desktop is a Ryzen 5 3600X in an MSI B450-A pro max
motherboard. Previous motherboard on my desktop was an ASRock Z77
Extreme motherboard with Intel core i7-3770. My laptop is an Intel
i5-7300U in a Lenovo thinkpad. Somehow hubs are getting set up
differently for split transactions full/low speed devices between two
machines.

Since I have a working configuration of hardware, my next steps are to
use my desktop with working configuration of hardware and then go deeper
into USB debugging to find out what is wrong with non-working
configurations.

Thanks,
Khalid



^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-09-04 16:43 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-31 16:08 [RFC PATCH 0/1] USB EHCI: repeated resets on full and low speed devices Khalid Aziz
2020-08-31 16:08 ` [RFC PATCH 1/1] usb: ehci: Remove erroneous return of EPROTO upon detection of stall Khalid Aziz
2020-08-31 16:23   ` [RFC RESEND " Khalid Aziz
2020-09-04 15:19   ` [RFC " Greg KH
2020-09-04 16:43     ` Khalid Aziz
2020-08-31 16:23 ` [RFC RESEND PATCH 0/1] USB EHCI: repeated resets on full and low speed devices Khalid Aziz
2020-09-01  2:31 ` Alan Stern
2020-09-01 15:51   ` Khalid Aziz
2020-09-01 16:18     ` Alan Stern
     [not found]   ` <608418fa-b0ce-c2a4-ad79-fe505c842587@oracle.com>
2020-09-01 16:36     ` Alan Stern
2020-09-01 17:00       ` Khalid Aziz
2020-09-01 19:51         ` Alan Stern
2020-09-01 22:54           ` Khalid Aziz
2020-09-02  1:44             ` Alan Stern

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).