Linux-USB Archive on lore.kernel.org
 help / color / Atom feed
* [RFCv1 1/1] USB: EHCI: Do not return -EPIPE when hub is disconnected
@ 2019-11-29 14:08 Erkka Talvitie
  2019-12-02 19:43 ` Alan Stern
  0 siblings, 1 reply; 13+ messages in thread
From: Erkka Talvitie @ 2019-11-29 14:08 UTC (permalink / raw)
  To: stern, gregkh; +Cc: linux-usb, claus.baumgartner, Erkka Talvitie

When disconnecting a USB hub that has some child device(s) connected to it
(such as a USB mouse), then the stack tries to clear halt and
reset device(s) which are _already_ physically disconnected.

The issue has been reproduced with:

CPU: IMX6D5EYM10AD or MCIMX6D5EYM10AE.
SW: U-Boot 2019.07 and kernel 4.19.40.

In this situation there will be error bit for MMF active yet the
CERR equals EHCI_TUNE_CERR + halt. Existing implementation
interprets this as a stall [1] (chapter 8.4.5).

Fix for the issue is at first to check for a stall that comes after
an error (the CERR has been decreased).

Then after that, check for other errors.

And at last check for stall without other errors (the CERR equals
EHCI_TUNE_CERR as stall does not decrease the CERR [2] (table 3-16)).

What happens after the fix is that when disconnecting a hub with
attached device(s) the situation is not interpret as a stall.

The specification [2] is not clear about error priorities, but
since there is no explicit error bit for the stall, it is
assumed to be lower priority than other errors.

[1] https://www.usb.org/document-library/usb-20-specification, usb_20.pdf
[2] https://www.intel.com/content/dam/www/public/us/en/documents/technical-specifications/ehci-specification-for-usb.pdf

Signed-off-by: Erkka Talvitie <erkka.talvitie@vincit.fi>
---
Following test steps were executed to first demonstrate the issue and then
to verify the fix:

--> Attach a mouse into host port and then disconnect it.
--> Attach a keyboard into host port and then disconnect it.
--> Attach a KVM switch into host port and then disconnect it.
--> Attach a KVM switch into host port, attach a mouse into the KVM and then disconnect it.
--> Re-attach the mouse into the KVM and then disconnect the KVM switch.
--> Attach a KVM switch into host port, attach a keyboard into the KVM and then disconnect it.
--> Re-attach the keyboard into the KVM and then disconnect the KVM switch.
--> Attach a KVM switch into host port, attach a mouse and a keyboard into the KVM and then disconnect the KVM switch.
--> End test.

Output without the fix:

--> Attach a mouse into host port and then disconnect it.
kernel: [  374.409837] usb 2-1.2: new low-speed USB device number 16 using ci_hdrc
kernel: [  374.673010] input: Logitech USB Optical Mouse as /devices/soc0/soc/2100000.aips-bus/2184200.usb/ci_hdrc.1/usb2/2-1/2-1.2/2-1.2:1.0/0003:046D:C05F.0006/input/input7
kernel: [  374.674044] hid-generic 0003:046D:C05F.0006: input: USB HID v1.11 Mouse [Logitech USB Optical Mouse] on usb-ci_hdrc.1-1.2/input0
weston[464]: [11:14:07.717] event6  - Logitech USB Optical Mouse: is tagged by udev as: Mouse
weston[464]: [11:14:07.717] event6  - Logitech USB Optical Mouse: device is a pointer
weston[464]: [11:14:07.718] libinput: configuring device "Logitech USB Optical Mouse".
weston[464]: [11:14:07.718] associating input device event6 with output HDMI-A-1 (none by udev)
kernel: [  378.931298] usb 2-1.2: USB disconnect, device number 16
weston[464]: [11:14:11.668] event6  - Logitech USB Optical Mouse: device removed
--> Attach a keyboard into host port and then disconnect it.
kernel: [  400.759884] usb 2-1.2: new low-speed USB device number 17 using ci_hdrc
kernel: [  401.025695] input: Dell Dell QuietKey Keyboard as /devices/soc0/soc/2100000.aips-bus/2184200.usb/ci_hdrc.1/usb2/2-1/2-1.2/2-1.2:1.0/0003:413C:2106.0007/input/input8
kernel: [  401.090413] hid-generic 0003:413C:2106.0007: input: USB HID v1.10 Keyboard [Dell Dell QuietKey Keyboard] on usb-ci_hdrc.1-1.2/input0
systemd-logind[392]: Watching system buttons on /dev/input/event6 (Dell Dell QuietKey Keyboard)
weston[464]: [11:14:33.992] event6  - Dell Dell QuietKey Keyboard: is tagged by udev as: Keyboard
weston[464]: [11:14:33.994] event6  - Dell Dell QuietKey Keyboard: device is a keyboard
weston[464]: [11:14:33.995] libinput: configuring device "Dell Dell QuietKey Keyboard".
weston[464]: [11:14:33.996] associating input device event6 with output HDMI-A-1 (none by udev)
kernel: [  404.230882] usb 2-1.2: USB disconnect, device number 17
weston[464]: [11:14:37.047] event6  - Dell Dell QuietKey Keyboard: device removed
--> Attach a KVM switch into host port and then disconnect it.
kernel: [  424.059844] usb 2-1.2: new high-speed USB device number 18 using ci_hdrc
kernel: [  424.315581] hub 2-1.2:1.0: USB hub found
kernel: [  424.316179] hub 2-1.2:1.0: 4 ports detected
kernel: [  428.268438] usb 2-1.2: USB disconnect, device number 18
--> Attach a KVM switch into host port, attach a mouse into the KVM and then disconnect it.
kernel: [  443.509853] usb 2-1.2: new high-speed USB device number 19 using ci_hdrc
kernel: [  443.763137] hub 2-1.2:1.0: USB hub found
kernel: [  443.763860] hub 2-1.2:1.0: 4 ports detected
kernel: [  448.299911] usb 2-1.2.2: new low-speed USB device number 20 using ci_hdrc
kernel: [  448.463160] input: Logitech USB Optical Mouse as /devices/soc0/soc/2100000.aips-bus/2184200.usb/ci_hdrc.1/usb2/2-1/2-1.2/2-1.2.2/2-1.2.2:1.0/0003:046D:C05F.0008/input/input9
kernel: [  448.463380] hid-generic 0003:046D:C05F.0008: input: USB HID v1.11 Mouse [Logitech USB Optical Mouse] on usb-ci_hdrc.1-1.2.2/input0
weston[464]: [11:15:21.508] event6  - Logitech USB Optical Mouse: is tagged by udev as: Mouse
weston[464]: [11:15:21.508] event6  - Logitech USB Optical Mouse: device is a pointer
weston[464]: [11:15:21.509] libinput: configuring device "Logitech USB Optical Mouse".
weston[464]: [11:15:21.509] associating input device event6 with output HDMI-A-1 (none by udev)
kernel: [  453.915419] usb 2-1.2.2: USB disconnect, device number 20
weston[464]: [11:15:26.655] event6  - Logitech USB Optical Mouse: device removed
--> Re-attach the mouse into the KVM and then disconnect the KVM switch.
kernel: [  468.519841] usb 2-1.2.2: new low-speed USB device number 21 using ci_hdrc
kernel: [  468.683002] input: Logitech USB Optical Mouse as /devices/soc0/soc/2100000.aips-bus/2184200.usb/ci_hdrc.1/usb2/2-1/2-1.2/2-1.2.2/2-1.2.2:1.0/0003:046D:C05F.0009/input/input10
kernel: [  468.683202] hid-generic 0003:046D:C05F.0009: input: USB HID v1.11 Mouse [Logitech USB Optical Mouse] on usb-ci_hdrc.1-1.2.2/input0
weston[464]: [11:15:41.727] event6  - Logitech USB Optical Mouse: is tagged by udev as: Mouse
weston[464]: [11:15:41.728] event6  - Logitech USB Optical Mouse: device is a pointer
weston[464]: [11:15:41.728] libinput: configuring device "Logitech USB Optical Mouse".
weston[464]: [11:15:41.728] associating input device event6 with output HDMI-A-1 (none by udev)
kernel: [  471.118091] usb 2-1.2: clear tt 1 (0150) error -71
kernel: [  471.134041] usb 2-1.2-port2: cannot reset (err = -71)
kernel: [  471.138266] usb 2-1.2-port2: cannot reset (err = -71)
kernel: [  471.142515] usb 2-1.2-port2: cannot reset (err = -71)
kernel: [  471.146773] usb 2-1.2-port2: cannot reset (err = -71)
kernel: [  471.151013] usb 2-1.2-port2: cannot reset (err = -71)
kernel: [  471.151020] usb 2-1.2-port2: Cannot enable. Maybe the USB cable is bad?
kernel: [  471.155259] usb 2-1.2-port2: cannot disable (err = -71)
kernel: [  471.159542] usb 2-1.2-port2: cannot reset (err = -71)
kernel: [  471.163765] usb 2-1.2-port2: cannot reset (err = -71)
kernel: [  471.168007] usb 2-1.2-port2: cannot reset (err = -71)
kernel: [  471.172267] usb 2-1.2-port2: cannot reset (err = -71)
kernel: [  471.176507] usb 2-1.2-port2: cannot reset (err = -71)
kernel: [  471.176512] usb 2-1.2-port2: Cannot enable. Maybe the USB cable is bad?
kernel: [  471.180763] usb 2-1.2-port2: cannot disable (err = -71)
kernel: [  471.185022] usb 2-1.2-port2: cannot reset (err = -71)
kernel: [  471.189261] usb 2-1.2-port2: cannot reset (err = -71)
kernel: [  471.194198] usb 2-1.2-port2: cannot reset (err = -71)
kernel: [  471.198273] usb 2-1.2-port2: cannot reset (err = -71)
kernel: [  471.202514] usb 2-1.2-port2: cannot reset (err = -71)
kernel: [  471.202521] usb 2-1.2-port2: Cannot enable. Maybe the USB cable is bad?
kernel: [  471.206757] usb 2-1.2-port2: cannot disable (err = -71)
kernel: [  471.211024] usb 2-1.2-port2: cannot reset (err = -71)
kernel: [  471.215257] usb 2-1.2-port2: cannot reset (err = -71)
kernel: [  471.219524] usb 2-1.2-port2: cannot reset (err = -71)
kernel: [  471.223767] usb 2-1.2-port2: cannot reset (err = -71)
kernel: [  471.228007] usb 2-1.2-port2: cannot reset (err = -71)
kernel: [  471.228013] usb 2-1.2-port2: Cannot enable. Maybe the USB cable is bad?
kernel: [  471.232261] usb 2-1.2-port2: cannot disable (err = -71)
kernel: [  471.236514] usb 2-1.2-port2: cannot disable (err = -71)
kernel: [  471.240801] hub 2-1.2:1.0: hub_ext_port_status failed (err = -71)
weston[464]: [11:15:43.974] event6  - Logitech USB Optical Mouse: device removed
kernel: [  471.276392] usb 2-1.2: USB disconnect, device number 19
kernel: [  471.276404] usb 2-1.2.2: USB disconnect, device number 21
--> Attach a KVM switch into host port, attach a keyboard into the KVM and then disconnect it.
kernel: [  489.339839] usb 2-1.2: new high-speed USB device number 22 using ci_hdrc
kernel: [  489.602886] hub 2-1.2:1.0: USB hub found
kernel: [  489.603397] hub 2-1.2:1.0: 4 ports detected
kernel: [  493.869845] usb 2-1.2.2: new low-speed USB device number 23 using ci_hdrc
kernel: [  494.046950] input: Dell Dell QuietKey Keyboard as /devices/soc0/soc/2100000.aips-bus/2184200.usb/ci_hdrc.1/usb2/2-1/2-1.2/2-1.2.2/2-1.2.2:1.0/0003:413C:2106.000A/input/input11
kernel: [  494.112909] hid-generic 0003:413C:2106.000A: input: USB HID v1.10 Keyboard [Dell Dell QuietKey Keyboard] on usb-ci_hdrc.1-1.2.2/input0
systemd-logind[392]: Watching system buttons on /dev/input/event6 (Dell Dell QuietKey Keyboard)
weston[464]: [11:16:07.033] event6  - Dell Dell QuietKey Keyboard: is tagged by udev as: Keyboard
weston[464]: [11:16:07.038] event6  - Dell Dell QuietKey Keyboard: device is a keyboard
weston[464]: [11:16:07.040] libinput: configuring device "Dell Dell QuietKey Keyboard".
weston[464]: [11:16:07.040] associating input device event6 with output HDMI-A-1 (none by udev)
kernel: [  498.303010] usb 2-1.2.2: USB disconnect, device number 23
weston[464]: [11:16:11.135] event6  - Dell Dell QuietKey Keyboard: device removed
--> Re-attach the keyboard into the KVM and then disconnect the KVM switch.
kernel: [  510.509858] usb 2-1.2.2: new low-speed USB device number 24 using ci_hdrc
kernel: [  510.673159] input: Dell Dell QuietKey Keyboard as /devices/soc0/soc/2100000.aips-bus/2184200.usb/ci_hdrc.1/usb2/2-1/2-1.2/2-1.2.2/2-1.2.2:1.0/0003:413C:2106.000B/input/input12
kernel: [  510.740488] hid-generic 0003:413C:2106.000B: input: USB HID v1.10 Keyboard [Dell Dell QuietKey Keyboard] on usb-ci_hdrc.1-1.2.2/input0
systemd-logind[392]: Watching system buttons on /dev/input/event6 (Dell Dell QuietKey Keyboard)
weston[464]: [11:16:23.669] event6  - Dell Dell QuietKey Keyboard: is tagged by udev as: Keyboard
weston[464]: [11:16:23.670] event6  - Dell Dell QuietKey Keyboard: device is a keyboard
weston[464]: [11:16:23.672] libinput: configuring device "Dell Dell QuietKey Keyboard".
weston[464]: [11:16:23.673] associating input device event6 with output HDMI-A-1 (none by udev)
kernel: [  513.808762] usb 2-1.2: clear tt 1 (0180) error -71
kernel: [  513.824014] usb 2-1.2-port2: cannot reset (err = -71)
kernel: [  513.828254] usb 2-1.2-port2: cannot reset (err = -71)
kernel: [  513.832510] usb 2-1.2-port2: cannot reset (err = -71)
kernel: [  513.837034] usb 2-1.2-port2: cannot reset (err = -71)
kernel: [  513.841135] usb 2-1.2-port2: cannot reset (err = -71)
kernel: [  513.841141] usb 2-1.2-port2: Cannot enable. Maybe the USB cable is bad?
kernel: [  513.845391] usb 2-1.2-port2: cannot disable (err = -71)
kernel: [  513.849575] usb 2-1.2-port2: cannot reset (err = -71)
kernel: [  513.853760] usb 2-1.2-port2: cannot reset (err = -71)
kernel: [  513.858009] usb 2-1.2-port2: cannot reset (err = -71)
kernel: [  513.862264] usb 2-1.2-port2: cannot reset (err = -71)
kernel: [  513.866514] usb 2-1.2-port2: cannot reset (err = -71)
kernel: [  513.866521] usb 2-1.2-port2: Cannot enable. Maybe the USB cable is bad?
kernel: [  513.870812] usb 2-1.2-port2: cannot disable (err = -71)
kernel: [  513.875012] usb 2-1.2-port2: cannot reset (err = -71)
kernel: [  513.879253] usb 2-1.2-port2: cannot reset (err = -71)
kernel: [  513.883511] usb 2-1.2-port2: cannot reset (err = -71)
kernel: [  513.887763] usb 2-1.2-port2: cannot reset (err = -71)
kernel: [  513.892012] usb 2-1.2-port2: cannot reset (err = -71)
kernel: [  513.892019] usb 2-1.2-port2: Cannot enable. Maybe the USB cable is bad?
kernel: [  513.896267] usb 2-1.2-port2: cannot disable (err = -71)
kernel: [  513.900572] usb 2-1.2-port2: cannot reset (err = -71)
kernel: [  513.904776] usb 2-1.2-port2: cannot reset (err = -71)
kernel: [  513.909004] usb 2-1.2-port2: cannot reset (err = -71)
kernel: [  513.913290] usb 2-1.2-port2: cannot reset (err = -71)
kernel: [  513.917512] usb 2-1.2-port2: cannot reset (err = -71)
kernel: [  513.917517] usb 2-1.2-port2: Cannot enable. Maybe the USB cable is bad?
kernel: [  513.921769] usb 2-1.2-port2: cannot disable (err = -71)
kernel: [  513.926009] usb 2-1.2-port2: cannot disable (err = -71)
kernel: [  513.930372] hub 2-1.2:1.0: hub_ext_port_status failed (err = -71)
weston[464]: [11:16:26.752] event6  - Dell Dell QuietKey Keyboard: device removed
kernel: [  514.028708] usb 2-1.2: USB disconnect, device number 22
kernel: [  514.028721] usb 2-1.2.2: USB disconnect, device number 24
--> Attach a KVM switch into host port, attach a mouse and a keyboard into the KVM and then disconnect the KVM switch.
kernel: [  530.299896] usb 2-1.2: new high-speed USB device number 25 using ci_hdrc
kernel: [  530.553126] hub 2-1.2:1.0: USB hub found
kernel: [  530.553656] hub 2-1.2:1.0: 4 ports detected
kernel: [  534.580137] usb 2-1.2.2: new low-speed USB device number 26 using ci_hdrc
kernel: [  534.743092] input: Logitech USB Optical Mouse as /devices/soc0/soc/2100000.aips-bus/2184200.usb/ci_hdrc.1/usb2/2-1/2-1.2/2-1.2.2/2-1.2.2:1.0/0003:046D:C05F.000C/input/input13
kernel: [  534.744022] hid-generic 0003:046D:C05F.000C: input: USB HID v1.11 Mouse [Logitech USB Optical Mouse] on usb-ci_hdrc.1-1.2.2/input0
weston[464]: [11:16:47.778] event6  - Logitech USB Optical Mouse: is tagged by udev as: Mouse
weston[464]: [11:16:47.778] event6  - Logitech USB Optical Mouse: device is a pointer
weston[464]: [11:16:47.779] libinput: configuring device "Logitech USB Optical Mouse".
weston[464]: [11:16:47.779] associating input device event6 with output HDMI-A-1 (none by udev)
kernel: [  536.889928] usb 2-1.2.4: new low-speed USB device number 27 using ci_hdrc
kernel: [  537.054252] input: Dell Dell QuietKey Keyboard as /devices/soc0/soc/2100000.aips-bus/2184200.usb/ci_hdrc.1/usb2/2-1/2-1.2/2-1.2.4/2-1.2.4:1.0/0003:413C:2106.000D/input/input14
kernel: [  537.120711] hid-generic 0003:413C:2106.000D: input: USB HID v1.10 Keyboard [Dell Dell QuietKey Keyboard] on usb-ci_hdrc.1-1.2.4/input0
systemd-logind[392]: Watching system buttons on /dev/input/event7 (Dell Dell QuietKey Keyboard)
weston[464]: [11:16:50.022] event7  - Dell Dell QuietKey Keyboard: is tagged by udev as: Keyboard
weston[464]: [11:16:50.024] event7  - Dell Dell QuietKey Keyboard: device is a keyboard
weston[464]: [11:16:50.025] libinput: configuring device "Dell Dell QuietKey Keyboard".
weston[464]: [11:16:50.026] associating input device event7 with output HDMI-A-1 (none by udev)
kernel: [  540.962649] usb 2-1.2: clear tt 1 (01a0) error -71
kernel: [  540.966889] usb 2-1.2: clear tt 1 (01b0) error -71
kernel: [  540.974144] usb 2-1.2-port4: cannot reset (err = -71)
kernel: [  540.978389] usb 2-1.2-port4: cannot reset (err = -71)
kernel: [  540.982519] usb 2-1.2-port4: cannot reset (err = -71)
kernel: [  540.986762] usb 2-1.2-port4: cannot reset (err = -71)
kernel: [  540.991019] usb 2-1.2-port4: cannot reset (err = -71)
kernel: [  540.991025] usb 2-1.2-port4: Cannot enable. Maybe the USB cable is bad?
kernel: [  540.995254] usb 2-1.2-port4: cannot disable (err = -71)
kernel: [  540.999529] usb 2-1.2-port4: cannot reset (err = -71)
kernel: [  541.003785] usb 2-1.2-port4: cannot reset (err = -71)
kernel: [  541.008015] usb 2-1.2-port4: cannot reset (err = -71)
kernel: [  541.012276] usb 2-1.2-port4: cannot reset (err = -71)
kernel: [  541.016511] usb 2-1.2-port4: cannot reset (err = -71)
kernel: [  541.016517] usb 2-1.2-port4: Cannot enable. Maybe the USB cable is bad?
kernel: [  541.020910] usb 2-1.2-port4: cannot disable (err = -71)
kernel: [  541.025107] usb 2-1.2-port2: cannot reset (err = -71)
kernel: [  541.029286] usb 2-1.2-port2: cannot reset (err = -71)
kernel: [  541.033522] usb 2-1.2-port2: cannot reset (err = -71)
kernel: [  541.037793] usb 2-1.2-port2: cannot reset (err = -71)
kernel: [  541.042021] usb 2-1.2-port2: cannot reset (err = -71)
kernel: [  541.042028] usb 2-1.2-port2: Cannot enable. Maybe the USB cable is bad?
kernel: [  541.046269] usb 2-1.2-port2: cannot disable (err = -71)
kernel: [  541.050543] usb 2-1.2-port2: cannot reset (err = -71)
kernel: [  541.054753] usb 2-1.2-port2: cannot reset (err = -71)
kernel: [  541.059005] usb 2-1.2-port2: cannot reset (err = -71)
kernel: [  541.063266] usb 2-1.2-port2: cannot reset (err = -71)
kernel: [  541.067513] usb 2-1.2-port2: cannot reset (err = -71)
kernel: [  541.067518] usb 2-1.2-port2: Cannot enable. Maybe the USB cable is bad?
kernel: [  541.071769] usb 2-1.2-port2: cannot disable (err = -71)
kernel: [  541.076005] usb 2-1.2-port4: cannot reset (err = -71)
kernel: [  541.080324] usb 2-1.2-port4: cannot reset (err = -71)
kernel: [  541.084543] usb 2-1.2-port4: cannot reset (err = -71)
kernel: [  541.088762] usb 2-1.2-port4: cannot reset (err = -71)
kernel: [  541.093018] usb 2-1.2-port4: cannot reset (err = -71)
kernel: [  541.093024] usb 2-1.2-port4: Cannot enable. Maybe the USB cable is bad?
kernel: [  541.097261] usb 2-1.2-port4: cannot disable (err = -71)
kernel: [  541.101515] usb 2-1.2-port4: cannot reset (err = -71)
kernel: [  541.105751] usb 2-1.2-port4: cannot reset (err = -71)
kernel: [  541.110169] usb 2-1.2-port4: cannot reset (err = -71)
kernel: [  541.114265] usb 2-1.2-port4: cannot reset (err = -71)
kernel: [  541.118507] usb 2-1.2-port4: cannot reset (err = -71)
kernel: [  541.118512] usb 2-1.2-port4: Cannot enable. Maybe the USB cable is bad?
kernel: [  541.122763] usb 2-1.2-port4: cannot disable (err = -71)
kernel: [  541.127006] usb 2-1.2-port4: cannot disable (err = -71)
kernel: [  541.131266] usb 2-1.2-port2: cannot reset (err = -71)
kernel: [  541.135547] hub 2-1.2:1.0: hub_ext_port_status failed (err = -71)
kernel: [  541.139763] usb 2-1.2-port2: cannot reset (err = -71)
kernel: [  541.144017] usb 2-1.2-port2: cannot reset (err = -71)
kernel: [  541.148262] usb 2-1.2-port2: cannot reset (err = -71)
kernel: [  541.152513] usb 2-1.2-port2: cannot reset (err = -71)
kernel: [  541.152519] usb 2-1.2-port2: Cannot enable. Maybe the USB cable is bad?
kernel: [  541.156752] usb 2-1.2-port2: cannot disable (err = -71)
kernel: [  541.161031] usb 2-1.2-port2: cannot reset (err = -71)
kernel: [  541.164659] usb 2-1.2: USB disconnect, device number 25
kernel: [  541.164670] usb 2-1.2.2: USB disconnect, device number 26
kernel: [  541.165358] usb 2-1.2-port2: cannot reset (err = -71)
weston[464]: [11:16:53.936] event7  - Dell Dell QuietKey Keyboard: device removed
weston[464]: [11:16:53.976] event6  - Logitech USB Optical Mouse: device removed
kernel: [  541.271277] usb 2-1.2.4: USB disconnect, device number 27
--> End test.

Output with the fix:

--> Attach a mouse into host port and then disconnect it.
kernel: [  101.229751] usb 2-1.2: new low-speed USB device number 16 using ci_hdrc
kernel: [  101.496424] input: Logitech USB Optical Mouse as /devices/soc0/soc/2100000.aips-bus/2184200.usb/ci_hdrc.1/usb2/2-1/2-1.2/2-1.2:1.0/0003:046D:C05F.0006/input/input7
kernel: [  101.497252] hid-generic 0003:046D:C05F.0006: input: USB HID v1.11 Mouse [Logitech USB Optical Mouse] on usb-ci_hdrc.1-1.2/input0
weston[486]: [11:34:50.974] event6  - Logitech USB Optical Mouse: is tagged by udev as: Mouse
weston[486]: [11:34:50.975] event6  - Logitech USB Optical Mouse: device is a pointer
weston[486]: [11:34:50.975] libinput: configuring device "Logitech USB Optical Mouse".
weston[486]: [11:34:50.975] associating input device event6 with output HDMI-A-1 (none by udev)
kernel: [  104.418306] usb 2-1.2: USB disconnect, device number 16
weston[486]: [11:34:53.604] event6  - Logitech USB Optical Mouse: device removed
--> Attach a keyboard into host port and then disconnect it.
kernel: [  126.829750] usb 2-1.2: new low-speed USB device number 17 using ci_hdrc
kernel: [  127.096213] input: Dell Dell QuietKey Keyboard as /devices/soc0/soc/2100000.aips-bus/2184200.usb/ci_hdrc.1/usb2/2-1/2-1.2/2-1.2:1.0/0003:413C:2106.0007/input/input8
kernel: [  127.160286] hid-generic 0003:413C:2106.0007: input: USB HID v1.10 Keyboard [Dell Dell QuietKey Keyboard] on usb-ci_hdrc.1-1.2/input0
systemd-logind[418]: Watching system buttons on /dev/input/event6 (Dell Dell QuietKey Keyboard)
weston[486]: [11:35:16.511] event6  - Dell Dell QuietKey Keyboard: is tagged by udev as: Keyboard
weston[486]: [11:35:16.511] event6  - Dell Dell QuietKey Keyboard: device is a keyboard
weston[486]: [11:35:16.513] libinput: configuring device "Dell Dell QuietKey Keyboard".
weston[486]: [11:35:16.513] associating input device event6 with output HDMI-A-1 (none by udev)
kernel: [  129.506479] usb 2-1.2: USB disconnect, device number 17
weston[486]: [11:35:18.775] event6  - Dell Dell QuietKey Keyboard: device removed
--> Attach a KVM switch into host port and then disconnect it.
kernel: [  160.369756] usb 2-1.2: new high-speed USB device number 18 using ci_hdrc
kernel: [  160.622949] hub 2-1.2:1.0: USB hub found
kernel: [  160.623495] hub 2-1.2:1.0: 4 ports detected
kernel: [  163.298449] usb 2-1.2: USB disconnect, device number 18
--> Attach a KVM switch into host port, attach a mouse into the KVM and then disconnect it.
kernel: [  184.939760] usb 2-1.2: new high-speed USB device number 19 using ci_hdrc
kernel: [  185.195174] hub 2-1.2:1.0: USB hub found
kernel: [  185.195706] hub 2-1.2:1.0: 4 ports detected
kernel: [  190.249755] usb 2-1.2.2: new low-speed USB device number 20 using ci_hdrc
kernel: [  190.416307] input: Logitech USB Optical Mouse as /devices/soc0/soc/2100000.aips-bus/2184200.usb/ci_hdrc.1/usb2/2-1/2-1.2/2-1.2.2/2-1.2.2:1.0/0003:046D:C05F.0008/input/input9
kernel: [  190.417250] hid-generic 0003:046D:C05F.0008: input: USB HID v1.11 Mouse [Logitech USB Optical Mouse] on usb-ci_hdrc.1-1.2.2/input0
weston[486]: [11:36:19.907] event6  - Logitech USB Optical Mouse: is tagged by udev as: Mouse
weston[486]: [11:36:19.907] event6  - Logitech USB Optical Mouse: device is a pointer
weston[486]: [11:36:19.908] libinput: configuring device "Logitech USB Optical Mouse".
weston[486]: [11:36:19.908] associating input device event6 with output HDMI-A-1 (none by udev)
kernel: [  193.531312] usb 2-1.2.2: USB disconnect, device number 20
weston[486]: [11:36:22.721] event6  - Logitech USB Optical Mouse: device removed
--> Re-attach the mouse into the KVM and then disconnect the KVM switch.
kernel: [  208.169755] usb 2-1.2.2: new low-speed USB device number 21 using ci_hdrc
kernel: [  208.333289] input: Logitech USB Optical Mouse as /devices/soc0/soc/2100000.aips-bus/2184200.usb/ci_hdrc.1/usb2/2-1/2-1.2/2-1.2.2/2-1.2.2:1.0/0003:046D:C05F.0009/input/input10
kernel: [  208.335566] hid-generic 0003:046D:C05F.0009: input: USB HID v1.11 Mouse [Logitech USB Optical Mouse] on usb-ci_hdrc.1-1.2.2/input0
weston[486]: [11:36:37.805] event6  - Logitech USB Optical Mouse: is tagged by udev as: Mouse
weston[486]: [11:36:37.806] event6  - Logitech USB Optical Mouse: device is a pointer
weston[486]: [11:36:37.806] libinput: configuring device "Logitech USB Optical Mouse".
weston[486]: [11:36:37.806] associating input device event6 with output HDMI-A-1 (none by udev)
kernel: [  211.426306] usb 2-1.2: USB disconnect, device number 19
kernel: [  211.426317] usb 2-1.2.2: USB disconnect, device number 21
weston[486]: [11:36:40.616] event6  - Logitech USB Optical Mouse: device removed
--> Attach a KVM switch into host port, attach a keyboard into the KVM and then disconnect it.
kernel: [  230.249761] usb 2-1.2: new high-speed USB device number 22 using ci_hdrc
kernel: [  230.502823] hub 2-1.2:1.0: USB hub found
kernel: [  230.503332] hub 2-1.2:1.0: 4 ports detected
kernel: [  237.349770] usb 2-1.2.2: new low-speed USB device number 23 using ci_hdrc
kernel: [  237.516824] input: Dell Dell QuietKey Keyboard as /devices/soc0/soc/2100000.aips-bus/2184200.usb/ci_hdrc.1/usb2/2-1/2-1.2/2-1.2.2/2-1.2.2:1.0/0003:413C:2106.000A/input/input11
kernel: [  237.580297] hid-generic 0003:413C:2106.000A: input: USB HID v1.10 Keyboard [Dell Dell QuietKey Keyboard] on usb-ci_hdrc.1-1.2.2/input0
systemd-logind[418]: Watching system buttons on /dev/input/event6 (Dell Dell QuietKey Keyboard)
weston[486]: [11:37:06.967] event6  - Dell Dell QuietKey Keyboard: is tagged by udev as: Keyboard
weston[486]: [11:37:06.969] event6  - Dell Dell QuietKey Keyboard: device is a keyboard
weston[486]: [11:37:06.971] libinput: configuring device "Dell Dell QuietKey Keyboard".
weston[486]: [11:37:06.972] associating input device event6 with output HDMI-A-1 (none by udev)
kernel: [  240.637437] usb 2-1.2.2: USB disconnect, device number 23
weston[486]: [11:37:09.897] event6  - Dell Dell QuietKey Keyboard: device removed
--> Re-attach the keyboard into the KVM and then disconnect the KVM switch.
kernel: [  257.319980] usb 2-1.2.2: new low-speed USB device number 24 using ci_hdrc
kernel: [  257.483579] input: Dell Dell QuietKey Keyboard as /devices/soc0/soc/2100000.aips-bus/2184200.usb/ci_hdrc.1/usb2/2-1/2-1.2/2-1.2.2/2-1.2.2:1.0/0003:413C:2106.000B/input/input12
kernel: [  257.560285] hid-generic 0003:413C:2106.000B: input: USB HID v1.10 Keyboard [Dell Dell QuietKey Keyboard] on usb-ci_hdrc.1-1.2.2/input0
systemd-logind[418]: Watching system buttons on /dev/input/event6 (Dell Dell QuietKey Keyboard)
weston[486]: [11:37:26.926] event6  - Dell Dell QuietKey Keyboard: is tagged by udev as: Keyboard
weston[486]: [11:37:26.928] event6  - Dell Dell QuietKey Keyboard: device is a keyboard
weston[486]: [11:37:26.930] libinput: configuring device "Dell Dell QuietKey Keyboard".
weston[486]: [11:37:26.931] associating input device event6 with output HDMI-A-1 (none by udev)
kernel: [  260.323187] usb 2-1.2: USB disconnect, device number 22
kernel: [  260.323199] usb 2-1.2.2: USB disconnect, device number 24
weston[486]: [11:37:29.585] event6  - Dell Dell QuietKey Keyboard: device removed
--> Attach a KVM switch into host port, attach a mouse and a keyboard into the KVM and then disconnect the KVM switch.
kernel: [  282.479747] usb 2-1.2: new high-speed USB device number 25 using ci_hdrc
kernel: [  282.732817] hub 2-1.2:1.0: USB hub found
kernel: [  282.733327] hub 2-1.2:1.0: 4 ports detected
kernel: [  288.299751] usb 2-1.2.2: new low-speed USB device number 26 using ci_hdrc
kernel: [  288.476005] input: Logitech USB Optical Mouse as /devices/soc0/soc/2100000.aips-bus/2184200.usb/ci_hdrc.1/usb2/2-1/2-1.2/2-1.2.2/2-1.2.2:1.0/0003:046D:C05F.000C/input/input13
kernel: [  288.476222] hid-generic 0003:046D:C05F.000C: input: USB HID v1.11 Mouse [Logitech USB Optical Mouse] on usb-ci_hdrc.1-1.2.2/input0
weston[486]: [11:37:57.956] event6  - Logitech USB Optical Mouse: is tagged by udev as: Mouse
weston[486]: [11:37:57.956] event6  - Logitech USB Optical Mouse: device is a pointer
weston[486]: [11:37:57.957] libinput: configuring device "Logitech USB Optical Mouse".
weston[486]: [11:37:57.957] associating input device event6 with output HDMI-A-1 (none by udev)
kernel: [  291.629751] usb 2-1.2.4: new low-speed USB device number 27 using ci_hdrc
kernel: [  291.793640] input: Dell Dell QuietKey Keyboard as /devices/soc0/soc/2100000.aips-bus/2184200.usb/ci_hdrc.1/usb2/2-1/2-1.2/2-1.2.4/2-1.2.4:1.0/0003:413C:2106.000D/input/input14
kernel: [  291.860657] hid-generic 0003:413C:2106.000D: input: USB HID v1.10 Keyboard [Dell Dell QuietKey Keyboard] on usb-ci_hdrc.1-1.2.4/input0
systemd-logind[418]: Watching system buttons on /dev/input/event7 (Dell Dell QuietKey Keyboard)
weston[486]: [11:38:01.238] event7  - Dell Dell QuietKey Keyboard: is tagged by udev as: Keyboard
weston[486]: [11:38:01.242] event7  - Dell Dell QuietKey Keyboard: device is a keyboard
weston[486]: [11:38:01.243] libinput: configuring device "Dell Dell QuietKey Keyboard".
weston[486]: [11:38:01.243] associating input device event7 with output HDMI-A-1 (none by udev)
kernel: [  295.394440] usb 2-1.2: USB disconnect, device number 25
kernel: [  295.394450] usb 2-1.2.2: USB disconnect, device number 26
weston[486]: [11:38:04.580] event6  - Logitech USB Optical Mouse: device removed
kernel: [  295.431931] usb 2-1.2.4: USB disconnect, device number 27
weston[486]: [11:38:04.725] event7  - Dell Dell QuietKey Keyboard: device removed
--> End test.
---
 drivers/usb/host/ehci-q.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/usb/host/ehci-q.c b/drivers/usb/host/ehci-q.c
index 3276304..da7fd12 100644
--- a/drivers/usb/host/ehci-q.c
+++ b/drivers/usb/host/ehci-q.c
@@ -206,8 +206,9 @@ static int qtd_copy_status (
 		if (token & QTD_STS_BABBLE) {
 			/* FIXME "must" disable babbling device's port too */
 			status = -EOVERFLOW;
-		/* CERR nonzero + halt --> stall */
-		} else if (QTD_CERR(token)) {
+		/* CERR nonzero and less than EHCI_TUNE_CERR + halt --> stall.
+		   This handles situation where stall comes after an error. */
+		} else if (QTD_CERR(token) && QTD_CERR(token) < EHCI_TUNE_CERR) {
 			status = -EPIPE;
 
 		/* In theory, more than one of the following bits can be set
@@ -228,6 +229,10 @@ static int qtd_copy_status (
 				usb_pipeendpoint(urb->pipe),
 				usb_pipein(urb->pipe) ? "in" : "out");
 			status = -EPROTO;
+		/* CERR equals EHCI_TUNE_CERR, no other errors + halt --> stall.
+		   This handles situation where stall comes without error bits set. */
+		} else if (QTD_CERR(token)) {
+			status = -EPIPE;
 		} else {	/* unknown */
 			status = -EPROTO;
 		}
-- 
2.7.4


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFCv1 1/1] USB: EHCI: Do not return -EPIPE when hub is disconnected
  2019-11-29 14:08 [RFCv1 1/1] USB: EHCI: Do not return -EPIPE when hub is disconnected Erkka Talvitie
@ 2019-12-02 19:43 ` Alan Stern
  2019-12-03  9:38   ` Erkka Talvitie
  0 siblings, 1 reply; 13+ messages in thread
From: Alan Stern @ 2019-12-02 19:43 UTC (permalink / raw)
  To: Erkka Talvitie; +Cc: gregkh, linux-usb, claus.baumgartner

On Fri, 29 Nov 2019, Erkka Talvitie wrote:

> When disconnecting a USB hub that has some child device(s) connected to it
> (such as a USB mouse), then the stack tries to clear halt and
> reset device(s) which are _already_ physically disconnected.

That behavior is understandable.  The kernel doesn't know that the
device has been disconnected until it can process the notification from
an upstream hub, and it can't process that notification while it's
trying to reset the device.

> The issue has been reproduced with:
> 
> CPU: IMX6D5EYM10AD or MCIMX6D5EYM10AE.
> SW: U-Boot 2019.07 and kernel 4.19.40.
> 
> In this situation there will be error bit for MMF active yet the
> CERR equals EHCI_TUNE_CERR + halt.

Why?  In general, setting the MMF bit does not cause the halt bit to be 
set (set Table 4-13 in the EHCI spec).  In fact, MMF refers to errors 
that occur on the host, not bus errors caused by a disconnected device.

> Existing implementation
> interprets this as a stall [1] (chapter 8.4.5).

That is the correct thing to do.  When a transaction error occurs
during a Complete-Split transaction, the host controller is supposed to
decrement the CERR value, set the XACT bit, and retry the transaction
unless the CERR value is 0 or there isn't enough time left in the
microframe.

The fact that you saw CERR equal to EHCI_TUNE_CERR and XACT clear
probably means that your EHCI hardware is not behaving properly.

> Fix for the issue is at first to check for a stall that comes after
> an error (the CERR has been decreased).
> 
> Then after that, check for other errors.
> 
> And at last check for stall without other errors (the CERR equals
> EHCI_TUNE_CERR as stall does not decrease the CERR [2] (table 3-16)).
> 
> What happens after the fix is that when disconnecting a hub with
> attached device(s) the situation is not interpret as a stall.
> 
> The specification [2] is not clear about error priorities, but
> since there is no explicit error bit for the stall, it is
> assumed to be lower priority than other errors.

On the contrary, the specification is very clear.  Since transaction
errors cause CERR to be decremented until it reaches 0, a nonzero value
for CERR means the endpoint was halted for some other reason.  And the
only other reason is a stall.  (Or end of the microframe, but there's 
no way to tell if that happened.)

> [1] https://www.usb.org/document-library/usb-20-specification, usb_20.pdf
> [2] https://www.intel.com/content/dam/www/public/us/en/documents/technical-specifications/ehci-specification-for-usb.pdf
> 
> Signed-off-by: Erkka Talvitie <erkka.talvitie@vincit.fi>

Can you duplicate this behavior on a standard PC, say with an Intel
EHCI controller?

>  drivers/usb/host/ehci-q.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/usb/host/ehci-q.c b/drivers/usb/host/ehci-q.c
> index 3276304..da7fd12 100644
> --- a/drivers/usb/host/ehci-q.c
> +++ b/drivers/usb/host/ehci-q.c
> @@ -206,8 +206,9 @@ static int qtd_copy_status (
>  		if (token & QTD_STS_BABBLE) {
>  			/* FIXME "must" disable babbling device's port too */
>  			status = -EOVERFLOW;
> -		/* CERR nonzero + halt --> stall */
> -		} else if (QTD_CERR(token)) {
> +		/* CERR nonzero and less than EHCI_TUNE_CERR + halt --> stall.
> +		   This handles situation where stall comes after an error. */

This comment doesn't make sense.  Who cares whether a stall comes after
an error or not?  It's still a stall and should be reported.

> +		} else if (QTD_CERR(token) && QTD_CERR(token) < EHCI_TUNE_CERR) {
>  			status = -EPIPE;

If an error occurs and the transaction is retried and the retry gets a
stall, then the final status should be -EPIPE, not something else.

>  		/* In theory, more than one of the following bits can be set
> @@ -228,6 +229,10 @@ static int qtd_copy_status (
>  				usb_pipeendpoint(urb->pipe),
>  				usb_pipein(urb->pipe) ? "in" : "out");
>  			status = -EPROTO;
> +		/* CERR equals EHCI_TUNE_CERR, no other errors + halt --> stall.
> +		   This handles situation where stall comes without error bits set. */

If CERR is equal to EHCI_TUNE_CERR then no other errors could have 
occurred (since any error will decrement CERR).  So why shouldn't this 
case be included with the earlier case?

> +		} else if (QTD_CERR(token)) {
> +			status = -EPIPE;
>  		} else {	/* unknown */
>  			status = -EPROTO;
>  		}

Alan Stern


^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [RFCv1 1/1] USB: EHCI: Do not return -EPIPE when hub is disconnected
  2019-12-02 19:43 ` Alan Stern
@ 2019-12-03  9:38   ` Erkka Talvitie
  2019-12-03 10:54     ` Erkka Talvitie
  2019-12-03 19:01     ` Alan Stern
  0 siblings, 2 replies; 13+ messages in thread
From: Erkka Talvitie @ 2019-12-03  9:38 UTC (permalink / raw)
  To: Alan Stern; +Cc: gregkh, linux-usb, claus.baumgartner

Thank you for the comments.

> -----Original Message-----
> From: Alan Stern <stern@rowland.harvard.edu>
> Sent: maanantai 2. joulukuuta 2019 21.44
> To: Erkka Talvitie <erkka.talvitie@vincit.fi>
> Cc: gregkh@linuxfoundation.org; linux-usb@vger.kernel.org;
> claus.baumgartner@med.ge.com
> Subject: Re: [RFCv1 1/1] USB: EHCI: Do not return -EPIPE when hub is
> disconnected
> 
> On Fri, 29 Nov 2019, Erkka Talvitie wrote:
> 
> > When disconnecting a USB hub that has some child device(s) connected
> > to it (such as a USB mouse), then the stack tries to clear halt and
> > reset device(s) which are _already_ physically disconnected.
> 
> That behavior is understandable.  The kernel doesn't know that the device
> has been disconnected until it can process the notification from an
upstream
> hub, and it can't process that notification while it's trying to reset the
device.
> 

Ok. I was thinking that in this use case , it should not be trying to clear
the halt (and reset the device when the clear halt fails). And this behavior
was altered by this RFC.

> > The issue has been reproduced with:
> >
> > CPU: IMX6D5EYM10AD or MCIMX6D5EYM10AE.
> > SW: U-Boot 2019.07 and kernel 4.19.40.
> >
> > In this situation there will be error bit for MMF active yet the CERR
> > equals EHCI_TUNE_CERR + halt.
> 
> Why?  In general, setting the MMF bit does not cause the halt bit to be
set
> (set Table 4-13 in the EHCI spec).  In fact, MMF refers to errors that
occur on
> the host, not bus errors caused by a disconnected device.
 
I do not know for sure why that happens. I was suspecting that there has
been MMF error and a stall at the same time. And in this RFC it was assumed
that MMF is with greater priority than stall.
The disconnecting of a hub with attached devices cause the MMF error bit set
even though it is a host side error.

> 
> > Existing implementation
> > interprets this as a stall [1] (chapter 8.4.5).
> 
> That is the correct thing to do.  When a transaction error occurs during a
> Complete-Split transaction, the host controller is supposed to decrement
the
> CERR value, set the XACT bit, and retry the transaction unless the CERR
value
> is 0 or there isn't enough time left in the microframe.
> 
> The fact that you saw CERR equal to EHCI_TUNE_CERR and XACT clear
> probably means that your EHCI hardware is not behaving properly.
 
If you refer to the XactErr  bit (Table 4-13 [2] )with the "XACT clear" then
unfortunately I did not check it's state ,so I am not sure if it was clear.
In this patch, like also in the existing implementation, the MMF bit is
checked first and since it is active in this situation the XactErr is not
checked.

I will check this.

But as in this use case the CERR has not been decreased yet there is error
bit active (MMF) do you see it is still correct to interpret it as a stall
(even when the halt bit is set)?

I have tried to find out more details about our EHCI controller version, but
I have only found out those CPU versions. It might help in a search whether
this could be a HW issue.

> 
> > Fix for the issue is at first to check for a stall that comes after an
> > error (the CERR has been decreased).
> >
> > Then after that, check for other errors.
> >
> > And at last check for stall without other errors (the CERR equals
> > EHCI_TUNE_CERR as stall does not decrease the CERR [2] (table 3-16)).
> >
> > What happens after the fix is that when disconnecting a hub with
> > attached device(s) the situation is not interpret as a stall.
> >
> > The specification [2] is not clear about error priorities, but since
> > there is no explicit error bit for the stall, it is assumed to be
> > lower priority than other errors.
> 
> On the contrary, the specification is very clear.  Since transaction
errors cause
> CERR to be decremented until it reaches 0, a nonzero value for CERR means
> the endpoint was halted for some other reason.  And the only other reason
> is a stall.  (Or end of the microframe, but there's no way to tell if that
> happened.)
 
I see your point. EHCI specification states that babble is a serious error
and it will also cause the halt. The babble error bit is checked first. But
the specification does not say about order of the other errors or about what
to do if there is an error, no retries executed, yet a halt (stall). For
example should the XactErr be checked before the MMF.

>(Or end of the microframe, but there's no way to tell if that happened.)

I was not able to locate this from the specification. Could you please point
out where this statement is from?
Could the way to tell if "end of microframe" happened, be what is done here
- check for MMF error bit and if CERR has not been decreased?

> 
> > [1] https://www.usb.org/document-library/usb-20-specification,
> > usb_20.pdf [2]
> >
> https://www.intel.com/content/dam/www/public/us/en/documents/techn
> ical
> > -specifications/ehci-specification-for-usb.pdf
> >
> > Signed-off-by: Erkka Talvitie <erkka.talvitie@vincit.fi>
> 
> Can you duplicate this behavior on a standard PC, say with an Intel EHCI
> controller?

We tested with native Linux PC and the error did not reproduce. However I am
not sure about the used host controller in that PC.
I will check that or try to get a setup with Intel EHCI. 

> 
> >  drivers/usb/host/ehci-q.c | 9 +++++++--
> >  1 file changed, 7 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/usb/host/ehci-q.c b/drivers/usb/host/ehci-q.c
> > index 3276304..da7fd12 100644
> > --- a/drivers/usb/host/ehci-q.c
> > +++ b/drivers/usb/host/ehci-q.c
> > @@ -206,8 +206,9 @@ static int qtd_copy_status (
> >  		if (token & QTD_STS_BABBLE) {
> >  			/* FIXME "must" disable babbling
> device's port too */
> >  			status = -EOVERFLOW;
> > -		/* CERR nonzero + halt --> stall */
> > -		} else if (QTD_CERR(token)) {
> > +		/* CERR nonzero and less than
> EHCI_TUNE_CERR + halt --> stall.
> > +		   This handles situation where stall comes after
> an error. */
> 
> This comment doesn't make sense.  Who cares whether a stall comes after
> an error or not?  It's still a stall and should be reported.

This was basically a comment trying to answer to this commit:
ba516de332c0  USB: EHCI: check for STALL before other errors

    "The existing code doesn't do this properly, because it tests for MMF
    (Missed MicroFrame) and DBE (Data Buffer Error) before testing the
    retry counter.  Thus, if a transaction gets either MMF or DBE the
    corresponding flag is set and the transaction is retried.  If the
    second attempt receives a STALL then -EPIPE is the correct return
    value.  But the existing code would see the MMF or DBE flag instead
    and return -EPROTO, -ENOSR, or -ECOMM."

The comment tries to explain that it will not revert the fix made in the
commit ba516de332c0.


> 
> > +		} else if (QTD_CERR(token) &&
> QTD_CERR(token) < EHCI_TUNE_CERR) {
> >  			status = -EPIPE;
> 
> If an error occurs and the transaction is retried and the retry gets a
stall, then
> the final status should be -EPIPE, not something else.

This is how the RFC also works. If the transaction has been retried and gets
stall then -EPIPE is returned.
Or if there are no errors but there is a stall then -EPIPE is returned.

The only difference in this patch in comparison to the existing
implementation is that if there is an error but the 
transaction has not been retried it is not interpret as a stall even if
there is a halt.
 
> 
> >  		/* In theory, more than one of the following bits
> can be set @@
> > -228,6 +229,10 @@ static int qtd_copy_status (
> >
> 	usb_pipeendpoint(urb->pipe),
> >  				usb_pipein(urb-
> >pipe) ? "in" : "out");
> >  			status = -EPROTO;
> > +		/* CERR equals EHCI_TUNE_CERR, no other
> errors + halt --> stall.
> > +		   This handles situation where stall comes
> without error bits set.
> > +*/
> 
> If CERR is equal to EHCI_TUNE_CERR then no other errors could have
> occurred (since any error will decrement CERR).  So why shouldn't this
case
> be included with the earlier case?

That is what I also understood from the EHCI specification. If there is an
error the CERR should decrease. Only babble, data buffer error and stall (or
no error) will not decrement the CERR.
However in this use case there is an error (MMF) but the CERR still equals
to the EHCI_TUNE_CERR.

So that's why the RFC separates these. This is the logic in the RFC:

1. The first if handles the situation where the stall comes after there has
been an error AND a retry. CERR has been decreased. This is so that
ba516de332c0 is not reverted.
2. The second if handles the situation where the halt has been caused by the
stall AND there are no other errors.
3. If there are errors + halt, but no retries executed (CERR equals
EHCI_TUNE_CERR) the response here is to return error value according to the
error bit, not returning EPIPE according to the stall.
 

> 
> > +		} else if (QTD_CERR(token)) {
> > +			status = -EPIPE;
> >  		} else {	/* unknown */
> >  			status = -EPROTO;
> >  		}
> 
> Alan Stern

Erkka Talvitie



^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [RFCv1 1/1] USB: EHCI: Do not return -EPIPE when hub is disconnected
  2019-12-03  9:38   ` Erkka Talvitie
@ 2019-12-03 10:54     ` Erkka Talvitie
  2019-12-03 13:35       ` Erkka Talvitie
  2019-12-03 19:01     ` Alan Stern
  1 sibling, 1 reply; 13+ messages in thread
From: Erkka Talvitie @ 2019-12-03 10:54 UTC (permalink / raw)
  To: Alan Stern; +Cc: gregkh, linux-usb, claus.baumgartner

> -----Original Message-----
> From: Erkka Talvitie <erkka.talvitie@vincit.fi>
> Sent: tiistai 3. joulukuuta 2019 11.39
> To: 'Alan Stern' <stern@rowland.harvard.edu>
> Cc: gregkh@linuxfoundation.org; linux-usb@vger.kernel.org;
> claus.baumgartner@med.ge.com
> Subject: RE: [RFCv1 1/1] USB: EHCI: Do not return -EPIPE when hub is
> disconnected
> 
> Thank you for the comments.
> 
> > -----Original Message-----
> > From: Alan Stern <stern@rowland.harvard.edu>
> > Sent: maanantai 2. joulukuuta 2019 21.44
> > To: Erkka Talvitie <erkka.talvitie@vincit.fi>
> > Cc: gregkh@linuxfoundation.org; linux-usb@vger.kernel.org;
> > claus.baumgartner@med.ge.com
> > Subject: Re: [RFCv1 1/1] USB: EHCI: Do not return -EPIPE when hub is
> > disconnected
> >
> > On Fri, 29 Nov 2019, Erkka Talvitie wrote:
> >
> > > When disconnecting a USB hub that has some child device(s) connected
> > > to it (such as a USB mouse), then the stack tries to clear halt and
> > > reset device(s) which are _already_ physically disconnected.
> >
> > That behavior is understandable.  The kernel doesn't know that the
> > device has been disconnected until it can process the notification
> > from an
> upstream
> > hub, and it can't process that notification while it's trying to reset
> > the
> device.
> >
> 
> Ok. I was thinking that in this use case , it should not be trying to
clear the halt
> (and reset the device when the clear halt fails). And this behavior was
altered
> by this RFC.
> 
> > > The issue has been reproduced with:
> > >
> > > CPU: IMX6D5EYM10AD or MCIMX6D5EYM10AE.
> > > SW: U-Boot 2019.07 and kernel 4.19.40.
> > >
> > > In this situation there will be error bit for MMF active yet the
> > > CERR equals EHCI_TUNE_CERR + halt.
> >
> > Why?  In general, setting the MMF bit does not cause the halt bit to
> > be
> set
> > (set Table 4-13 in the EHCI spec).  In fact, MMF refers to errors that
> occur on
> > the host, not bus errors caused by a disconnected device.
> 
> I do not know for sure why that happens. I was suspecting that there has
> been MMF error and a stall at the same time. And in this RFC it was
assumed
> that MMF is with greater priority than stall.
> The disconnecting of a hub with attached devices cause the MMF error bit
> set even though it is a host side error.
> 
> >
> > > Existing implementation
> > > interprets this as a stall [1] (chapter 8.4.5).
> >
> > That is the correct thing to do.  When a transaction error occurs
> > during a Complete-Split transaction, the host controller is supposed
> > to decrement
> the
> > CERR value, set the XACT bit, and retry the transaction unless the
> > CERR
> value
> > is 0 or there isn't enough time left in the microframe.
> >
> > The fact that you saw CERR equal to EHCI_TUNE_CERR and XACT clear
> > probably means that your EHCI hardware is not behaving properly.
> 
> If you refer to the XactErr  bit (Table 4-13 [2] )with the "XACT clear"
then
> unfortunately I did not check it's state ,so I am not sure if it was
clear.
> In this patch, like also in the existing implementation, the MMF bit is
checked
> first and since it is active in this situation the XactErr is not checked.
> 
> I will check this.
> 
> But as in this use case the CERR has not been decreased yet there is error
bit
> active (MMF) do you see it is still correct to interpret it as a stall
(even when
> the halt bit is set)?
> 
> I have tried to find out more details about our EHCI controller version,
but I
> have only found out those CPU versions. It might help in a search whether
> this could be a HW issue.
> 
> >
> > > Fix for the issue is at first to check for a stall that comes after
> > > an error (the CERR has been decreased).
> > >
> > > Then after that, check for other errors.
> > >
> > > And at last check for stall without other errors (the CERR equals
> > > EHCI_TUNE_CERR as stall does not decrease the CERR [2] (table 3-16)).
> > >
> > > What happens after the fix is that when disconnecting a hub with
> > > attached device(s) the situation is not interpret as a stall.
> > >
> > > The specification [2] is not clear about error priorities, but since
> > > there is no explicit error bit for the stall, it is assumed to be
> > > lower priority than other errors.
> >
> > On the contrary, the specification is very clear.  Since transaction
> errors cause
> > CERR to be decremented until it reaches 0, a nonzero value for CERR
> > means the endpoint was halted for some other reason.  And the only
> > other reason is a stall.  (Or end of the microframe, but there's no
> > way to tell if that
> > happened.)
> 
> I see your point. EHCI specification states that babble is a serious error
and it
> will also cause the halt. The babble error bit is checked first. But the
> specification does not say about order of the other errors or about what
to
> do if there is an error, no retries executed, yet a halt (stall). For
example
> should the XactErr be checked before the MMF.
> 
> >(Or end of the microframe, but there's no way to tell if that
> >happened.)
> 
> I was not able to locate this from the specification. Could you please
point
> out where this statement is from?
> Could the way to tell if "end of microframe" happened, be what is done
here
> - check for MMF error bit and if CERR has not been decreased?
> 
> >
> > > [1] https://www.usb.org/document-library/usb-20-specification,
> > > usb_20.pdf [2]
> > >
> >
> https://www.intel.com/content/dam/www/public/us/en/documents/techn
> > ical
> > > -specifications/ehci-specification-for-usb.pdf
> > >
> > > Signed-off-by: Erkka Talvitie <erkka.talvitie@vincit.fi>
> >
> > Can you duplicate this behavior on a standard PC, say with an Intel
> > EHCI controller?
> 
> We tested with native Linux PC and the error did not reproduce. However I
> am not sure about the used host controller in that PC.
> I will check that or try to get a setup with Intel EHCI.

The PC where the issue did not reproduce was ThinkPad T480 with:

3c:00.0 USB controller: Intel Corporation Device 15c1 (rev 01)

[    4.229310] xhci_hcd 0000:3c:00.0: xHCI Host Controller
[    4.238578] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
[    4.239754] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
[    4.240857] ehci-pci: EHCI PCI platform driver
[    4.241437] ohci-pci: OHCI PCI platform driver
[    4.243080] uhci_hcd: USB Universal Host Controller Interface driver 

> 
> >
> > >  drivers/usb/host/ehci-q.c | 9 +++++++--
> > >  1 file changed, 7 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/usb/host/ehci-q.c b/drivers/usb/host/ehci-q.c
> > > index 3276304..da7fd12 100644
> > > --- a/drivers/usb/host/ehci-q.c
> > > +++ b/drivers/usb/host/ehci-q.c
> > > @@ -206,8 +206,9 @@ static int qtd_copy_status (
> > >  		if (token & QTD_STS_BABBLE) {
> > >  			/* FIXME "must" disable babbling
> > device's port too */
> > >  			status = -EOVERFLOW;
> > > -		/* CERR nonzero + halt --> stall */
> > > -		} else if (QTD_CERR(token)) {
> > > +		/* CERR nonzero and less than
> > EHCI_TUNE_CERR + halt --> stall.
> > > +		   This handles situation where stall comes after
> > an error. */
> >
> > This comment doesn't make sense.  Who cares whether a stall comes
> > after an error or not?  It's still a stall and should be reported.
> 
> This was basically a comment trying to answer to this commit:
> ba516de332c0  USB: EHCI: check for STALL before other errors
> 
>     "The existing code doesn't do this properly, because it tests for MMF
>     (Missed MicroFrame) and DBE (Data Buffer Error) before testing the
>     retry counter.  Thus, if a transaction gets either MMF or DBE the
>     corresponding flag is set and the transaction is retried.  If the
>     second attempt receives a STALL then -EPIPE is the correct return
>     value.  But the existing code would see the MMF or DBE flag instead
>     and return -EPROTO, -ENOSR, or -ECOMM."
> 
> The comment tries to explain that it will not revert the fix made in the
> commit ba516de332c0.
> 
> 
> >
> > > +		} else if (QTD_CERR(token) &&
> > QTD_CERR(token) < EHCI_TUNE_CERR) {
> > >  			status = -EPIPE;
> >
> > If an error occurs and the transaction is retried and the retry gets a
> stall, then
> > the final status should be -EPIPE, not something else.
> 
> This is how the RFC also works. If the transaction has been retried and
gets
> stall then -EPIPE is returned.
> Or if there are no errors but there is a stall then -EPIPE is returned.
> 
> The only difference in this patch in comparison to the existing
> implementation is that if there is an error but the transaction has not
been
> retried it is not interpret as a stall even if there is a halt.
> 
> >
> > >  		/* In theory, more than one of the following bits
> > can be set @@
> > > -228,6 +229,10 @@ static int qtd_copy_status (
> > >
> > 	usb_pipeendpoint(urb->pipe),
> > >  				usb_pipein(urb-
> > >pipe) ? "in" : "out");
> > >  			status = -EPROTO;
> > > +		/* CERR equals EHCI_TUNE_CERR, no other
> > errors + halt --> stall.
> > > +		   This handles situation where stall comes
> > without error bits set.
> > > +*/
> >
> > If CERR is equal to EHCI_TUNE_CERR then no other errors could have
> > occurred (since any error will decrement CERR).  So why shouldn't this
> case
> > be included with the earlier case?
> 
> That is what I also understood from the EHCI specification. If there is an
error
> the CERR should decrease. Only babble, data buffer error and stall (or no
> error) will not decrement the CERR.
> However in this use case there is an error (MMF) but the CERR still equals
to
> the EHCI_TUNE_CERR.
> 
> So that's why the RFC separates these. This is the logic in the RFC:
> 
> 1. The first if handles the situation where the stall comes after there
has
> been an error AND a retry. CERR has been decreased. This is so that
> ba516de332c0 is not reverted.
> 2. The second if handles the situation where the halt has been caused by
the
> stall AND there are no other errors.
> 3. If there are errors + halt, but no retries executed (CERR equals
> EHCI_TUNE_CERR) the response here is to return error value according to
> the error bit, not returning EPIPE according to the stall.
 
I am using CERR here confusingly. It is not a retry counter, instead it is
an error counter.

> 
> >
> > > +		} else if (QTD_CERR(token)) {
> > > +			status = -EPIPE;
> > >  		} else {	/* unknown */
> > >  			status = -EPROTO;
> > >  		}
> >
> > Alan Stern
> 
> Erkka Talvitie
> 

Erkka Talvitie



^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [RFCv1 1/1] USB: EHCI: Do not return -EPIPE when hub is disconnected
  2019-12-03 10:54     ` Erkka Talvitie
@ 2019-12-03 13:35       ` Erkka Talvitie
  0 siblings, 0 replies; 13+ messages in thread
From: Erkka Talvitie @ 2019-12-03 13:35 UTC (permalink / raw)
  To: Alan Stern; +Cc: gregkh, linux-usb, claus.baumgartner



> -----Original Message-----
> From: Erkka Talvitie <erkka.talvitie@vincit.fi>
> Sent: tiistai 3. joulukuuta 2019 12.54
> To: 'Alan Stern' <stern@rowland.harvard.edu>
> Cc: gregkh@linuxfoundation.org; linux-usb@vger.kernel.org;
> claus.baumgartner@med.ge.com
> Subject: RE: [RFCv1 1/1] USB: EHCI: Do not return -EPIPE when hub is
> disconnected
> 
> > -----Original Message-----
> > From: Erkka Talvitie <erkka.talvitie@vincit.fi>
> > Sent: tiistai 3. joulukuuta 2019 11.39
> > To: 'Alan Stern' <stern@rowland.harvard.edu>
> > Cc: gregkh@linuxfoundation.org; linux-usb@vger.kernel.org;
> > claus.baumgartner@med.ge.com
> > Subject: RE: [RFCv1 1/1] USB: EHCI: Do not return -EPIPE when hub is
> > disconnected
> >
> > Thank you for the comments.
> >
> > > -----Original Message-----
> > > From: Alan Stern <stern@rowland.harvard.edu>
> > > Sent: maanantai 2. joulukuuta 2019 21.44
> > > To: Erkka Talvitie <erkka.talvitie@vincit.fi>
> > > Cc: gregkh@linuxfoundation.org; linux-usb@vger.kernel.org;
> > > claus.baumgartner@med.ge.com
> > > Subject: Re: [RFCv1 1/1] USB: EHCI: Do not return -EPIPE when hub is
> > > disconnected
> > >
> > > On Fri, 29 Nov 2019, Erkka Talvitie wrote:
> > >
> > > > When disconnecting a USB hub that has some child device(s)
> > > > connected to it (such as a USB mouse), then the stack tries to
> > > > clear halt and reset device(s) which are _already_ physically
> disconnected.
> > >
> > > That behavior is understandable.  The kernel doesn't know that the
> > > device has been disconnected until it can process the notification
> > > from an
> > upstream
> > > hub, and it can't process that notification while it's trying to
> > > reset the
> > device.
> > >
> >
> > Ok. I was thinking that in this use case , it should not be trying to
> clear the halt
> > (and reset the device when the clear halt fails). And this behavior
> > was
> altered
> > by this RFC.
> >
> > > > The issue has been reproduced with:
> > > >
> > > > CPU: IMX6D5EYM10AD or MCIMX6D5EYM10AE.
> > > > SW: U-Boot 2019.07 and kernel 4.19.40.
> > > >
> > > > In this situation there will be error bit for MMF active yet the
> > > > CERR equals EHCI_TUNE_CERR + halt.
> > >
> > > Why?  In general, setting the MMF bit does not cause the halt bit to
> > > be
> > set
> > > (set Table 4-13 in the EHCI spec).  In fact, MMF refers to errors
> > > that
> > occur on
> > > the host, not bus errors caused by a disconnected device.
> >
> > I do not know for sure why that happens. I was suspecting that there
> > has been MMF error and a stall at the same time. And in this RFC it
> > was
> assumed
> > that MMF is with greater priority than stall.
> > The disconnecting of a hub with attached devices cause the MMF error
> > bit set even though it is a host side error.
> >
> > >
> > > > Existing implementation
> > > > interprets this as a stall [1] (chapter 8.4.5).
> > >
> > > That is the correct thing to do.  When a transaction error occurs
> > > during a Complete-Split transaction, the host controller is supposed
> > > to decrement
> > the
> > > CERR value, set the XACT bit, and retry the transaction unless the
> > > CERR
> > value
> > > is 0 or there isn't enough time left in the microframe.
> > >
> > > The fact that you saw CERR equal to EHCI_TUNE_CERR and XACT clear
> > > probably means that your EHCI hardware is not behaving properly.
> >
> > If you refer to the XactErr  bit (Table 4-13 [2] )with the "XACT clear"
> then
> > unfortunately I did not check it's state ,so I am not sure if it was
> clear.
> > In this patch, like also in the existing implementation, the MMF bit
> > is
> checked
> > first and since it is active in this situation the XactErr is not
checked.
> >
> > I will check this.
> >
> > But as in this use case the CERR has not been decreased yet there is
> > error
> bit
> > active (MMF) do you see it is still correct to interpret it as a stall
> (even when
> > the halt bit is set)?
> >
> > I have tried to find out more details about our EHCI controller
> > version,
> but I
> > have only found out those CPU versions. It might help in a search
> > whether this could be a HW issue.
> >
> > >
> > > > Fix for the issue is at first to check for a stall that comes
> > > > after an error (the CERR has been decreased).
> > > >
> > > > Then after that, check for other errors.
> > > >
> > > > And at last check for stall without other errors (the CERR equals
> > > > EHCI_TUNE_CERR as stall does not decrease the CERR [2] (table
3-16)).
> > > >
> > > > What happens after the fix is that when disconnecting a hub with
> > > > attached device(s) the situation is not interpret as a stall.
> > > >
> > > > The specification [2] is not clear about error priorities, but
> > > > since there is no explicit error bit for the stall, it is assumed
> > > > to be lower priority than other errors.
> > >
> > > On the contrary, the specification is very clear.  Since transaction
> > errors cause
> > > CERR to be decremented until it reaches 0, a nonzero value for CERR
> > > means the endpoint was halted for some other reason.  And the only
> > > other reason is a stall.  (Or end of the microframe, but there's no
> > > way to tell if that
> > > happened.)
> >
> > I see your point. EHCI specification states that babble is a serious
> > error
> and it
> > will also cause the halt. The babble error bit is checked first. But
> > the specification does not say about order of the other errors or
> > about what
> to
> > do if there is an error, no retries executed, yet a halt (stall). For
> example
> > should the XactErr be checked before the MMF.
> >
> > >(Or end of the microframe, but there's no way to tell if that
> > >happened.)
> >
> > I was not able to locate this from the specification. Could you please
> point
> > out where this statement is from?
> > Could the way to tell if "end of microframe" happened, be what is done
> here
> > - check for MMF error bit and if CERR has not been decreased?
> >
> > >
> > > > [1] https://www.usb.org/document-library/usb-20-specification,
> > > > usb_20.pdf [2]
> > > >
> > >
> >
> https://www.intel.com/content/dam/www/public/us/en/documents/techn
> > > ical
> > > > -specifications/ehci-specification-for-usb.pdf
> > > >
> > > > Signed-off-by: Erkka Talvitie <erkka.talvitie@vincit.fi>
> > >
> > > Can you duplicate this behavior on a standard PC, say with an Intel
> > > EHCI controller?
> >
> > We tested with native Linux PC and the error did not reproduce.
> > However I am not sure about the used host controller in that PC.
> > I will check that or try to get a setup with Intel EHCI.
> 
> The PC where the issue did not reproduce was ThinkPad T480 with:
> 
> 3c:00.0 USB controller: Intel Corporation Device 15c1 (rev 01)
> 
> [    4.229310] xhci_hcd 0000:3c:00.0: xHCI Host Controller
> [    4.238578] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
> [    4.239754] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
> [    4.240857] ehci-pci: EHCI PCI platform driver
> [    4.241437] ohci-pci: OHCI PCI platform driver
> [    4.243080] uhci_hcd: USB Universal Host Controller Interface driver
> 

Another test, HP Proliant Microserver Gen8
Linux version 4.2.3-300.fc23.x86_64

ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
ehci-pci: EHCI PCI platform driver

With this setup the behavior reproduces.

kernel: usb 3-1.2: clear tt 1 (0080) error -71
kernel: usb 3-1.2-port1: cannot reset (err = -71)
kernel: usb 3-1.2-port1: cannot reset (err = -71)
kernel: usb 3-1.2-port1: cannot reset (err = -71)
kernel: usb 3-1.2-port1: cannot reset (err = -71)
kernel: usb 3-1.2-port1: cannot reset (err = -71)
kernel: usb 3-1.2-port1: Cannot enable. Maybe the USB cable is bad?
kernel: usb 3-1.2-port1: cannot disable (err = -71)
kernel: usb 3-1.2-port1: cannot reset (err = -71)
kernel: usb 3-1.2-port1: cannot reset (err = -71)
kernel: usb 3-1.2-port1: cannot reset (err = -71)
kernel: usb 3-1.2-port1: cannot reset (err = -71)
kernel: usb 3-1.2-port1: cannot reset (err = -71)
kernel: usb 3-1.2-port1: Cannot enable. Maybe the USB cable is bad?
kernel: usb 3-1.2-port1: cannot disable (err = -71)
kernel: usb 3-1.2-port1: cannot reset (err = -71)
kernel: usb 3-1.2-port1: cannot reset (err = -71)
kernel: usb 3-1.2-port1: cannot reset (err = -71)
kernel: usb 3-1.2-port1: cannot reset (err = -71)
kernel: usb 3-1.2-port1: cannot reset (err = -71)
kernel: usb 3-1.2-port1: Cannot enable. Maybe the USB cable is bad?
kernel: usb 3-1.2-port1: cannot disable (err = -71)
kernel: usb 3-1.2-port1: cannot reset (err = -71)
kernel: usb 3-1.2-port1: cannot reset (err = -71)
kernel: usb 3-1.2-port1: cannot reset (err = -71)
kernel: usb 3-1.2-port1: cannot reset (err = -71)
kernel: usb 3-1.2-port1: cannot reset (err = -71)
kernel: usb 3-1.2-port1: Cannot enable. Maybe the USB cable is bad?
kernel: usb 3-1.2-port1: cannot disable (err = -71)
kernel: usb 3-1.2-port1: cannot disable (err = -71)
kernel: hub 3-1.2:1.0: hub_port_status failed (err = -71)

> >
> > >
> > > >  drivers/usb/host/ehci-q.c | 9 +++++++--
> > > >  1 file changed, 7 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/drivers/usb/host/ehci-q.c b/drivers/usb/host/ehci-q.c
> > > > index 3276304..da7fd12 100644
> > > > --- a/drivers/usb/host/ehci-q.c
> > > > +++ b/drivers/usb/host/ehci-q.c
> > > > @@ -206,8 +206,9 @@ static int qtd_copy_status (
> > > >  		if (token & QTD_STS_BABBLE) {
> > > >  			/* FIXME "must" disable babbling
> > > device's port too */
> > > >  			status = -EOVERFLOW;
> > > > -		/* CERR nonzero + halt --> stall */
> > > > -		} else if (QTD_CERR(token)) {
> > > > +		/* CERR nonzero and less than
> > > EHCI_TUNE_CERR + halt --> stall.
> > > > +		   This handles situation where stall comes after
> > > an error. */
> > >
> > > This comment doesn't make sense.  Who cares whether a stall comes
> > > after an error or not?  It's still a stall and should be reported.
> >
> > This was basically a comment trying to answer to this commit:
> > ba516de332c0  USB: EHCI: check for STALL before other errors
> >
> >     "The existing code doesn't do this properly, because it tests for
MMF
> >     (Missed MicroFrame) and DBE (Data Buffer Error) before testing the
> >     retry counter.  Thus, if a transaction gets either MMF or DBE the
> >     corresponding flag is set and the transaction is retried.  If the
> >     second attempt receives a STALL then -EPIPE is the correct return
> >     value.  But the existing code would see the MMF or DBE flag instead
> >     and return -EPROTO, -ENOSR, or -ECOMM."
> >
> > The comment tries to explain that it will not revert the fix made in
> > the commit ba516de332c0.
> >
> >
> > >
> > > > +		} else if (QTD_CERR(token) &&
> > > QTD_CERR(token) < EHCI_TUNE_CERR) {
> > > >  			status = -EPIPE;
> > >
> > > If an error occurs and the transaction is retried and the retry gets
> > > a
> > stall, then
> > > the final status should be -EPIPE, not something else.
> >
> > This is how the RFC also works. If the transaction has been retried
> > and
> gets
> > stall then -EPIPE is returned.
> > Or if there are no errors but there is a stall then -EPIPE is returned.
> >
> > The only difference in this patch in comparison to the existing
> > implementation is that if there is an error but the transaction has
> > not
> been
> > retried it is not interpret as a stall even if there is a halt.
> >
> > >
> > > >  		/* In theory, more than one of the following bits
> > > can be set @@
> > > > -228,6 +229,10 @@ static int qtd_copy_status (
> > > >
> > > 	usb_pipeendpoint(urb->pipe),
> > > >  				usb_pipein(urb-
> > > >pipe) ? "in" : "out");
> > > >  			status = -EPROTO;
> > > > +		/* CERR equals EHCI_TUNE_CERR, no other
> > > errors + halt --> stall.
> > > > +		   This handles situation where stall comes
> > > without error bits set.
> > > > +*/
> > >
> > > If CERR is equal to EHCI_TUNE_CERR then no other errors could have
> > > occurred (since any error will decrement CERR).  So why shouldn't
> > > this
> > case
> > > be included with the earlier case?
> >
> > That is what I also understood from the EHCI specification. If there
> > is an
> error
> > the CERR should decrease. Only babble, data buffer error and stall (or
> > no
> > error) will not decrement the CERR.
> > However in this use case there is an error (MMF) but the CERR still
> > equals
> to
> > the EHCI_TUNE_CERR.
> >
> > So that's why the RFC separates these. This is the logic in the RFC:
> >
> > 1. The first if handles the situation where the stall comes after
> > there
> has
> > been an error AND a retry. CERR has been decreased. This is so that
> > ba516de332c0 is not reverted.
> > 2. The second if handles the situation where the halt has been caused
> > by
> the
> > stall AND there are no other errors.
> > 3. If there are errors + halt, but no retries executed (CERR equals
> > EHCI_TUNE_CERR) the response here is to return error value according
> > to the error bit, not returning EPIPE according to the stall.
> 
> I am using CERR here confusingly. It is not a retry counter, instead it is
an
> error counter.
> 
> >
> > >
> > > > +		} else if (QTD_CERR(token)) {
> > > > +			status = -EPIPE;
> > > >  		} else {	/* unknown */
> > > >  			status = -EPROTO;
> > > >  		}
> > >
> > > Alan Stern
> >
> > Erkka Talvitie
> >
> 
> Erkka Talvitie
>
 
Erkka Talvitie



^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [RFCv1 1/1] USB: EHCI: Do not return -EPIPE when hub is disconnected
  2019-12-03  9:38   ` Erkka Talvitie
  2019-12-03 10:54     ` Erkka Talvitie
@ 2019-12-03 19:01     ` Alan Stern
  2019-12-04  8:55       ` Erkka Talvitie
  1 sibling, 1 reply; 13+ messages in thread
From: Alan Stern @ 2019-12-03 19:01 UTC (permalink / raw)
  To: Erkka Talvitie; +Cc: gregkh, linux-usb, claus.baumgartner

On Tue, 3 Dec 2019, Erkka Talvitie wrote:

> Thank you for the comments.
> 
> > -----Original Message-----
> > From: Alan Stern <stern@rowland.harvard.edu>
> > Sent: maanantai 2. joulukuuta 2019 21.44
> > To: Erkka Talvitie <erkka.talvitie@vincit.fi>
> > Cc: gregkh@linuxfoundation.org; linux-usb@vger.kernel.org;
> > claus.baumgartner@med.ge.com
> > Subject: Re: [RFCv1 1/1] USB: EHCI: Do not return -EPIPE when hub is
> > disconnected
> > 
> > On Fri, 29 Nov 2019, Erkka Talvitie wrote:
> > 
> > > When disconnecting a USB hub that has some child device(s) connected
> > > to it (such as a USB mouse), then the stack tries to clear halt and
> > > reset device(s) which are _already_ physically disconnected.
> > 
> > That behavior is understandable.  The kernel doesn't know that the device
> > has been disconnected until it can process the notification from an
> upstream
> > hub, and it can't process that notification while it's trying to reset the
> device.
> > 
> 
> Ok. I was thinking that in this use case , it should not be trying to clear
> the halt (and reset the device when the clear halt fails). And this behavior
> was altered by this RFC.

Actually, the situation is a little different from what I described
above.  When you unplug the high-speed hub, the kernel doesn't know the
hub has been disconnected until it receives a notification from the
upstream hub.  The kernel checks for those notifications at roughly
250-ms intervals, so it can take up to that long before the kernel
realizes the high-speed hub is gone.  Until that time, the kernel will
keep trying to reset and communicate with the hub and the devices that
were attached to it.

You can see this in the logs that you posted in your original report.  
In each case, the "cannot reset" and -71 errors lasted for less than 
250 ms.

I just tried doing the same experiment on my PC (which does use all 
Intel hardware and an EHCI controller).  Here's the output from when I 
unplugged the high-speed hub:

[ 6321.245528] usb 1-1.4: clear tt 4 (00a0) error -71
[ 6321.250903] usb 1-1.4-port4: cannot reset (err = -71)
[ 6321.255155] usb 1-1.4-port4: cannot reset (err = -71)
[ 6321.259403] usb 1-1.4-port4: cannot reset (err = -71)
[ 6321.263657] usb 1-1.4-port4: cannot reset (err = -71)
[ 6321.267905] usb 1-1.4-port4: cannot reset (err = -71)
[ 6321.267910] usb 1-1.4-port4: Cannot enable. Maybe the USB cable is bad?
[ 6321.272155] usb 1-1.4-port4: cannot disable (err = -71)
[ 6321.276405] usb 1-1.4-port4: cannot reset (err = -71)
[ 6321.280653] usb 1-1.4-port4: cannot reset (err = -71)
[ 6321.284905] usb 1-1.4-port4: cannot reset (err = -71)
[ 6321.289155] usb 1-1.4-port4: cannot reset (err = -71)
[ 6321.293403] usb 1-1.4-port4: cannot reset (err = -71)
[ 6321.293407] usb 1-1.4-port4: Cannot enable. Maybe the USB cable is bad?
[ 6321.297656] usb 1-1.4-port4: cannot disable (err = -71)
[ 6321.301904] usb 1-1.4-port4: cannot reset (err = -71)
[ 6321.306152] usb 1-1.4-port4: cannot reset (err = -71)
[ 6321.310402] usb 1-1.4-port4: cannot reset (err = -71)
[ 6321.314653] usb 1-1.4-port4: cannot reset (err = -71)
[ 6321.318904] usb 1-1.4-port4: cannot reset (err = -71)
[ 6321.318908] usb 1-1.4-port4: Cannot enable. Maybe the USB cable is bad?
[ 6321.323154] usb 1-1.4-port4: cannot disable (err = -71)
[ 6321.327404] usb 1-1.4-port4: cannot reset (err = -71)
[ 6321.331651] usb 1-1.4-port4: cannot reset (err = -71)
[ 6321.335902] usb 1-1.4-port4: cannot reset (err = -71)
[ 6321.340155] usb 1-1.4-port4: cannot reset (err = -71)
[ 6321.344402] usb 1-1.4-port4: cannot reset (err = -71)
[ 6321.344406] usb 1-1.4-port4: Cannot enable. Maybe the USB cable is bad?
[ 6321.348652] usb 1-1.4-port4: cannot disable (err = -71)
[ 6321.352904] usb 1-1.4-port4: cannot disable (err = -71)
[ 6321.357154] hub 1-1.4:1.0: hub_ext_port_status failed (err = -71)
[ 6321.437000] usb 1-1.4: USB disconnect, device number 9
[ 6321.437010] usb 1-1.4.4: USB disconnect, device number 10

As you can see, the time interval runs from 6321.245 to 6321.437, 
roughly 192 ms < 250 ms.  This is the expected behavior.

I did not try to check whether the MMF bit got set or what value CERR 
had.


> But as in this use case the CERR has not been decreased yet there is error
> bit active (MMF) do you see it is still correct to interpret it as a stall
> (even when the halt bit is set)?

See below.

> I have tried to find out more details about our EHCI controller version, but
> I have only found out those CPU versions. It might help in a search whether
> this could be a HW issue.


> > > The specification [2] is not clear about error priorities, but since
> > > there is no explicit error bit for the stall, it is assumed to be
> > > lower priority than other errors.
> > 
> > On the contrary, the specification is very clear.  Since transaction
> errors cause
> > CERR to be decremented until it reaches 0, a nonzero value for CERR means
> > the endpoint was halted for some other reason.  And the only other reason
> > is a stall.  (Or end of the microframe, but there's no way to tell if that
> > happened.)
>  
> I see your point. EHCI specification states that babble is a serious error
> and it will also cause the halt. The babble error bit is checked first. But
> the specification does not say about order of the other errors or about what
> to do if there is an error, no retries executed, yet a halt (stall). For
> example should the XactErr be checked before the MMF.

I think the order doesn't matter.  In fact, it's possible that both 
errors occurred, since the transaction gets retried multiple times.

> >(Or end of the microframe, but there's no way to tell if that happened.)
> 
> I was not able to locate this from the specification. Could you please point
> out where this statement is from?

"Enhanced Host Controller Interface Specification for Universal Serial 
Bus", rev 1.0 (2002), p. 110:

Transaction Error (XactErr). Timeout, data CRC failure, etc. The CErr
field is decremented and the XactErr bit in the Status field is set to
a one. The complete split transaction is immediately retried (if Cerr
is non-zero). If there is not enough time in the micro-frame to
complete the retry and the endpoint is an IN, or CErr is decremented to
a zero from a one, the queue is halted. If there is not enough time in
the micro-frame to complete the retry and the endpoint is an OUT and
CErr is not zero, then this state is exited (i.e. return to Do Start
Split). This results in a retry of the entire OUT split transaction, at
the next poll period. Refer to Chapter 11 Hubs (specifically the
section full- and low-speed Interrupts) in the USB Specification
Revision 2.0 for detailed requirements on why these errors must be
immediately retried. •

> Could the way to tell if "end of microframe" happened, be what is done here
> - check for MMF error bit and if CERR has not been decreased?

No, because the "end of microframe" situation happens when the host 
controller is handling a transaction error, whereas MMF gets set when 
the host controller detects an error on the host.

> > > diff --git a/drivers/usb/host/ehci-q.c b/drivers/usb/host/ehci-q.c
> > > index 3276304..da7fd12 100644
> > > --- a/drivers/usb/host/ehci-q.c
> > > +++ b/drivers/usb/host/ehci-q.c
> > > @@ -206,8 +206,9 @@ static int qtd_copy_status (
> > >  		if (token & QTD_STS_BABBLE) {
> > >  			/* FIXME "must" disable babbling
> > device's port too */
> > >  			status = -EOVERFLOW;
> > > -		/* CERR nonzero + halt --> stall */
> > > -		} else if (QTD_CERR(token)) {
> > > +		/* CERR nonzero and less than
> > EHCI_TUNE_CERR + halt --> stall.
> > > +		   This handles situation where stall comes after
> > an error. */
> > 
> > This comment doesn't make sense.  Who cares whether a stall comes after
> > an error or not?  It's still a stall and should be reported.
> 
> This was basically a comment trying to answer to this commit:
> ba516de332c0  USB: EHCI: check for STALL before other errors
> 
>     "The existing code doesn't do this properly, because it tests for MMF
>     (Missed MicroFrame) and DBE (Data Buffer Error) before testing the
>     retry counter.  Thus, if a transaction gets either MMF or DBE the
>     corresponding flag is set and the transaction is retried.  If the
>     second attempt receives a STALL then -EPIPE is the correct return
>     value.  But the existing code would see the MMF or DBE flag instead
>     and return -EPROTO, -ENOSR, or -ECOMM."
> 
> The comment tries to explain that it will not revert the fix made in the
> commit ba516de332c0.

Okay, I get it.  You're trying to rely on the strange behavior of the 
MMF bit.

I'm not sure this is a good idea.  Suppose MMF gets set for some other 
reason (a genuine error on the host) and then the transaction gets a 
STALL on the next retry.  Since host errors don't decrement CERR, your 
patch would cause the driver to return -EPROTO instead of -EPIPE.

> > > +		} else if (QTD_CERR(token) &&
> > QTD_CERR(token) < EHCI_TUNE_CERR) {
> > >  			status = -EPIPE;
> > 
> > If an error occurs and the transaction is retried and the retry gets a
> stall, then
> > the final status should be -EPIPE, not something else.
> 
> This is how the RFC also works. If the transaction has been retried and gets
> stall then -EPIPE is returned.
> Or if there are no errors but there is a stall then -EPIPE is returned.
> 
> The only difference in this patch in comparison to the existing
> implementation is that if there is an error but the 
> transaction has not been retried it is not interpret as a stall even if
> there is a halt.

Sometimes that will be the right behavior and other times it won't.  
However, it looks like there may be a way to tell which situation we
are in.  Setting the MMF bit will cause the queue to halt immediately
if the transaction is IN, but not if it is OUT (see Table 4-13 in the 
EHCI spec).

So if CERR == EHCI_TUNE_CERR and the QTD_PID != 1 (not IN) then we
should return -EPIPE, as the existing code does.  But if QTD_PID == 1
then the code should continue, as your patch does -- with one
difference: Put the additional check for EHCI_TUNE_CERR between the
tests for DBE and XACT instead of after XACT (because XACT would
decrement CERR whereas DBE wouldn't).

Can you make that change and test it?

Alan Stern


^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [RFCv1 1/1] USB: EHCI: Do not return -EPIPE when hub is disconnected
  2019-12-03 19:01     ` Alan Stern
@ 2019-12-04  8:55       ` Erkka Talvitie
  2019-12-04 13:18         ` Erkka Talvitie
  0 siblings, 1 reply; 13+ messages in thread
From: Erkka Talvitie @ 2019-12-04  8:55 UTC (permalink / raw)
  To: Alan Stern; +Cc: gregkh, linux-usb, claus.baumgartner

Thank you for the comments.

> -----Original Message-----
> From: Alan Stern <stern@rowland.harvard.edu>
> Sent: tiistai 3. joulukuuta 2019 21.01
> To: Erkka Talvitie <erkka.talvitie@vincit.fi>
> Cc: gregkh@linuxfoundation.org; linux-usb@vger.kernel.org;
> claus.baumgartner@med.ge.com
> Subject: RE: [RFCv1 1/1] USB: EHCI: Do not return -EPIPE when hub is
> disconnected
> 
> On Tue, 3 Dec 2019, Erkka Talvitie wrote:
> 
> > Thank you for the comments.
> >
> > > -----Original Message-----
> > > From: Alan Stern <stern@rowland.harvard.edu>
> > > Sent: maanantai 2. joulukuuta 2019 21.44
> > > To: Erkka Talvitie <erkka.talvitie@vincit.fi>
> > > Cc: gregkh@linuxfoundation.org; linux-usb@vger.kernel.org;
> > > claus.baumgartner@med.ge.com
> > > Subject: Re: [RFCv1 1/1] USB: EHCI: Do not return -EPIPE when hub is
> > > disconnected
> > >
> > > On Fri, 29 Nov 2019, Erkka Talvitie wrote:
> > >
> > > > When disconnecting a USB hub that has some child device(s)
> > > > connected to it (such as a USB mouse), then the stack tries to
> > > > clear halt and reset device(s) which are _already_ physically
> disconnected.
> > >
> > > That behavior is understandable.  The kernel doesn't know that the
> > > device has been disconnected until it can process the notification
> > > from an
> > upstream
> > > hub, and it can't process that notification while it's trying to
> > > reset the
> > device.
> > >
> >
> > Ok. I was thinking that in this use case , it should not be trying to
> > clear the halt (and reset the device when the clear halt fails). And
> > this behavior was altered by this RFC.
> 
> Actually, the situation is a little different from what I described above.  When
> you unplug the high-speed hub, the kernel doesn't know the hub has been
> disconnected until it receives a notification from the upstream hub.  The
> kernel checks for those notifications at roughly 250-ms intervals, so it can
> take up to that long before the kernel realizes the high-speed hub is gone.
> Until that time, the kernel will keep trying to reset and communicate with the
> hub and the devices that were attached to it.
> 
> You can see this in the logs that you posted in your original report.
> In each case, the "cannot reset" and -71 errors lasted for less than
> 250 ms.
> 
> I just tried doing the same experiment on my PC (which does use all Intel
> hardware and an EHCI controller).  Here's the output from when I unplugged
> the high-speed hub:
> 
> [ 6321.245528] usb 1-1.4: clear tt 4 (00a0) error -71 [ 6321.250903] usb 1-1.4-
> port4: cannot reset (err = -71) [ 6321.255155] usb 1-1.4-port4: cannot reset
> (err = -71) [ 6321.259403] usb 1-1.4-port4: cannot reset (err = -71) [
> 6321.263657] usb 1-1.4-port4: cannot reset (err = -71) [ 6321.267905] usb 1-
> 1.4-port4: cannot reset (err = -71) [ 6321.267910] usb 1-1.4-port4: Cannot
> enable. Maybe the USB cable is bad?
> [ 6321.272155] usb 1-1.4-port4: cannot disable (err = -71) [ 6321.276405] usb
> 1-1.4-port4: cannot reset (err = -71) [ 6321.280653] usb 1-1.4-port4: cannot
> reset (err = -71) [ 6321.284905] usb 1-1.4-port4: cannot reset (err = -71) [
> 6321.289155] usb 1-1.4-port4: cannot reset (err = -71) [ 6321.293403] usb 1-
> 1.4-port4: cannot reset (err = -71) [ 6321.293407] usb 1-1.4-port4: Cannot
> enable. Maybe the USB cable is bad?
> [ 6321.297656] usb 1-1.4-port4: cannot disable (err = -71) [ 6321.301904] usb
> 1-1.4-port4: cannot reset (err = -71) [ 6321.306152] usb 1-1.4-port4: cannot
> reset (err = -71) [ 6321.310402] usb 1-1.4-port4: cannot reset (err = -71) [
> 6321.314653] usb 1-1.4-port4: cannot reset (err = -71) [ 6321.318904] usb 1-
> 1.4-port4: cannot reset (err = -71) [ 6321.318908] usb 1-1.4-port4: Cannot
> enable. Maybe the USB cable is bad?
> [ 6321.323154] usb 1-1.4-port4: cannot disable (err = -71) [ 6321.327404] usb
> 1-1.4-port4: cannot reset (err = -71) [ 6321.331651] usb 1-1.4-port4: cannot
> reset (err = -71) [ 6321.335902] usb 1-1.4-port4: cannot reset (err = -71) [
> 6321.340155] usb 1-1.4-port4: cannot reset (err = -71) [ 6321.344402] usb 1-
> 1.4-port4: cannot reset (err = -71) [ 6321.344406] usb 1-1.4-port4: Cannot
> enable. Maybe the USB cable is bad?
> [ 6321.348652] usb 1-1.4-port4: cannot disable (err = -71) [ 6321.352904] usb
> 1-1.4-port4: cannot disable (err = -71) [ 6321.357154] hub 1-1.4:1.0:
> hub_ext_port_status failed (err = -71) [ 6321.437000] usb 1-1.4: USB
> disconnect, device number 9 [ 6321.437010] usb 1-1.4.4: USB disconnect,
> device number 10
> 
> As you can see, the time interval runs from 6321.245 to 6321.437, roughly 192
> ms < 250 ms.  This is the expected behavior.
> 
> I did not try to check whether the MMF bit got set or what value CERR had.
> 

Ok, makes sense. Thank you for testing this with your setup.
 
> 
> > But as in this use case the CERR has not been decreased yet there is error
> > bit active (MMF) do you see it is still correct to interpret it as a stall
> > (even when the halt bit is set)?
> 
> See below.
> 
> > I have tried to find out more details about our EHCI controller version, but
> > I have only found out those CPU versions. It might help in a search whether
> > this could be a HW issue.
> 
> 
> > > > The specification [2] is not clear about error priorities, but since
> > > > there is no explicit error bit for the stall, it is assumed to be
> > > > lower priority than other errors.
> > >
> > > On the contrary, the specification is very clear.  Since transaction
> > errors cause
> > > CERR to be decremented until it reaches 0, a nonzero value for CERR
> means
> > > the endpoint was halted for some other reason.  And the only other
> reason
> > > is a stall.  (Or end of the microframe, but there's no way to tell if that
> > > happened.)
> >
> > I see your point. EHCI specification states that babble is a serious error
> > and it will also cause the halt. The babble error bit is checked first. But
> > the specification does not say about order of the other errors or about
> what
> > to do if there is an error, no retries executed, yet a halt (stall). For
> > example should the XactErr be checked before the MMF.
> 
> I think the order doesn't matter.  In fact, it's possible that both
> errors occurred, since the transaction gets retried multiple times.
 
True as the error bits are "sticky".

For example XactErr in EHCI specification [2] table 3-16:
" If the host controller sets this bit to a one, then it remains a one for the duration of the transfer."

> 
> > >(Or end of the microframe, but there's no way to tell if that happened.)
> >
> > I was not able to locate this from the specification. Could you please point
> > out where this statement is from?
> 
> "Enhanced Host Controller Interface Specification for Universal Serial
> Bus", rev 1.0 (2002), p. 110:
> 
> Transaction Error (XactErr). Timeout, data CRC failure, etc. The CErr
> field is decremented and the XactErr bit in the Status field is set to
> a one. The complete split transaction is immediately retried (if Cerr
> is non-zero). If there is not enough time in the micro-frame to
> complete the retry and the endpoint is an IN, or CErr is decremented to
> a zero from a one, the queue is halted. If there is not enough time in
> the micro-frame to complete the retry and the endpoint is an OUT and
> CErr is not zero, then this state is exited (i.e. return to Do Start
> Split). This results in a retry of the entire OUT split transaction, at
> the next poll period. Refer to Chapter 11 Hubs (specifically the
> section full- and low-speed Interrupts) in the USB Specification
> Revision 2.0 for detailed requirements on why these errors must be
> immediately retried. •
> 

There it is, thank you.

> > Could the way to tell if "end of microframe" happened, be what is done
> here
> > - check for MMF error bit and if CERR has not been decreased?
> 
> No, because the "end of microframe" situation happens when the host
> controller is handling a transaction error, whereas MMF gets set when
> the host controller detects an error on the host.

Ok, I see.
 
> 
> > > > diff --git a/drivers/usb/host/ehci-q.c b/drivers/usb/host/ehci-q.c
> > > > index 3276304..da7fd12 100644
> > > > --- a/drivers/usb/host/ehci-q.c
> > > > +++ b/drivers/usb/host/ehci-q.c
> > > > @@ -206,8 +206,9 @@ static int qtd_copy_status (
> > > >  		if (token & QTD_STS_BABBLE) {
> > > >  			/* FIXME "must" disable babbling
> > > device's port too */
> > > >  			status = -EOVERFLOW;
> > > > -		/* CERR nonzero + halt --> stall */
> > > > -		} else if (QTD_CERR(token)) {
> > > > +		/* CERR nonzero and less than
> > > EHCI_TUNE_CERR + halt --> stall.
> > > > +		   This handles situation where stall comes after
> > > an error. */
> > >
> > > This comment doesn't make sense.  Who cares whether a stall comes
> after
> > > an error or not?  It's still a stall and should be reported.
> >
> > This was basically a comment trying to answer to this commit:
> > ba516de332c0  USB: EHCI: check for STALL before other errors
> >
> >     "The existing code doesn't do this properly, because it tests for MMF
> >     (Missed MicroFrame) and DBE (Data Buffer Error) before testing the
> >     retry counter.  Thus, if a transaction gets either MMF or DBE the
> >     corresponding flag is set and the transaction is retried.  If the
> >     second attempt receives a STALL then -EPIPE is the correct return
> >     value.  But the existing code would see the MMF or DBE flag instead
> >     and return -EPROTO, -ENOSR, or -ECOMM."
> >
> > The comment tries to explain that it will not revert the fix made in the
> > commit ba516de332c0.
> 
> Okay, I get it.  You're trying to rely on the strange behavior of the
> MMF bit.
> 
> I'm not sure this is a good idea.  Suppose MMF gets set for some other
> reason (a genuine error on the host) and then the transaction gets a
> STALL on the next retry.  Since host errors don't decrement CERR, your
> patch would cause the driver to return -EPROTO instead of -EPIPE.

Hmm. I originally understood from the documentation that the MMF does decrement the CERR. I followed the EHCI specification table 3-16, the row about the CERR.
There are listed the errors that do decrement the CERR and which do not decrement. Data buffer error, stall, babble and no error (of course) do not decrement.
Transaction errors do decrement the CERR. As there was no mention about MMF I have been in the impression that it is included to the transaction errors of the table.

Now I was able to find from the documentation you pointed out (in your next comment below)  that your statement is correct. EHCI specification, table 4-13 contains Interrupt IN / OUT
conditions which prove your point.

> 
> > > > +		} else if (QTD_CERR(token) &&
> > > QTD_CERR(token) < EHCI_TUNE_CERR) {
> > > >  			status = -EPIPE;
> > >
> > > If an error occurs and the transaction is retried and the retry gets a
> > stall, then
> > > the final status should be -EPIPE, not something else.
> >
> > This is how the RFC also works. If the transaction has been retried and gets
> > stall then -EPIPE is returned.
> > Or if there are no errors but there is a stall then -EPIPE is returned.
> >
> > The only difference in this patch in comparison to the existing
> > implementation is that if there is an error but the
> > transaction has not been retried it is not interpret as a stall even if
> > there is a halt.
> 
> Sometimes that will be the right behavior and other times it won't.
> However, it looks like there may be a way to tell which situation we
> are in.  Setting the MMF bit will cause the queue to halt immediately
> if the transaction is IN, but not if it is OUT (see Table 4-13 in the
> EHCI spec).

Thank you, this Table 4-13 was a vital piece of documentation that I either missed or failed  to read carefully enough.
Now this could explain why the queue is halted with MMF set, even if it would not be a stall. Well done!
 
> 
> So if CERR == EHCI_TUNE_CERR and the QTD_PID != 1 (not IN) then we
> should return -EPIPE, as the existing code does.  But if QTD_PID == 1
> then the code should continue, as your patch does -- with one
> difference: Put the additional check for EHCI_TUNE_CERR between the
> tests for DBE and XACT instead of after XACT (because XACT would
> decrement CERR whereas DBE wouldn't).

Good point, I agree.
 
> 
> Can you make that change and test it?

Sure, I have made the change and test it as soon as possible.

> 
> Alan Stern

Erkka Talvitie



^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [RFCv1 1/1] USB: EHCI: Do not return -EPIPE when hub is disconnected
  2019-12-04  8:55       ` Erkka Talvitie
@ 2019-12-04 13:18         ` Erkka Talvitie
  2019-12-04 14:24           ` Alan Stern
  0 siblings, 1 reply; 13+ messages in thread
From: Erkka Talvitie @ 2019-12-04 13:18 UTC (permalink / raw)
  To: Alan Stern; +Cc: gregkh, linux-usb, claus.baumgartner



> -----Original Message-----
> From: Erkka Talvitie <erkka.talvitie@vincit.fi>
> Sent: keskiviikko 4. joulukuuta 2019 10.55
> To: 'Alan Stern' <stern@rowland.harvard.edu>
> Cc: gregkh@linuxfoundation.org; linux-usb@vger.kernel.org;
> claus.baumgartner@med.ge.com
> Subject: RE: [RFCv1 1/1] USB: EHCI: Do not return -EPIPE when hub is
> disconnected
> 
> Thank you for the comments.
> 
> > -----Original Message-----
> > From: Alan Stern <stern@rowland.harvard.edu>
> > Sent: tiistai 3. joulukuuta 2019 21.01
> > To: Erkka Talvitie <erkka.talvitie@vincit.fi>
> > Cc: gregkh@linuxfoundation.org; linux-usb@vger.kernel.org;
> > claus.baumgartner@med.ge.com
> > Subject: RE: [RFCv1 1/1] USB: EHCI: Do not return -EPIPE when hub is
> > disconnected
> >
> > On Tue, 3 Dec 2019, Erkka Talvitie wrote:
> >
> > > Thank you for the comments.
> > >
> > > > -----Original Message-----
> > > > From: Alan Stern <stern@rowland.harvard.edu>
> > > > Sent: maanantai 2. joulukuuta 2019 21.44
> > > > To: Erkka Talvitie <erkka.talvitie@vincit.fi>
> > > > Cc: gregkh@linuxfoundation.org; linux-usb@vger.kernel.org;
> > > > claus.baumgartner@med.ge.com
> > > > Subject: Re: [RFCv1 1/1] USB: EHCI: Do not return -EPIPE when hub
> > > > is disconnected
> > > >
> > > > On Fri, 29 Nov 2019, Erkka Talvitie wrote:
> > > >
> > > > > When disconnecting a USB hub that has some child device(s)
> > > > > connected to it (such as a USB mouse), then the stack tries to
> > > > > clear halt and reset device(s) which are _already_ physically
> > disconnected.
> > > >
> > > > That behavior is understandable.  The kernel doesn't know that the
> > > > device has been disconnected until it can process the notification
> > > > from an
> > > upstream
> > > > hub, and it can't process that notification while it's trying to
> > > > reset the
> > > device.
> > > >
> > >
> > > Ok. I was thinking that in this use case , it should not be trying
> > > to clear the halt (and reset the device when the clear halt fails).
> > > And this behavior was altered by this RFC.
> >
> > Actually, the situation is a little different from what I described
> > above.  When you unplug the high-speed hub, the kernel doesn't know
> > the hub has been disconnected until it receives a notification from
> > the upstream hub.  The kernel checks for those notifications at
> > roughly 250-ms intervals, so it can take up to that long before the kernel
> realizes the high-speed hub is gone.
> > Until that time, the kernel will keep trying to reset and communicate
> > with the hub and the devices that were attached to it.
> >
> > You can see this in the logs that you posted in your original report.
> > In each case, the "cannot reset" and -71 errors lasted for less than
> > 250 ms.
> >
> > I just tried doing the same experiment on my PC (which does use all
> > Intel hardware and an EHCI controller).  Here's the output from when I
> > unplugged the high-speed hub:
> >
> > [ 6321.245528] usb 1-1.4: clear tt 4 (00a0) error -71 [ 6321.250903]
> > usb 1-1.4-
> > port4: cannot reset (err = -71) [ 6321.255155] usb 1-1.4-port4: cannot
> > reset (err = -71) [ 6321.259403] usb 1-1.4-port4: cannot reset (err =
> > -71) [ 6321.263657] usb 1-1.4-port4: cannot reset (err = -71) [
> > 6321.267905] usb 1-
> > 1.4-port4: cannot reset (err = -71) [ 6321.267910] usb 1-1.4-port4:
> > Cannot enable. Maybe the USB cable is bad?
> > [ 6321.272155] usb 1-1.4-port4: cannot disable (err = -71) [
> > 6321.276405] usb
> > 1-1.4-port4: cannot reset (err = -71) [ 6321.280653] usb 1-1.4-port4:
> > cannot reset (err = -71) [ 6321.284905] usb 1-1.4-port4: cannot reset
> > (err = -71) [ 6321.289155] usb 1-1.4-port4: cannot reset (err = -71) [
> > 6321.293403] usb 1-
> > 1.4-port4: cannot reset (err = -71) [ 6321.293407] usb 1-1.4-port4:
> > Cannot enable. Maybe the USB cable is bad?
> > [ 6321.297656] usb 1-1.4-port4: cannot disable (err = -71) [
> > 6321.301904] usb
> > 1-1.4-port4: cannot reset (err = -71) [ 6321.306152] usb 1-1.4-port4:
> > cannot reset (err = -71) [ 6321.310402] usb 1-1.4-port4: cannot reset
> > (err = -71) [ 6321.314653] usb 1-1.4-port4: cannot reset (err = -71) [
> > 6321.318904] usb 1-
> > 1.4-port4: cannot reset (err = -71) [ 6321.318908] usb 1-1.4-port4:
> > Cannot enable. Maybe the USB cable is bad?
> > [ 6321.323154] usb 1-1.4-port4: cannot disable (err = -71) [
> > 6321.327404] usb
> > 1-1.4-port4: cannot reset (err = -71) [ 6321.331651] usb 1-1.4-port4:
> > cannot reset (err = -71) [ 6321.335902] usb 1-1.4-port4: cannot reset
> > (err = -71) [ 6321.340155] usb 1-1.4-port4: cannot reset (err = -71) [
> > 6321.344402] usb 1-
> > 1.4-port4: cannot reset (err = -71) [ 6321.344406] usb 1-1.4-port4:
> > Cannot enable. Maybe the USB cable is bad?
> > [ 6321.348652] usb 1-1.4-port4: cannot disable (err = -71) [
> > 6321.352904] usb
> > 1-1.4-port4: cannot disable (err = -71) [ 6321.357154] hub 1-1.4:1.0:
> > hub_ext_port_status failed (err = -71) [ 6321.437000] usb 1-1.4: USB
> > disconnect, device number 9 [ 6321.437010] usb 1-1.4.4: USB
> > disconnect, device number 10
> >
> > As you can see, the time interval runs from 6321.245 to 6321.437,
> > roughly 192 ms < 250 ms.  This is the expected behavior.
> >
> > I did not try to check whether the MMF bit got set or what value CERR had.
> >
> 
> Ok, makes sense. Thank you for testing this with your setup.
> 
> >
> > > But as in this use case the CERR has not been decreased yet there is
> > > error bit active (MMF) do you see it is still correct to interpret
> > > it as a stall (even when the halt bit is set)?
> >
> > See below.
> >
> > > I have tried to find out more details about our EHCI controller
> > > version, but I have only found out those CPU versions. It might help
> > > in a search whether this could be a HW issue.
> >
> >
> > > > > The specification [2] is not clear about error priorities, but
> > > > > since there is no explicit error bit for the stall, it is
> > > > > assumed to be lower priority than other errors.
> > > >
> > > > On the contrary, the specification is very clear.  Since
> > > > transaction
> > > errors cause
> > > > CERR to be decremented until it reaches 0, a nonzero value for
> > > > CERR
> > means
> > > > the endpoint was halted for some other reason.  And the only other
> > reason
> > > > is a stall.  (Or end of the microframe, but there's no way to tell
> > > > if that
> > > > happened.)
> > >
> > > I see your point. EHCI specification states that babble is a serious
> > > error and it will also cause the halt. The babble error bit is
> > > checked first. But the specification does not say about order of the
> > > other errors or about
> > what
> > > to do if there is an error, no retries executed, yet a halt (stall).
> > > For example should the XactErr be checked before the MMF.
> >
> > I think the order doesn't matter.  In fact, it's possible that both
> > errors occurred, since the transaction gets retried multiple times.
> 
> True as the error bits are "sticky".
> 
> For example XactErr in EHCI specification [2] table 3-16:
> " If the host controller sets this bit to a one, then it remains a one for the
> duration of the transfer."
> 
> >
> > > >(Or end of the microframe, but there's no way to tell if that
> > > >happened.)
> > >
> > > I was not able to locate this from the specification. Could you
> > > please point out where this statement is from?
> >
> > "Enhanced Host Controller Interface Specification for Universal Serial
> > Bus", rev 1.0 (2002), p. 110:
> >
> > Transaction Error (XactErr). Timeout, data CRC failure, etc. The CErr
> > field is decremented and the XactErr bit in the Status field is set to
> > a one. The complete split transaction is immediately retried (if Cerr
> > is non-zero). If there is not enough time in the micro-frame to
> > complete the retry and the endpoint is an IN, or CErr is decremented
> > to a zero from a one, the queue is halted. If there is not enough time
> > in the micro-frame to complete the retry and the endpoint is an OUT
> > and CErr is not zero, then this state is exited (i.e. return to Do
> > Start Split). This results in a retry of the entire OUT split
> > transaction, at the next poll period. Refer to Chapter 11 Hubs
> > (specifically the section full- and low-speed Interrupts) in the USB
> > Specification Revision 2.0 for detailed requirements on why these
> > errors must be immediately retried. •
> >
> 
> There it is, thank you.
> 
> > > Could the way to tell if "end of microframe" happened, be what is
> > > done
> > here
> > > - check for MMF error bit and if CERR has not been decreased?
> >
> > No, because the "end of microframe" situation happens when the host
> > controller is handling a transaction error, whereas MMF gets set when
> > the host controller detects an error on the host.
> 
> Ok, I see.
> 
> >
> > > > > diff --git a/drivers/usb/host/ehci-q.c
> > > > > b/drivers/usb/host/ehci-q.c index 3276304..da7fd12 100644
> > > > > --- a/drivers/usb/host/ehci-q.c
> > > > > +++ b/drivers/usb/host/ehci-q.c
> > > > > @@ -206,8 +206,9 @@ static int qtd_copy_status (
> > > > >  		if (token & QTD_STS_BABBLE) {
> > > > >  			/* FIXME "must" disable babbling
> > > > device's port too */
> > > > >  			status = -EOVERFLOW;
> > > > > -		/* CERR nonzero + halt --> stall */
> > > > > -		} else if (QTD_CERR(token)) {
> > > > > +		/* CERR nonzero and less than
> > > > EHCI_TUNE_CERR + halt --> stall.
> > > > > +		   This handles situation where stall comes after
> > > > an error. */
> > > >
> > > > This comment doesn't make sense.  Who cares whether a stall comes
> > after
> > > > an error or not?  It's still a stall and should be reported.
> > >
> > > This was basically a comment trying to answer to this commit:
> > > ba516de332c0  USB: EHCI: check for STALL before other errors
> > >
> > >     "The existing code doesn't do this properly, because it tests for MMF
> > >     (Missed MicroFrame) and DBE (Data Buffer Error) before testing the
> > >     retry counter.  Thus, if a transaction gets either MMF or DBE the
> > >     corresponding flag is set and the transaction is retried.  If the
> > >     second attempt receives a STALL then -EPIPE is the correct return
> > >     value.  But the existing code would see the MMF or DBE flag instead
> > >     and return -EPROTO, -ENOSR, or -ECOMM."
> > >
> > > The comment tries to explain that it will not revert the fix made in
> > > the commit ba516de332c0.
> >
> > Okay, I get it.  You're trying to rely on the strange behavior of the
> > MMF bit.
> >
> > I'm not sure this is a good idea.  Suppose MMF gets set for some other
> > reason (a genuine error on the host) and then the transaction gets a
> > STALL on the next retry.  Since host errors don't decrement CERR, your
> > patch would cause the driver to return -EPROTO instead of -EPIPE.
> 
> Hmm. I originally understood from the documentation that the MMF does
> decrement the CERR. I followed the EHCI specification table 3-16, the row
> about the CERR.
> There are listed the errors that do decrement the CERR and which do not
> decrement. Data buffer error, stall, babble and no error (of course) do not
> decrement.
> Transaction errors do decrement the CERR. As there was no mention about
> MMF I have been in the impression that it is included to the transaction
> errors of the table.
> 
> Now I was able to find from the documentation you pointed out (in your
> next comment below)  that your statement is correct. EHCI specification,
> table 4-13 contains Interrupt IN / OUT conditions which prove your point.
> 
> >
> > > > > +		} else if (QTD_CERR(token) &&
> > > > QTD_CERR(token) < EHCI_TUNE_CERR) {
> > > > >  			status = -EPIPE;
> > > >
> > > > If an error occurs and the transaction is retried and the retry
> > > > gets a
> > > stall, then
> > > > the final status should be -EPIPE, not something else.
> > >
> > > This is how the RFC also works. If the transaction has been retried
> > > and gets stall then -EPIPE is returned.
> > > Or if there are no errors but there is a stall then -EPIPE is returned.
> > >
> > > The only difference in this patch in comparison to the existing
> > > implementation is that if there is an error but the transaction has
> > > not been retried it is not interpret as a stall even if there is a
> > > halt.
> >
> > Sometimes that will be the right behavior and other times it won't.
> > However, it looks like there may be a way to tell which situation we
> > are in.  Setting the MMF bit will cause the queue to halt immediately
> > if the transaction is IN, but not if it is OUT (see Table 4-13 in the
> > EHCI spec).
> 
> Thank you, this Table 4-13 was a vital piece of documentation that I either
> missed or failed  to read carefully enough.
> Now this could explain why the queue is halted with MMF set, even if it
> would not be a stall. Well done!
> 
> >
> > So if CERR == EHCI_TUNE_CERR and the QTD_PID != 1 (not IN) then we
> > should return -EPIPE, as the existing code does.  But if QTD_PID == 1
> > then the code should continue, as your patch does -- with one
> > difference: Put the additional check for EHCI_TUNE_CERR between the
> > tests for DBE and XACT instead of after XACT (because XACT would
> > decrement CERR whereas DBE wouldn't).
> 
> Good point, I agree.
> 
> >
> > Can you make that change and test it?
> 
> Sure, I have made the change and test it as soon as possible.

I am not actually totally sure if I understood you correctly, but I tested a change where the first stall check is like this (PID_CODE_IN is defined as 1):

-               } else if (QTD_CERR(token)) {
+               } else if (QTD_CERR(token) && (QTD_PID (token) != PID_CODE_IN)) {
                        status = -EPIPE;

And the second stall check (now between DBE and XACT):
+               } else if (QTD_CERR(token)) {
+                       status = -EPIPE;

Is this what you meant? Please correct me if I am wrong.

Anyways with these changes the issue does not reproduce.

> 
> >
> > Alan Stern
> 
> Erkka Talvitie
> 

Erkka Talvitie


^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [RFCv1 1/1] USB: EHCI: Do not return -EPIPE when hub is disconnected
  2019-12-04 13:18         ` Erkka Talvitie
@ 2019-12-04 14:24           ` Alan Stern
  2019-12-04 14:37             ` Erkka Talvitie
  0 siblings, 1 reply; 13+ messages in thread
From: Alan Stern @ 2019-12-04 14:24 UTC (permalink / raw)
  To: Erkka Talvitie; +Cc: gregkh, linux-usb, claus.baumgartner

On Wed, 4 Dec 2019, Erkka Talvitie wrote:

> > > So if CERR == EHCI_TUNE_CERR and the QTD_PID != 1 (not IN) then we
> > > should return -EPIPE, as the existing code does.  But if QTD_PID == 1
> > > then the code should continue, as your patch does -- with one
> > > difference: Put the additional check for EHCI_TUNE_CERR between the
> > > tests for DBE and XACT instead of after XACT (because XACT would
> > > decrement CERR whereas DBE wouldn't).
> > 
> > Good point, I agree.
> > 
> > >
> > > Can you make that change and test it?
> > 
> > Sure, I have made the change and test it as soon as possible.
> 
> I am not actually totally sure if I understood you correctly, but I tested a change where the first stall check is like this (PID_CODE_IN is defined as 1):
> 
> -               } else if (QTD_CERR(token)) {
> +               } else if (QTD_CERR(token) && (QTD_PID (token) != PID_CODE_IN)) {
>                         status = -EPIPE;
> 
> And the second stall check (now between DBE and XACT):
> +               } else if (QTD_CERR(token)) {
> +                       status = -EPIPE;
> 
> Is this what you meant? Please correct me if I am wrong.

Actually, what I meant for the first part was:

		} else if (QTD_CERR(token) &&
				(QTD_CERR(token) < EHCI_TUNE_CERR ||
				 QTD_PID(token) != PID_CODE_IN)) {

And of course there should be a comment before this line, explaining 
what it does.  By the way, the accepted format for multi-line comments 
is:

		/*
		 * Blah blah blah
		 * Blah blah blah
		 */

The second part of the patch looks okay (but again, with a comment).

> Anyways with these changes the issue does not reproduce.

Very good.

Alan Stern


^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [RFCv1 1/1] USB: EHCI: Do not return -EPIPE when hub is disconnected
  2019-12-04 14:24           ` Alan Stern
@ 2019-12-04 14:37             ` Erkka Talvitie
  2019-12-05 10:35               ` Erkka Talvitie
  0 siblings, 1 reply; 13+ messages in thread
From: Erkka Talvitie @ 2019-12-04 14:37 UTC (permalink / raw)
  To: Alan Stern; +Cc: gregkh, linux-usb, claus.baumgartner



> -----Original Message-----
> From: Alan Stern <stern@rowland.harvard.edu>
> Sent: keskiviikko 4. joulukuuta 2019 16.24
> To: Erkka Talvitie <erkka.talvitie@vincit.fi>
> Cc: gregkh@linuxfoundation.org; linux-usb@vger.kernel.org;
> claus.baumgartner@med.ge.com
> Subject: RE: [RFCv1 1/1] USB: EHCI: Do not return -EPIPE when hub is
> disconnected
> 
> On Wed, 4 Dec 2019, Erkka Talvitie wrote:
> 
> > > > So if CERR == EHCI_TUNE_CERR and the QTD_PID != 1 (not IN) then we
> > > > should return -EPIPE, as the existing code does.  But if QTD_PID
> > > > == 1 then the code should continue, as your patch does -- with one
> > > > difference: Put the additional check for EHCI_TUNE_CERR between
> > > > the tests for DBE and XACT instead of after XACT (because XACT
> > > > would decrement CERR whereas DBE wouldn't).
> > >
> > > Good point, I agree.
> > >
> > > >
> > > > Can you make that change and test it?
> > >
> > > Sure, I have made the change and test it as soon as possible.
> >
> > I am not actually totally sure if I understood you correctly, but I
tested a
> change where the first stall check is like this (PID_CODE_IN is defined as
1):
> >
> > -               } else if (QTD_CERR(token)) {
> > +               } else if (QTD_CERR(token) && (QTD_PID (token) !=
> > + PID_CODE_IN)) {
> >                         status = -EPIPE;
> >
> > And the second stall check (now between DBE and XACT):
> > +               } else if (QTD_CERR(token)) {
> > +                       status = -EPIPE;
> >
> > Is this what you meant? Please correct me if I am wrong.
> 
> Actually, what I meant for the first part was:
> 
> 		} else if (QTD_CERR(token) &&
> 				(QTD_CERR(token)
> < EHCI_TUNE_CERR ||
> 				 QTD_PID(token) !=
> PID_CODE_IN)) {

Ok, now I understand the change. Good.

> 
> And of course there should be a comment before this line, explaining what
it
> does.  By the way, the accepted format for multi-line comments
> is:
> 
> 		/*
> 		 * Blah blah blah
> 		 * Blah blah blah
> 		 */
> 

Thanks for the information. I noticed that my comments were different than
the existing ones in the file and I was already about to change my comments
to match the existing style.

> The second part of the patch looks okay (but again, with a comment).

Yes, I will add the comment here also.

> 
> > Anyways with these changes the issue does not reproduce.
> 
> Very good.

I will do the changes and re-test.

> 
> Alan Stern

Erkka Talvitie


^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [RFCv1 1/1] USB: EHCI: Do not return -EPIPE when hub is disconnected
  2019-12-04 14:37             ` Erkka Talvitie
@ 2019-12-05 10:35               ` Erkka Talvitie
  2019-12-05 14:37                 ` Alan Stern
  0 siblings, 1 reply; 13+ messages in thread
From: Erkka Talvitie @ 2019-12-05 10:35 UTC (permalink / raw)
  To: Alan Stern; +Cc: gregkh, linux-usb, claus.baumgartner



> -----Original Message-----
> From: Erkka Talvitie <erkka.talvitie@vincit.fi>
> Sent: keskiviikko 4. joulukuuta 2019 16.38
> To: 'Alan Stern' <stern@rowland.harvard.edu>
> Cc: gregkh@linuxfoundation.org; linux-usb@vger.kernel.org;
> claus.baumgartner@med.ge.com
> Subject: RE: [RFCv1 1/1] USB: EHCI: Do not return -EPIPE when hub is
> disconnected
> 
> 
> 
> > -----Original Message-----
> > From: Alan Stern <stern@rowland.harvard.edu>
> > Sent: keskiviikko 4. joulukuuta 2019 16.24
> > To: Erkka Talvitie <erkka.talvitie@vincit.fi>
> > Cc: gregkh@linuxfoundation.org; linux-usb@vger.kernel.org;
> > claus.baumgartner@med.ge.com
> > Subject: RE: [RFCv1 1/1] USB: EHCI: Do not return -EPIPE when hub is
> > disconnected
> >
> > On Wed, 4 Dec 2019, Erkka Talvitie wrote:
> >
> > > > > So if CERR == EHCI_TUNE_CERR and the QTD_PID != 1 (not IN) then
> > > > > we should return -EPIPE, as the existing code does.  But if
> > > > > QTD_PID == 1 then the code should continue, as your patch does
> > > > > -- with one
> > > > > difference: Put the additional check for EHCI_TUNE_CERR between
> > > > > the tests for DBE and XACT instead of after XACT (because XACT
> > > > > would decrement CERR whereas DBE wouldn't).
> > > >
> > > > Good point, I agree.
> > > >
> > > > >
> > > > > Can you make that change and test it?
> > > >
> > > > Sure, I have made the change and test it as soon as possible.
> > >
> > > I am not actually totally sure if I understood you correctly, but I
> tested a
> > change where the first stall check is like this (PID_CODE_IN is
> > defined as
> 1):
> > >
> > > -               } else if (QTD_CERR(token)) {
> > > +               } else if (QTD_CERR(token) && (QTD_PID (token) !=
> > > + PID_CODE_IN)) {
> > >                         status = -EPIPE;
> > >
> > > And the second stall check (now between DBE and XACT):
> > > +               } else if (QTD_CERR(token)) {
> > > +                       status = -EPIPE;
> > >
> > > Is this what you meant? Please correct me if I am wrong.
> >
> > Actually, what I meant for the first part was:
> >
> > 		} else if (QTD_CERR(token) &&
> > 				(QTD_CERR(token)
> > < EHCI_TUNE_CERR ||
> > 				 QTD_PID(token) !=
> > PID_CODE_IN)) {
> 
> Ok, now I understand the change. Good.

I tested this change and the issue did not reproduce.

However when I was writing the comments for the patch I started to think
what happens in this following scenario:

The PID Code is IN.

1. First there will be XACT, the CERR is decremented, let's say from 3 to 2
and the host controller executes a retry.
2. On this next try there will happen the condition mentioned in the Table
4-13 of the EHCI specification so that the MMF is set and the queue is
halted (because it is IN).
3. To my understanding now the execution enters to this first stall check
if, as CERR is > 0 and CERR < EHCI_TUNE_CERR.
4. The -EPIPE (stall) is returned when actually the queue was halted due to
MMF - not stall - and the -EPROTO should be returned.

Is my logic correct or am I missing something?

If you agree with me then I would like to present you a bit more bold (in a
sense of amount of refactoring) RFC. In high level this another RFC
separates 1. error check and 2. stall check. For me this approach is a bit
easier to understand from the code. Or then please  propose another
solution.

If you do not agree with this scenario then never mind the above suggestion
about RFC doing more refactoring.

> 
> >
> > And of course there should be a comment before this line, explaining
> > what
> it
> > does.  By the way, the accepted format for multi-line comments
> > is:
> >
> > 		/*
> > 		 * Blah blah blah
> > 		 * Blah blah blah
> > 		 */
> >
> 
> Thanks for the information. I noticed that my comments were different than
> the existing ones in the file and I was already about to change my
comments
> to match the existing style.
> 
> > The second part of the patch looks okay (but again, with a comment).
> 
> Yes, I will add the comment here also.
> 
> >
> > > Anyways with these changes the issue does not reproduce.
> >
> > Very good.
> 
> I will do the changes and re-test.
> 
> >
> > Alan Stern
> 
> Erkka Talvitie

Erkka Talvitie


^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [RFCv1 1/1] USB: EHCI: Do not return -EPIPE when hub is disconnected
  2019-12-05 10:35               ` Erkka Talvitie
@ 2019-12-05 14:37                 ` Alan Stern
  2019-12-05 15:00                   ` Erkka Talvitie
  0 siblings, 1 reply; 13+ messages in thread
From: Alan Stern @ 2019-12-05 14:37 UTC (permalink / raw)
  To: Erkka Talvitie; +Cc: gregkh, linux-usb, claus.baumgartner

On Thu, 5 Dec 2019, Erkka Talvitie wrote:

> I tested this change and the issue did not reproduce.
> 
> However when I was writing the comments for the patch I started to think
> what happens in this following scenario:
> 
> The PID Code is IN.
> 
> 1. First there will be XACT, the CERR is decremented, let's say from 3 to 2
> and the host controller executes a retry.
> 2. On this next try there will happen the condition mentioned in the Table
> 4-13 of the EHCI specification so that the MMF is set and the queue is
> halted (because it is IN).
> 3. To my understanding now the execution enters to this first stall check
> if, as CERR is > 0 and CERR < EHCI_TUNE_CERR.
> 4. The -EPIPE (stall) is returned when actually the queue was halted due to
> MMF - not stall - and the -EPROTO should be returned.
> 
> Is my logic correct or am I missing something?

The same thought had occurred to me.

> If you agree with me then I would like to present you a bit more bold (in a
> sense of amount of refactoring) RFC. In high level this another RFC
> separates 1. error check and 2. stall check. For me this approach is a bit
> easier to understand from the code. Or then please  propose another
> solution.

I was going to suggest: Just check for MMF and PID == IN before
checking for STALL.  Everything else can remain the way it is.

Alan Stern



^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [RFCv1 1/1] USB: EHCI: Do not return -EPIPE when hub is disconnected
  2019-12-05 14:37                 ` Alan Stern
@ 2019-12-05 15:00                   ` Erkka Talvitie
  0 siblings, 0 replies; 13+ messages in thread
From: Erkka Talvitie @ 2019-12-05 15:00 UTC (permalink / raw)
  To: Alan Stern; +Cc: gregkh, linux-usb, claus.baumgartner



> -----Original Message-----
> From: Alan Stern <stern@rowland.harvard.edu>
> Sent: torstai 5. joulukuuta 2019 16.38
> To: Erkka Talvitie <erkka.talvitie@vincit.fi>
> Cc: gregkh@linuxfoundation.org; linux-usb@vger.kernel.org;
> claus.baumgartner@med.ge.com
> Subject: RE: [RFCv1 1/1] USB: EHCI: Do not return -EPIPE when hub is
> disconnected
> 
> On Thu, 5 Dec 2019, Erkka Talvitie wrote:
> 
> > I tested this change and the issue did not reproduce.
> >
> > However when I was writing the comments for the patch I started to
> > think what happens in this following scenario:
> >
> > The PID Code is IN.
> >
> > 1. First there will be XACT, the CERR is decremented, let's say from 3
> > to 2 and the host controller executes a retry.
> > 2. On this next try there will happen the condition mentioned in the
> > Table
> > 4-13 of the EHCI specification so that the MMF is set and the queue is
> > halted (because it is IN).
> > 3. To my understanding now the execution enters to this first stall
> > check if, as CERR is > 0 and CERR < EHCI_TUNE_CERR.
> > 4. The -EPIPE (stall) is returned when actually the queue was halted
> > due to MMF - not stall - and the -EPROTO should be returned.
> >
> > Is my logic correct or am I missing something?
> 
> The same thought had occurred to me.
> 
> > If you agree with me then I would like to present you a bit more bold
> > (in a sense of amount of refactoring) RFC. In high level this another
> > RFC separates 1. error check and 2. stall check. For me this approach
> > is a bit easier to understand from the code. Or then please  propose
> > another solution.
> 
> I was going to suggest: Just check for MMF and PID == IN before checking
for
> STALL.  Everything else can remain the way it is.

Ok, just to double check:

I take the existing driver code (I will not apply the earlier RFC on top of
that) and apply one more check before the original stall check (that is):
} else if (QTD_CERR(token)) {

The check that I will add is checking MMF bit && PID == IN, and this check
comes right after the babble check, right?

Good, seems like a simple change. Yet I still prefer to test the change.
Unfortunately that goes to the next week as we have a national holiday
tomorrow. 
I will get back to you most likely on Monday.

> 
> Alan Stern
> 
Erkka Talvitie


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, back to index

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-29 14:08 [RFCv1 1/1] USB: EHCI: Do not return -EPIPE when hub is disconnected Erkka Talvitie
2019-12-02 19:43 ` Alan Stern
2019-12-03  9:38   ` Erkka Talvitie
2019-12-03 10:54     ` Erkka Talvitie
2019-12-03 13:35       ` Erkka Talvitie
2019-12-03 19:01     ` Alan Stern
2019-12-04  8:55       ` Erkka Talvitie
2019-12-04 13:18         ` Erkka Talvitie
2019-12-04 14:24           ` Alan Stern
2019-12-04 14:37             ` Erkka Talvitie
2019-12-05 10:35               ` Erkka Talvitie
2019-12-05 14:37                 ` Alan Stern
2019-12-05 15:00                   ` Erkka Talvitie

Linux-USB Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-usb/0 linux-usb/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-usb linux-usb/ https://lore.kernel.org/linux-usb \
		linux-usb@vger.kernel.org
	public-inbox-index linux-usb

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-usb


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git