* [RFCv1 1/1] USB: EHCI: Do not return -EPIPE when hub is disconnected @ 2019-11-29 14:08 Erkka Talvitie 2019-12-02 19:43 ` Alan Stern 0 siblings, 1 reply; 16+ messages in thread From: Erkka Talvitie @ 2019-11-29 14:08 UTC (permalink / raw) To: stern, gregkh; +Cc: linux-usb, claus.baumgartner, Erkka Talvitie When disconnecting a USB hub that has some child device(s) connected to it (such as a USB mouse), then the stack tries to clear halt and reset device(s) which are _already_ physically disconnected. The issue has been reproduced with: CPU: IMX6D5EYM10AD or MCIMX6D5EYM10AE. SW: U-Boot 2019.07 and kernel 4.19.40. In this situation there will be error bit for MMF active yet the CERR equals EHCI_TUNE_CERR + halt. Existing implementation interprets this as a stall [1] (chapter 8.4.5). Fix for the issue is at first to check for a stall that comes after an error (the CERR has been decreased). Then after that, check for other errors. And at last check for stall without other errors (the CERR equals EHCI_TUNE_CERR as stall does not decrease the CERR [2] (table 3-16)). What happens after the fix is that when disconnecting a hub with attached device(s) the situation is not interpret as a stall. The specification [2] is not clear about error priorities, but since there is no explicit error bit for the stall, it is assumed to be lower priority than other errors. [1] https://www.usb.org/document-library/usb-20-specification, usb_20.pdf [2] https://www.intel.com/content/dam/www/public/us/en/documents/technical-specifications/ehci-specification-for-usb.pdf Signed-off-by: Erkka Talvitie <erkka.talvitie@vincit.fi> --- Following test steps were executed to first demonstrate the issue and then to verify the fix: --> Attach a mouse into host port and then disconnect it. --> Attach a keyboard into host port and then disconnect it. --> Attach a KVM switch into host port and then disconnect it. --> Attach a KVM switch into host port, attach a mouse into the KVM and then disconnect it. --> Re-attach the mouse into the KVM and then disconnect the KVM switch. --> Attach a KVM switch into host port, attach a keyboard into the KVM and then disconnect it. --> Re-attach the keyboard into the KVM and then disconnect the KVM switch. --> Attach a KVM switch into host port, attach a mouse and a keyboard into the KVM and then disconnect the KVM switch. --> End test. Output without the fix: --> Attach a mouse into host port and then disconnect it. kernel: [ 374.409837] usb 2-1.2: new low-speed USB device number 16 using ci_hdrc kernel: [ 374.673010] input: Logitech USB Optical Mouse as /devices/soc0/soc/2100000.aips-bus/2184200.usb/ci_hdrc.1/usb2/2-1/2-1.2/2-1.2:1.0/0003:046D:C05F.0006/input/input7 kernel: [ 374.674044] hid-generic 0003:046D:C05F.0006: input: USB HID v1.11 Mouse [Logitech USB Optical Mouse] on usb-ci_hdrc.1-1.2/input0 weston[464]: [11:14:07.717] event6 - Logitech USB Optical Mouse: is tagged by udev as: Mouse weston[464]: [11:14:07.717] event6 - Logitech USB Optical Mouse: device is a pointer weston[464]: [11:14:07.718] libinput: configuring device "Logitech USB Optical Mouse". weston[464]: [11:14:07.718] associating input device event6 with output HDMI-A-1 (none by udev) kernel: [ 378.931298] usb 2-1.2: USB disconnect, device number 16 weston[464]: [11:14:11.668] event6 - Logitech USB Optical Mouse: device removed --> Attach a keyboard into host port and then disconnect it. kernel: [ 400.759884] usb 2-1.2: new low-speed USB device number 17 using ci_hdrc kernel: [ 401.025695] input: Dell Dell QuietKey Keyboard as /devices/soc0/soc/2100000.aips-bus/2184200.usb/ci_hdrc.1/usb2/2-1/2-1.2/2-1.2:1.0/0003:413C:2106.0007/input/input8 kernel: [ 401.090413] hid-generic 0003:413C:2106.0007: input: USB HID v1.10 Keyboard [Dell Dell QuietKey Keyboard] on usb-ci_hdrc.1-1.2/input0 systemd-logind[392]: Watching system buttons on /dev/input/event6 (Dell Dell QuietKey Keyboard) weston[464]: [11:14:33.992] event6 - Dell Dell QuietKey Keyboard: is tagged by udev as: Keyboard weston[464]: [11:14:33.994] event6 - Dell Dell QuietKey Keyboard: device is a keyboard weston[464]: [11:14:33.995] libinput: configuring device "Dell Dell QuietKey Keyboard". weston[464]: [11:14:33.996] associating input device event6 with output HDMI-A-1 (none by udev) kernel: [ 404.230882] usb 2-1.2: USB disconnect, device number 17 weston[464]: [11:14:37.047] event6 - Dell Dell QuietKey Keyboard: device removed --> Attach a KVM switch into host port and then disconnect it. kernel: [ 424.059844] usb 2-1.2: new high-speed USB device number 18 using ci_hdrc kernel: [ 424.315581] hub 2-1.2:1.0: USB hub found kernel: [ 424.316179] hub 2-1.2:1.0: 4 ports detected kernel: [ 428.268438] usb 2-1.2: USB disconnect, device number 18 --> Attach a KVM switch into host port, attach a mouse into the KVM and then disconnect it. kernel: [ 443.509853] usb 2-1.2: new high-speed USB device number 19 using ci_hdrc kernel: [ 443.763137] hub 2-1.2:1.0: USB hub found kernel: [ 443.763860] hub 2-1.2:1.0: 4 ports detected kernel: [ 448.299911] usb 2-1.2.2: new low-speed USB device number 20 using ci_hdrc kernel: [ 448.463160] input: Logitech USB Optical Mouse as /devices/soc0/soc/2100000.aips-bus/2184200.usb/ci_hdrc.1/usb2/2-1/2-1.2/2-1.2.2/2-1.2.2:1.0/0003:046D:C05F.0008/input/input9 kernel: [ 448.463380] hid-generic 0003:046D:C05F.0008: input: USB HID v1.11 Mouse [Logitech USB Optical Mouse] on usb-ci_hdrc.1-1.2.2/input0 weston[464]: [11:15:21.508] event6 - Logitech USB Optical Mouse: is tagged by udev as: Mouse weston[464]: [11:15:21.508] event6 - Logitech USB Optical Mouse: device is a pointer weston[464]: [11:15:21.509] libinput: configuring device "Logitech USB Optical Mouse". weston[464]: [11:15:21.509] associating input device event6 with output HDMI-A-1 (none by udev) kernel: [ 453.915419] usb 2-1.2.2: USB disconnect, device number 20 weston[464]: [11:15:26.655] event6 - Logitech USB Optical Mouse: device removed --> Re-attach the mouse into the KVM and then disconnect the KVM switch. kernel: [ 468.519841] usb 2-1.2.2: new low-speed USB device number 21 using ci_hdrc kernel: [ 468.683002] input: Logitech USB Optical Mouse as /devices/soc0/soc/2100000.aips-bus/2184200.usb/ci_hdrc.1/usb2/2-1/2-1.2/2-1.2.2/2-1.2.2:1.0/0003:046D:C05F.0009/input/input10 kernel: [ 468.683202] hid-generic 0003:046D:C05F.0009: input: USB HID v1.11 Mouse [Logitech USB Optical Mouse] on usb-ci_hdrc.1-1.2.2/input0 weston[464]: [11:15:41.727] event6 - Logitech USB Optical Mouse: is tagged by udev as: Mouse weston[464]: [11:15:41.728] event6 - Logitech USB Optical Mouse: device is a pointer weston[464]: [11:15:41.728] libinput: configuring device "Logitech USB Optical Mouse". weston[464]: [11:15:41.728] associating input device event6 with output HDMI-A-1 (none by udev) kernel: [ 471.118091] usb 2-1.2: clear tt 1 (0150) error -71 kernel: [ 471.134041] usb 2-1.2-port2: cannot reset (err = -71) kernel: [ 471.138266] usb 2-1.2-port2: cannot reset (err = -71) kernel: [ 471.142515] usb 2-1.2-port2: cannot reset (err = -71) kernel: [ 471.146773] usb 2-1.2-port2: cannot reset (err = -71) kernel: [ 471.151013] usb 2-1.2-port2: cannot reset (err = -71) kernel: [ 471.151020] usb 2-1.2-port2: Cannot enable. Maybe the USB cable is bad? kernel: [ 471.155259] usb 2-1.2-port2: cannot disable (err = -71) kernel: [ 471.159542] usb 2-1.2-port2: cannot reset (err = -71) kernel: [ 471.163765] usb 2-1.2-port2: cannot reset (err = -71) kernel: [ 471.168007] usb 2-1.2-port2: cannot reset (err = -71) kernel: [ 471.172267] usb 2-1.2-port2: cannot reset (err = -71) kernel: [ 471.176507] usb 2-1.2-port2: cannot reset (err = -71) kernel: [ 471.176512] usb 2-1.2-port2: Cannot enable. Maybe the USB cable is bad? kernel: [ 471.180763] usb 2-1.2-port2: cannot disable (err = -71) kernel: [ 471.185022] usb 2-1.2-port2: cannot reset (err = -71) kernel: [ 471.189261] usb 2-1.2-port2: cannot reset (err = -71) kernel: [ 471.194198] usb 2-1.2-port2: cannot reset (err = -71) kernel: [ 471.198273] usb 2-1.2-port2: cannot reset (err = -71) kernel: [ 471.202514] usb 2-1.2-port2: cannot reset (err = -71) kernel: [ 471.202521] usb 2-1.2-port2: Cannot enable. Maybe the USB cable is bad? kernel: [ 471.206757] usb 2-1.2-port2: cannot disable (err = -71) kernel: [ 471.211024] usb 2-1.2-port2: cannot reset (err = -71) kernel: [ 471.215257] usb 2-1.2-port2: cannot reset (err = -71) kernel: [ 471.219524] usb 2-1.2-port2: cannot reset (err = -71) kernel: [ 471.223767] usb 2-1.2-port2: cannot reset (err = -71) kernel: [ 471.228007] usb 2-1.2-port2: cannot reset (err = -71) kernel: [ 471.228013] usb 2-1.2-port2: Cannot enable. Maybe the USB cable is bad? kernel: [ 471.232261] usb 2-1.2-port2: cannot disable (err = -71) kernel: [ 471.236514] usb 2-1.2-port2: cannot disable (err = -71) kernel: [ 471.240801] hub 2-1.2:1.0: hub_ext_port_status failed (err = -71) weston[464]: [11:15:43.974] event6 - Logitech USB Optical Mouse: device removed kernel: [ 471.276392] usb 2-1.2: USB disconnect, device number 19 kernel: [ 471.276404] usb 2-1.2.2: USB disconnect, device number 21 --> Attach a KVM switch into host port, attach a keyboard into the KVM and then disconnect it. kernel: [ 489.339839] usb 2-1.2: new high-speed USB device number 22 using ci_hdrc kernel: [ 489.602886] hub 2-1.2:1.0: USB hub found kernel: [ 489.603397] hub 2-1.2:1.0: 4 ports detected kernel: [ 493.869845] usb 2-1.2.2: new low-speed USB device number 23 using ci_hdrc kernel: [ 494.046950] input: Dell Dell QuietKey Keyboard as /devices/soc0/soc/2100000.aips-bus/2184200.usb/ci_hdrc.1/usb2/2-1/2-1.2/2-1.2.2/2-1.2.2:1.0/0003:413C:2106.000A/input/input11 kernel: [ 494.112909] hid-generic 0003:413C:2106.000A: input: USB HID v1.10 Keyboard [Dell Dell QuietKey Keyboard] on usb-ci_hdrc.1-1.2.2/input0 systemd-logind[392]: Watching system buttons on /dev/input/event6 (Dell Dell QuietKey Keyboard) weston[464]: [11:16:07.033] event6 - Dell Dell QuietKey Keyboard: is tagged by udev as: Keyboard weston[464]: [11:16:07.038] event6 - Dell Dell QuietKey Keyboard: device is a keyboard weston[464]: [11:16:07.040] libinput: configuring device "Dell Dell QuietKey Keyboard". weston[464]: [11:16:07.040] associating input device event6 with output HDMI-A-1 (none by udev) kernel: [ 498.303010] usb 2-1.2.2: USB disconnect, device number 23 weston[464]: [11:16:11.135] event6 - Dell Dell QuietKey Keyboard: device removed --> Re-attach the keyboard into the KVM and then disconnect the KVM switch. kernel: [ 510.509858] usb 2-1.2.2: new low-speed USB device number 24 using ci_hdrc kernel: [ 510.673159] input: Dell Dell QuietKey Keyboard as /devices/soc0/soc/2100000.aips-bus/2184200.usb/ci_hdrc.1/usb2/2-1/2-1.2/2-1.2.2/2-1.2.2:1.0/0003:413C:2106.000B/input/input12 kernel: [ 510.740488] hid-generic 0003:413C:2106.000B: input: USB HID v1.10 Keyboard [Dell Dell QuietKey Keyboard] on usb-ci_hdrc.1-1.2.2/input0 systemd-logind[392]: Watching system buttons on /dev/input/event6 (Dell Dell QuietKey Keyboard) weston[464]: [11:16:23.669] event6 - Dell Dell QuietKey Keyboard: is tagged by udev as: Keyboard weston[464]: [11:16:23.670] event6 - Dell Dell QuietKey Keyboard: device is a keyboard weston[464]: [11:16:23.672] libinput: configuring device "Dell Dell QuietKey Keyboard". weston[464]: [11:16:23.673] associating input device event6 with output HDMI-A-1 (none by udev) kernel: [ 513.808762] usb 2-1.2: clear tt 1 (0180) error -71 kernel: [ 513.824014] usb 2-1.2-port2: cannot reset (err = -71) kernel: [ 513.828254] usb 2-1.2-port2: cannot reset (err = -71) kernel: [ 513.832510] usb 2-1.2-port2: cannot reset (err = -71) kernel: [ 513.837034] usb 2-1.2-port2: cannot reset (err = -71) kernel: [ 513.841135] usb 2-1.2-port2: cannot reset (err = -71) kernel: [ 513.841141] usb 2-1.2-port2: Cannot enable. Maybe the USB cable is bad? kernel: [ 513.845391] usb 2-1.2-port2: cannot disable (err = -71) kernel: [ 513.849575] usb 2-1.2-port2: cannot reset (err = -71) kernel: [ 513.853760] usb 2-1.2-port2: cannot reset (err = -71) kernel: [ 513.858009] usb 2-1.2-port2: cannot reset (err = -71) kernel: [ 513.862264] usb 2-1.2-port2: cannot reset (err = -71) kernel: [ 513.866514] usb 2-1.2-port2: cannot reset (err = -71) kernel: [ 513.866521] usb 2-1.2-port2: Cannot enable. Maybe the USB cable is bad? kernel: [ 513.870812] usb 2-1.2-port2: cannot disable (err = -71) kernel: [ 513.875012] usb 2-1.2-port2: cannot reset (err = -71) kernel: [ 513.879253] usb 2-1.2-port2: cannot reset (err = -71) kernel: [ 513.883511] usb 2-1.2-port2: cannot reset (err = -71) kernel: [ 513.887763] usb 2-1.2-port2: cannot reset (err = -71) kernel: [ 513.892012] usb 2-1.2-port2: cannot reset (err = -71) kernel: [ 513.892019] usb 2-1.2-port2: Cannot enable. Maybe the USB cable is bad? kernel: [ 513.896267] usb 2-1.2-port2: cannot disable (err = -71) kernel: [ 513.900572] usb 2-1.2-port2: cannot reset (err = -71) kernel: [ 513.904776] usb 2-1.2-port2: cannot reset (err = -71) kernel: [ 513.909004] usb 2-1.2-port2: cannot reset (err = -71) kernel: [ 513.913290] usb 2-1.2-port2: cannot reset (err = -71) kernel: [ 513.917512] usb 2-1.2-port2: cannot reset (err = -71) kernel: [ 513.917517] usb 2-1.2-port2: Cannot enable. Maybe the USB cable is bad? kernel: [ 513.921769] usb 2-1.2-port2: cannot disable (err = -71) kernel: [ 513.926009] usb 2-1.2-port2: cannot disable (err = -71) kernel: [ 513.930372] hub 2-1.2:1.0: hub_ext_port_status failed (err = -71) weston[464]: [11:16:26.752] event6 - Dell Dell QuietKey Keyboard: device removed kernel: [ 514.028708] usb 2-1.2: USB disconnect, device number 22 kernel: [ 514.028721] usb 2-1.2.2: USB disconnect, device number 24 --> Attach a KVM switch into host port, attach a mouse and a keyboard into the KVM and then disconnect the KVM switch. kernel: [ 530.299896] usb 2-1.2: new high-speed USB device number 25 using ci_hdrc kernel: [ 530.553126] hub 2-1.2:1.0: USB hub found kernel: [ 530.553656] hub 2-1.2:1.0: 4 ports detected kernel: [ 534.580137] usb 2-1.2.2: new low-speed USB device number 26 using ci_hdrc kernel: [ 534.743092] input: Logitech USB Optical Mouse as /devices/soc0/soc/2100000.aips-bus/2184200.usb/ci_hdrc.1/usb2/2-1/2-1.2/2-1.2.2/2-1.2.2:1.0/0003:046D:C05F.000C/input/input13 kernel: [ 534.744022] hid-generic 0003:046D:C05F.000C: input: USB HID v1.11 Mouse [Logitech USB Optical Mouse] on usb-ci_hdrc.1-1.2.2/input0 weston[464]: [11:16:47.778] event6 - Logitech USB Optical Mouse: is tagged by udev as: Mouse weston[464]: [11:16:47.778] event6 - Logitech USB Optical Mouse: device is a pointer weston[464]: [11:16:47.779] libinput: configuring device "Logitech USB Optical Mouse". weston[464]: [11:16:47.779] associating input device event6 with output HDMI-A-1 (none by udev) kernel: [ 536.889928] usb 2-1.2.4: new low-speed USB device number 27 using ci_hdrc kernel: [ 537.054252] input: Dell Dell QuietKey Keyboard as /devices/soc0/soc/2100000.aips-bus/2184200.usb/ci_hdrc.1/usb2/2-1/2-1.2/2-1.2.4/2-1.2.4:1.0/0003:413C:2106.000D/input/input14 kernel: [ 537.120711] hid-generic 0003:413C:2106.000D: input: USB HID v1.10 Keyboard [Dell Dell QuietKey Keyboard] on usb-ci_hdrc.1-1.2.4/input0 systemd-logind[392]: Watching system buttons on /dev/input/event7 (Dell Dell QuietKey Keyboard) weston[464]: [11:16:50.022] event7 - Dell Dell QuietKey Keyboard: is tagged by udev as: Keyboard weston[464]: [11:16:50.024] event7 - Dell Dell QuietKey Keyboard: device is a keyboard weston[464]: [11:16:50.025] libinput: configuring device "Dell Dell QuietKey Keyboard". weston[464]: [11:16:50.026] associating input device event7 with output HDMI-A-1 (none by udev) kernel: [ 540.962649] usb 2-1.2: clear tt 1 (01a0) error -71 kernel: [ 540.966889] usb 2-1.2: clear tt 1 (01b0) error -71 kernel: [ 540.974144] usb 2-1.2-port4: cannot reset (err = -71) kernel: [ 540.978389] usb 2-1.2-port4: cannot reset (err = -71) kernel: [ 540.982519] usb 2-1.2-port4: cannot reset (err = -71) kernel: [ 540.986762] usb 2-1.2-port4: cannot reset (err = -71) kernel: [ 540.991019] usb 2-1.2-port4: cannot reset (err = -71) kernel: [ 540.991025] usb 2-1.2-port4: Cannot enable. Maybe the USB cable is bad? kernel: [ 540.995254] usb 2-1.2-port4: cannot disable (err = -71) kernel: [ 540.999529] usb 2-1.2-port4: cannot reset (err = -71) kernel: [ 541.003785] usb 2-1.2-port4: cannot reset (err = -71) kernel: [ 541.008015] usb 2-1.2-port4: cannot reset (err = -71) kernel: [ 541.012276] usb 2-1.2-port4: cannot reset (err = -71) kernel: [ 541.016511] usb 2-1.2-port4: cannot reset (err = -71) kernel: [ 541.016517] usb 2-1.2-port4: Cannot enable. Maybe the USB cable is bad? kernel: [ 541.020910] usb 2-1.2-port4: cannot disable (err = -71) kernel: [ 541.025107] usb 2-1.2-port2: cannot reset (err = -71) kernel: [ 541.029286] usb 2-1.2-port2: cannot reset (err = -71) kernel: [ 541.033522] usb 2-1.2-port2: cannot reset (err = -71) kernel: [ 541.037793] usb 2-1.2-port2: cannot reset (err = -71) kernel: [ 541.042021] usb 2-1.2-port2: cannot reset (err = -71) kernel: [ 541.042028] usb 2-1.2-port2: Cannot enable. Maybe the USB cable is bad? kernel: [ 541.046269] usb 2-1.2-port2: cannot disable (err = -71) kernel: [ 541.050543] usb 2-1.2-port2: cannot reset (err = -71) kernel: [ 541.054753] usb 2-1.2-port2: cannot reset (err = -71) kernel: [ 541.059005] usb 2-1.2-port2: cannot reset (err = -71) kernel: [ 541.063266] usb 2-1.2-port2: cannot reset (err = -71) kernel: [ 541.067513] usb 2-1.2-port2: cannot reset (err = -71) kernel: [ 541.067518] usb 2-1.2-port2: Cannot enable. Maybe the USB cable is bad? kernel: [ 541.071769] usb 2-1.2-port2: cannot disable (err = -71) kernel: [ 541.076005] usb 2-1.2-port4: cannot reset (err = -71) kernel: [ 541.080324] usb 2-1.2-port4: cannot reset (err = -71) kernel: [ 541.084543] usb 2-1.2-port4: cannot reset (err = -71) kernel: [ 541.088762] usb 2-1.2-port4: cannot reset (err = -71) kernel: [ 541.093018] usb 2-1.2-port4: cannot reset (err = -71) kernel: [ 541.093024] usb 2-1.2-port4: Cannot enable. Maybe the USB cable is bad? kernel: [ 541.097261] usb 2-1.2-port4: cannot disable (err = -71) kernel: [ 541.101515] usb 2-1.2-port4: cannot reset (err = -71) kernel: [ 541.105751] usb 2-1.2-port4: cannot reset (err = -71) kernel: [ 541.110169] usb 2-1.2-port4: cannot reset (err = -71) kernel: [ 541.114265] usb 2-1.2-port4: cannot reset (err = -71) kernel: [ 541.118507] usb 2-1.2-port4: cannot reset (err = -71) kernel: [ 541.118512] usb 2-1.2-port4: Cannot enable. Maybe the USB cable is bad? kernel: [ 541.122763] usb 2-1.2-port4: cannot disable (err = -71) kernel: [ 541.127006] usb 2-1.2-port4: cannot disable (err = -71) kernel: [ 541.131266] usb 2-1.2-port2: cannot reset (err = -71) kernel: [ 541.135547] hub 2-1.2:1.0: hub_ext_port_status failed (err = -71) kernel: [ 541.139763] usb 2-1.2-port2: cannot reset (err = -71) kernel: [ 541.144017] usb 2-1.2-port2: cannot reset (err = -71) kernel: [ 541.148262] usb 2-1.2-port2: cannot reset (err = -71) kernel: [ 541.152513] usb 2-1.2-port2: cannot reset (err = -71) kernel: [ 541.152519] usb 2-1.2-port2: Cannot enable. Maybe the USB cable is bad? kernel: [ 541.156752] usb 2-1.2-port2: cannot disable (err = -71) kernel: [ 541.161031] usb 2-1.2-port2: cannot reset (err = -71) kernel: [ 541.164659] usb 2-1.2: USB disconnect, device number 25 kernel: [ 541.164670] usb 2-1.2.2: USB disconnect, device number 26 kernel: [ 541.165358] usb 2-1.2-port2: cannot reset (err = -71) weston[464]: [11:16:53.936] event7 - Dell Dell QuietKey Keyboard: device removed weston[464]: [11:16:53.976] event6 - Logitech USB Optical Mouse: device removed kernel: [ 541.271277] usb 2-1.2.4: USB disconnect, device number 27 --> End test. Output with the fix: --> Attach a mouse into host port and then disconnect it. kernel: [ 101.229751] usb 2-1.2: new low-speed USB device number 16 using ci_hdrc kernel: [ 101.496424] input: Logitech USB Optical Mouse as /devices/soc0/soc/2100000.aips-bus/2184200.usb/ci_hdrc.1/usb2/2-1/2-1.2/2-1.2:1.0/0003:046D:C05F.0006/input/input7 kernel: [ 101.497252] hid-generic 0003:046D:C05F.0006: input: USB HID v1.11 Mouse [Logitech USB Optical Mouse] on usb-ci_hdrc.1-1.2/input0 weston[486]: [11:34:50.974] event6 - Logitech USB Optical Mouse: is tagged by udev as: Mouse weston[486]: [11:34:50.975] event6 - Logitech USB Optical Mouse: device is a pointer weston[486]: [11:34:50.975] libinput: configuring device "Logitech USB Optical Mouse". weston[486]: [11:34:50.975] associating input device event6 with output HDMI-A-1 (none by udev) kernel: [ 104.418306] usb 2-1.2: USB disconnect, device number 16 weston[486]: [11:34:53.604] event6 - Logitech USB Optical Mouse: device removed --> Attach a keyboard into host port and then disconnect it. kernel: [ 126.829750] usb 2-1.2: new low-speed USB device number 17 using ci_hdrc kernel: [ 127.096213] input: Dell Dell QuietKey Keyboard as /devices/soc0/soc/2100000.aips-bus/2184200.usb/ci_hdrc.1/usb2/2-1/2-1.2/2-1.2:1.0/0003:413C:2106.0007/input/input8 kernel: [ 127.160286] hid-generic 0003:413C:2106.0007: input: USB HID v1.10 Keyboard [Dell Dell QuietKey Keyboard] on usb-ci_hdrc.1-1.2/input0 systemd-logind[418]: Watching system buttons on /dev/input/event6 (Dell Dell QuietKey Keyboard) weston[486]: [11:35:16.511] event6 - Dell Dell QuietKey Keyboard: is tagged by udev as: Keyboard weston[486]: [11:35:16.511] event6 - Dell Dell QuietKey Keyboard: device is a keyboard weston[486]: [11:35:16.513] libinput: configuring device "Dell Dell QuietKey Keyboard". weston[486]: [11:35:16.513] associating input device event6 with output HDMI-A-1 (none by udev) kernel: [ 129.506479] usb 2-1.2: USB disconnect, device number 17 weston[486]: [11:35:18.775] event6 - Dell Dell QuietKey Keyboard: device removed --> Attach a KVM switch into host port and then disconnect it. kernel: [ 160.369756] usb 2-1.2: new high-speed USB device number 18 using ci_hdrc kernel: [ 160.622949] hub 2-1.2:1.0: USB hub found kernel: [ 160.623495] hub 2-1.2:1.0: 4 ports detected kernel: [ 163.298449] usb 2-1.2: USB disconnect, device number 18 --> Attach a KVM switch into host port, attach a mouse into the KVM and then disconnect it. kernel: [ 184.939760] usb 2-1.2: new high-speed USB device number 19 using ci_hdrc kernel: [ 185.195174] hub 2-1.2:1.0: USB hub found kernel: [ 185.195706] hub 2-1.2:1.0: 4 ports detected kernel: [ 190.249755] usb 2-1.2.2: new low-speed USB device number 20 using ci_hdrc kernel: [ 190.416307] input: Logitech USB Optical Mouse as /devices/soc0/soc/2100000.aips-bus/2184200.usb/ci_hdrc.1/usb2/2-1/2-1.2/2-1.2.2/2-1.2.2:1.0/0003:046D:C05F.0008/input/input9 kernel: [ 190.417250] hid-generic 0003:046D:C05F.0008: input: USB HID v1.11 Mouse [Logitech USB Optical Mouse] on usb-ci_hdrc.1-1.2.2/input0 weston[486]: [11:36:19.907] event6 - Logitech USB Optical Mouse: is tagged by udev as: Mouse weston[486]: [11:36:19.907] event6 - Logitech USB Optical Mouse: device is a pointer weston[486]: [11:36:19.908] libinput: configuring device "Logitech USB Optical Mouse". weston[486]: [11:36:19.908] associating input device event6 with output HDMI-A-1 (none by udev) kernel: [ 193.531312] usb 2-1.2.2: USB disconnect, device number 20 weston[486]: [11:36:22.721] event6 - Logitech USB Optical Mouse: device removed --> Re-attach the mouse into the KVM and then disconnect the KVM switch. kernel: [ 208.169755] usb 2-1.2.2: new low-speed USB device number 21 using ci_hdrc kernel: [ 208.333289] input: Logitech USB Optical Mouse as /devices/soc0/soc/2100000.aips-bus/2184200.usb/ci_hdrc.1/usb2/2-1/2-1.2/2-1.2.2/2-1.2.2:1.0/0003:046D:C05F.0009/input/input10 kernel: [ 208.335566] hid-generic 0003:046D:C05F.0009: input: USB HID v1.11 Mouse [Logitech USB Optical Mouse] on usb-ci_hdrc.1-1.2.2/input0 weston[486]: [11:36:37.805] event6 - Logitech USB Optical Mouse: is tagged by udev as: Mouse weston[486]: [11:36:37.806] event6 - Logitech USB Optical Mouse: device is a pointer weston[486]: [11:36:37.806] libinput: configuring device "Logitech USB Optical Mouse". weston[486]: [11:36:37.806] associating input device event6 with output HDMI-A-1 (none by udev) kernel: [ 211.426306] usb 2-1.2: USB disconnect, device number 19 kernel: [ 211.426317] usb 2-1.2.2: USB disconnect, device number 21 weston[486]: [11:36:40.616] event6 - Logitech USB Optical Mouse: device removed --> Attach a KVM switch into host port, attach a keyboard into the KVM and then disconnect it. kernel: [ 230.249761] usb 2-1.2: new high-speed USB device number 22 using ci_hdrc kernel: [ 230.502823] hub 2-1.2:1.0: USB hub found kernel: [ 230.503332] hub 2-1.2:1.0: 4 ports detected kernel: [ 237.349770] usb 2-1.2.2: new low-speed USB device number 23 using ci_hdrc kernel: [ 237.516824] input: Dell Dell QuietKey Keyboard as /devices/soc0/soc/2100000.aips-bus/2184200.usb/ci_hdrc.1/usb2/2-1/2-1.2/2-1.2.2/2-1.2.2:1.0/0003:413C:2106.000A/input/input11 kernel: [ 237.580297] hid-generic 0003:413C:2106.000A: input: USB HID v1.10 Keyboard [Dell Dell QuietKey Keyboard] on usb-ci_hdrc.1-1.2.2/input0 systemd-logind[418]: Watching system buttons on /dev/input/event6 (Dell Dell QuietKey Keyboard) weston[486]: [11:37:06.967] event6 - Dell Dell QuietKey Keyboard: is tagged by udev as: Keyboard weston[486]: [11:37:06.969] event6 - Dell Dell QuietKey Keyboard: device is a keyboard weston[486]: [11:37:06.971] libinput: configuring device "Dell Dell QuietKey Keyboard". weston[486]: [11:37:06.972] associating input device event6 with output HDMI-A-1 (none by udev) kernel: [ 240.637437] usb 2-1.2.2: USB disconnect, device number 23 weston[486]: [11:37:09.897] event6 - Dell Dell QuietKey Keyboard: device removed --> Re-attach the keyboard into the KVM and then disconnect the KVM switch. kernel: [ 257.319980] usb 2-1.2.2: new low-speed USB device number 24 using ci_hdrc kernel: [ 257.483579] input: Dell Dell QuietKey Keyboard as /devices/soc0/soc/2100000.aips-bus/2184200.usb/ci_hdrc.1/usb2/2-1/2-1.2/2-1.2.2/2-1.2.2:1.0/0003:413C:2106.000B/input/input12 kernel: [ 257.560285] hid-generic 0003:413C:2106.000B: input: USB HID v1.10 Keyboard [Dell Dell QuietKey Keyboard] on usb-ci_hdrc.1-1.2.2/input0 systemd-logind[418]: Watching system buttons on /dev/input/event6 (Dell Dell QuietKey Keyboard) weston[486]: [11:37:26.926] event6 - Dell Dell QuietKey Keyboard: is tagged by udev as: Keyboard weston[486]: [11:37:26.928] event6 - Dell Dell QuietKey Keyboard: device is a keyboard weston[486]: [11:37:26.930] libinput: configuring device "Dell Dell QuietKey Keyboard". weston[486]: [11:37:26.931] associating input device event6 with output HDMI-A-1 (none by udev) kernel: [ 260.323187] usb 2-1.2: USB disconnect, device number 22 kernel: [ 260.323199] usb 2-1.2.2: USB disconnect, device number 24 weston[486]: [11:37:29.585] event6 - Dell Dell QuietKey Keyboard: device removed --> Attach a KVM switch into host port, attach a mouse and a keyboard into the KVM and then disconnect the KVM switch. kernel: [ 282.479747] usb 2-1.2: new high-speed USB device number 25 using ci_hdrc kernel: [ 282.732817] hub 2-1.2:1.0: USB hub found kernel: [ 282.733327] hub 2-1.2:1.0: 4 ports detected kernel: [ 288.299751] usb 2-1.2.2: new low-speed USB device number 26 using ci_hdrc kernel: [ 288.476005] input: Logitech USB Optical Mouse as /devices/soc0/soc/2100000.aips-bus/2184200.usb/ci_hdrc.1/usb2/2-1/2-1.2/2-1.2.2/2-1.2.2:1.0/0003:046D:C05F.000C/input/input13 kernel: [ 288.476222] hid-generic 0003:046D:C05F.000C: input: USB HID v1.11 Mouse [Logitech USB Optical Mouse] on usb-ci_hdrc.1-1.2.2/input0 weston[486]: [11:37:57.956] event6 - Logitech USB Optical Mouse: is tagged by udev as: Mouse weston[486]: [11:37:57.956] event6 - Logitech USB Optical Mouse: device is a pointer weston[486]: [11:37:57.957] libinput: configuring device "Logitech USB Optical Mouse". weston[486]: [11:37:57.957] associating input device event6 with output HDMI-A-1 (none by udev) kernel: [ 291.629751] usb 2-1.2.4: new low-speed USB device number 27 using ci_hdrc kernel: [ 291.793640] input: Dell Dell QuietKey Keyboard as /devices/soc0/soc/2100000.aips-bus/2184200.usb/ci_hdrc.1/usb2/2-1/2-1.2/2-1.2.4/2-1.2.4:1.0/0003:413C:2106.000D/input/input14 kernel: [ 291.860657] hid-generic 0003:413C:2106.000D: input: USB HID v1.10 Keyboard [Dell Dell QuietKey Keyboard] on usb-ci_hdrc.1-1.2.4/input0 systemd-logind[418]: Watching system buttons on /dev/input/event7 (Dell Dell QuietKey Keyboard) weston[486]: [11:38:01.238] event7 - Dell Dell QuietKey Keyboard: is tagged by udev as: Keyboard weston[486]: [11:38:01.242] event7 - Dell Dell QuietKey Keyboard: device is a keyboard weston[486]: [11:38:01.243] libinput: configuring device "Dell Dell QuietKey Keyboard". weston[486]: [11:38:01.243] associating input device event7 with output HDMI-A-1 (none by udev) kernel: [ 295.394440] usb 2-1.2: USB disconnect, device number 25 kernel: [ 295.394450] usb 2-1.2.2: USB disconnect, device number 26 weston[486]: [11:38:04.580] event6 - Logitech USB Optical Mouse: device removed kernel: [ 295.431931] usb 2-1.2.4: USB disconnect, device number 27 weston[486]: [11:38:04.725] event7 - Dell Dell QuietKey Keyboard: device removed --> End test. --- drivers/usb/host/ehci-q.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/drivers/usb/host/ehci-q.c b/drivers/usb/host/ehci-q.c index 3276304..da7fd12 100644 --- a/drivers/usb/host/ehci-q.c +++ b/drivers/usb/host/ehci-q.c @@ -206,8 +206,9 @@ static int qtd_copy_status ( if (token & QTD_STS_BABBLE) { /* FIXME "must" disable babbling device's port too */ status = -EOVERFLOW; - /* CERR nonzero + halt --> stall */ - } else if (QTD_CERR(token)) { + /* CERR nonzero and less than EHCI_TUNE_CERR + halt --> stall. + This handles situation where stall comes after an error. */ + } else if (QTD_CERR(token) && QTD_CERR(token) < EHCI_TUNE_CERR) { status = -EPIPE; /* In theory, more than one of the following bits can be set @@ -228,6 +229,10 @@ static int qtd_copy_status ( usb_pipeendpoint(urb->pipe), usb_pipein(urb->pipe) ? "in" : "out"); status = -EPROTO; + /* CERR equals EHCI_TUNE_CERR, no other errors + halt --> stall. + This handles situation where stall comes without error bits set. */ + } else if (QTD_CERR(token)) { + status = -EPIPE; } else { /* unknown */ status = -EPROTO; } -- 2.7.4 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [RFCv1 1/1] USB: EHCI: Do not return -EPIPE when hub is disconnected 2019-11-29 14:08 [RFCv1 1/1] USB: EHCI: Do not return -EPIPE when hub is disconnected Erkka Talvitie @ 2019-12-02 19:43 ` Alan Stern 2019-12-03 9:38 ` Erkka Talvitie 0 siblings, 1 reply; 16+ messages in thread From: Alan Stern @ 2019-12-02 19:43 UTC (permalink / raw) To: Erkka Talvitie; +Cc: gregkh, linux-usb, claus.baumgartner On Fri, 29 Nov 2019, Erkka Talvitie wrote: > When disconnecting a USB hub that has some child device(s) connected to it > (such as a USB mouse), then the stack tries to clear halt and > reset device(s) which are _already_ physically disconnected. That behavior is understandable. The kernel doesn't know that the device has been disconnected until it can process the notification from an upstream hub, and it can't process that notification while it's trying to reset the device. > The issue has been reproduced with: > > CPU: IMX6D5EYM10AD or MCIMX6D5EYM10AE. > SW: U-Boot 2019.07 and kernel 4.19.40. > > In this situation there will be error bit for MMF active yet the > CERR equals EHCI_TUNE_CERR + halt. Why? In general, setting the MMF bit does not cause the halt bit to be set (set Table 4-13 in the EHCI spec). In fact, MMF refers to errors that occur on the host, not bus errors caused by a disconnected device. > Existing implementation > interprets this as a stall [1] (chapter 8.4.5). That is the correct thing to do. When a transaction error occurs during a Complete-Split transaction, the host controller is supposed to decrement the CERR value, set the XACT bit, and retry the transaction unless the CERR value is 0 or there isn't enough time left in the microframe. The fact that you saw CERR equal to EHCI_TUNE_CERR and XACT clear probably means that your EHCI hardware is not behaving properly. > Fix for the issue is at first to check for a stall that comes after > an error (the CERR has been decreased). > > Then after that, check for other errors. > > And at last check for stall without other errors (the CERR equals > EHCI_TUNE_CERR as stall does not decrease the CERR [2] (table 3-16)). > > What happens after the fix is that when disconnecting a hub with > attached device(s) the situation is not interpret as a stall. > > The specification [2] is not clear about error priorities, but > since there is no explicit error bit for the stall, it is > assumed to be lower priority than other errors. On the contrary, the specification is very clear. Since transaction errors cause CERR to be decremented until it reaches 0, a nonzero value for CERR means the endpoint was halted for some other reason. And the only other reason is a stall. (Or end of the microframe, but there's no way to tell if that happened.) > [1] https://www.usb.org/document-library/usb-20-specification, usb_20.pdf > [2] https://www.intel.com/content/dam/www/public/us/en/documents/technical-specifications/ehci-specification-for-usb.pdf > > Signed-off-by: Erkka Talvitie <erkka.talvitie@vincit.fi> Can you duplicate this behavior on a standard PC, say with an Intel EHCI controller? > drivers/usb/host/ehci-q.c | 9 +++++++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/drivers/usb/host/ehci-q.c b/drivers/usb/host/ehci-q.c > index 3276304..da7fd12 100644 > --- a/drivers/usb/host/ehci-q.c > +++ b/drivers/usb/host/ehci-q.c > @@ -206,8 +206,9 @@ static int qtd_copy_status ( > if (token & QTD_STS_BABBLE) { > /* FIXME "must" disable babbling device's port too */ > status = -EOVERFLOW; > - /* CERR nonzero + halt --> stall */ > - } else if (QTD_CERR(token)) { > + /* CERR nonzero and less than EHCI_TUNE_CERR + halt --> stall. > + This handles situation where stall comes after an error. */ This comment doesn't make sense. Who cares whether a stall comes after an error or not? It's still a stall and should be reported. > + } else if (QTD_CERR(token) && QTD_CERR(token) < EHCI_TUNE_CERR) { > status = -EPIPE; If an error occurs and the transaction is retried and the retry gets a stall, then the final status should be -EPIPE, not something else. > /* In theory, more than one of the following bits can be set > @@ -228,6 +229,10 @@ static int qtd_copy_status ( > usb_pipeendpoint(urb->pipe), > usb_pipein(urb->pipe) ? "in" : "out"); > status = -EPROTO; > + /* CERR equals EHCI_TUNE_CERR, no other errors + halt --> stall. > + This handles situation where stall comes without error bits set. */ If CERR is equal to EHCI_TUNE_CERR then no other errors could have occurred (since any error will decrement CERR). So why shouldn't this case be included with the earlier case? > + } else if (QTD_CERR(token)) { > + status = -EPIPE; > } else { /* unknown */ > status = -EPROTO; > } Alan Stern ^ permalink raw reply [flat|nested] 16+ messages in thread
* RE: [RFCv1 1/1] USB: EHCI: Do not return -EPIPE when hub is disconnected 2019-12-02 19:43 ` Alan Stern @ 2019-12-03 9:38 ` Erkka Talvitie 2019-12-03 10:54 ` Erkka Talvitie 2019-12-03 19:01 ` Alan Stern 0 siblings, 2 replies; 16+ messages in thread From: Erkka Talvitie @ 2019-12-03 9:38 UTC (permalink / raw) To: 'Alan Stern'; +Cc: gregkh, linux-usb, claus.baumgartner Thank you for the comments. > -----Original Message----- > From: Alan Stern <stern@rowland.harvard.edu> > Sent: maanantai 2. joulukuuta 2019 21.44 > To: Erkka Talvitie <erkka.talvitie@vincit.fi> > Cc: gregkh@linuxfoundation.org; linux-usb@vger.kernel.org; > claus.baumgartner@med.ge.com > Subject: Re: [RFCv1 1/1] USB: EHCI: Do not return -EPIPE when hub is > disconnected > > On Fri, 29 Nov 2019, Erkka Talvitie wrote: > > > When disconnecting a USB hub that has some child device(s) connected > > to it (such as a USB mouse), then the stack tries to clear halt and > > reset device(s) which are _already_ physically disconnected. > > That behavior is understandable. The kernel doesn't know that the device > has been disconnected until it can process the notification from an upstream > hub, and it can't process that notification while it's trying to reset the device. > Ok. I was thinking that in this use case , it should not be trying to clear the halt (and reset the device when the clear halt fails). And this behavior was altered by this RFC. > > The issue has been reproduced with: > > > > CPU: IMX6D5EYM10AD or MCIMX6D5EYM10AE. > > SW: U-Boot 2019.07 and kernel 4.19.40. > > > > In this situation there will be error bit for MMF active yet the CERR > > equals EHCI_TUNE_CERR + halt. > > Why? In general, setting the MMF bit does not cause the halt bit to be set > (set Table 4-13 in the EHCI spec). In fact, MMF refers to errors that occur on > the host, not bus errors caused by a disconnected device. I do not know for sure why that happens. I was suspecting that there has been MMF error and a stall at the same time. And in this RFC it was assumed that MMF is with greater priority than stall. The disconnecting of a hub with attached devices cause the MMF error bit set even though it is a host side error. > > > Existing implementation > > interprets this as a stall [1] (chapter 8.4.5). > > That is the correct thing to do. When a transaction error occurs during a > Complete-Split transaction, the host controller is supposed to decrement the > CERR value, set the XACT bit, and retry the transaction unless the CERR value > is 0 or there isn't enough time left in the microframe. > > The fact that you saw CERR equal to EHCI_TUNE_CERR and XACT clear > probably means that your EHCI hardware is not behaving properly. If you refer to the XactErr bit (Table 4-13 [2] )with the "XACT clear" then unfortunately I did not check it's state ,so I am not sure if it was clear. In this patch, like also in the existing implementation, the MMF bit is checked first and since it is active in this situation the XactErr is not checked. I will check this. But as in this use case the CERR has not been decreased yet there is error bit active (MMF) do you see it is still correct to interpret it as a stall (even when the halt bit is set)? I have tried to find out more details about our EHCI controller version, but I have only found out those CPU versions. It might help in a search whether this could be a HW issue. > > > Fix for the issue is at first to check for a stall that comes after an > > error (the CERR has been decreased). > > > > Then after that, check for other errors. > > > > And at last check for stall without other errors (the CERR equals > > EHCI_TUNE_CERR as stall does not decrease the CERR [2] (table 3-16)). > > > > What happens after the fix is that when disconnecting a hub with > > attached device(s) the situation is not interpret as a stall. > > > > The specification [2] is not clear about error priorities, but since > > there is no explicit error bit for the stall, it is assumed to be > > lower priority than other errors. > > On the contrary, the specification is very clear. Since transaction errors cause > CERR to be decremented until it reaches 0, a nonzero value for CERR means > the endpoint was halted for some other reason. And the only other reason > is a stall. (Or end of the microframe, but there's no way to tell if that > happened.) I see your point. EHCI specification states that babble is a serious error and it will also cause the halt. The babble error bit is checked first. But the specification does not say about order of the other errors or about what to do if there is an error, no retries executed, yet a halt (stall). For example should the XactErr be checked before the MMF. >(Or end of the microframe, but there's no way to tell if that happened.) I was not able to locate this from the specification. Could you please point out where this statement is from? Could the way to tell if "end of microframe" happened, be what is done here - check for MMF error bit and if CERR has not been decreased? > > > [1] https://www.usb.org/document-library/usb-20-specification, > > usb_20.pdf [2] > > > https://www.intel.com/content/dam/www/public/us/en/documents/techn > ical > > -specifications/ehci-specification-for-usb.pdf > > > > Signed-off-by: Erkka Talvitie <erkka.talvitie@vincit.fi> > > Can you duplicate this behavior on a standard PC, say with an Intel EHCI > controller? We tested with native Linux PC and the error did not reproduce. However I am not sure about the used host controller in that PC. I will check that or try to get a setup with Intel EHCI. > > > drivers/usb/host/ehci-q.c | 9 +++++++-- > > 1 file changed, 7 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/usb/host/ehci-q.c b/drivers/usb/host/ehci-q.c > > index 3276304..da7fd12 100644 > > --- a/drivers/usb/host/ehci-q.c > > +++ b/drivers/usb/host/ehci-q.c > > @@ -206,8 +206,9 @@ static int qtd_copy_status ( > > if (token & QTD_STS_BABBLE) { > > /* FIXME "must" disable babbling > device's port too */ > > status = -EOVERFLOW; > > - /* CERR nonzero + halt --> stall */ > > - } else if (QTD_CERR(token)) { > > + /* CERR nonzero and less than > EHCI_TUNE_CERR + halt --> stall. > > + This handles situation where stall comes after > an error. */ > > This comment doesn't make sense. Who cares whether a stall comes after > an error or not? It's still a stall and should be reported. This was basically a comment trying to answer to this commit: ba516de332c0 USB: EHCI: check for STALL before other errors "The existing code doesn't do this properly, because it tests for MMF (Missed MicroFrame) and DBE (Data Buffer Error) before testing the retry counter. Thus, if a transaction gets either MMF or DBE the corresponding flag is set and the transaction is retried. If the second attempt receives a STALL then -EPIPE is the correct return value. But the existing code would see the MMF or DBE flag instead and return -EPROTO, -ENOSR, or -ECOMM." The comment tries to explain that it will not revert the fix made in the commit ba516de332c0. > > > + } else if (QTD_CERR(token) && > QTD_CERR(token) < EHCI_TUNE_CERR) { > > status = -EPIPE; > > If an error occurs and the transaction is retried and the retry gets a stall, then > the final status should be -EPIPE, not something else. This is how the RFC also works. If the transaction has been retried and gets stall then -EPIPE is returned. Or if there are no errors but there is a stall then -EPIPE is returned. The only difference in this patch in comparison to the existing implementation is that if there is an error but the transaction has not been retried it is not interpret as a stall even if there is a halt. > > > /* In theory, more than one of the following bits > can be set @@ > > -228,6 +229,10 @@ static int qtd_copy_status ( > > > usb_pipeendpoint(urb->pipe), > > usb_pipein(urb- > >pipe) ? "in" : "out"); > > status = -EPROTO; > > + /* CERR equals EHCI_TUNE_CERR, no other > errors + halt --> stall. > > + This handles situation where stall comes > without error bits set. > > +*/ > > If CERR is equal to EHCI_TUNE_CERR then no other errors could have > occurred (since any error will decrement CERR). So why shouldn't this case > be included with the earlier case? That is what I also understood from the EHCI specification. If there is an error the CERR should decrease. Only babble, data buffer error and stall (or no error) will not decrement the CERR. However in this use case there is an error (MMF) but the CERR still equals to the EHCI_TUNE_CERR. So that's why the RFC separates these. This is the logic in the RFC: 1. The first if handles the situation where the stall comes after there has been an error AND a retry. CERR has been decreased. This is so that ba516de332c0 is not reverted. 2. The second if handles the situation where the halt has been caused by the stall AND there are no other errors. 3. If there are errors + halt, but no retries executed (CERR equals EHCI_TUNE_CERR) the response here is to return error value according to the error bit, not returning EPIPE according to the stall. > > > + } else if (QTD_CERR(token)) { > > + status = -EPIPE; > > } else { /* unknown */ > > status = -EPROTO; > > } > > Alan Stern Erkka Talvitie ^ permalink raw reply [flat|nested] 16+ messages in thread
* RE: [RFCv1 1/1] USB: EHCI: Do not return -EPIPE when hub is disconnected 2019-12-03 9:38 ` Erkka Talvitie @ 2019-12-03 10:54 ` Erkka Talvitie 2019-12-03 13:35 ` Erkka Talvitie 2019-12-03 19:01 ` Alan Stern 1 sibling, 1 reply; 16+ messages in thread From: Erkka Talvitie @ 2019-12-03 10:54 UTC (permalink / raw) To: 'Alan Stern'; +Cc: gregkh, linux-usb, claus.baumgartner > -----Original Message----- > From: Erkka Talvitie <erkka.talvitie@vincit.fi> > Sent: tiistai 3. joulukuuta 2019 11.39 > To: 'Alan Stern' <stern@rowland.harvard.edu> > Cc: gregkh@linuxfoundation.org; linux-usb@vger.kernel.org; > claus.baumgartner@med.ge.com > Subject: RE: [RFCv1 1/1] USB: EHCI: Do not return -EPIPE when hub is > disconnected > > Thank you for the comments. > > > -----Original Message----- > > From: Alan Stern <stern@rowland.harvard.edu> > > Sent: maanantai 2. joulukuuta 2019 21.44 > > To: Erkka Talvitie <erkka.talvitie@vincit.fi> > > Cc: gregkh@linuxfoundation.org; linux-usb@vger.kernel.org; > > claus.baumgartner@med.ge.com > > Subject: Re: [RFCv1 1/1] USB: EHCI: Do not return -EPIPE when hub is > > disconnected > > > > On Fri, 29 Nov 2019, Erkka Talvitie wrote: > > > > > When disconnecting a USB hub that has some child device(s) connected > > > to it (such as a USB mouse), then the stack tries to clear halt and > > > reset device(s) which are _already_ physically disconnected. > > > > That behavior is understandable. The kernel doesn't know that the > > device has been disconnected until it can process the notification > > from an > upstream > > hub, and it can't process that notification while it's trying to reset > > the > device. > > > > Ok. I was thinking that in this use case , it should not be trying to clear the halt > (and reset the device when the clear halt fails). And this behavior was altered > by this RFC. > > > > The issue has been reproduced with: > > > > > > CPU: IMX6D5EYM10AD or MCIMX6D5EYM10AE. > > > SW: U-Boot 2019.07 and kernel 4.19.40. > > > > > > In this situation there will be error bit for MMF active yet the > > > CERR equals EHCI_TUNE_CERR + halt. > > > > Why? In general, setting the MMF bit does not cause the halt bit to > > be > set > > (set Table 4-13 in the EHCI spec). In fact, MMF refers to errors that > occur on > > the host, not bus errors caused by a disconnected device. > > I do not know for sure why that happens. I was suspecting that there has > been MMF error and a stall at the same time. And in this RFC it was assumed > that MMF is with greater priority than stall. > The disconnecting of a hub with attached devices cause the MMF error bit > set even though it is a host side error. > > > > > > Existing implementation > > > interprets this as a stall [1] (chapter 8.4.5). > > > > That is the correct thing to do. When a transaction error occurs > > during a Complete-Split transaction, the host controller is supposed > > to decrement > the > > CERR value, set the XACT bit, and retry the transaction unless the > > CERR > value > > is 0 or there isn't enough time left in the microframe. > > > > The fact that you saw CERR equal to EHCI_TUNE_CERR and XACT clear > > probably means that your EHCI hardware is not behaving properly. > > If you refer to the XactErr bit (Table 4-13 [2] )with the "XACT clear" then > unfortunately I did not check it's state ,so I am not sure if it was clear. > In this patch, like also in the existing implementation, the MMF bit is checked > first and since it is active in this situation the XactErr is not checked. > > I will check this. > > But as in this use case the CERR has not been decreased yet there is error bit > active (MMF) do you see it is still correct to interpret it as a stall (even when > the halt bit is set)? > > I have tried to find out more details about our EHCI controller version, but I > have only found out those CPU versions. It might help in a search whether > this could be a HW issue. > > > > > > Fix for the issue is at first to check for a stall that comes after > > > an error (the CERR has been decreased). > > > > > > Then after that, check for other errors. > > > > > > And at last check for stall without other errors (the CERR equals > > > EHCI_TUNE_CERR as stall does not decrease the CERR [2] (table 3-16)). > > > > > > What happens after the fix is that when disconnecting a hub with > > > attached device(s) the situation is not interpret as a stall. > > > > > > The specification [2] is not clear about error priorities, but since > > > there is no explicit error bit for the stall, it is assumed to be > > > lower priority than other errors. > > > > On the contrary, the specification is very clear. Since transaction > errors cause > > CERR to be decremented until it reaches 0, a nonzero value for CERR > > means the endpoint was halted for some other reason. And the only > > other reason is a stall. (Or end of the microframe, but there's no > > way to tell if that > > happened.) > > I see your point. EHCI specification states that babble is a serious error and it > will also cause the halt. The babble error bit is checked first. But the > specification does not say about order of the other errors or about what to > do if there is an error, no retries executed, yet a halt (stall). For example > should the XactErr be checked before the MMF. > > >(Or end of the microframe, but there's no way to tell if that > >happened.) > > I was not able to locate this from the specification. Could you please point > out where this statement is from? > Could the way to tell if "end of microframe" happened, be what is done here > - check for MMF error bit and if CERR has not been decreased? > > > > > > [1] https://www.usb.org/document-library/usb-20-specification, > > > usb_20.pdf [2] > > > > > > https://www.intel.com/content/dam/www/public/us/en/documents/techn > > ical > > > -specifications/ehci-specification-for-usb.pdf > > > > > > Signed-off-by: Erkka Talvitie <erkka.talvitie@vincit.fi> > > > > Can you duplicate this behavior on a standard PC, say with an Intel > > EHCI controller? > > We tested with native Linux PC and the error did not reproduce. However I > am not sure about the used host controller in that PC. > I will check that or try to get a setup with Intel EHCI. The PC where the issue did not reproduce was ThinkPad T480 with: 3c:00.0 USB controller: Intel Corporation Device 15c1 (rev 01) [ 4.229310] xhci_hcd 0000:3c:00.0: xHCI Host Controller [ 4.238578] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver [ 4.239754] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver [ 4.240857] ehci-pci: EHCI PCI platform driver [ 4.241437] ohci-pci: OHCI PCI platform driver [ 4.243080] uhci_hcd: USB Universal Host Controller Interface driver > > > > > > drivers/usb/host/ehci-q.c | 9 +++++++-- > > > 1 file changed, 7 insertions(+), 2 deletions(-) > > > > > > diff --git a/drivers/usb/host/ehci-q.c b/drivers/usb/host/ehci-q.c > > > index 3276304..da7fd12 100644 > > > --- a/drivers/usb/host/ehci-q.c > > > +++ b/drivers/usb/host/ehci-q.c > > > @@ -206,8 +206,9 @@ static int qtd_copy_status ( > > > if (token & QTD_STS_BABBLE) { > > > /* FIXME "must" disable babbling > > device's port too */ > > > status = -EOVERFLOW; > > > - /* CERR nonzero + halt --> stall */ > > > - } else if (QTD_CERR(token)) { > > > + /* CERR nonzero and less than > > EHCI_TUNE_CERR + halt --> stall. > > > + This handles situation where stall comes after > > an error. */ > > > > This comment doesn't make sense. Who cares whether a stall comes > > after an error or not? It's still a stall and should be reported. > > This was basically a comment trying to answer to this commit: > ba516de332c0 USB: EHCI: check for STALL before other errors > > "The existing code doesn't do this properly, because it tests for MMF > (Missed MicroFrame) and DBE (Data Buffer Error) before testing the > retry counter. Thus, if a transaction gets either MMF or DBE the > corresponding flag is set and the transaction is retried. If the > second attempt receives a STALL then -EPIPE is the correct return > value. But the existing code would see the MMF or DBE flag instead > and return -EPROTO, -ENOSR, or -ECOMM." > > The comment tries to explain that it will not revert the fix made in the > commit ba516de332c0. > > > > > > > + } else if (QTD_CERR(token) && > > QTD_CERR(token) < EHCI_TUNE_CERR) { > > > status = -EPIPE; > > > > If an error occurs and the transaction is retried and the retry gets a > stall, then > > the final status should be -EPIPE, not something else. > > This is how the RFC also works. If the transaction has been retried and gets > stall then -EPIPE is returned. > Or if there are no errors but there is a stall then -EPIPE is returned. > > The only difference in this patch in comparison to the existing > implementation is that if there is an error but the transaction has not been > retried it is not interpret as a stall even if there is a halt. > > > > > > /* In theory, more than one of the following bits > > can be set @@ > > > -228,6 +229,10 @@ static int qtd_copy_status ( > > > > > usb_pipeendpoint(urb->pipe), > > > usb_pipein(urb- > > >pipe) ? "in" : "out"); > > > status = -EPROTO; > > > + /* CERR equals EHCI_TUNE_CERR, no other > > errors + halt --> stall. > > > + This handles situation where stall comes > > without error bits set. > > > +*/ > > > > If CERR is equal to EHCI_TUNE_CERR then no other errors could have > > occurred (since any error will decrement CERR). So why shouldn't this > case > > be included with the earlier case? > > That is what I also understood from the EHCI specification. If there is an error > the CERR should decrease. Only babble, data buffer error and stall (or no > error) will not decrement the CERR. > However in this use case there is an error (MMF) but the CERR still equals to > the EHCI_TUNE_CERR. > > So that's why the RFC separates these. This is the logic in the RFC: > > 1. The first if handles the situation where the stall comes after there has > been an error AND a retry. CERR has been decreased. This is so that > ba516de332c0 is not reverted. > 2. The second if handles the situation where the halt has been caused by the > stall AND there are no other errors. > 3. If there are errors + halt, but no retries executed (CERR equals > EHCI_TUNE_CERR) the response here is to return error value according to > the error bit, not returning EPIPE according to the stall. I am using CERR here confusingly. It is not a retry counter, instead it is an error counter. > > > > > > + } else if (QTD_CERR(token)) { > > > + status = -EPIPE; > > > } else { /* unknown */ > > > status = -EPROTO; > > > } > > > > Alan Stern > > Erkka Talvitie > Erkka Talvitie ^ permalink raw reply [flat|nested] 16+ messages in thread
* RE: [RFCv1 1/1] USB: EHCI: Do not return -EPIPE when hub is disconnected 2019-12-03 10:54 ` Erkka Talvitie @ 2019-12-03 13:35 ` Erkka Talvitie 0 siblings, 0 replies; 16+ messages in thread From: Erkka Talvitie @ 2019-12-03 13:35 UTC (permalink / raw) To: 'Alan Stern'; +Cc: gregkh, linux-usb, claus.baumgartner > -----Original Message----- > From: Erkka Talvitie <erkka.talvitie@vincit.fi> > Sent: tiistai 3. joulukuuta 2019 12.54 > To: 'Alan Stern' <stern@rowland.harvard.edu> > Cc: gregkh@linuxfoundation.org; linux-usb@vger.kernel.org; > claus.baumgartner@med.ge.com > Subject: RE: [RFCv1 1/1] USB: EHCI: Do not return -EPIPE when hub is > disconnected > > > -----Original Message----- > > From: Erkka Talvitie <erkka.talvitie@vincit.fi> > > Sent: tiistai 3. joulukuuta 2019 11.39 > > To: 'Alan Stern' <stern@rowland.harvard.edu> > > Cc: gregkh@linuxfoundation.org; linux-usb@vger.kernel.org; > > claus.baumgartner@med.ge.com > > Subject: RE: [RFCv1 1/1] USB: EHCI: Do not return -EPIPE when hub is > > disconnected > > > > Thank you for the comments. > > > > > -----Original Message----- > > > From: Alan Stern <stern@rowland.harvard.edu> > > > Sent: maanantai 2. joulukuuta 2019 21.44 > > > To: Erkka Talvitie <erkka.talvitie@vincit.fi> > > > Cc: gregkh@linuxfoundation.org; linux-usb@vger.kernel.org; > > > claus.baumgartner@med.ge.com > > > Subject: Re: [RFCv1 1/1] USB: EHCI: Do not return -EPIPE when hub is > > > disconnected > > > > > > On Fri, 29 Nov 2019, Erkka Talvitie wrote: > > > > > > > When disconnecting a USB hub that has some child device(s) > > > > connected to it (such as a USB mouse), then the stack tries to > > > > clear halt and reset device(s) which are _already_ physically > disconnected. > > > > > > That behavior is understandable. The kernel doesn't know that the > > > device has been disconnected until it can process the notification > > > from an > > upstream > > > hub, and it can't process that notification while it's trying to > > > reset the > > device. > > > > > > > Ok. I was thinking that in this use case , it should not be trying to > clear the halt > > (and reset the device when the clear halt fails). And this behavior > > was > altered > > by this RFC. > > > > > > The issue has been reproduced with: > > > > > > > > CPU: IMX6D5EYM10AD or MCIMX6D5EYM10AE. > > > > SW: U-Boot 2019.07 and kernel 4.19.40. > > > > > > > > In this situation there will be error bit for MMF active yet the > > > > CERR equals EHCI_TUNE_CERR + halt. > > > > > > Why? In general, setting the MMF bit does not cause the halt bit to > > > be > > set > > > (set Table 4-13 in the EHCI spec). In fact, MMF refers to errors > > > that > > occur on > > > the host, not bus errors caused by a disconnected device. > > > > I do not know for sure why that happens. I was suspecting that there > > has been MMF error and a stall at the same time. And in this RFC it > > was > assumed > > that MMF is with greater priority than stall. > > The disconnecting of a hub with attached devices cause the MMF error > > bit set even though it is a host side error. > > > > > > > > > Existing implementation > > > > interprets this as a stall [1] (chapter 8.4.5). > > > > > > That is the correct thing to do. When a transaction error occurs > > > during a Complete-Split transaction, the host controller is supposed > > > to decrement > > the > > > CERR value, set the XACT bit, and retry the transaction unless the > > > CERR > > value > > > is 0 or there isn't enough time left in the microframe. > > > > > > The fact that you saw CERR equal to EHCI_TUNE_CERR and XACT clear > > > probably means that your EHCI hardware is not behaving properly. > > > > If you refer to the XactErr bit (Table 4-13 [2] )with the "XACT clear" > then > > unfortunately I did not check it's state ,so I am not sure if it was > clear. > > In this patch, like also in the existing implementation, the MMF bit > > is > checked > > first and since it is active in this situation the XactErr is not checked. > > > > I will check this. > > > > But as in this use case the CERR has not been decreased yet there is > > error > bit > > active (MMF) do you see it is still correct to interpret it as a stall > (even when > > the halt bit is set)? > > > > I have tried to find out more details about our EHCI controller > > version, > but I > > have only found out those CPU versions. It might help in a search > > whether this could be a HW issue. > > > > > > > > > Fix for the issue is at first to check for a stall that comes > > > > after an error (the CERR has been decreased). > > > > > > > > Then after that, check for other errors. > > > > > > > > And at last check for stall without other errors (the CERR equals > > > > EHCI_TUNE_CERR as stall does not decrease the CERR [2] (table 3-16)). > > > > > > > > What happens after the fix is that when disconnecting a hub with > > > > attached device(s) the situation is not interpret as a stall. > > > > > > > > The specification [2] is not clear about error priorities, but > > > > since there is no explicit error bit for the stall, it is assumed > > > > to be lower priority than other errors. > > > > > > On the contrary, the specification is very clear. Since transaction > > errors cause > > > CERR to be decremented until it reaches 0, a nonzero value for CERR > > > means the endpoint was halted for some other reason. And the only > > > other reason is a stall. (Or end of the microframe, but there's no > > > way to tell if that > > > happened.) > > > > I see your point. EHCI specification states that babble is a serious > > error > and it > > will also cause the halt. The babble error bit is checked first. But > > the specification does not say about order of the other errors or > > about what > to > > do if there is an error, no retries executed, yet a halt (stall). For > example > > should the XactErr be checked before the MMF. > > > > >(Or end of the microframe, but there's no way to tell if that > > >happened.) > > > > I was not able to locate this from the specification. Could you please > point > > out where this statement is from? > > Could the way to tell if "end of microframe" happened, be what is done > here > > - check for MMF error bit and if CERR has not been decreased? > > > > > > > > > [1] https://www.usb.org/document-library/usb-20-specification, > > > > usb_20.pdf [2] > > > > > > > > > > https://www.intel.com/content/dam/www/public/us/en/documents/techn > > > ical > > > > -specifications/ehci-specification-for-usb.pdf > > > > > > > > Signed-off-by: Erkka Talvitie <erkka.talvitie@vincit.fi> > > > > > > Can you duplicate this behavior on a standard PC, say with an Intel > > > EHCI controller? > > > > We tested with native Linux PC and the error did not reproduce. > > However I am not sure about the used host controller in that PC. > > I will check that or try to get a setup with Intel EHCI. > > The PC where the issue did not reproduce was ThinkPad T480 with: > > 3c:00.0 USB controller: Intel Corporation Device 15c1 (rev 01) > > [ 4.229310] xhci_hcd 0000:3c:00.0: xHCI Host Controller > [ 4.238578] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver > [ 4.239754] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver > [ 4.240857] ehci-pci: EHCI PCI platform driver > [ 4.241437] ohci-pci: OHCI PCI platform driver > [ 4.243080] uhci_hcd: USB Universal Host Controller Interface driver > Another test, HP Proliant Microserver Gen8 Linux version 4.2.3-300.fc23.x86_64 ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver ehci-pci: EHCI PCI platform driver With this setup the behavior reproduces. kernel: usb 3-1.2: clear tt 1 (0080) error -71 kernel: usb 3-1.2-port1: cannot reset (err = -71) kernel: usb 3-1.2-port1: cannot reset (err = -71) kernel: usb 3-1.2-port1: cannot reset (err = -71) kernel: usb 3-1.2-port1: cannot reset (err = -71) kernel: usb 3-1.2-port1: cannot reset (err = -71) kernel: usb 3-1.2-port1: Cannot enable. Maybe the USB cable is bad? kernel: usb 3-1.2-port1: cannot disable (err = -71) kernel: usb 3-1.2-port1: cannot reset (err = -71) kernel: usb 3-1.2-port1: cannot reset (err = -71) kernel: usb 3-1.2-port1: cannot reset (err = -71) kernel: usb 3-1.2-port1: cannot reset (err = -71) kernel: usb 3-1.2-port1: cannot reset (err = -71) kernel: usb 3-1.2-port1: Cannot enable. Maybe the USB cable is bad? kernel: usb 3-1.2-port1: cannot disable (err = -71) kernel: usb 3-1.2-port1: cannot reset (err = -71) kernel: usb 3-1.2-port1: cannot reset (err = -71) kernel: usb 3-1.2-port1: cannot reset (err = -71) kernel: usb 3-1.2-port1: cannot reset (err = -71) kernel: usb 3-1.2-port1: cannot reset (err = -71) kernel: usb 3-1.2-port1: Cannot enable. Maybe the USB cable is bad? kernel: usb 3-1.2-port1: cannot disable (err = -71) kernel: usb 3-1.2-port1: cannot reset (err = -71) kernel: usb 3-1.2-port1: cannot reset (err = -71) kernel: usb 3-1.2-port1: cannot reset (err = -71) kernel: usb 3-1.2-port1: cannot reset (err = -71) kernel: usb 3-1.2-port1: cannot reset (err = -71) kernel: usb 3-1.2-port1: Cannot enable. Maybe the USB cable is bad? kernel: usb 3-1.2-port1: cannot disable (err = -71) kernel: usb 3-1.2-port1: cannot disable (err = -71) kernel: hub 3-1.2:1.0: hub_port_status failed (err = -71) > > > > > > > > > drivers/usb/host/ehci-q.c | 9 +++++++-- > > > > 1 file changed, 7 insertions(+), 2 deletions(-) > > > > > > > > diff --git a/drivers/usb/host/ehci-q.c b/drivers/usb/host/ehci-q.c > > > > index 3276304..da7fd12 100644 > > > > --- a/drivers/usb/host/ehci-q.c > > > > +++ b/drivers/usb/host/ehci-q.c > > > > @@ -206,8 +206,9 @@ static int qtd_copy_status ( > > > > if (token & QTD_STS_BABBLE) { > > > > /* FIXME "must" disable babbling > > > device's port too */ > > > > status = -EOVERFLOW; > > > > - /* CERR nonzero + halt --> stall */ > > > > - } else if (QTD_CERR(token)) { > > > > + /* CERR nonzero and less than > > > EHCI_TUNE_CERR + halt --> stall. > > > > + This handles situation where stall comes after > > > an error. */ > > > > > > This comment doesn't make sense. Who cares whether a stall comes > > > after an error or not? It's still a stall and should be reported. > > > > This was basically a comment trying to answer to this commit: > > ba516de332c0 USB: EHCI: check for STALL before other errors > > > > "The existing code doesn't do this properly, because it tests for MMF > > (Missed MicroFrame) and DBE (Data Buffer Error) before testing the > > retry counter. Thus, if a transaction gets either MMF or DBE the > > corresponding flag is set and the transaction is retried. If the > > second attempt receives a STALL then -EPIPE is the correct return > > value. But the existing code would see the MMF or DBE flag instead > > and return -EPROTO, -ENOSR, or -ECOMM." > > > > The comment tries to explain that it will not revert the fix made in > > the commit ba516de332c0. > > > > > > > > > > > + } else if (QTD_CERR(token) && > > > QTD_CERR(token) < EHCI_TUNE_CERR) { > > > > status = -EPIPE; > > > > > > If an error occurs and the transaction is retried and the retry gets > > > a > > stall, then > > > the final status should be -EPIPE, not something else. > > > > This is how the RFC also works. If the transaction has been retried > > and > gets > > stall then -EPIPE is returned. > > Or if there are no errors but there is a stall then -EPIPE is returned. > > > > The only difference in this patch in comparison to the existing > > implementation is that if there is an error but the transaction has > > not > been > > retried it is not interpret as a stall even if there is a halt. > > > > > > > > > /* In theory, more than one of the following bits > > > can be set @@ > > > > -228,6 +229,10 @@ static int qtd_copy_status ( > > > > > > > usb_pipeendpoint(urb->pipe), > > > > usb_pipein(urb- > > > >pipe) ? "in" : "out"); > > > > status = -EPROTO; > > > > + /* CERR equals EHCI_TUNE_CERR, no other > > > errors + halt --> stall. > > > > + This handles situation where stall comes > > > without error bits set. > > > > +*/ > > > > > > If CERR is equal to EHCI_TUNE_CERR then no other errors could have > > > occurred (since any error will decrement CERR). So why shouldn't > > > this > > case > > > be included with the earlier case? > > > > That is what I also understood from the EHCI specification. If there > > is an > error > > the CERR should decrease. Only babble, data buffer error and stall (or > > no > > error) will not decrement the CERR. > > However in this use case there is an error (MMF) but the CERR still > > equals > to > > the EHCI_TUNE_CERR. > > > > So that's why the RFC separates these. This is the logic in the RFC: > > > > 1. The first if handles the situation where the stall comes after > > there > has > > been an error AND a retry. CERR has been decreased. This is so that > > ba516de332c0 is not reverted. > > 2. The second if handles the situation where the halt has been caused > > by > the > > stall AND there are no other errors. > > 3. If there are errors + halt, but no retries executed (CERR equals > > EHCI_TUNE_CERR) the response here is to return error value according > > to the error bit, not returning EPIPE according to the stall. > > I am using CERR here confusingly. It is not a retry counter, instead it is an > error counter. > > > > > > > > > > + } else if (QTD_CERR(token)) { > > > > + status = -EPIPE; > > > > } else { /* unknown */ > > > > status = -EPROTO; > > > > } > > > > > > Alan Stern > > > > Erkka Talvitie > > > > Erkka Talvitie > Erkka Talvitie ^ permalink raw reply [flat|nested] 16+ messages in thread
* RE: [RFCv1 1/1] USB: EHCI: Do not return -EPIPE when hub is disconnected 2019-12-03 9:38 ` Erkka Talvitie 2019-12-03 10:54 ` Erkka Talvitie @ 2019-12-03 19:01 ` Alan Stern 2019-12-04 8:55 ` Erkka Talvitie 1 sibling, 1 reply; 16+ messages in thread From: Alan Stern @ 2019-12-03 19:01 UTC (permalink / raw) To: Erkka Talvitie; +Cc: gregkh, linux-usb, claus.baumgartner On Tue, 3 Dec 2019, Erkka Talvitie wrote: > Thank you for the comments. > > > -----Original Message----- > > From: Alan Stern <stern@rowland.harvard.edu> > > Sent: maanantai 2. joulukuuta 2019 21.44 > > To: Erkka Talvitie <erkka.talvitie@vincit.fi> > > Cc: gregkh@linuxfoundation.org; linux-usb@vger.kernel.org; > > claus.baumgartner@med.ge.com > > Subject: Re: [RFCv1 1/1] USB: EHCI: Do not return -EPIPE when hub is > > disconnected > > > > On Fri, 29 Nov 2019, Erkka Talvitie wrote: > > > > > When disconnecting a USB hub that has some child device(s) connected > > > to it (such as a USB mouse), then the stack tries to clear halt and > > > reset device(s) which are _already_ physically disconnected. > > > > That behavior is understandable. The kernel doesn't know that the device > > has been disconnected until it can process the notification from an > upstream > > hub, and it can't process that notification while it's trying to reset the > device. > > > > Ok. I was thinking that in this use case , it should not be trying to clear > the halt (and reset the device when the clear halt fails). And this behavior > was altered by this RFC. Actually, the situation is a little different from what I described above. When you unplug the high-speed hub, the kernel doesn't know the hub has been disconnected until it receives a notification from the upstream hub. The kernel checks for those notifications at roughly 250-ms intervals, so it can take up to that long before the kernel realizes the high-speed hub is gone. Until that time, the kernel will keep trying to reset and communicate with the hub and the devices that were attached to it. You can see this in the logs that you posted in your original report. In each case, the "cannot reset" and -71 errors lasted for less than 250 ms. I just tried doing the same experiment on my PC (which does use all Intel hardware and an EHCI controller). Here's the output from when I unplugged the high-speed hub: [ 6321.245528] usb 1-1.4: clear tt 4 (00a0) error -71 [ 6321.250903] usb 1-1.4-port4: cannot reset (err = -71) [ 6321.255155] usb 1-1.4-port4: cannot reset (err = -71) [ 6321.259403] usb 1-1.4-port4: cannot reset (err = -71) [ 6321.263657] usb 1-1.4-port4: cannot reset (err = -71) [ 6321.267905] usb 1-1.4-port4: cannot reset (err = -71) [ 6321.267910] usb 1-1.4-port4: Cannot enable. Maybe the USB cable is bad? [ 6321.272155] usb 1-1.4-port4: cannot disable (err = -71) [ 6321.276405] usb 1-1.4-port4: cannot reset (err = -71) [ 6321.280653] usb 1-1.4-port4: cannot reset (err = -71) [ 6321.284905] usb 1-1.4-port4: cannot reset (err = -71) [ 6321.289155] usb 1-1.4-port4: cannot reset (err = -71) [ 6321.293403] usb 1-1.4-port4: cannot reset (err = -71) [ 6321.293407] usb 1-1.4-port4: Cannot enable. Maybe the USB cable is bad? [ 6321.297656] usb 1-1.4-port4: cannot disable (err = -71) [ 6321.301904] usb 1-1.4-port4: cannot reset (err = -71) [ 6321.306152] usb 1-1.4-port4: cannot reset (err = -71) [ 6321.310402] usb 1-1.4-port4: cannot reset (err = -71) [ 6321.314653] usb 1-1.4-port4: cannot reset (err = -71) [ 6321.318904] usb 1-1.4-port4: cannot reset (err = -71) [ 6321.318908] usb 1-1.4-port4: Cannot enable. Maybe the USB cable is bad? [ 6321.323154] usb 1-1.4-port4: cannot disable (err = -71) [ 6321.327404] usb 1-1.4-port4: cannot reset (err = -71) [ 6321.331651] usb 1-1.4-port4: cannot reset (err = -71) [ 6321.335902] usb 1-1.4-port4: cannot reset (err = -71) [ 6321.340155] usb 1-1.4-port4: cannot reset (err = -71) [ 6321.344402] usb 1-1.4-port4: cannot reset (err = -71) [ 6321.344406] usb 1-1.4-port4: Cannot enable. Maybe the USB cable is bad? [ 6321.348652] usb 1-1.4-port4: cannot disable (err = -71) [ 6321.352904] usb 1-1.4-port4: cannot disable (err = -71) [ 6321.357154] hub 1-1.4:1.0: hub_ext_port_status failed (err = -71) [ 6321.437000] usb 1-1.4: USB disconnect, device number 9 [ 6321.437010] usb 1-1.4.4: USB disconnect, device number 10 As you can see, the time interval runs from 6321.245 to 6321.437, roughly 192 ms < 250 ms. This is the expected behavior. I did not try to check whether the MMF bit got set or what value CERR had. > But as in this use case the CERR has not been decreased yet there is error > bit active (MMF) do you see it is still correct to interpret it as a stall > (even when the halt bit is set)? See below. > I have tried to find out more details about our EHCI controller version, but > I have only found out those CPU versions. It might help in a search whether > this could be a HW issue. > > > The specification [2] is not clear about error priorities, but since > > > there is no explicit error bit for the stall, it is assumed to be > > > lower priority than other errors. > > > > On the contrary, the specification is very clear. Since transaction > errors cause > > CERR to be decremented until it reaches 0, a nonzero value for CERR means > > the endpoint was halted for some other reason. And the only other reason > > is a stall. (Or end of the microframe, but there's no way to tell if that > > happened.) > > I see your point. EHCI specification states that babble is a serious error > and it will also cause the halt. The babble error bit is checked first. But > the specification does not say about order of the other errors or about what > to do if there is an error, no retries executed, yet a halt (stall). For > example should the XactErr be checked before the MMF. I think the order doesn't matter. In fact, it's possible that both errors occurred, since the transaction gets retried multiple times. > >(Or end of the microframe, but there's no way to tell if that happened.) > > I was not able to locate this from the specification. Could you please point > out where this statement is from? "Enhanced Host Controller Interface Specification for Universal Serial Bus", rev 1.0 (2002), p. 110: Transaction Error (XactErr). Timeout, data CRC failure, etc. The CErr field is decremented and the XactErr bit in the Status field is set to a one. The complete split transaction is immediately retried (if Cerr is non-zero). If there is not enough time in the micro-frame to complete the retry and the endpoint is an IN, or CErr is decremented to a zero from a one, the queue is halted. If there is not enough time in the micro-frame to complete the retry and the endpoint is an OUT and CErr is not zero, then this state is exited (i.e. return to Do Start Split). This results in a retry of the entire OUT split transaction, at the next poll period. Refer to Chapter 11 Hubs (specifically the section full- and low-speed Interrupts) in the USB Specification Revision 2.0 for detailed requirements on why these errors must be immediately retried. • > Could the way to tell if "end of microframe" happened, be what is done here > - check for MMF error bit and if CERR has not been decreased? No, because the "end of microframe" situation happens when the host controller is handling a transaction error, whereas MMF gets set when the host controller detects an error on the host. > > > diff --git a/drivers/usb/host/ehci-q.c b/drivers/usb/host/ehci-q.c > > > index 3276304..da7fd12 100644 > > > --- a/drivers/usb/host/ehci-q.c > > > +++ b/drivers/usb/host/ehci-q.c > > > @@ -206,8 +206,9 @@ static int qtd_copy_status ( > > > if (token & QTD_STS_BABBLE) { > > > /* FIXME "must" disable babbling > > device's port too */ > > > status = -EOVERFLOW; > > > - /* CERR nonzero + halt --> stall */ > > > - } else if (QTD_CERR(token)) { > > > + /* CERR nonzero and less than > > EHCI_TUNE_CERR + halt --> stall. > > > + This handles situation where stall comes after > > an error. */ > > > > This comment doesn't make sense. Who cares whether a stall comes after > > an error or not? It's still a stall and should be reported. > > This was basically a comment trying to answer to this commit: > ba516de332c0 USB: EHCI: check for STALL before other errors > > "The existing code doesn't do this properly, because it tests for MMF > (Missed MicroFrame) and DBE (Data Buffer Error) before testing the > retry counter. Thus, if a transaction gets either MMF or DBE the > corresponding flag is set and the transaction is retried. If the > second attempt receives a STALL then -EPIPE is the correct return > value. But the existing code would see the MMF or DBE flag instead > and return -EPROTO, -ENOSR, or -ECOMM." > > The comment tries to explain that it will not revert the fix made in the > commit ba516de332c0. Okay, I get it. You're trying to rely on the strange behavior of the MMF bit. I'm not sure this is a good idea. Suppose MMF gets set for some other reason (a genuine error on the host) and then the transaction gets a STALL on the next retry. Since host errors don't decrement CERR, your patch would cause the driver to return -EPROTO instead of -EPIPE. > > > + } else if (QTD_CERR(token) && > > QTD_CERR(token) < EHCI_TUNE_CERR) { > > > status = -EPIPE; > > > > If an error occurs and the transaction is retried and the retry gets a > stall, then > > the final status should be -EPIPE, not something else. > > This is how the RFC also works. If the transaction has been retried and gets > stall then -EPIPE is returned. > Or if there are no errors but there is a stall then -EPIPE is returned. > > The only difference in this patch in comparison to the existing > implementation is that if there is an error but the > transaction has not been retried it is not interpret as a stall even if > there is a halt. Sometimes that will be the right behavior and other times it won't. However, it looks like there may be a way to tell which situation we are in. Setting the MMF bit will cause the queue to halt immediately if the transaction is IN, but not if it is OUT (see Table 4-13 in the EHCI spec). So if CERR == EHCI_TUNE_CERR and the QTD_PID != 1 (not IN) then we should return -EPIPE, as the existing code does. But if QTD_PID == 1 then the code should continue, as your patch does -- with one difference: Put the additional check for EHCI_TUNE_CERR between the tests for DBE and XACT instead of after XACT (because XACT would decrement CERR whereas DBE wouldn't). Can you make that change and test it? Alan Stern ^ permalink raw reply [flat|nested] 16+ messages in thread
* RE: [RFCv1 1/1] USB: EHCI: Do not return -EPIPE when hub is disconnected 2019-12-03 19:01 ` Alan Stern @ 2019-12-04 8:55 ` Erkka Talvitie 2019-12-04 13:18 ` Erkka Talvitie 0 siblings, 1 reply; 16+ messages in thread From: Erkka Talvitie @ 2019-12-04 8:55 UTC (permalink / raw) To: 'Alan Stern'; +Cc: gregkh, linux-usb, claus.baumgartner Thank you for the comments. > -----Original Message----- > From: Alan Stern <stern@rowland.harvard.edu> > Sent: tiistai 3. joulukuuta 2019 21.01 > To: Erkka Talvitie <erkka.talvitie@vincit.fi> > Cc: gregkh@linuxfoundation.org; linux-usb@vger.kernel.org; > claus.baumgartner@med.ge.com > Subject: RE: [RFCv1 1/1] USB: EHCI: Do not return -EPIPE when hub is > disconnected > > On Tue, 3 Dec 2019, Erkka Talvitie wrote: > > > Thank you for the comments. > > > > > -----Original Message----- > > > From: Alan Stern <stern@rowland.harvard.edu> > > > Sent: maanantai 2. joulukuuta 2019 21.44 > > > To: Erkka Talvitie <erkka.talvitie@vincit.fi> > > > Cc: gregkh@linuxfoundation.org; linux-usb@vger.kernel.org; > > > claus.baumgartner@med.ge.com > > > Subject: Re: [RFCv1 1/1] USB: EHCI: Do not return -EPIPE when hub is > > > disconnected > > > > > > On Fri, 29 Nov 2019, Erkka Talvitie wrote: > > > > > > > When disconnecting a USB hub that has some child device(s) > > > > connected to it (such as a USB mouse), then the stack tries to > > > > clear halt and reset device(s) which are _already_ physically > disconnected. > > > > > > That behavior is understandable. The kernel doesn't know that the > > > device has been disconnected until it can process the notification > > > from an > > upstream > > > hub, and it can't process that notification while it's trying to > > > reset the > > device. > > > > > > > Ok. I was thinking that in this use case , it should not be trying to > > clear the halt (and reset the device when the clear halt fails). And > > this behavior was altered by this RFC. > > Actually, the situation is a little different from what I described above. When > you unplug the high-speed hub, the kernel doesn't know the hub has been > disconnected until it receives a notification from the upstream hub. The > kernel checks for those notifications at roughly 250-ms intervals, so it can > take up to that long before the kernel realizes the high-speed hub is gone. > Until that time, the kernel will keep trying to reset and communicate with the > hub and the devices that were attached to it. > > You can see this in the logs that you posted in your original report. > In each case, the "cannot reset" and -71 errors lasted for less than > 250 ms. > > I just tried doing the same experiment on my PC (which does use all Intel > hardware and an EHCI controller). Here's the output from when I unplugged > the high-speed hub: > > [ 6321.245528] usb 1-1.4: clear tt 4 (00a0) error -71 [ 6321.250903] usb 1-1.4- > port4: cannot reset (err = -71) [ 6321.255155] usb 1-1.4-port4: cannot reset > (err = -71) [ 6321.259403] usb 1-1.4-port4: cannot reset (err = -71) [ > 6321.263657] usb 1-1.4-port4: cannot reset (err = -71) [ 6321.267905] usb 1- > 1.4-port4: cannot reset (err = -71) [ 6321.267910] usb 1-1.4-port4: Cannot > enable. Maybe the USB cable is bad? > [ 6321.272155] usb 1-1.4-port4: cannot disable (err = -71) [ 6321.276405] usb > 1-1.4-port4: cannot reset (err = -71) [ 6321.280653] usb 1-1.4-port4: cannot > reset (err = -71) [ 6321.284905] usb 1-1.4-port4: cannot reset (err = -71) [ > 6321.289155] usb 1-1.4-port4: cannot reset (err = -71) [ 6321.293403] usb 1- > 1.4-port4: cannot reset (err = -71) [ 6321.293407] usb 1-1.4-port4: Cannot > enable. Maybe the USB cable is bad? > [ 6321.297656] usb 1-1.4-port4: cannot disable (err = -71) [ 6321.301904] usb > 1-1.4-port4: cannot reset (err = -71) [ 6321.306152] usb 1-1.4-port4: cannot > reset (err = -71) [ 6321.310402] usb 1-1.4-port4: cannot reset (err = -71) [ > 6321.314653] usb 1-1.4-port4: cannot reset (err = -71) [ 6321.318904] usb 1- > 1.4-port4: cannot reset (err = -71) [ 6321.318908] usb 1-1.4-port4: Cannot > enable. Maybe the USB cable is bad? > [ 6321.323154] usb 1-1.4-port4: cannot disable (err = -71) [ 6321.327404] usb > 1-1.4-port4: cannot reset (err = -71) [ 6321.331651] usb 1-1.4-port4: cannot > reset (err = -71) [ 6321.335902] usb 1-1.4-port4: cannot reset (err = -71) [ > 6321.340155] usb 1-1.4-port4: cannot reset (err = -71) [ 6321.344402] usb 1- > 1.4-port4: cannot reset (err = -71) [ 6321.344406] usb 1-1.4-port4: Cannot > enable. Maybe the USB cable is bad? > [ 6321.348652] usb 1-1.4-port4: cannot disable (err = -71) [ 6321.352904] usb > 1-1.4-port4: cannot disable (err = -71) [ 6321.357154] hub 1-1.4:1.0: > hub_ext_port_status failed (err = -71) [ 6321.437000] usb 1-1.4: USB > disconnect, device number 9 [ 6321.437010] usb 1-1.4.4: USB disconnect, > device number 10 > > As you can see, the time interval runs from 6321.245 to 6321.437, roughly 192 > ms < 250 ms. This is the expected behavior. > > I did not try to check whether the MMF bit got set or what value CERR had. > Ok, makes sense. Thank you for testing this with your setup. > > > But as in this use case the CERR has not been decreased yet there is error > > bit active (MMF) do you see it is still correct to interpret it as a stall > > (even when the halt bit is set)? > > See below. > > > I have tried to find out more details about our EHCI controller version, but > > I have only found out those CPU versions. It might help in a search whether > > this could be a HW issue. > > > > > > The specification [2] is not clear about error priorities, but since > > > > there is no explicit error bit for the stall, it is assumed to be > > > > lower priority than other errors. > > > > > > On the contrary, the specification is very clear. Since transaction > > errors cause > > > CERR to be decremented until it reaches 0, a nonzero value for CERR > means > > > the endpoint was halted for some other reason. And the only other > reason > > > is a stall. (Or end of the microframe, but there's no way to tell if that > > > happened.) > > > > I see your point. EHCI specification states that babble is a serious error > > and it will also cause the halt. The babble error bit is checked first. But > > the specification does not say about order of the other errors or about > what > > to do if there is an error, no retries executed, yet a halt (stall). For > > example should the XactErr be checked before the MMF. > > I think the order doesn't matter. In fact, it's possible that both > errors occurred, since the transaction gets retried multiple times. True as the error bits are "sticky". For example XactErr in EHCI specification [2] table 3-16: " If the host controller sets this bit to a one, then it remains a one for the duration of the transfer." > > > >(Or end of the microframe, but there's no way to tell if that happened.) > > > > I was not able to locate this from the specification. Could you please point > > out where this statement is from? > > "Enhanced Host Controller Interface Specification for Universal Serial > Bus", rev 1.0 (2002), p. 110: > > Transaction Error (XactErr). Timeout, data CRC failure, etc. The CErr > field is decremented and the XactErr bit in the Status field is set to > a one. The complete split transaction is immediately retried (if Cerr > is non-zero). If there is not enough time in the micro-frame to > complete the retry and the endpoint is an IN, or CErr is decremented to > a zero from a one, the queue is halted. If there is not enough time in > the micro-frame to complete the retry and the endpoint is an OUT and > CErr is not zero, then this state is exited (i.e. return to Do Start > Split). This results in a retry of the entire OUT split transaction, at > the next poll period. Refer to Chapter 11 Hubs (specifically the > section full- and low-speed Interrupts) in the USB Specification > Revision 2.0 for detailed requirements on why these errors must be > immediately retried. • > There it is, thank you. > > Could the way to tell if "end of microframe" happened, be what is done > here > > - check for MMF error bit and if CERR has not been decreased? > > No, because the "end of microframe" situation happens when the host > controller is handling a transaction error, whereas MMF gets set when > the host controller detects an error on the host. Ok, I see. > > > > > diff --git a/drivers/usb/host/ehci-q.c b/drivers/usb/host/ehci-q.c > > > > index 3276304..da7fd12 100644 > > > > --- a/drivers/usb/host/ehci-q.c > > > > +++ b/drivers/usb/host/ehci-q.c > > > > @@ -206,8 +206,9 @@ static int qtd_copy_status ( > > > > if (token & QTD_STS_BABBLE) { > > > > /* FIXME "must" disable babbling > > > device's port too */ > > > > status = -EOVERFLOW; > > > > - /* CERR nonzero + halt --> stall */ > > > > - } else if (QTD_CERR(token)) { > > > > + /* CERR nonzero and less than > > > EHCI_TUNE_CERR + halt --> stall. > > > > + This handles situation where stall comes after > > > an error. */ > > > > > > This comment doesn't make sense. Who cares whether a stall comes > after > > > an error or not? It's still a stall and should be reported. > > > > This was basically a comment trying to answer to this commit: > > ba516de332c0 USB: EHCI: check for STALL before other errors > > > > "The existing code doesn't do this properly, because it tests for MMF > > (Missed MicroFrame) and DBE (Data Buffer Error) before testing the > > retry counter. Thus, if a transaction gets either MMF or DBE the > > corresponding flag is set and the transaction is retried. If the > > second attempt receives a STALL then -EPIPE is the correct return > > value. But the existing code would see the MMF or DBE flag instead > > and return -EPROTO, -ENOSR, or -ECOMM." > > > > The comment tries to explain that it will not revert the fix made in the > > commit ba516de332c0. > > Okay, I get it. You're trying to rely on the strange behavior of the > MMF bit. > > I'm not sure this is a good idea. Suppose MMF gets set for some other > reason (a genuine error on the host) and then the transaction gets a > STALL on the next retry. Since host errors don't decrement CERR, your > patch would cause the driver to return -EPROTO instead of -EPIPE. Hmm. I originally understood from the documentation that the MMF does decrement the CERR. I followed the EHCI specification table 3-16, the row about the CERR. There are listed the errors that do decrement the CERR and which do not decrement. Data buffer error, stall, babble and no error (of course) do not decrement. Transaction errors do decrement the CERR. As there was no mention about MMF I have been in the impression that it is included to the transaction errors of the table. Now I was able to find from the documentation you pointed out (in your next comment below) that your statement is correct. EHCI specification, table 4-13 contains Interrupt IN / OUT conditions which prove your point. > > > > > + } else if (QTD_CERR(token) && > > > QTD_CERR(token) < EHCI_TUNE_CERR) { > > > > status = -EPIPE; > > > > > > If an error occurs and the transaction is retried and the retry gets a > > stall, then > > > the final status should be -EPIPE, not something else. > > > > This is how the RFC also works. If the transaction has been retried and gets > > stall then -EPIPE is returned. > > Or if there are no errors but there is a stall then -EPIPE is returned. > > > > The only difference in this patch in comparison to the existing > > implementation is that if there is an error but the > > transaction has not been retried it is not interpret as a stall even if > > there is a halt. > > Sometimes that will be the right behavior and other times it won't. > However, it looks like there may be a way to tell which situation we > are in. Setting the MMF bit will cause the queue to halt immediately > if the transaction is IN, but not if it is OUT (see Table 4-13 in the > EHCI spec). Thank you, this Table 4-13 was a vital piece of documentation that I either missed or failed to read carefully enough. Now this could explain why the queue is halted with MMF set, even if it would not be a stall. Well done! > > So if CERR == EHCI_TUNE_CERR and the QTD_PID != 1 (not IN) then we > should return -EPIPE, as the existing code does. But if QTD_PID == 1 > then the code should continue, as your patch does -- with one > difference: Put the additional check for EHCI_TUNE_CERR between the > tests for DBE and XACT instead of after XACT (because XACT would > decrement CERR whereas DBE wouldn't). Good point, I agree. > > Can you make that change and test it? Sure, I have made the change and test it as soon as possible. > > Alan Stern Erkka Talvitie ^ permalink raw reply [flat|nested] 16+ messages in thread
* RE: [RFCv1 1/1] USB: EHCI: Do not return -EPIPE when hub is disconnected 2019-12-04 8:55 ` Erkka Talvitie @ 2019-12-04 13:18 ` Erkka Talvitie 2019-12-04 14:24 ` Alan Stern 0 siblings, 1 reply; 16+ messages in thread From: Erkka Talvitie @ 2019-12-04 13:18 UTC (permalink / raw) To: 'Alan Stern'; +Cc: gregkh, linux-usb, claus.baumgartner > -----Original Message----- > From: Erkka Talvitie <erkka.talvitie@vincit.fi> > Sent: keskiviikko 4. joulukuuta 2019 10.55 > To: 'Alan Stern' <stern@rowland.harvard.edu> > Cc: gregkh@linuxfoundation.org; linux-usb@vger.kernel.org; > claus.baumgartner@med.ge.com > Subject: RE: [RFCv1 1/1] USB: EHCI: Do not return -EPIPE when hub is > disconnected > > Thank you for the comments. > > > -----Original Message----- > > From: Alan Stern <stern@rowland.harvard.edu> > > Sent: tiistai 3. joulukuuta 2019 21.01 > > To: Erkka Talvitie <erkka.talvitie@vincit.fi> > > Cc: gregkh@linuxfoundation.org; linux-usb@vger.kernel.org; > > claus.baumgartner@med.ge.com > > Subject: RE: [RFCv1 1/1] USB: EHCI: Do not return -EPIPE when hub is > > disconnected > > > > On Tue, 3 Dec 2019, Erkka Talvitie wrote: > > > > > Thank you for the comments. > > > > > > > -----Original Message----- > > > > From: Alan Stern <stern@rowland.harvard.edu> > > > > Sent: maanantai 2. joulukuuta 2019 21.44 > > > > To: Erkka Talvitie <erkka.talvitie@vincit.fi> > > > > Cc: gregkh@linuxfoundation.org; linux-usb@vger.kernel.org; > > > > claus.baumgartner@med.ge.com > > > > Subject: Re: [RFCv1 1/1] USB: EHCI: Do not return -EPIPE when hub > > > > is disconnected > > > > > > > > On Fri, 29 Nov 2019, Erkka Talvitie wrote: > > > > > > > > > When disconnecting a USB hub that has some child device(s) > > > > > connected to it (such as a USB mouse), then the stack tries to > > > > > clear halt and reset device(s) which are _already_ physically > > disconnected. > > > > > > > > That behavior is understandable. The kernel doesn't know that the > > > > device has been disconnected until it can process the notification > > > > from an > > > upstream > > > > hub, and it can't process that notification while it's trying to > > > > reset the > > > device. > > > > > > > > > > Ok. I was thinking that in this use case , it should not be trying > > > to clear the halt (and reset the device when the clear halt fails). > > > And this behavior was altered by this RFC. > > > > Actually, the situation is a little different from what I described > > above. When you unplug the high-speed hub, the kernel doesn't know > > the hub has been disconnected until it receives a notification from > > the upstream hub. The kernel checks for those notifications at > > roughly 250-ms intervals, so it can take up to that long before the kernel > realizes the high-speed hub is gone. > > Until that time, the kernel will keep trying to reset and communicate > > with the hub and the devices that were attached to it. > > > > You can see this in the logs that you posted in your original report. > > In each case, the "cannot reset" and -71 errors lasted for less than > > 250 ms. > > > > I just tried doing the same experiment on my PC (which does use all > > Intel hardware and an EHCI controller). Here's the output from when I > > unplugged the high-speed hub: > > > > [ 6321.245528] usb 1-1.4: clear tt 4 (00a0) error -71 [ 6321.250903] > > usb 1-1.4- > > port4: cannot reset (err = -71) [ 6321.255155] usb 1-1.4-port4: cannot > > reset (err = -71) [ 6321.259403] usb 1-1.4-port4: cannot reset (err = > > -71) [ 6321.263657] usb 1-1.4-port4: cannot reset (err = -71) [ > > 6321.267905] usb 1- > > 1.4-port4: cannot reset (err = -71) [ 6321.267910] usb 1-1.4-port4: > > Cannot enable. Maybe the USB cable is bad? > > [ 6321.272155] usb 1-1.4-port4: cannot disable (err = -71) [ > > 6321.276405] usb > > 1-1.4-port4: cannot reset (err = -71) [ 6321.280653] usb 1-1.4-port4: > > cannot reset (err = -71) [ 6321.284905] usb 1-1.4-port4: cannot reset > > (err = -71) [ 6321.289155] usb 1-1.4-port4: cannot reset (err = -71) [ > > 6321.293403] usb 1- > > 1.4-port4: cannot reset (err = -71) [ 6321.293407] usb 1-1.4-port4: > > Cannot enable. Maybe the USB cable is bad? > > [ 6321.297656] usb 1-1.4-port4: cannot disable (err = -71) [ > > 6321.301904] usb > > 1-1.4-port4: cannot reset (err = -71) [ 6321.306152] usb 1-1.4-port4: > > cannot reset (err = -71) [ 6321.310402] usb 1-1.4-port4: cannot reset > > (err = -71) [ 6321.314653] usb 1-1.4-port4: cannot reset (err = -71) [ > > 6321.318904] usb 1- > > 1.4-port4: cannot reset (err = -71) [ 6321.318908] usb 1-1.4-port4: > > Cannot enable. Maybe the USB cable is bad? > > [ 6321.323154] usb 1-1.4-port4: cannot disable (err = -71) [ > > 6321.327404] usb > > 1-1.4-port4: cannot reset (err = -71) [ 6321.331651] usb 1-1.4-port4: > > cannot reset (err = -71) [ 6321.335902] usb 1-1.4-port4: cannot reset > > (err = -71) [ 6321.340155] usb 1-1.4-port4: cannot reset (err = -71) [ > > 6321.344402] usb 1- > > 1.4-port4: cannot reset (err = -71) [ 6321.344406] usb 1-1.4-port4: > > Cannot enable. Maybe the USB cable is bad? > > [ 6321.348652] usb 1-1.4-port4: cannot disable (err = -71) [ > > 6321.352904] usb > > 1-1.4-port4: cannot disable (err = -71) [ 6321.357154] hub 1-1.4:1.0: > > hub_ext_port_status failed (err = -71) [ 6321.437000] usb 1-1.4: USB > > disconnect, device number 9 [ 6321.437010] usb 1-1.4.4: USB > > disconnect, device number 10 > > > > As you can see, the time interval runs from 6321.245 to 6321.437, > > roughly 192 ms < 250 ms. This is the expected behavior. > > > > I did not try to check whether the MMF bit got set or what value CERR had. > > > > Ok, makes sense. Thank you for testing this with your setup. > > > > > > But as in this use case the CERR has not been decreased yet there is > > > error bit active (MMF) do you see it is still correct to interpret > > > it as a stall (even when the halt bit is set)? > > > > See below. > > > > > I have tried to find out more details about our EHCI controller > > > version, but I have only found out those CPU versions. It might help > > > in a search whether this could be a HW issue. > > > > > > > > > The specification [2] is not clear about error priorities, but > > > > > since there is no explicit error bit for the stall, it is > > > > > assumed to be lower priority than other errors. > > > > > > > > On the contrary, the specification is very clear. Since > > > > transaction > > > errors cause > > > > CERR to be decremented until it reaches 0, a nonzero value for > > > > CERR > > means > > > > the endpoint was halted for some other reason. And the only other > > reason > > > > is a stall. (Or end of the microframe, but there's no way to tell > > > > if that > > > > happened.) > > > > > > I see your point. EHCI specification states that babble is a serious > > > error and it will also cause the halt. The babble error bit is > > > checked first. But the specification does not say about order of the > > > other errors or about > > what > > > to do if there is an error, no retries executed, yet a halt (stall). > > > For example should the XactErr be checked before the MMF. > > > > I think the order doesn't matter. In fact, it's possible that both > > errors occurred, since the transaction gets retried multiple times. > > True as the error bits are "sticky". > > For example XactErr in EHCI specification [2] table 3-16: > " If the host controller sets this bit to a one, then it remains a one for the > duration of the transfer." > > > > > > >(Or end of the microframe, but there's no way to tell if that > > > >happened.) > > > > > > I was not able to locate this from the specification. Could you > > > please point out where this statement is from? > > > > "Enhanced Host Controller Interface Specification for Universal Serial > > Bus", rev 1.0 (2002), p. 110: > > > > Transaction Error (XactErr). Timeout, data CRC failure, etc. The CErr > > field is decremented and the XactErr bit in the Status field is set to > > a one. The complete split transaction is immediately retried (if Cerr > > is non-zero). If there is not enough time in the micro-frame to > > complete the retry and the endpoint is an IN, or CErr is decremented > > to a zero from a one, the queue is halted. If there is not enough time > > in the micro-frame to complete the retry and the endpoint is an OUT > > and CErr is not zero, then this state is exited (i.e. return to Do > > Start Split). This results in a retry of the entire OUT split > > transaction, at the next poll period. Refer to Chapter 11 Hubs > > (specifically the section full- and low-speed Interrupts) in the USB > > Specification Revision 2.0 for detailed requirements on why these > > errors must be immediately retried. • > > > > There it is, thank you. > > > > Could the way to tell if "end of microframe" happened, be what is > > > done > > here > > > - check for MMF error bit and if CERR has not been decreased? > > > > No, because the "end of microframe" situation happens when the host > > controller is handling a transaction error, whereas MMF gets set when > > the host controller detects an error on the host. > > Ok, I see. > > > > > > > > diff --git a/drivers/usb/host/ehci-q.c > > > > > b/drivers/usb/host/ehci-q.c index 3276304..da7fd12 100644 > > > > > --- a/drivers/usb/host/ehci-q.c > > > > > +++ b/drivers/usb/host/ehci-q.c > > > > > @@ -206,8 +206,9 @@ static int qtd_copy_status ( > > > > > if (token & QTD_STS_BABBLE) { > > > > > /* FIXME "must" disable babbling > > > > device's port too */ > > > > > status = -EOVERFLOW; > > > > > - /* CERR nonzero + halt --> stall */ > > > > > - } else if (QTD_CERR(token)) { > > > > > + /* CERR nonzero and less than > > > > EHCI_TUNE_CERR + halt --> stall. > > > > > + This handles situation where stall comes after > > > > an error. */ > > > > > > > > This comment doesn't make sense. Who cares whether a stall comes > > after > > > > an error or not? It's still a stall and should be reported. > > > > > > This was basically a comment trying to answer to this commit: > > > ba516de332c0 USB: EHCI: check for STALL before other errors > > > > > > "The existing code doesn't do this properly, because it tests for MMF > > > (Missed MicroFrame) and DBE (Data Buffer Error) before testing the > > > retry counter. Thus, if a transaction gets either MMF or DBE the > > > corresponding flag is set and the transaction is retried. If the > > > second attempt receives a STALL then -EPIPE is the correct return > > > value. But the existing code would see the MMF or DBE flag instead > > > and return -EPROTO, -ENOSR, or -ECOMM." > > > > > > The comment tries to explain that it will not revert the fix made in > > > the commit ba516de332c0. > > > > Okay, I get it. You're trying to rely on the strange behavior of the > > MMF bit. > > > > I'm not sure this is a good idea. Suppose MMF gets set for some other > > reason (a genuine error on the host) and then the transaction gets a > > STALL on the next retry. Since host errors don't decrement CERR, your > > patch would cause the driver to return -EPROTO instead of -EPIPE. > > Hmm. I originally understood from the documentation that the MMF does > decrement the CERR. I followed the EHCI specification table 3-16, the row > about the CERR. > There are listed the errors that do decrement the CERR and which do not > decrement. Data buffer error, stall, babble and no error (of course) do not > decrement. > Transaction errors do decrement the CERR. As there was no mention about > MMF I have been in the impression that it is included to the transaction > errors of the table. > > Now I was able to find from the documentation you pointed out (in your > next comment below) that your statement is correct. EHCI specification, > table 4-13 contains Interrupt IN / OUT conditions which prove your point. > > > > > > > > + } else if (QTD_CERR(token) && > > > > QTD_CERR(token) < EHCI_TUNE_CERR) { > > > > > status = -EPIPE; > > > > > > > > If an error occurs and the transaction is retried and the retry > > > > gets a > > > stall, then > > > > the final status should be -EPIPE, not something else. > > > > > > This is how the RFC also works. If the transaction has been retried > > > and gets stall then -EPIPE is returned. > > > Or if there are no errors but there is a stall then -EPIPE is returned. > > > > > > The only difference in this patch in comparison to the existing > > > implementation is that if there is an error but the transaction has > > > not been retried it is not interpret as a stall even if there is a > > > halt. > > > > Sometimes that will be the right behavior and other times it won't. > > However, it looks like there may be a way to tell which situation we > > are in. Setting the MMF bit will cause the queue to halt immediately > > if the transaction is IN, but not if it is OUT (see Table 4-13 in the > > EHCI spec). > > Thank you, this Table 4-13 was a vital piece of documentation that I either > missed or failed to read carefully enough. > Now this could explain why the queue is halted with MMF set, even if it > would not be a stall. Well done! > > > > > So if CERR == EHCI_TUNE_CERR and the QTD_PID != 1 (not IN) then we > > should return -EPIPE, as the existing code does. But if QTD_PID == 1 > > then the code should continue, as your patch does -- with one > > difference: Put the additional check for EHCI_TUNE_CERR between the > > tests for DBE and XACT instead of after XACT (because XACT would > > decrement CERR whereas DBE wouldn't). > > Good point, I agree. > > > > > Can you make that change and test it? > > Sure, I have made the change and test it as soon as possible. I am not actually totally sure if I understood you correctly, but I tested a change where the first stall check is like this (PID_CODE_IN is defined as 1): - } else if (QTD_CERR(token)) { + } else if (QTD_CERR(token) && (QTD_PID (token) != PID_CODE_IN)) { status = -EPIPE; And the second stall check (now between DBE and XACT): + } else if (QTD_CERR(token)) { + status = -EPIPE; Is this what you meant? Please correct me if I am wrong. Anyways with these changes the issue does not reproduce. > > > > > Alan Stern > > Erkka Talvitie > Erkka Talvitie ^ permalink raw reply [flat|nested] 16+ messages in thread
* RE: [RFCv1 1/1] USB: EHCI: Do not return -EPIPE when hub is disconnected 2019-12-04 13:18 ` Erkka Talvitie @ 2019-12-04 14:24 ` Alan Stern 2019-12-04 14:37 ` Erkka Talvitie 0 siblings, 1 reply; 16+ messages in thread From: Alan Stern @ 2019-12-04 14:24 UTC (permalink / raw) To: Erkka Talvitie; +Cc: gregkh, linux-usb, claus.baumgartner On Wed, 4 Dec 2019, Erkka Talvitie wrote: > > > So if CERR == EHCI_TUNE_CERR and the QTD_PID != 1 (not IN) then we > > > should return -EPIPE, as the existing code does. But if QTD_PID == 1 > > > then the code should continue, as your patch does -- with one > > > difference: Put the additional check for EHCI_TUNE_CERR between the > > > tests for DBE and XACT instead of after XACT (because XACT would > > > decrement CERR whereas DBE wouldn't). > > > > Good point, I agree. > > > > > > > > Can you make that change and test it? > > > > Sure, I have made the change and test it as soon as possible. > > I am not actually totally sure if I understood you correctly, but I tested a change where the first stall check is like this (PID_CODE_IN is defined as 1): > > - } else if (QTD_CERR(token)) { > + } else if (QTD_CERR(token) && (QTD_PID (token) != PID_CODE_IN)) { > status = -EPIPE; > > And the second stall check (now between DBE and XACT): > + } else if (QTD_CERR(token)) { > + status = -EPIPE; > > Is this what you meant? Please correct me if I am wrong. Actually, what I meant for the first part was: } else if (QTD_CERR(token) && (QTD_CERR(token) < EHCI_TUNE_CERR || QTD_PID(token) != PID_CODE_IN)) { And of course there should be a comment before this line, explaining what it does. By the way, the accepted format for multi-line comments is: /* * Blah blah blah * Blah blah blah */ The second part of the patch looks okay (but again, with a comment). > Anyways with these changes the issue does not reproduce. Very good. Alan Stern ^ permalink raw reply [flat|nested] 16+ messages in thread
* RE: [RFCv1 1/1] USB: EHCI: Do not return -EPIPE when hub is disconnected 2019-12-04 14:24 ` Alan Stern @ 2019-12-04 14:37 ` Erkka Talvitie 2019-12-05 10:35 ` Erkka Talvitie 0 siblings, 1 reply; 16+ messages in thread From: Erkka Talvitie @ 2019-12-04 14:37 UTC (permalink / raw) To: 'Alan Stern'; +Cc: gregkh, linux-usb, claus.baumgartner > -----Original Message----- > From: Alan Stern <stern@rowland.harvard.edu> > Sent: keskiviikko 4. joulukuuta 2019 16.24 > To: Erkka Talvitie <erkka.talvitie@vincit.fi> > Cc: gregkh@linuxfoundation.org; linux-usb@vger.kernel.org; > claus.baumgartner@med.ge.com > Subject: RE: [RFCv1 1/1] USB: EHCI: Do not return -EPIPE when hub is > disconnected > > On Wed, 4 Dec 2019, Erkka Talvitie wrote: > > > > > So if CERR == EHCI_TUNE_CERR and the QTD_PID != 1 (not IN) then we > > > > should return -EPIPE, as the existing code does. But if QTD_PID > > > > == 1 then the code should continue, as your patch does -- with one > > > > difference: Put the additional check for EHCI_TUNE_CERR between > > > > the tests for DBE and XACT instead of after XACT (because XACT > > > > would decrement CERR whereas DBE wouldn't). > > > > > > Good point, I agree. > > > > > > > > > > > Can you make that change and test it? > > > > > > Sure, I have made the change and test it as soon as possible. > > > > I am not actually totally sure if I understood you correctly, but I tested a > change where the first stall check is like this (PID_CODE_IN is defined as 1): > > > > - } else if (QTD_CERR(token)) { > > + } else if (QTD_CERR(token) && (QTD_PID (token) != > > + PID_CODE_IN)) { > > status = -EPIPE; > > > > And the second stall check (now between DBE and XACT): > > + } else if (QTD_CERR(token)) { > > + status = -EPIPE; > > > > Is this what you meant? Please correct me if I am wrong. > > Actually, what I meant for the first part was: > > } else if (QTD_CERR(token) && > (QTD_CERR(token) > < EHCI_TUNE_CERR || > QTD_PID(token) != > PID_CODE_IN)) { Ok, now I understand the change. Good. > > And of course there should be a comment before this line, explaining what it > does. By the way, the accepted format for multi-line comments > is: > > /* > * Blah blah blah > * Blah blah blah > */ > Thanks for the information. I noticed that my comments were different than the existing ones in the file and I was already about to change my comments to match the existing style. > The second part of the patch looks okay (but again, with a comment). Yes, I will add the comment here also. > > > Anyways with these changes the issue does not reproduce. > > Very good. I will do the changes and re-test. > > Alan Stern Erkka Talvitie ^ permalink raw reply [flat|nested] 16+ messages in thread
* RE: [RFCv1 1/1] USB: EHCI: Do not return -EPIPE when hub is disconnected 2019-12-04 14:37 ` Erkka Talvitie @ 2019-12-05 10:35 ` Erkka Talvitie 2019-12-05 14:37 ` Alan Stern 0 siblings, 1 reply; 16+ messages in thread From: Erkka Talvitie @ 2019-12-05 10:35 UTC (permalink / raw) To: 'Alan Stern'; +Cc: gregkh, linux-usb, claus.baumgartner > -----Original Message----- > From: Erkka Talvitie <erkka.talvitie@vincit.fi> > Sent: keskiviikko 4. joulukuuta 2019 16.38 > To: 'Alan Stern' <stern@rowland.harvard.edu> > Cc: gregkh@linuxfoundation.org; linux-usb@vger.kernel.org; > claus.baumgartner@med.ge.com > Subject: RE: [RFCv1 1/1] USB: EHCI: Do not return -EPIPE when hub is > disconnected > > > > > -----Original Message----- > > From: Alan Stern <stern@rowland.harvard.edu> > > Sent: keskiviikko 4. joulukuuta 2019 16.24 > > To: Erkka Talvitie <erkka.talvitie@vincit.fi> > > Cc: gregkh@linuxfoundation.org; linux-usb@vger.kernel.org; > > claus.baumgartner@med.ge.com > > Subject: RE: [RFCv1 1/1] USB: EHCI: Do not return -EPIPE when hub is > > disconnected > > > > On Wed, 4 Dec 2019, Erkka Talvitie wrote: > > > > > > > So if CERR == EHCI_TUNE_CERR and the QTD_PID != 1 (not IN) then > > > > > we should return -EPIPE, as the existing code does. But if > > > > > QTD_PID == 1 then the code should continue, as your patch does > > > > > -- with one > > > > > difference: Put the additional check for EHCI_TUNE_CERR between > > > > > the tests for DBE and XACT instead of after XACT (because XACT > > > > > would decrement CERR whereas DBE wouldn't). > > > > > > > > Good point, I agree. > > > > > > > > > > > > > > Can you make that change and test it? > > > > > > > > Sure, I have made the change and test it as soon as possible. > > > > > > I am not actually totally sure if I understood you correctly, but I > tested a > > change where the first stall check is like this (PID_CODE_IN is > > defined as > 1): > > > > > > - } else if (QTD_CERR(token)) { > > > + } else if (QTD_CERR(token) && (QTD_PID (token) != > > > + PID_CODE_IN)) { > > > status = -EPIPE; > > > > > > And the second stall check (now between DBE and XACT): > > > + } else if (QTD_CERR(token)) { > > > + status = -EPIPE; > > > > > > Is this what you meant? Please correct me if I am wrong. > > > > Actually, what I meant for the first part was: > > > > } else if (QTD_CERR(token) && > > (QTD_CERR(token) > > < EHCI_TUNE_CERR || > > QTD_PID(token) != > > PID_CODE_IN)) { > > Ok, now I understand the change. Good. I tested this change and the issue did not reproduce. However when I was writing the comments for the patch I started to think what happens in this following scenario: The PID Code is IN. 1. First there will be XACT, the CERR is decremented, let's say from 3 to 2 and the host controller executes a retry. 2. On this next try there will happen the condition mentioned in the Table 4-13 of the EHCI specification so that the MMF is set and the queue is halted (because it is IN). 3. To my understanding now the execution enters to this first stall check if, as CERR is > 0 and CERR < EHCI_TUNE_CERR. 4. The -EPIPE (stall) is returned when actually the queue was halted due to MMF - not stall - and the -EPROTO should be returned. Is my logic correct or am I missing something? If you agree with me then I would like to present you a bit more bold (in a sense of amount of refactoring) RFC. In high level this another RFC separates 1. error check and 2. stall check. For me this approach is a bit easier to understand from the code. Or then please propose another solution. If you do not agree with this scenario then never mind the above suggestion about RFC doing more refactoring. > > > > > And of course there should be a comment before this line, explaining > > what > it > > does. By the way, the accepted format for multi-line comments > > is: > > > > /* > > * Blah blah blah > > * Blah blah blah > > */ > > > > Thanks for the information. I noticed that my comments were different than > the existing ones in the file and I was already about to change my comments > to match the existing style. > > > The second part of the patch looks okay (but again, with a comment). > > Yes, I will add the comment here also. > > > > > > Anyways with these changes the issue does not reproduce. > > > > Very good. > > I will do the changes and re-test. > > > > > Alan Stern > > Erkka Talvitie Erkka Talvitie ^ permalink raw reply [flat|nested] 16+ messages in thread
* RE: [RFCv1 1/1] USB: EHCI: Do not return -EPIPE when hub is disconnected 2019-12-05 10:35 ` Erkka Talvitie @ 2019-12-05 14:37 ` Alan Stern 2019-12-05 15:00 ` Erkka Talvitie 0 siblings, 1 reply; 16+ messages in thread From: Alan Stern @ 2019-12-05 14:37 UTC (permalink / raw) To: Erkka Talvitie; +Cc: gregkh, linux-usb, claus.baumgartner On Thu, 5 Dec 2019, Erkka Talvitie wrote: > I tested this change and the issue did not reproduce. > > However when I was writing the comments for the patch I started to think > what happens in this following scenario: > > The PID Code is IN. > > 1. First there will be XACT, the CERR is decremented, let's say from 3 to 2 > and the host controller executes a retry. > 2. On this next try there will happen the condition mentioned in the Table > 4-13 of the EHCI specification so that the MMF is set and the queue is > halted (because it is IN). > 3. To my understanding now the execution enters to this first stall check > if, as CERR is > 0 and CERR < EHCI_TUNE_CERR. > 4. The -EPIPE (stall) is returned when actually the queue was halted due to > MMF - not stall - and the -EPROTO should be returned. > > Is my logic correct or am I missing something? The same thought had occurred to me. > If you agree with me then I would like to present you a bit more bold (in a > sense of amount of refactoring) RFC. In high level this another RFC > separates 1. error check and 2. stall check. For me this approach is a bit > easier to understand from the code. Or then please propose another > solution. I was going to suggest: Just check for MMF and PID == IN before checking for STALL. Everything else can remain the way it is. Alan Stern ^ permalink raw reply [flat|nested] 16+ messages in thread
* RE: [RFCv1 1/1] USB: EHCI: Do not return -EPIPE when hub is disconnected 2019-12-05 14:37 ` Alan Stern @ 2019-12-05 15:00 ` Erkka Talvitie 2019-12-09 9:57 ` Erkka Talvitie 0 siblings, 1 reply; 16+ messages in thread From: Erkka Talvitie @ 2019-12-05 15:00 UTC (permalink / raw) To: 'Alan Stern'; +Cc: gregkh, linux-usb, claus.baumgartner > -----Original Message----- > From: Alan Stern <stern@rowland.harvard.edu> > Sent: torstai 5. joulukuuta 2019 16.38 > To: Erkka Talvitie <erkka.talvitie@vincit.fi> > Cc: gregkh@linuxfoundation.org; linux-usb@vger.kernel.org; > claus.baumgartner@med.ge.com > Subject: RE: [RFCv1 1/1] USB: EHCI: Do not return -EPIPE when hub is > disconnected > > On Thu, 5 Dec 2019, Erkka Talvitie wrote: > > > I tested this change and the issue did not reproduce. > > > > However when I was writing the comments for the patch I started to > > think what happens in this following scenario: > > > > The PID Code is IN. > > > > 1. First there will be XACT, the CERR is decremented, let's say from 3 > > to 2 and the host controller executes a retry. > > 2. On this next try there will happen the condition mentioned in the > > Table > > 4-13 of the EHCI specification so that the MMF is set and the queue is > > halted (because it is IN). > > 3. To my understanding now the execution enters to this first stall > > check if, as CERR is > 0 and CERR < EHCI_TUNE_CERR. > > 4. The -EPIPE (stall) is returned when actually the queue was halted > > due to MMF - not stall - and the -EPROTO should be returned. > > > > Is my logic correct or am I missing something? > > The same thought had occurred to me. > > > If you agree with me then I would like to present you a bit more bold > > (in a sense of amount of refactoring) RFC. In high level this another > > RFC separates 1. error check and 2. stall check. For me this approach > > is a bit easier to understand from the code. Or then please propose > > another solution. > > I was going to suggest: Just check for MMF and PID == IN before checking for > STALL. Everything else can remain the way it is. Ok, just to double check: I take the existing driver code (I will not apply the earlier RFC on top of that) and apply one more check before the original stall check (that is): } else if (QTD_CERR(token)) { The check that I will add is checking MMF bit && PID == IN, and this check comes right after the babble check, right? Good, seems like a simple change. Yet I still prefer to test the change. Unfortunately that goes to the next week as we have a national holiday tomorrow. I will get back to you most likely on Monday. > > Alan Stern > Erkka Talvitie ^ permalink raw reply [flat|nested] 16+ messages in thread
* RE: [RFCv1 1/1] USB: EHCI: Do not return -EPIPE when hub is disconnected 2019-12-05 15:00 ` Erkka Talvitie @ 2019-12-09 9:57 ` Erkka Talvitie 2019-12-09 15:24 ` Alan Stern 0 siblings, 1 reply; 16+ messages in thread From: Erkka Talvitie @ 2019-12-09 9:57 UTC (permalink / raw) To: 'Alan Stern'; +Cc: gregkh, linux-usb, claus.baumgartner > -----Original Message----- > From: Erkka Talvitie <erkka.talvitie@vincit.fi> > Sent: torstai 5. joulukuuta 2019 17.00 > To: 'Alan Stern' <stern@rowland.harvard.edu> > Cc: gregkh@linuxfoundation.org; linux-usb@vger.kernel.org; > claus.baumgartner@med.ge.com > Subject: RE: [RFCv1 1/1] USB: EHCI: Do not return -EPIPE when hub is > disconnected > > > > > -----Original Message----- > > From: Alan Stern <stern@rowland.harvard.edu> > > Sent: torstai 5. joulukuuta 2019 16.38 > > To: Erkka Talvitie <erkka.talvitie@vincit.fi> > > Cc: gregkh@linuxfoundation.org; linux-usb@vger.kernel.org; > > claus.baumgartner@med.ge.com > > Subject: RE: [RFCv1 1/1] USB: EHCI: Do not return -EPIPE when hub is > > disconnected > > > > On Thu, 5 Dec 2019, Erkka Talvitie wrote: > > > > > I tested this change and the issue did not reproduce. > > > > > > However when I was writing the comments for the patch I started to > > > think what happens in this following scenario: > > > > > > The PID Code is IN. > > > > > > 1. First there will be XACT, the CERR is decremented, let's say from > > > 3 to 2 and the host controller executes a retry. > > > 2. On this next try there will happen the condition mentioned in the > > > Table > > > 4-13 of the EHCI specification so that the MMF is set and the queue > > > is halted (because it is IN). > > > 3. To my understanding now the execution enters to this first stall > > > check if, as CERR is > 0 and CERR < EHCI_TUNE_CERR. > > > 4. The -EPIPE (stall) is returned when actually the queue was halted > > > due to MMF - not stall - and the -EPROTO should be returned. > > > > > > Is my logic correct or am I missing something? > > > > The same thought had occurred to me. > > > > > If you agree with me then I would like to present you a bit more > > > bold (in a sense of amount of refactoring) RFC. In high level this > > > another RFC separates 1. error check and 2. stall check. For me this > > > approach is a bit easier to understand from the code. Or then please > > > propose another solution. > > > > I was going to suggest: Just check for MMF and PID == IN before > > checking > for > > STALL. Everything else can remain the way it is. > > Ok, just to double check: > > I take the existing driver code (I will not apply the earlier RFC on top of > that) and apply one more check before the original stall check (that is): > } else if (QTD_CERR(token)) { > > The check that I will add is checking MMF bit && PID == IN, and this check > comes right after the babble check, right? > > Good, seems like a simple change. Yet I still prefer to test the change. > Unfortunately that goes to the next week as we have a national holiday > tomorrow. > I will get back to you most likely on Monday. I tested this change and it removes the error messages from the output. > > > > > Alan Stern > > > Erkka Talvitie Erkka Talvitie ^ permalink raw reply [flat|nested] 16+ messages in thread
* RE: [RFCv1 1/1] USB: EHCI: Do not return -EPIPE when hub is disconnected 2019-12-09 9:57 ` Erkka Talvitie @ 2019-12-09 15:24 ` Alan Stern 2019-12-10 6:31 ` Erkka Talvitie 0 siblings, 1 reply; 16+ messages in thread From: Alan Stern @ 2019-12-09 15:24 UTC (permalink / raw) To: Erkka Talvitie; +Cc: gregkh, linux-usb, claus.baumgartner On Mon, 9 Dec 2019, Erkka Talvitie wrote: > > Ok, just to double check: > > > > I take the existing driver code (I will not apply the earlier RFC on top > of > > that) and apply one more check before the original stall check (that is): > > } else if (QTD_CERR(token)) { > > > > The check that I will add is checking MMF bit && PID == IN, and this check > > comes right after the babble check, right? > > > > Good, seems like a simple change. Yet I still prefer to test the change. > > Unfortunately that goes to the next week as we have a national holiday > > tomorrow. > > I will get back to you most likely on Monday. > > I tested this change and it removes the error messages from the output. Great! Okay, feel free to submit a new patch. Alan Stern ^ permalink raw reply [flat|nested] 16+ messages in thread
* RE: [RFCv1 1/1] USB: EHCI: Do not return -EPIPE when hub is disconnected 2019-12-09 15:24 ` Alan Stern @ 2019-12-10 6:31 ` Erkka Talvitie 0 siblings, 0 replies; 16+ messages in thread From: Erkka Talvitie @ 2019-12-10 6:31 UTC (permalink / raw) To: 'Alan Stern'; +Cc: gregkh, linux-usb, claus.baumgartner > -----Original Message----- > From: Alan Stern <stern@rowland.harvard.edu> > Sent: maanantai 9. joulukuuta 2019 17.25 > To: Erkka Talvitie <erkka.talvitie@vincit.fi> > Cc: gregkh@linuxfoundation.org; linux-usb@vger.kernel.org; > claus.baumgartner@med.ge.com > Subject: RE: [RFCv1 1/1] USB: EHCI: Do not return -EPIPE when hub is > disconnected > > On Mon, 9 Dec 2019, Erkka Talvitie wrote: > > > > Ok, just to double check: > > > > > > I take the existing driver code (I will not apply the earlier RFC on > > > top > > of > > > that) and apply one more check before the original stall check (that is): > > > } else if (QTD_CERR(token)) { > > > > > > The check that I will add is checking MMF bit && PID == IN, and this > > > check comes right after the babble check, right? > > > > > > Good, seems like a simple change. Yet I still prefer to test the change. > > > Unfortunately that goes to the next week as we have a national > > > holiday tomorrow. > > > I will get back to you most likely on Monday. > > > > I tested this change and it removes the error messages from the output. > > Great! Okay, feel free to submit a new patch. Thank you, a new patch was submitted. > > Alan Stern Erkka Talvitie ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2019-12-10 6:31 UTC | newest] Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-11-29 14:08 [RFCv1 1/1] USB: EHCI: Do not return -EPIPE when hub is disconnected Erkka Talvitie 2019-12-02 19:43 ` Alan Stern 2019-12-03 9:38 ` Erkka Talvitie 2019-12-03 10:54 ` Erkka Talvitie 2019-12-03 13:35 ` Erkka Talvitie 2019-12-03 19:01 ` Alan Stern 2019-12-04 8:55 ` Erkka Talvitie 2019-12-04 13:18 ` Erkka Talvitie 2019-12-04 14:24 ` Alan Stern 2019-12-04 14:37 ` Erkka Talvitie 2019-12-05 10:35 ` Erkka Talvitie 2019-12-05 14:37 ` Alan Stern 2019-12-05 15:00 ` Erkka Talvitie 2019-12-09 9:57 ` Erkka Talvitie 2019-12-09 15:24 ` Alan Stern 2019-12-10 6:31 ` Erkka Talvitie
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.