* kernel locks due to USB I/O @ 2020-11-10 19:20 Alberto Sentieri 2020-11-10 20:51 ` Alan Stern 0 siblings, 1 reply; 17+ messages in thread From: Alberto Sentieri @ 2020-11-10 19:20 UTC (permalink / raw) To: Greg Kroah-Hartman, linux-usb I’ve seen many kernel locks caused by a particular user-level application. After the kernel locks, there is no report left in the machine, neither in the logs. These locks have to do with USB input and output. The objective of this email is to get guidance about how to collect more data related to the locks. Follows a description of the problem. I manage a few remote machines installed at a manufacturing facility, which run Ubuntu 18.04. For months I had seen unexpected kernel locks, which I could not explain. By locks I mean that the machine completely dies. The graphical screen and keyboard freezes. I cannot ping or connect through ssh during the locks. The only way of making the machine come back is through a “pull the plug”. After rebooting I cannot find anything meaningful about the lock in the logs. The machine is a good quality one with a 6-core Xeon, 32 GB ECC memory (and the application is using about 1GB). Exact the same problem happens in two identical machines, one running kernel 5.0.0-37 generic and the other running kernel 5.3.0-62-generic. A few days ago I was able to create a sequence of events that produce the locks in a couple of minutes. These events have to do with USB 2.0 interrupt I/O on USB devices connected at 12 Mbits/s and the frequency URBs are submitted and reaped . It is necessary to have at least 36 devices connected to reproduce the problem easily, which I cannot do from where I am. The machines are in a country other than the one I live, and my physical access to them is not possible due to COVID-19 restrictions. There is no special USB drivers installed. However, there is a NVIDIA manufacturer driver installed, which I installed using the Ubuntu regular tools for non-free software. All USB I/O is done by a regular user opening /dev/bus/usb/xxx/xxx (the device group is set to the user group by udev). Each set of 18 USB devices is connected to a 10-Amp.-power-supply powered HUB. Each hub has its own USB 2.0 root, I mean, I installed multiple USB 2.0 PCI express expansion cards, and only one port of each expansion card is used for each HUB. The protocol to talk to any of the 36 devices is pretty simple. It uses USB interrupt frames. A 64-byte frame is sent to the device (request packet). I use ioctl (USBDEVFS_SUBMITURB). The file descriptor is monitored by epoll and when an answer comes back, the response packet (another 64-byte interrupt packet) is recovered by ioctl (USBDEVFS_REAPURBNDELAY). Then a 64-byte packet (confirmation packet) is sent through USBDEVFS_SUBMITURB. This sequence happens once every few seconds and the delay between the three packets is just a couple of milliseconds. All process of dealing with the 36 devices is in a unique thread, under the same epoll loop. So if I synchronize all 36 devices, I mean, I try to talk to all them basically at the same time, the kernel will lock in about 2 minutes or less. By “at the same time” I mean to submit the URBs for the request packet around the same time for all of them, and then sit there, waiting for the proper epoll wake-up to deal with the state machine (response and confirmation packets). However, if I lock a semaphore before sending the request packet for one device, and only unlock after reaping the URB I used to send the confirmation packet, it ran for ate least 72 hours without problems. So, one device at a time (using basically the same software plus the semaphore) does not cause the kernel lock. My point is that simple ioctl calls to USB devices should not break the kernel. I need help to address the kernel issue. The problem is difficult to reproduce at my office because it needs many devices connected to it, which are available only in a place I do not have physical access to, due to COVID-19 travel restrictions. My guess is that, for a regular user, this bug rarely manifests itself and it may be there for a long time. I would like to figure out exactly where the problem is and I am looking for your guidance to get more information about it. Thanks, Alberto Sentieri # cat /proc/version Linux version 5.3.0-62-generic (buildd@lcy01-amd64-004) (gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)) #56~18.04.1-Ubuntu SMP Wed Jun 24 16:17:03 UTC 2020 # free -h (while running the application) total used free shared buff/cache available Mem: 31G 860M 29G 7.1M 1.3G 29G Swap: 15G 0B 15G # lspci 00:00.0 Host bridge: Intel Corporation Sky Lake-E DMI3 Registers (rev 04) 00:04.0 System peripheral: Intel Corporation Sky Lake-E CBDMA Registers (rev 04) 00:04.1 System peripheral: Intel Corporation Sky Lake-E CBDMA Registers (rev 04) 00:04.2 System peripheral: Intel Corporation Sky Lake-E CBDMA Registers (rev 04) 00:04.3 System peripheral: Intel Corporation Sky Lake-E CBDMA Registers (rev 04) 00:04.4 System peripheral: Intel Corporation Sky Lake-E CBDMA Registers (rev 04) 00:04.5 System peripheral: Intel Corporation Sky Lake-E CBDMA Registers (rev 04) 00:04.6 System peripheral: Intel Corporation Sky Lake-E CBDMA Registers (rev 04) 00:04.7 System peripheral: Intel Corporation Sky Lake-E CBDMA Registers (rev 04) 00:05.0 System peripheral: Intel Corporation Sky Lake-E MM/Vt-d Configuration Registers (rev 04) 00:05.2 System peripheral: Intel Corporation Device 2025 (rev 04) 00:05.4 PIC: Intel Corporation Device 2026 (rev 04) 00:08.0 System peripheral: Intel Corporation Sky Lake-E Ubox Registers (rev 04) 00:08.1 Performance counters: Intel Corporation Sky Lake-E Ubox Registers (rev 04) 00:08.2 System peripheral: Intel Corporation Sky Lake-E Ubox Registers (rev 04) 00:14.0 USB controller: Intel Corporation 200 Series/Z370 Chipset Family USB 3.0 xHCI Controller 00:14.2 Signal processing controller: Intel Corporation 200 Series PCH Thermal Subsystem 00:16.0 Communication controller: Intel Corporation 200 Series PCH CSME HECI #1 00:17.0 RAID bus controller: Intel Corporation C600/X79 series chipset SATA RAID Controller 00:1b.0 PCI bridge: Intel Corporation 200 Series PCH PCI Express Root Port #21 (rev f0) 00:1c.0 PCI bridge: Intel Corporation 200 Series PCH PCI Express Root Port #3 (rev f0) 00:1c.4 PCI bridge: Intel Corporation 200 Series PCH PCI Express Root Port #5 (rev f0) 00:1d.0 PCI bridge: Intel Corporation 200 Series PCH PCI Express Root Port #9 (rev f0) 00:1f.0 ISA bridge: Intel Corporation C422 Chipset LPC/eSPI Controller 00:1f.2 Memory controller: Intel Corporation 200 Series/Z370 Chipset Family Power Management Controller 00:1f.3 Audio device: Intel Corporation 200 Series PCH HD Audio 00:1f.4 SMBus: Intel Corporation 200 Series/Z370 Chipset Family SMBus Controller 00:1f.5 Non-VGA unclassified device: Intel Corporation 200 Series/Z370 Chipset Family SPI Controller 00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (2) I219-LM 01:00.0 PCI bridge: Pericom Semiconductor PI7C9X111SL PCIe-to-PCI Reversible Bridge (rev 02) 02:04.0 USB controller: VIA Technologies, Inc. VT82xx/62xx UHCI USB 1.1 Controller (rev 62) 02:04.1 USB controller: VIA Technologies, Inc. VT82xx/62xx UHCI USB 1.1 Controller (rev 62) 02:04.2 USB controller: VIA Technologies, Inc. USB 2.0 (rev 65) 03:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03) 05:00.0 PCI bridge: Pericom Semiconductor PI7C9X111SL PCIe-to-PCI Reversible Bridge (rev 02) 06:04.0 USB controller: VIA Technologies, Inc. VT82xx/62xx UHCI USB 1.1 Controller (rev 62) 06:04.1 USB controller: VIA Technologies, Inc. VT82xx/62xx UHCI USB 1.1 Controller (rev 62) 06:04.2 USB controller: VIA Technologies, Inc. USB 2.0 (rev 65) 14:00.0 PCI bridge: Intel Corporation Sky Lake-E PCI Express Root Port A (rev 04) 14:05.0 System peripheral: Intel Corporation Device 2034 (rev 04) 14:05.2 System peripheral: Intel Corporation Sky Lake-E RAS Configuration Registers (rev 04) 14:05.4 PIC: Intel Corporation Device 2036 (rev 04) 14:08.0 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04) 14:08.1 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04) 14:08.2 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04) 14:08.3 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04) 14:08.4 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04) 14:08.5 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04) 14:08.6 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04) 14:08.7 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04) 14:09.0 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04) 14:09.1 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04) 14:0e.0 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04) 14:0e.1 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04) 14:0e.2 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04) 14:0e.3 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04) 14:0e.4 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04) 14:0e.5 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04) 14:0e.6 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04) 14:0e.7 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04) 14:0f.0 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04) 14:0f.1 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04) 14:1d.0 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04) 14:1d.1 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04) 14:1d.2 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04) 14:1d.3 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04) 14:1e.0 System peripheral: Intel Corporation Sky Lake-E PCU Registers (rev 04) 14:1e.1 System peripheral: Intel Corporation Sky Lake-E PCU Registers (rev 04) 14:1e.2 System peripheral: Intel Corporation Sky Lake-E PCU Registers (rev 04) 14:1e.3 System peripheral: Intel Corporation Sky Lake-E PCU Registers (rev 04) 14:1e.4 System peripheral: Intel Corporation Sky Lake-E PCU Registers (rev 04) 14:1e.5 System peripheral: Intel Corporation Sky Lake-E PCU Registers (rev 04) 14:1e.6 System peripheral: Intel Corporation Sky Lake-E PCU Registers (rev 04) 15:00.0 PCI bridge: Pericom Semiconductor PI7C9X111SL PCIe-to-PCI Reversible Bridge (rev 02) 16:04.0 USB controller: VIA Technologies, Inc. VT82xx/62xx UHCI USB 1.1 Controller (rev 62) 16:04.1 USB controller: VIA Technologies, Inc. VT82xx/62xx UHCI USB 1.1 Controller (rev 62) 16:04.2 USB controller: VIA Technologies, Inc. USB 2.0 (rev 65) 20:00.0 PCI bridge: Intel Corporation Sky Lake-E PCI Express Root Port A (rev 04) 20:05.0 System peripheral: Intel Corporation Device 2034 (rev 04) 20:05.2 System peripheral: Intel Corporation Sky Lake-E RAS Configuration Registers (rev 04) 20:05.4 PIC: Intel Corporation Device 2036 (rev 04) 20:08.0 System peripheral: Intel Corporation Device 2066 (rev 04) 20:09.0 System peripheral: Intel Corporation Device 2066 (rev 04) 20:0a.0 System peripheral: Intel Corporation Device 2040 (rev 04) 20:0a.1 System peripheral: Intel Corporation Device 2041 (rev 04) 20:0a.2 System peripheral: Intel Corporation Device 2042 (rev 04) 20:0a.3 System peripheral: Intel Corporation Device 2043 (rev 04) 20:0a.4 System peripheral: Intel Corporation Device 2044 (rev 04) 20:0a.5 System peripheral: Intel Corporation Device 2045 (rev 04) 20:0a.6 System peripheral: Intel Corporation Device 2046 (rev 04) 20:0a.7 System peripheral: Intel Corporation Device 2047 (rev 04) 20:0b.0 System peripheral: Intel Corporation Device 2048 (rev 04) 20:0b.1 System peripheral: Intel Corporation Device 2049 (rev 04) 20:0b.2 System peripheral: Intel Corporation Device 204a (rev 04) 20:0b.3 System peripheral: Intel Corporation Device 204b (rev 04) 20:0c.0 System peripheral: Intel Corporation Device 2040 (rev 04) 20:0c.1 System peripheral: Intel Corporation Device 2041 (rev 04) 20:0c.2 System peripheral: Intel Corporation Device 2042 (rev 04) 20:0c.3 System peripheral: Intel Corporation Device 2043 (rev 04) 20:0c.4 System peripheral: Intel Corporation Device 2044 (rev 04) 20:0c.5 System peripheral: Intel Corporation Device 2045 (rev 04) 20:0c.6 System peripheral: Intel Corporation Device 2046 (rev 04) 20:0c.7 System peripheral: Intel Corporation Device 2047 (rev 04) 20:0d.0 System peripheral: Intel Corporation Device 2048 (rev 04) 20:0d.1 System peripheral: Intel Corporation Device 2049 (rev 04) 20:0d.2 System peripheral: Intel Corporation Device 204a (rev 04) 20:0d.3 System peripheral: Intel Corporation Device 204b (rev 04) 21:00.0 VGA compatible controller: NVIDIA Corporation GP107GL [Quadro P400] (rev a1) 21:00.1 Audio device: NVIDIA Corporation GP107GL High Definition Audio Controller (rev a1) 2c:00.0 PCI bridge: Intel Corporation Sky Lake-E PCI Express Root Port A (rev 04) 2c:02.0 PCI bridge: Intel Corporation Sky Lake-E PCI Express Root Port C (rev 04) 2c:03.0 PCI bridge: Intel Corporation Sky Lake-E PCI Express Root Port D (rev 04) 2c:05.0 System peripheral: Intel Corporation Device 2034 (rev 04) 2c:05.2 System peripheral: Intel Corporation Sky Lake-E RAS Configuration Registers (rev 04) 2c:05.4 PIC: Intel Corporation Device 2036 (rev 04) 2c:12.0 Performance counters: Intel Corporation Sky Lake-E M3KTI Registers (rev 04) 2c:12.1 Performance counters: Intel Corporation Sky Lake-E M3KTI Registers (rev 04) 2c:12.2 System peripheral: Intel Corporation Sky Lake-E M3KTI Registers (rev 04) 2c:15.0 System peripheral: Intel Corporation Sky Lake-E M2PCI Registers (rev 04) 2c:16.0 System peripheral: Intel Corporation Sky Lake-E M2PCI Registers (rev 04) 2c:16.4 System peripheral: Intel Corporation Sky Lake-E M2PCI Registers (rev 04) 2c:17.0 System peripheral: Intel Corporation Sky Lake-E M2PCI Registers (rev 04) 2d:00.0 PCI bridge: Pericom Semiconductor PI7C9X111SL PCIe-to-PCI Reversible Bridge (rev 02) 2e:04.0 USB controller: VIA Technologies, Inc. VT82xx/62xx UHCI USB 1.1 Controller (rev 62) 2e:04.1 USB controller: VIA Technologies, Inc. VT82xx/62xx UHCI USB 1.1 Controller (rev 62) 2e:04.2 USB controller: VIA Technologies, Inc. USB 2.0 (rev 65) ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: kernel locks due to USB I/O 2020-11-10 19:20 kernel locks due to USB I/O Alberto Sentieri @ 2020-11-10 20:51 ` Alan Stern 2020-11-10 23:42 ` Alberto Sentieri 0 siblings, 1 reply; 17+ messages in thread From: Alan Stern @ 2020-11-10 20:51 UTC (permalink / raw) To: Alberto Sentieri; +Cc: Greg Kroah-Hartman, linux-usb On Tue, Nov 10, 2020 at 02:20:50PM -0500, Alberto Sentieri wrote: > I’ve seen many kernel locks caused by a particular user-level application. > After the kernel locks, there is no report left in the machine, neither in > the logs. These locks have to do with USB input and output. > > The objective of this email is to get guidance about how to collect more > data related to the locks. > > Follows a description of the problem. > > I manage a few remote machines installed at a manufacturing facility, which > run Ubuntu 18.04. For months I had seen unexpected kernel locks, which I > could not explain. By locks I mean that the machine completely dies. The > graphical screen and keyboard freezes. I cannot ping or connect through ssh > during the locks. The only way of making the machine come back is through a > “pull the plug”. After rebooting I cannot find anything meaningful about the > lock in the logs. The machine is a good quality one with a 6-core Xeon, 32 > GB ECC memory (and the application is using about 1GB). Exact the same > problem happens in two identical machines, one running kernel 5.0.0-37 > generic and the other running kernel 5.3.0-62-generic. Can you update either machine to a 5.9 kernel? > A few days ago I was able to create a sequence of events that produce the > locks in a couple of minutes. These events have to do with USB 2.0 interrupt > I/O on USB devices connected at 12 Mbits/s and the frequency URBs are > submitted and reaped . It is necessary to have at least 36 devices connected > to reproduce the problem easily, which I cannot do from where I am. The > machines are in a country other than the one I live, and my physical access > to them is not possible due to COVID-19 restrictions. > > There is no special USB drivers installed. However, there is a NVIDIA > manufacturer driver installed, which I installed using the Ubuntu regular > tools for non-free software. All USB I/O is done by a regular user opening > /dev/bus/usb/xxx/xxx (the device group is set to the user group by udev). > > Each set of 18 USB devices is connected to a 10-Amp.-power-supply powered > HUB. Each hub has its own USB 2.0 root, I mean, I installed multiple USB 2.0 > PCI express expansion cards, and only one port of each expansion card is > used for each HUB. > > The protocol to talk to any of the 36 devices is pretty simple. It uses USB > interrupt frames. A 64-byte frame is sent to the device (request packet). I > use ioctl (USBDEVFS_SUBMITURB). The file descriptor is monitored by epoll > and when an answer comes back, the response packet (another 64-byte > interrupt packet) is recovered by ioctl (USBDEVFS_REAPURBNDELAY). Then a > 64-byte packet (confirmation packet) is sent through USBDEVFS_SUBMITURB. > This sequence happens once every few seconds and the delay between the three > packets is just a couple of milliseconds. All process of dealing with the 36 > devices is in a unique thread, under the same epoll loop. This sentence is ambiguous. Do you mean there is a single unique thread which talks to all 36 devices? Or do you mean there is a separate unique thread for each device (so 36 threads)? > So if I synchronize all 36 devices, I mean, I try to talk to all them > basically at the same time, the kernel will lock in about 2 minutes or less. > By “at the same time” I mean to submit the URBs for the request packet > around the same time for all of them, and then sit there, waiting for the > proper epoll wake-up to deal with the state machine (response and > confirmation packets). > > However, if I lock a semaphore before sending the request packet for one > device, and only unlock after reaping the URB I used to send the > confirmation packet, it ran for ate least 72 hours without problems. So, one > device at a time (using basically the same software plus the semaphore) does > not cause the kernel lock. > > My point is that simple ioctl calls to USB devices should not break the > kernel. I need help to address the kernel issue. The problem is difficult to > reproduce at my office because it needs many devices connected to it, which > are available only in a place I do not have physical access to, due to > COVID-19 travel restrictions. > > My guess is that, for a regular user, this bug rarely manifests itself and > it may be there for a long time. > > I would like to figure out exactly where the problem is and I am looking for > your guidance to get more information about it. You could try using a network console. Or have someone who is on-site take a picture of the computer screen when a crash occurs. Alan Stern ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: kernel locks due to USB I/O 2020-11-10 20:51 ` Alan Stern @ 2020-11-10 23:42 ` Alberto Sentieri 2020-11-11 7:51 ` Greg Kroah-Hartman 2020-11-11 15:51 ` Alan Stern 0 siblings, 2 replies; 17+ messages in thread From: Alberto Sentieri @ 2020-11-10 23:42 UTC (permalink / raw) To: Alan Stern; +Cc: Greg Kroah-Hartman, linux-usb 1) The current Ubuntu Kernel is 5.4.0-53. Do you want me to upgrade it to 5.9, from kernel.org? Or is there a Ubuntu 5.9 package that I can use? It would be easy to do it If there is a Ubuntu package with 5.9, which I would install and, after the tests, uninstall. 2) Why do you believe that 5.9 would solve the problem? I am asking that because I cannot change the production machine for a test if I cannot go back to the original state. There is always a risk involved. 3) It is one single thread dealing with all 36 devices. Each device has its own co-routine (not preemptive), but all co-routines are executed by a unique thread. 4) By network console, do you mean ssh? It dies as well when it locks. The screen is the regular GNOME3 screen and nothing can be seen there. Every time it locks they send a picture, and I cannot see anything meaningful there. I am thinking about disabling GNOME3, but I need their blessing for that. Thanks, Alberto On 11/10/20 3:51 PM, Alan Stern wrote: > On Tue, Nov 10, 2020 at 02:20:50PM -0500, Alberto Sentieri wrote: >> I’ve seen many kernel locks caused by a particular user-level application. >> After the kernel locks, there is no report left in the machine, neither in >> the logs. These locks have to do with USB input and output. >> >> The objective of this email is to get guidance about how to collect more >> data related to the locks. >> >> Follows a description of the problem. >> >> I manage a few remote machines installed at a manufacturing facility, which >> run Ubuntu 18.04. For months I had seen unexpected kernel locks, which I >> could not explain. By locks I mean that the machine completely dies. The >> graphical screen and keyboard freezes. I cannot ping or connect through ssh >> during the locks. The only way of making the machine come back is through a >> “pull the plug”. After rebooting I cannot find anything meaningful about the >> lock in the logs. The machine is a good quality one with a 6-core Xeon, 32 >> GB ECC memory (and the application is using about 1GB). Exact the same >> problem happens in two identical machines, one running kernel 5.0.0-37 >> generic and the other running kernel 5.3.0-62-generic. > Can you update either machine to a 5.9 kernel? > >> A few days ago I was able to create a sequence of events that produce the >> locks in a couple of minutes. These events have to do with USB 2.0 interrupt >> I/O on USB devices connected at 12 Mbits/s and the frequency URBs are >> submitted and reaped . It is necessary to have at least 36 devices connected >> to reproduce the problem easily, which I cannot do from where I am. The >> machines are in a country other than the one I live, and my physical access >> to them is not possible due to COVID-19 restrictions. >> >> There is no special USB drivers installed. However, there is a NVIDIA >> manufacturer driver installed, which I installed using the Ubuntu regular >> tools for non-free software. All USB I/O is done by a regular user opening >> /dev/bus/usb/xxx/xxx (the device group is set to the user group by udev). >> >> Each set of 18 USB devices is connected to a 10-Amp.-power-supply powered >> HUB. Each hub has its own USB 2.0 root, I mean, I installed multiple USB 2.0 >> PCI express expansion cards, and only one port of each expansion card is >> used for each HUB. >> >> The protocol to talk to any of the 36 devices is pretty simple. It uses USB >> interrupt frames. A 64-byte frame is sent to the device (request packet). I >> use ioctl (USBDEVFS_SUBMITURB). The file descriptor is monitored by epoll >> and when an answer comes back, the response packet (another 64-byte >> interrupt packet) is recovered by ioctl (USBDEVFS_REAPURBNDELAY). Then a >> 64-byte packet (confirmation packet) is sent through USBDEVFS_SUBMITURB. >> This sequence happens once every few seconds and the delay between the three >> packets is just a couple of milliseconds. All process of dealing with the 36 >> devices is in a unique thread, under the same epoll loop. > This sentence is ambiguous. Do you mean there is a single unique thread > which talks to all 36 devices? Or do you mean there is a separate > unique thread for each device (so 36 threads)? > >> So if I synchronize all 36 devices, I mean, I try to talk to all them >> basically at the same time, the kernel will lock in about 2 minutes or less. >> By “at the same time” I mean to submit the URBs for the request packet >> around the same time for all of them, and then sit there, waiting for the >> proper epoll wake-up to deal with the state machine (response and >> confirmation packets). >> >> However, if I lock a semaphore before sending the request packet for one >> device, and only unlock after reaping the URB I used to send the >> confirmation packet, it ran for ate least 72 hours without problems. So, one >> device at a time (using basically the same software plus the semaphore) does >> not cause the kernel lock. >> >> My point is that simple ioctl calls to USB devices should not break the >> kernel. I need help to address the kernel issue. The problem is difficult to >> reproduce at my office because it needs many devices connected to it, which >> are available only in a place I do not have physical access to, due to >> COVID-19 travel restrictions. >> >> My guess is that, for a regular user, this bug rarely manifests itself and >> it may be there for a long time. >> >> I would like to figure out exactly where the problem is and I am looking for >> your guidance to get more information about it. > You could try using a network console. Or have someone who is on-site > take a picture of the computer screen when a crash occurs. > > Alan Stern ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: kernel locks due to USB I/O 2020-11-10 23:42 ` Alberto Sentieri @ 2020-11-11 7:51 ` Greg Kroah-Hartman 2020-11-11 15:51 ` Alan Stern 1 sibling, 0 replies; 17+ messages in thread From: Greg Kroah-Hartman @ 2020-11-11 7:51 UTC (permalink / raw) To: Alberto Sentieri; +Cc: Alan Stern, linux-usb On Tue, Nov 10, 2020 at 06:42:17PM -0500, Alberto Sentieri wrote: > 1) The current Ubuntu Kernel is 5.4.0-53. Do you want me to upgrade it to > 5.9, from kernel.org? Or is there a Ubuntu 5.9 package that I can use? It > would be easy to do it If there is a Ubuntu package with 5.9, which I would > install and, after the tests, uninstall. There might be a Ubuntu package, we don't know. The issue is that the Ubuntu kernel you are using is an old one, and we (the community) have no idea what is in there. We can help you if you use a kernel.org release, otherwise you really should get support from Ubuntu as you are relying on them for this type of thing (and probably paying them for support as you are using it in a corporate setting and relying on it, right?) > 2) Why do you believe that 5.9 would solve the problem? I am asking that > because I cannot change the production machine for a test if I cannot go > back to the original state. There is always a risk involved. You should always be able to choose a different kernel version at boot time, no need to ever get rid of your older one in case something goes wrong. There has been lots of changes in the past year of kernel development, as always, from 5.4 to 5.9, so something that we did then might have resolved this. Maybe not, but we don't know if you can't try as to try to remember what we did a year ago is hard, as I am sure you can imagine :) > 3) It is one single thread dealing with all 36 devices. Each device has its > own co-routine (not preemptive), but all co-routines are executed by a > unique thread. Are you sure that these threads are not just getting stalled somehow when a response doesn't come back properly and are not timing out, which your userspace co-routines should be doing if the bus stalls, right? One thing to note, USB hubs are almost never tested with that many devices at once, so you might have to upgrade to one that says it really will work with that many devices, and provide enough power to them. We have seen more low-power issues on USB hubs and connections over the years to know that this is a real issue, and one that the OS can't do much about, but can affect it. > 4) By network console, do you mean ssh? It dies as well when it locks. The > screen is the regular GNOME3 screen and nothing can be seen there. Every > time it locks they send a picture, and I cannot see anything meaningful > there. I am thinking about disabling GNOME3, but I need their blessing for > that. No, netconsole is something else, see the kernel documentation for the full details. thanks, greg k-h ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: kernel locks due to USB I/O 2020-11-10 23:42 ` Alberto Sentieri 2020-11-11 7:51 ` Greg Kroah-Hartman @ 2020-11-11 15:51 ` Alan Stern 2020-11-11 19:31 ` Alberto Sentieri 2020-11-16 16:53 ` Alberto Sentieri 1 sibling, 2 replies; 17+ messages in thread From: Alan Stern @ 2020-11-11 15:51 UTC (permalink / raw) To: Alberto Sentieri; +Cc: Greg Kroah-Hartman, linux-usb On Tue, Nov 10, 2020 at 06:42:17PM -0500, Alberto Sentieri wrote: > 1) The current Ubuntu Kernel is 5.4.0-53. Do you want me to upgrade it to > 5.9, from kernel.org? Or is there a Ubuntu 5.9 package that I can use? It > would be easy to do it If there is a Ubuntu package with 5.9, which I would > install and, after the tests, uninstall. If you want to know what Ubuntu packages are available, you should ask on an Ubuntu mailing list instead of the linux-usb mailing list. > 2) Why do you believe that 5.9 would solve the problem? I am asking that > because I cannot change the production machine for a test if I cannot go > back to the original state. There is always a risk involved. We do not believe that 5.9 will solve the problem -- we have no reason to believe this -- but we could be wrong. In any case it is always best to test with the most up-to-date software available, and 5.9 is the version closest to what we are working on now. > 3) It is one single thread dealing with all 36 devices. Each device has its > own co-routine (not preemptive), but all co-routines are executed by a > unique thread. If everything runs within a single thread, how can adding a semaphore or mutex make any difference? > 4) By network console, do you mean ssh? It dies as well when it locks. The > screen is the regular GNOME3 screen and nothing can be seen there. Every > time it locks they send a picture, and I cannot see anything meaningful > there. I am thinking about disabling GNOME3, but I need their blessing for > that. See https://www.kernel.org/doc/Documentation/networking/netconsole.txt for instructions on netconsole. And when you use it for testing, be sure to set the console log level to a high value. Alan Stern ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: kernel locks due to USB I/O 2020-11-11 15:51 ` Alan Stern @ 2020-11-11 19:31 ` Alberto Sentieri 2020-11-16 16:53 ` Alberto Sentieri 1 sibling, 0 replies; 17+ messages in thread From: Alberto Sentieri @ 2020-11-11 19:31 UTC (permalink / raw) To: Alan Stern; +Cc: Greg Kroah-Hartman, linux-usb On 11/11/20 10:51 AM, Alan Stern wrote: > On Tue, Nov 10, 2020 at 06:42:17PM -0500, Alberto Sentieri wrote: >> 1) The current Ubuntu Kernel is 5.4.0-53. Do you want me to upgrade it to >> 5.9, from kernel.org? Or is there a Ubuntu 5.9 package that I can use? It >> would be easy to do it If there is a Ubuntu package with 5.9, which I would >> install and, after the tests, uninstall. > If you want to know what Ubuntu packages are available, you should ask > on an Ubuntu mailing list instead of the linux-usb mailing list. I am sorry to be asking about Ubuntu. For some reason I imagined I was exchanging emails with Ubuntu guys, but now I understand that you are from the kernel.org. > >> 2) Why do you believe that 5.9 would solve the problem? I am asking that >> because I cannot change the production machine for a test if I cannot go >> back to the original state. There is always a risk involved. > We do not believe that 5.9 will solve the problem -- we have no reason > to believe this -- but we could be wrong. In any case it is always > best to test with the most up-to-date software available, and 5.9 is the > version closest to what we are working on now. I will try kernel 5.9 soon... > >> 3) It is one single thread dealing with all 36 devices. Each device has its >> own co-routine (not preemptive), but all co-routines are executed by a >> unique thread. > If everything runs within a single thread, how can adding a semaphore > or mutex make any difference? The semaphore will block a co-routine, not a thread. It is not the type of semaphore C programmers are used to. So, before the introduction of the semaphore, a sequence like that would happen: request packet device 1 URB submit request packet device 2 URB submit ... request packet device 36 URB submit wait on epoll after submitting 36 URBs, one for each device. reap URBs, receive response packets, send confirmation packets (basically run the state machine, each device has its own state) After the semaphore, a sequence like that would happen: lock the semaphore send request packet device 1 (URB submit) wait on epoll reap URB with device 1 response packet submit URB with device 1 confirmation wait on epoll reap URB submitted on last step unlock the semaphore Now go to the next device, which was waiting on the co-routine semaphore. The main difference is that I would not submit 36 URBs to 36 different devices at the same time. The submission of 36 URBs would make the devices start responding as soon as they get ready and receive a pool. > >> 4) By network console, do you mean ssh? It dies as well when it locks. The >> screen is the regular GNOME3 screen and nothing can be seen there. Every >> time it locks they send a picture, and I cannot see anything meaningful >> there. I am thinking about disabling GNOME3, but I need their blessing for >> that. > See https://www.kernel.org/doc/Documentation/networking/netconsole.txt > for instructions on netconsole. And when you use it for testing, be > sure to set the console log level to a high value. > > Alan Stern I will try kernel 5.9. However, it seems that the 5.3 kernel gets lost when too many submits / reaps start happening very close to each other. Thanks, Alberto Sentieri ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: kernel locks due to USB I/O 2020-11-11 15:51 ` Alan Stern 2020-11-11 19:31 ` Alberto Sentieri @ 2020-11-16 16:53 ` Alberto Sentieri 2020-11-16 17:06 ` Alan Stern 1 sibling, 1 reply; 17+ messages in thread From: Alberto Sentieri @ 2020-11-16 16:53 UTC (permalink / raw) To: Alan Stern; +Cc: Greg Kroah-Hartman, linux-usb The objective of this email is to report the current status of my findings. I loaded netconsole on both machines I was having problems with. I tried 3 times on the machine with kernel 5.0.0-37 and twice with on the machine with kernel 5.3.0-62. Each attempt consisted of running the program which lock the kernel until it locked (about 3 minutes after stating the program). The referred program had the "semphore code" commented out. Nothing was sent to netconsole on all the 5 attempts I made when the kernel locked. Just to be clear about my use of netconsole, before loading the netconsole kernel module, I ran "dmesg -n 8". When netconsole module was loaded I could clearly see about 9 message lines on the computer receiving the netconsole messages telling me that netconsole was loaded (and how it was configured), so no doubts about the correct netconsole setup. The "netconsole server" was a machine on the same local network. My next attempt will be to compile kernel 5.9, as you suggest, and try it. Thanks, Alberto Sentieri On 11/11/20 10:51 AM, Alan Stern wrote: > On Tue, Nov 10, 2020 at 06:42:17PM -0500, Alberto Sentieri wrote: >> 1) The current Ubuntu Kernel is 5.4.0-53. Do you want me to upgrade it to >> 5.9, from kernel.org? Or is there a Ubuntu 5.9 package that I can use? It >> would be easy to do it If there is a Ubuntu package with 5.9, which I would >> install and, after the tests, uninstall. > If you want to know what Ubuntu packages are available, you should ask > on an Ubuntu mailing list instead of the linux-usb mailing list. > >> 2) Why do you believe that 5.9 would solve the problem? I am asking that >> because I cannot change the production machine for a test if I cannot go >> back to the original state. There is always a risk involved. > We do not believe that 5.9 will solve the problem -- we have no reason > to believe this -- but we could be wrong. In any case it is always > best to test with the most up-to-date software available, and 5.9 is the > version closest to what we are working on now. > >> 3) It is one single thread dealing with all 36 devices. Each device has its >> own co-routine (not preemptive), but all co-routines are executed by a >> unique thread. > If everything runs within a single thread, how can adding a semaphore > or mutex make any difference? > >> 4) By network console, do you mean ssh? It dies as well when it locks. The >> screen is the regular GNOME3 screen and nothing can be seen there. Every >> time it locks they send a picture, and I cannot see anything meaningful >> there. I am thinking about disabling GNOME3, but I need their blessing for >> that. > See https://www.kernel.org/doc/Documentation/networking/netconsole.txt > for instructions on netconsole. And when you use it for testing, be > sure to set the console log level to a high value. > > Alan Stern ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: kernel locks due to USB I/O 2020-11-16 16:53 ` Alberto Sentieri @ 2020-11-16 17:06 ` Alan Stern 2020-11-16 18:42 ` Alberto Sentieri 0 siblings, 1 reply; 17+ messages in thread From: Alan Stern @ 2020-11-16 17:06 UTC (permalink / raw) To: Alberto Sentieri; +Cc: Greg Kroah-Hartman, linux-usb On Mon, Nov 16, 2020 at 11:53:38AM -0500, Alberto Sentieri wrote: > The objective of this email is to report the current status of my findings. > > I loaded netconsole on both machines I was having problems with. I tried 3 > times on the machine with kernel 5.0.0-37 and twice with on the machine with > kernel 5.3.0-62. Each attempt consisted of running the program which lock > the kernel until it locked (about 3 minutes after stating the program). The > referred program had the "semphore code" commented out. Nothing was sent to > netconsole on all the 5 attempts I made when the kernel locked. > > Just to be clear about my use of netconsole, before loading the netconsole > kernel module, I ran "dmesg -n 8". When netconsole module was loaded I could > clearly see about 9 message lines on the computer receiving the netconsole > messages telling me that netconsole was loaded (and how it was configured), > so no doubts about the correct netconsole setup. The "netconsole server" was > a machine on the same local network. > > My next attempt will be to compile kernel 5.9, as you suggest, and try it. While this may not generate any useful information, one way to create a bunch of log output while running your test is to set the usbfs_snoop module parameter for usbcore to true. You can also enable dynamic debugging for usbcore, although in a stable environment like yours it probably won't produce much output. Alan Stern ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: kernel locks due to USB I/O 2020-11-16 17:06 ` Alan Stern @ 2020-11-16 18:42 ` Alberto Sentieri 2020-11-19 17:22 ` Alan Stern 0 siblings, 1 reply; 17+ messages in thread From: Alberto Sentieri @ 2020-11-16 18:42 UTC (permalink / raw) To: Alan Stern; +Cc: Greg Kroah-Hartman, linux-usb Alan, I am not sure if that would give you any clue, but I enabled usbfs_snoop in the kernel and I got URB error -108 (endpoint shutdown) just before the kernel lock. Just after starting the program, I got something like that: [ 174.325977] usb 1-1.1.1: opened by process 2961: ate [ 174.326006] usb 1-1.1.1: usbdev_do_ioctl: GETDRIVER [ 174.326015] usb 1-1.1.1: usbdev_do_ioctl: IOCTL [ 174.326334] usb 1-1.1.1: usbdev_do_ioctl: CLAIMINTERFACE [ 174.326371] usb 1-1.1.1: usbdev_do_ioctl: SETINTERFACE [ 174.326582] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB [ 174.326592] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64 [ 174.326617] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB [ 174.326623] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64 [ 174.326629] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB [ 174.326635] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64 [ 174.326640] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB [ 174.326645] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64 [ 174.326652] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB [ 174.326657] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64 [ 174.326663] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB [ 174.326668] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64 [ 174.326674] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB [ 174.326679] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64 [ 174.326685] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB [ 174.326691] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64 [ 174.326696] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB [ 174.326702] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64 [ 174.326708] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB [ 174.326713] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64 [ 174.326718] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB [ 174.326723] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64 [ 174.326729] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB [ 174.326734] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64 [ 174.326740] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB [ 174.326745] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64 [ 174.326751] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB [ 174.326756] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64 [ 174.326762] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB [ 174.326767] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64 [ 174.326773] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB [ 174.326783] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64 [ 174.327565] usb 1-1.3.1: opened by process 2961: ate [ 174.327584] usb 1-1.3.1: usbdev_do_ioctl: GETDRIVER [ 174.327593] usb 1-1.3.1: usbdev_do_ioctl: IOCTL [ 174.327855] usb 1-1.3.1: usbdev_do_ioctl: CLAIMINTERFACE [ 174.327881] usb 1-1.3.1: usbdev_do_ioctl: SETINTERFACE [ 174.328180] usb 1-1.3.1: usbdev_do_ioctl: SUBMITURB [ 174.328188] usb 1-1.3.1: userurb 0000000000000000, ep1 int-in, length 64 [ 174.328205] usb 1-1.3.1: usbdev_do_ioctl: SUBMITURB During the program execution, I got many lines like that: [ 178.537958] usb 1-1.4.1: urb complete [ 178.537975] usb 1-1.4.1: userurb pK-error, ep1 int-out, actual_length 64 status 0 [ 178.538000] usb 1-1.2.1: urb complete [ 178.538006] usb 1-1.2.1: userurb pK-error, ep1 int-out, actual_length 64 status 0 I am not sure if pK-error has a special meaning there. However, just before the lock, I got this sequence: [ 387.326795] usb 1-1.5.3.1: usbdev_do_ioctl: SUBMITURB [ 387.326801] usb 1-1.5.3.1: userurb 0000000000000000, ep1 int-in, length 64 [ 387.327531] usb 1-1.5.3.1: urb complete [ 387.327539] usb 1-1.5.3.1: userurb pK-error, ep1 int-out, actual_length 64 status 0 [ 387.327582] usb 1-1.5.3.1: usbdev_do_ioctl: REAPURBNDELAY [ 387.327590] usb 1-1.5.3.1: reap 0000000000000000 [ 387.327598] usb 1-1.5.3.1: usbdev_do_ioctl: REAPURBNDELAY [ 388.109400] usb 3-1.5.7.1: usbdev_do_ioctl: REAPURBNDELAY [ 388.110032] usb 3-1.6.7.1: usbdev_do_ioctl: REAPURBNDELAY [ 388.110169] usb 3-1.5.5.1: usbdev_do_ioctl: REAPURBNDELAY [ 388.110605] usb 3-1.6.5.1: usbdev_do_ioctl: REAPURBNDELAY [ 388.111394] usb 3-1.1.1: usbdev_do_ioctl: REAPURBNDELAY [ 388.403278] usb 3-1.6.4.1: usbdev_do_ioctl: REAPURBNDELAY [ 388.404011] usb 3-1.5.1.1: usbdev_do_ioctl: REAPURBNDELAY [ 388.404344] usb 3-1.6.3.1: usbdev_do_ioctl: REAPURBNDELAY [ 388.404649] usb 3-1.4.1: usbdev_do_ioctl: REAPURBNDELAY [ 388.404790] usb 3-1.3.1: usbdev_do_ioctl: REAPURBNDELAY [ 388.405283] usb 3-1.5.3.1: usbdev_do_ioctl: REAPURBNDELAY [ 388.407828] usb 3-1.6.1.1: usbdev_do_ioctl: REAPURBNDELAY [ 388.408771] usb 3-1.5.4.1: usbdev_do_ioctl: REAPURBNDELAY [ 388.409155] usb 3-1.5.2.1: usbdev_do_ioctl: REAPURBNDELAY [ 388.409452] usb 3-1.5.6.1: usbdev_do_ioctl: REAPURBNDELAY [ 388.409743] usb 3-1.6.6.1: usbdev_do_ioctl: REAPURBNDELAY [ 388.409848] usb 3-1.2.1: usbdev_do_ioctl: REAPURBNDELAY [ 388.410840] usb 3-1.6.2.1: usbdev_do_ioctl: REAPURBNDELAY [ 389.612597] usb 3-1.5.7.1: opened by process 2961: ate [ 389.612622] usb 3-1.5.7.1: usbdev_do_ioctl: GETDRIVER [ 389.612631] usb 3-1.5.7.1: usbdev_do_ioctl: IOCTL [ 389.614965] usb 3-1.5.7.1: urb complete [ 389.614981] usb 3-1.5.7.1: userurb pK-error, ep1 int-in, actual_length 0 status -108 [ 389.614989] usb 3-1.5.7.1: urb complete [ 389.614994] usb 3-1.5.7.1: userurb pK-error, ep1 int-in, actual_length 0 status -108 [ 389.614999] usb 3-1.5.7.1: urb complete [ 389.615003] usb 3-1.5.7.1: userurb pK-error, ep1 int-in, actual_length 0 status -108 [ 389.615008] usb 3-1.5.7.1: urb complete [ 389.615013] usb 3-1.5.7.1: userurb pK-error, ep1 int-in, actual_length 0 status -108 [ 389.615018] usb 3-1.5.7.1: urb complete [ 389.615022] usb 3-1.5.7.1: userurb pK-error, ep1 int-in, actual_length 0 status -108 [ 389.615027] usb 3-1.5.7.1: urb complete [ 389.615032] usb 3-1.5.7.1: userurb pK-error, ep1 int-in, actual_length 0 status -108 [ 389.615037] usb 3-1.5.7.1: urb complete [ 389.615042] usb 3-1.5.7.1: userurb pK-error, ep1 int-in, actual_length 0 status -108 [ 389.615047] usb 3-1.5.7.1: urb complete [ 389.615051] usb 3-1.5.7.1: userurb pK-error, ep1 int-in, actual_length 0 status -108 [ 389.615056] usb 3-1.5.7.1: urb complete [ 389.615060] usb 3-1.5.7.1: userurb pK-error, ep1 int-in, actual_length 0 status -108 [ 389.615065] usb 3-1.5.7.1: urb complete [ 389.615070] usb 3-1.5.7.1: userurb pK-error, ep1 int-in, actual_length 0 status -108 [ 389.615074] usb 3-1.5.7.1: urb complete [ 389.615079] usb 3-1.5.7.1: userurb pK-error, ep1 int-in, actual_length 0 status -108 [ 389.615084] usb 3-1.5.7.1: urb complete [ 389.615089] usb 3-1.5.7.1: userurb pK-error, ep1 int-in, actual_length 0 status -108 [ 389.615094] usb 3-1.5.7.1: urb complete [ 389.615098] usb 3-1.5.7.1: userurb pK-error, ep1 int-in, actual_length 0 status -108 [ 389.615103] usb 3-1.5.7.1: urb complete [ 389.615108] usb 3-1.5.7.1: userurb pK-error, ep1 int-in, actual_length 0 status -108 [ 389.615112] usb 3-1.5.7.1: urb complete [ 389.615117] usb 3-1.5.7.1: userurb pK-error, ep1 int-in, actual_length 0 status -108 [ 389.615121] usb 3-1.5.7.1: urb complete [ 389.615126] usb 3-1.5.7.1: userurb pK-error, ep1 int-in, actual_length 0 status -108 [ 389.616823] usb 3-1.5.7.1: urb complete [ 389.616839] usb 3-1.5.7.1: userurb pK-error, ep1 int-out, actual_length 0 status -108 [ 389.616926] usb 3-1.5.7.1: usbdev_do_ioctl: CLAIMINTERFACE [ 389.616964] usb 3-1.5.7.1: usbdev_do_ioctl: SETINTERFACE As you can see, it locked in 215 seconds, in this test. Do you see anything special in this sequence? Thanks, Alberto Sentieri On 11/16/20 12:06 PM, Alan Stern wrote: > On Mon, Nov 16, 2020 at 11:53:38AM -0500, Alberto Sentieri wrote: >> The objective of this email is to report the current status of my findings. >> >> I loaded netconsole on both machines I was having problems with. I tried 3 >> times on the machine with kernel 5.0.0-37 and twice with on the machine with >> kernel 5.3.0-62. Each attempt consisted of running the program which lock >> the kernel until it locked (about 3 minutes after stating the program). The >> referred program had the "semphore code" commented out. Nothing was sent to >> netconsole on all the 5 attempts I made when the kernel locked. >> >> Just to be clear about my use of netconsole, before loading the netconsole >> kernel module, I ran "dmesg -n 8". When netconsole module was loaded I could >> clearly see about 9 message lines on the computer receiving the netconsole >> messages telling me that netconsole was loaded (and how it was configured), >> so no doubts about the correct netconsole setup. The "netconsole server" was >> a machine on the same local network. >> >> My next attempt will be to compile kernel 5.9, as you suggest, and try it. > While this may not generate any useful information, one way to create a > bunch of log output while running your test is to set the usbfs_snoop > module parameter for usbcore to true. > > You can also enable dynamic debugging for usbcore, although in a stable > environment like yours it probably won't produce much output. > > Alan Stern ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: kernel locks due to USB I/O 2020-11-16 18:42 ` Alberto Sentieri @ 2020-11-19 17:22 ` Alan Stern 2020-11-19 18:50 ` Alberto Sentieri 2020-11-19 19:21 ` Alberto Sentieri 0 siblings, 2 replies; 17+ messages in thread From: Alan Stern @ 2020-11-19 17:22 UTC (permalink / raw) To: Alberto Sentieri; +Cc: Greg Kroah-Hartman, linux-usb On Mon, Nov 16, 2020 at 01:42:46PM -0500, Alberto Sentieri wrote: > Alan, > > I am not sure if that would give you any clue, but I enabled usbfs_snoop in > the kernel and I got URB error -108 (endpoint shutdown) just before the > kernel lock. > > Just after starting the program, I got something like that: > > [ 174.325977] usb 1-1.1.1: opened by process 2961: ate > [ 174.326006] usb 1-1.1.1: usbdev_do_ioctl: GETDRIVER > [ 174.326015] usb 1-1.1.1: usbdev_do_ioctl: IOCTL > [ 174.326334] usb 1-1.1.1: usbdev_do_ioctl: CLAIMINTERFACE > [ 174.326371] usb 1-1.1.1: usbdev_do_ioctl: SETINTERFACE > [ 174.326582] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB > [ 174.326592] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64 > [ 174.326617] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB > [ 174.326623] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64 > [ 174.326629] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB > [ 174.326635] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64 > [ 174.326640] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB > [ 174.326645] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64 > [ 174.326652] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB > [ 174.326657] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64 > [ 174.326663] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB > [ 174.326668] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64 > [ 174.326674] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB > [ 174.326679] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64 > [ 174.326685] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB > [ 174.326691] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64 > [ 174.326696] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB > [ 174.326702] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64 > [ 174.326708] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB > [ 174.326713] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64 > [ 174.326718] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB > [ 174.326723] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64 > [ 174.326729] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB > [ 174.326734] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64 > [ 174.326740] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB > [ 174.326745] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64 > [ 174.326751] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB > [ 174.326756] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64 > [ 174.326762] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB > [ 174.326767] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64 > [ 174.326773] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB > [ 174.326783] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64 > [ 174.327565] usb 1-1.3.1: opened by process 2961: ate > [ 174.327584] usb 1-1.3.1: usbdev_do_ioctl: GETDRIVER > [ 174.327593] usb 1-1.3.1: usbdev_do_ioctl: IOCTL > [ 174.327855] usb 1-1.3.1: usbdev_do_ioctl: CLAIMINTERFACE > [ 174.327881] usb 1-1.3.1: usbdev_do_ioctl: SETINTERFACE > [ 174.328180] usb 1-1.3.1: usbdev_do_ioctl: SUBMITURB > [ 174.328188] usb 1-1.3.1: userurb 0000000000000000, ep1 int-in, length 64 > [ 174.328205] usb 1-1.3.1: usbdev_do_ioctl: SUBMITURB Can you post the output from "lsusb -t" for your test system? The device numbers aren't what I expected to see. Not wrong, just unexpected. Apart from that, this looks like normal initialization. Why does the program submit 16 URBs all for the same endpoint without waiting? Wouldn't two or three be enough? > During the program execution, I got many lines like that: > > [ 178.537958] usb 1-1.4.1: urb complete > [ 178.537975] usb 1-1.4.1: userurb pK-error, ep1 int-out, > actual_length 64 status 0 > [ 178.538000] usb 1-1.2.1: urb complete > [ 178.538006] usb 1-1.2.1: userurb pK-error, ep1 int-out, > actual_length 64 status 0 > > I am not sure if pK-error has a special meaning there. It's an error in the debugging output. I have just submitted a patch to fix the error (see <https://marc.info/?l=linux-usb&m=160580539818074&w=2>). In any case, it should be irrelevant for your problem. > However, just before the lock, I got this sequence: > > [ 387.326795] usb 1-1.5.3.1: usbdev_do_ioctl: SUBMITURB > [ 387.326801] usb 1-1.5.3.1: userurb 0000000000000000, ep1 int-in, length > 64 > [ 387.327531] usb 1-1.5.3.1: urb complete > [ 387.327539] usb 1-1.5.3.1: userurb pK-error, ep1 int-out, > actual_length 64 status 0 > [ 387.327582] usb 1-1.5.3.1: usbdev_do_ioctl: REAPURBNDELAY > [ 387.327590] usb 1-1.5.3.1: reap 0000000000000000 > [ 387.327598] usb 1-1.5.3.1: usbdev_do_ioctl: REAPURBNDELAY > [ 388.109400] usb 3-1.5.7.1: usbdev_do_ioctl: REAPURBNDELAY > [ 388.110032] usb 3-1.6.7.1: usbdev_do_ioctl: REAPURBNDELAY > [ 388.110169] usb 3-1.5.5.1: usbdev_do_ioctl: REAPURBNDELAY > [ 388.110605] usb 3-1.6.5.1: usbdev_do_ioctl: REAPURBNDELAY > [ 388.111394] usb 3-1.1.1: usbdev_do_ioctl: REAPURBNDELAY > [ 388.403278] usb 3-1.6.4.1: usbdev_do_ioctl: REAPURBNDELAY > [ 388.404011] usb 3-1.5.1.1: usbdev_do_ioctl: REAPURBNDELAY > [ 388.404344] usb 3-1.6.3.1: usbdev_do_ioctl: REAPURBNDELAY > [ 388.404649] usb 3-1.4.1: usbdev_do_ioctl: REAPURBNDELAY > [ 388.404790] usb 3-1.3.1: usbdev_do_ioctl: REAPURBNDELAY > [ 388.405283] usb 3-1.5.3.1: usbdev_do_ioctl: REAPURBNDELAY > [ 388.407828] usb 3-1.6.1.1: usbdev_do_ioctl: REAPURBNDELAY > [ 388.408771] usb 3-1.5.4.1: usbdev_do_ioctl: REAPURBNDELAY > [ 388.409155] usb 3-1.5.2.1: usbdev_do_ioctl: REAPURBNDELAY > [ 388.409452] usb 3-1.5.6.1: usbdev_do_ioctl: REAPURBNDELAY > [ 388.409743] usb 3-1.6.6.1: usbdev_do_ioctl: REAPURBNDELAY > [ 388.409848] usb 3-1.2.1: usbdev_do_ioctl: REAPURBNDELAY > [ 388.410840] usb 3-1.6.2.1: usbdev_do_ioctl: REAPURBNDELAY This rapid sequence of reap requests with no completions or actual reaps is odd. Is this the sort of thing you expect your program to do? Maybe you can add some debugging output to the program. Also, can you turn on dynamic debugging for usbcore before starting the test? echo 'module usbcore =p' >/sys/kernel/debug/dynamic_debug/control > [ 389.612597] usb 3-1.5.7.1: opened by process 2961: ate > [ 389.612622] usb 3-1.5.7.1: usbdev_do_ioctl: GETDRIVER > [ 389.612631] usb 3-1.5.7.1: usbdev_do_ioctl: IOCTL > [ 389.614965] usb 3-1.5.7.1: urb complete > [ 389.614981] usb 3-1.5.7.1: userurb pK-error, ep1 int-in, > actual_length 0 status -108 > [ 389.614989] usb 3-1.5.7.1: urb complete > [ 389.614994] usb 3-1.5.7.1: userurb pK-error, ep1 int-in, > actual_length 0 status -108 The -108 erroro codes are ESHUTDOWN. They indicate that the device was disconnected or something like that, although I don't see any other indication of this in the log. > [ 389.616926] usb 3-1.5.7.1: usbdev_do_ioctl: CLAIMINTERFACE > [ 389.616964] usb 3-1.5.7.1: usbdev_do_ioctl: SETINTERFACE And if the device had been disconnected, these calls would have failed. > As you can see, it locked in 215 seconds, in this test. Do you see anything > special in this sequence? No, but several things are puzzling. But there's nothing that indicates any reason for the system to freeze. Alan Stern ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: kernel locks due to USB I/O 2020-11-19 17:22 ` Alan Stern @ 2020-11-19 18:50 ` Alberto Sentieri 2020-11-19 20:01 ` Alan Stern 2020-11-19 19:21 ` Alberto Sentieri 1 sibling, 1 reply; 17+ messages in thread From: Alberto Sentieri @ 2020-11-19 18:50 UTC (permalink / raw) To: Alan Stern; +Cc: Greg Kroah-Hartman, linux-usb Alan, I have to get the proper blessing to do more tests, and I will work on them. Meanwhile, my comments are below. Thanks for your attention, Alberto On 11/19/20 12:22 PM, Alan Stern wrote: > On Mon, Nov 16, 2020 at 01:42:46PM -0500, Alberto Sentieri wrote: >> Alan, >> >> I am not sure if that would give you any clue, but I enabled usbfs_snoop in >> the kernel and I got URB error -108 (endpoint shutdown) just before the >> kernel lock. >> >> Just after starting the program, I got something like that: >> >> [ 174.325977] usb 1-1.1.1: opened by process 2961: ate >> [ 174.326006] usb 1-1.1.1: usbdev_do_ioctl: GETDRIVER >> [ 174.326015] usb 1-1.1.1: usbdev_do_ioctl: IOCTL >> [ 174.326334] usb 1-1.1.1: usbdev_do_ioctl: CLAIMINTERFACE >> [ 174.326371] usb 1-1.1.1: usbdev_do_ioctl: SETINTERFACE >> [ 174.326582] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB >> [ 174.326592] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64 >> [ 174.326617] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB >> [ 174.326623] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64 >> [ 174.326629] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB >> [ 174.326635] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64 >> [ 174.326640] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB >> [ 174.326645] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64 >> [ 174.326652] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB >> [ 174.326657] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64 >> [ 174.326663] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB >> [ 174.326668] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64 >> [ 174.326674] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB >> [ 174.326679] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64 >> [ 174.326685] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB >> [ 174.326691] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64 >> [ 174.326696] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB >> [ 174.326702] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64 >> [ 174.326708] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB >> [ 174.326713] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64 >> [ 174.326718] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB >> [ 174.326723] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64 >> [ 174.326729] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB >> [ 174.326734] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64 >> [ 174.326740] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB >> [ 174.326745] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64 >> [ 174.326751] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB >> [ 174.326756] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64 >> [ 174.326762] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB >> [ 174.326767] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64 >> [ 174.326773] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB >> [ 174.326783] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64 >> [ 174.327565] usb 1-1.3.1: opened by process 2961: ate >> [ 174.327584] usb 1-1.3.1: usbdev_do_ioctl: GETDRIVER >> [ 174.327593] usb 1-1.3.1: usbdev_do_ioctl: IOCTL >> [ 174.327855] usb 1-1.3.1: usbdev_do_ioctl: CLAIMINTERFACE >> [ 174.327881] usb 1-1.3.1: usbdev_do_ioctl: SETINTERFACE >> [ 174.328180] usb 1-1.3.1: usbdev_do_ioctl: SUBMITURB >> [ 174.328188] usb 1-1.3.1: userurb 0000000000000000, ep1 int-in, length 64 >> [ 174.328205] usb 1-1.3.1: usbdev_do_ioctl: SUBMITURB > Can you post the output from "lsusb -t" for your test system? The > device numbers aren't what I expected to see. Not wrong, just > unexpected. I will run lsusb -t when I get access to the system. But the numbers are correct. Each device has a 2-port USB hub which is connected to the USB bus, just before the micro-controller. That is the reason for all of them ending in .1. The micro-controller is connected to port 1 of the 2-port hub. > > Apart from that, this looks like normal initialization. Why does the > program submit 16 URBs all for the same endpoint without waiting? > Wouldn't two or three be enough? I am using simple commands (to the device) and the responses fit in a 64-byte HID packet. But there are more complex commands, which may return many packets. All submitted input URBs are there because, in some cases, the worker main loop can take many milliseconds to start reaping the URBs after they arrive. >> During the program execution, I got many lines like that: >> >> [ 178.537958] usb 1-1.4.1: urb complete >> [ 178.537975] usb 1-1.4.1: userurb pK-error, ep1 int-out, >> actual_length 64 status 0 >> [ 178.538000] usb 1-1.2.1: urb complete >> [ 178.538006] usb 1-1.2.1: userurb pK-error, ep1 int-out, >> actual_length 64 status 0 >> >> I am not sure if pK-error has a special meaning there. > It's an error in the debugging output. I have just submitted a patch > to fix the error (see > <https://marc.info/?l=linux-usb&m=160580539818074&w=2>). In any case, > it should be irrelevant for your problem. Thanks. >> However, just before the lock, I got this sequence: >> >> [ 387.326795] usb 1-1.5.3.1: usbdev_do_ioctl: SUBMITURB >> [ 387.326801] usb 1-1.5.3.1: userurb 0000000000000000, ep1 int-in, length >> 64 >> [ 387.327531] usb 1-1.5.3.1: urb complete >> [ 387.327539] usb 1-1.5.3.1: userurb pK-error, ep1 int-out, >> actual_length 64 status 0 >> [ 387.327582] usb 1-1.5.3.1: usbdev_do_ioctl: REAPURBNDELAY >> [ 387.327590] usb 1-1.5.3.1: reap 0000000000000000 >> [ 387.327598] usb 1-1.5.3.1: usbdev_do_ioctl: REAPURBNDELAY >> [ 388.109400] usb 3-1.5.7.1: usbdev_do_ioctl: REAPURBNDELAY >> [ 388.110032] usb 3-1.6.7.1: usbdev_do_ioctl: REAPURBNDELAY >> [ 388.110169] usb 3-1.5.5.1: usbdev_do_ioctl: REAPURBNDELAY >> [ 388.110605] usb 3-1.6.5.1: usbdev_do_ioctl: REAPURBNDELAY >> [ 388.111394] usb 3-1.1.1: usbdev_do_ioctl: REAPURBNDELAY >> [ 388.403278] usb 3-1.6.4.1: usbdev_do_ioctl: REAPURBNDELAY >> [ 388.404011] usb 3-1.5.1.1: usbdev_do_ioctl: REAPURBNDELAY >> [ 388.404344] usb 3-1.6.3.1: usbdev_do_ioctl: REAPURBNDELAY >> [ 388.404649] usb 3-1.4.1: usbdev_do_ioctl: REAPURBNDELAY >> [ 388.404790] usb 3-1.3.1: usbdev_do_ioctl: REAPURBNDELAY >> [ 388.405283] usb 3-1.5.3.1: usbdev_do_ioctl: REAPURBNDELAY >> [ 388.407828] usb 3-1.6.1.1: usbdev_do_ioctl: REAPURBNDELAY >> [ 388.408771] usb 3-1.5.4.1: usbdev_do_ioctl: REAPURBNDELAY >> [ 388.409155] usb 3-1.5.2.1: usbdev_do_ioctl: REAPURBNDELAY >> [ 388.409452] usb 3-1.5.6.1: usbdev_do_ioctl: REAPURBNDELAY >> [ 388.409743] usb 3-1.6.6.1: usbdev_do_ioctl: REAPURBNDELAY >> [ 388.409848] usb 3-1.2.1: usbdev_do_ioctl: REAPURBNDELAY >> [ 388.410840] usb 3-1.6.2.1: usbdev_do_ioctl: REAPURBNDELAY > This rapid sequence of reap requests with no completions or actual > reaps is odd. Is this the sort of thing you expect your program to > do? Maybe you can add some debugging output to the program. What apparently happened before that (but unfortunately I did not capture it this time) is that the particular device return errno -71 in some URBs, without de-enumerating. However, I am only seeing userurbs with status 0 in my the netconsole capture, before the ones with status -108 (just before the lock). So, if error -71 happened, all URBs are reaped and the devices are then closed. Usually I see error -71 in the program log happening for a few devices at the same time. After closing the interface, the program opens it again. The return from open should be a valid file descriptor, otherwise it would fail. After opening the device successfully it starts submitting urbs, and error -108 is coming back, telling me that the device just opened is not open. It just came to my attention now: maybe something to be investigated is what the kernel does if a interface is closed due to a URB error and immediately opened again. > Also, can you turn on dynamic debugging for usbcore before starting > the test? > > echo 'module usbcore =p' >/sys/kernel/debug/dynamic_debug/control > >> [ 389.612597] usb 3-1.5.7.1: opened by process 2961: ate >> [ 389.612622] usb 3-1.5.7.1: usbdev_do_ioctl: GETDRIVER >> [ 389.612631] usb 3-1.5.7.1: usbdev_do_ioctl: IOCTL >> [ 389.614965] usb 3-1.5.7.1: urb complete >> [ 389.614981] usb 3-1.5.7.1: userurb pK-error, ep1 int-in, >> actual_length 0 status -108 >> [ 389.614989] usb 3-1.5.7.1: urb complete >> [ 389.614994] usb 3-1.5.7.1: userurb pK-error, ep1 int-in, >> actual_length 0 status -108 > The -108 erroro codes are ESHUTDOWN. They indicate that the device > was disconnected or something like that, although I don't see any > other indication of this in the log. > >> [ 389.616926] usb 3-1.5.7.1: usbdev_do_ioctl: CLAIMINTERFACE >> [ 389.616964] usb 3-1.5.7.1: usbdev_do_ioctl: SETINTERFACE > And if the device had been disconnected, these calls would have failed. > >> As you can see, it locked in 215 seconds, in this test. Do you see anything >> special in this sequence? > No, but several things are puzzling. But there's nothing that > indicates any reason for the system to freeze. > > Alan Stern ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: kernel locks due to USB I/O 2020-11-19 18:50 ` Alberto Sentieri @ 2020-11-19 20:01 ` Alan Stern [not found] ` <4f8f545e-4846-45e0-b8f8-5c73876b150a@tripolho.com> 0 siblings, 1 reply; 17+ messages in thread From: Alan Stern @ 2020-11-19 20:01 UTC (permalink / raw) To: Alberto Sentieri; +Cc: Greg Kroah-Hartman, linux-usb On Thu, Nov 19, 2020 at 01:50:13PM -0500, Alberto Sentieri wrote: > > > However, just before the lock, I got this sequence: > > > > > > [ 387.326795] usb 1-1.5.3.1: usbdev_do_ioctl: SUBMITURB > > > [ 387.326801] usb 1-1.5.3.1: userurb 0000000000000000, ep1 int-in, length > > > 64 > > > [ 387.327531] usb 1-1.5.3.1: urb complete > > > [ 387.327539] usb 1-1.5.3.1: userurb pK-error, ep1 int-out, > > > actual_length 64 status 0 > > > [ 387.327582] usb 1-1.5.3.1: usbdev_do_ioctl: REAPURBNDELAY > > > [ 387.327590] usb 1-1.5.3.1: reap 0000000000000000 > > > [ 387.327598] usb 1-1.5.3.1: usbdev_do_ioctl: REAPURBNDELAY > > > [ 388.109400] usb 3-1.5.7.1: usbdev_do_ioctl: REAPURBNDELAY > > > [ 388.110032] usb 3-1.6.7.1: usbdev_do_ioctl: REAPURBNDELAY > > > [ 388.110169] usb 3-1.5.5.1: usbdev_do_ioctl: REAPURBNDELAY > > > [ 388.110605] usb 3-1.6.5.1: usbdev_do_ioctl: REAPURBNDELAY > > > [ 388.111394] usb 3-1.1.1: usbdev_do_ioctl: REAPURBNDELAY > > > [ 388.403278] usb 3-1.6.4.1: usbdev_do_ioctl: REAPURBNDELAY > > > [ 388.404011] usb 3-1.5.1.1: usbdev_do_ioctl: REAPURBNDELAY > > > [ 388.404344] usb 3-1.6.3.1: usbdev_do_ioctl: REAPURBNDELAY > > > [ 388.404649] usb 3-1.4.1: usbdev_do_ioctl: REAPURBNDELAY > > > [ 388.404790] usb 3-1.3.1: usbdev_do_ioctl: REAPURBNDELAY > > > [ 388.405283] usb 3-1.5.3.1: usbdev_do_ioctl: REAPURBNDELAY > > > [ 388.407828] usb 3-1.6.1.1: usbdev_do_ioctl: REAPURBNDELAY > > > [ 388.408771] usb 3-1.5.4.1: usbdev_do_ioctl: REAPURBNDELAY > > > [ 388.409155] usb 3-1.5.2.1: usbdev_do_ioctl: REAPURBNDELAY > > > [ 388.409452] usb 3-1.5.6.1: usbdev_do_ioctl: REAPURBNDELAY > > > [ 388.409743] usb 3-1.6.6.1: usbdev_do_ioctl: REAPURBNDELAY > > > [ 388.409848] usb 3-1.2.1: usbdev_do_ioctl: REAPURBNDELAY > > > [ 388.410840] usb 3-1.6.2.1: usbdev_do_ioctl: REAPURBNDELAY > > This rapid sequence of reap requests with no completions or actual > > reaps is odd. Is this the sort of thing you expect your program to > > do? Maybe you can add some debugging output to the program. > What apparently happened before that (but unfortunately I did not capture it > this time) is that the particular device return errno -71 in some URBs, > without de-enumerating. That's actually quite bad. -71 (-EPROTO) indicates a serious error in the USB protocol. Most often it means that the kernel did not receive a reply packet from the device -- this could be caused by the device's firmware crashing or a USB cable failing. In a well-functioning system it should never occur. > However, I am only seeing userurbs with status 0 in > my the netconsole capture, before the ones with status -108 (just before the > lock). So, if error -71 happened, all URBs are reaped and the devices are > then closed. Usually I see error -71 in the program log happening for a few > devices at the same time. That probably means there's something wrong with a hub they are all connected to. But of course, a problem in a hub wouldn't explain why the whole system freezes. > After closing the interface, the program opens it > again. The return from open should be a valid file descriptor, otherwise it > would fail. After opening the device successfully it starts submitting urbs, > and error -108 is coming back, telling me that the device just opened is not > open. It means that the device is no longer present. Perhaps it was logically (if not physically) disconnected after the open but before the URBs were submitted. > It just came to my attention now: maybe something to be investigated is what > the kernel does if a interface is closed due to a URB error and immediately > opened again. In itself, that should work perfectly. But whatever caused the USB error could go on to cause other things to go wrong. It might help to see a usbmon trace. You merely have to send the data from /sys/kernel/debug/usb/usbmon/1u (or 3u, or even both -- the number refers to the USB bus being monitored) through a UDP pipe, using netcat or something similar, to another computer which could safely store the output. Alan Stern ^ permalink raw reply [flat|nested] 17+ messages in thread
[parent not found: <4f8f545e-4846-45e0-b8f8-5c73876b150a@tripolho.com>]
[parent not found: <20201119225144.GA590990@rowland.harvard.edu>]
[parent not found: <3df90f9d-0af2-2aaa-9853-966f99e961a4@tripolho.com>]
* Re: kernel locks due to USB I/O [not found] ` <3df90f9d-0af2-2aaa-9853-966f99e961a4@tripolho.com> @ 2020-12-14 17:18 ` Alan Stern 2020-12-16 22:14 ` Alberto Sentieri 0 siblings, 1 reply; 17+ messages in thread From: Alan Stern @ 2020-12-14 17:18 UTC (permalink / raw) To: Alberto Sentieri; +Cc: Greg Kroah-Hartman, linux-usb On Mon, Dec 14, 2020 at 09:43:25AM -0500, Alberto Sentieri wrote: > Alan, > > I was finally able to assemble (on my desk) a similar environment which can > reproduce the problem. It is locking with only 18 USB devices, which are > being accessed in higher frequency than on the regular application (once > every 3 seconds instead of once every 15 seconds). I was able to reproduce > locks with kernel version 5.9.8, which I compiled myself. > > Now a brief description of what I did. > > Each access I referred to on the first paragraph was composed of 3 64-byte > USB interrupt packets, which are: a request the computer sends, a response > the device sends back, and a confirmation that the computer send to finish > the cycle, which would be repeated in about 3 seconds. > > The computer motherboard I am using has a UART (apparently a 16550) and I > could enable a serial console using: console=tty1 console=ttyS0,115200. I > also would like to point out that ipv6.disable=1 is the other kernel > argument I am using. The serial console was working properly and all its > output was being captured by another computer during the locks. My lab > machine has the same USB expansion card model being used at the production > environment and that was the one being used during the experiments. Though > my lab machine has some extra PCI USB cards installed, they were not being > using during the locks. > > I did the experiment just a few times during the weekend, after receiving > the last parts I needed on Saturday. > > So, with kernel 5.4.0-53-generic, which is one of the kernels released with > Ubuntu, the locked happened in just a couple of minutes. I tried that twice. > With kernel 5.9.8, the lock took more time to happen. Initially I though it > would not happen, because I'd observed the machine running for about 15 > minutes and everything seemed normal. However after a couple of hours I > found the machine locked. I also would like to point out that I’ve been > using this machine for more than 6 months and that it had never locked on > me, so that excludes any hardware problem. > > Unfortunately on the 3 times it locked (2 with 5.4.0 and one with 5.9.8) > nothing was printed on the serial console. > > I will be working on this during the day and I will try to enable some of > the debugging tools you suggested in your previous emails. > > Any comment on this will be highly appreciated. We'll have to wait and see what the tests and debugging tools reveal. As for the comment about the hardware being reliable -- in fact, you can't be certain of this. It may be that your test is stressing the hardware in a way that it never experienced during the prior six months. Alan Stern ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: kernel locks due to USB I/O 2020-12-14 17:18 ` Alan Stern @ 2020-12-16 22:14 ` Alberto Sentieri 0 siblings, 0 replies; 17+ messages in thread From: Alberto Sentieri @ 2020-12-16 22:14 UTC (permalink / raw) To: Alan Stern; +Cc: Greg Kroah-Hartman, linux-usb This is just a quick status update. I am just checking if it rings any bell. I am still doing more tests. The machine where all these problems are happening is a HP Z4 G4, as well as the other two in production. Until this morning it had BIOS v01.76 dated 04/22/2019. Today I update the BIOS to v02.63 dated 10/28/2020. After updating the BIOS, the behavior definitively changed. Now both kernels (5.4.0-53-generic and 5.9.8) reboot unexpectedly instead of locking. After re-enabling the console, expecting to capture something, I could not see a lock or reboot. However the serial console, after displaying the whole boot, locked, and I cannot use it to login. Alberto Sentieri On 12/14/20 12:18 PM, Alan Stern wrote: > We'll have to wait and see what the tests and debugging tools reveal. > As for the comment about the hardware being reliable -- in fact, you > can't be certain of this. It may be that your test is stressing the > hardware in a way that it never experienced during the prior six > months. > > Alan Stern ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: kernel locks due to USB I/O 2020-11-19 17:22 ` Alan Stern 2020-11-19 18:50 ` Alberto Sentieri @ 2020-11-19 19:21 ` Alberto Sentieri 2020-11-19 19:43 ` Alan Stern 1 sibling, 1 reply; 17+ messages in thread From: Alberto Sentieri @ 2020-11-19 19:21 UTC (permalink / raw) To: Alan Stern; +Cc: Greg Kroah-Hartman, linux-usb lsusb -t in a similar configuration I use for development (it has just 6 device, and not 36): $ lsusb -t /: Bus 16.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/2p, 5000M /: Bus 15.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/2p, 480M /: Bus 14.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/2p, 5000M /: Bus 13.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/2p, 480M /: Bus 12.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/2p, 5000M /: Bus 11.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/2p, 480M /: Bus 10.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/2p, 5000M /: Bus 09.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/2p, 480M /: Bus 08.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/10p, 5000M |__ Port 8: Dev 2, If 0, Class=Hub, Driver=hub/4p, 5000M /: Bus 07.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/16p, 480M |__ Port 6: Dev 2, If 0, Class=Hub, Driver=hub/4p, 480M /: Bus 06.Port 1: Dev 1, Class=root_hub, Driver=uhci_hcd/2p, 12M /: Bus 05.Port 1: Dev 1, Class=root_hub, Driver=uhci_hcd/2p, 12M /: Bus 04.Port 1: Dev 1, Class=root_hub, Driver=ohci-pci/2p, 12M /: Bus 03.Port 1: Dev 1, Class=root_hub, Driver=ohci-pci/3p, 12M /: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=ehci-pci/4p, 480M |__ Port 3: Dev 5, If 0, Class=Hub, Driver=hub/7p, 480M |__ Port 1: Dev 6, If 0, Class=Hub, Driver=hub/2p, 480M |__ Port 1: Dev 8, If 0, Class=Human Interface Device, Driver=usbhid, 12M |__ Port 2: Dev 7, If 0, Class=Hub, Driver=hub/2p, 480M |__ Port 1: Dev 10, If 0, Class=Human Interface Device, Driver=usbhid, 12M |__ Port 4: Dev 9, If 0, Class=Hub, Driver=hub/2p, 480M |__ Port 1: Dev 12, If 0, Class=Human Interface Device, Driver=usbhid, 12M |__ Port 5: Dev 11, If 0, Class=Hub, Driver=hub/7p, 480M |__ Port 6: Dev 13, If 0, Class=Hub, Driver=hub/7p, 480M |__ Port 6: Dev 15, If 0, Class=Hub, Driver=hub/2p, 480M |__ Port 1: Dev 17, If 0, Class=Human Interface Device, Driver=usbhid, 12M |__ Port 7: Dev 16, If 0, Class=Hub, Driver=hub/2p, 480M |__ Port 1: Dev 18, If 0, Class=Human Interface Device, Driver=usbhid, 12M |__ Port 7: Dev 14, If 0, Class=Human Interface Device, Driver=usbhid, 12M /: Bus 01.Port 1: Dev 1, Class=root_hub, Driver=ehci-pci/5p, 480M And lspci (I believe we are using the VIA Technologies, Inc controller at the site where we are having locks): $ lspci 00:00.0 Host bridge: Intel Corporation Sky Lake-E DMI3 Registers (rev 04) 00:04.0 System peripheral: Intel Corporation Sky Lake-E CBDMA Registers (rev 04) 00:04.1 System peripheral: Intel Corporation Sky Lake-E CBDMA Registers (rev 04) 00:04.2 System peripheral: Intel Corporation Sky Lake-E CBDMA Registers (rev 04) 00:04.3 System peripheral: Intel Corporation Sky Lake-E CBDMA Registers (rev 04) 00:04.4 System peripheral: Intel Corporation Sky Lake-E CBDMA Registers (rev 04) 00:04.5 System peripheral: Intel Corporation Sky Lake-E CBDMA Registers (rev 04) 00:04.6 System peripheral: Intel Corporation Sky Lake-E CBDMA Registers (rev 04) 00:04.7 System peripheral: Intel Corporation Sky Lake-E CBDMA Registers (rev 04) 00:05.0 System peripheral: Intel Corporation Sky Lake-E MM/Vt-d Configuration Registers (rev 04) 00:05.2 System peripheral: Intel Corporation Sky Lake-E RAS (rev 04) 00:05.4 PIC: Intel Corporation Sky Lake-E IOAPIC (rev 04) 00:08.0 System peripheral: Intel Corporation Sky Lake-E Ubox Registers (rev 04) 00:08.1 Performance counters: Intel Corporation Sky Lake-E Ubox Registers (rev 04) 00:08.2 System peripheral: Intel Corporation Sky Lake-E Ubox Registers (rev 04) 00:14.0 USB controller: Intel Corporation 200 Series/Z370 Chipset Family USB 3.0 xHCI Controller 00:14.2 Signal processing controller: Intel Corporation 200 Series PCH Thermal Subsystem 00:17.0 RAID bus controller: Intel Corporation C600/X79 series chipset SATA RAID Controller 00:1b.0 PCI bridge: Intel Corporation 200 Series PCH PCI Express Root Port #21 (rev f0) 00:1c.0 PCI bridge: Intel Corporation 200 Series PCH PCI Express Root Port #3 (rev f0) 00:1c.4 PCI bridge: Intel Corporation 200 Series PCH PCI Express Root Port #5 (rev f0) 00:1d.0 PCI bridge: Intel Corporation 200 Series PCH PCI Express Root Port #9 (rev f0) 00:1f.0 ISA bridge: Intel Corporation C422 Chipset LPC/eSPI Controller 00:1f.2 Memory controller: Intel Corporation 200 Series/Z370 Chipset Family Power Management Controller 00:1f.3 Audio device: Intel Corporation 200 Series PCH HD Audio 00:1f.4 SMBus: Intel Corporation 200 Series/Z370 Chipset Family SMBus Controller 00:1f.5 Non-VGA unclassified device: Intel Corporation 200 Series/Z370 Chipset Family SPI Controller 00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (2) I219-LM 01:00.0 PCI bridge: Texas Instruments XIO2000(A)/XIO2200A PCI Express-to-PCI Bridge (rev 03) 02:01.0 USB controller: NEC Corporation OHCI USB Controller (rev 43) 02:01.1 USB controller: NEC Corporation OHCI USB Controller (rev 43) 02:01.2 USB controller: NEC Corporation uPD72010x USB 2.0 Controller (rev 04) 03:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03) 05:00.0 PCI bridge: PLX Technology, Inc. PEX 8609 8-lane, 8-Port PCI Express Gen 2 (5.0 GT/s) Switch with DMA (rev ba) 05:00.1 System peripheral: PLX Technology, Inc. PEX 8609 8-lane, 8-Port PCI Express Gen 2 (5.0 GT/s) Switch with DMA (rev ba) 06:01.0 PCI bridge: PLX Technology, Inc. PEX 8609 8-lane, 8-Port PCI Express Gen 2 (5.0 GT/s) Switch with DMA (rev ba) 06:05.0 PCI bridge: PLX Technology, Inc. PEX 8609 8-lane, 8-Port PCI Express Gen 2 (5.0 GT/s) Switch with DMA (rev ba) 06:07.0 PCI bridge: PLX Technology, Inc. PEX 8609 8-lane, 8-Port PCI Express Gen 2 (5.0 GT/s) Switch with DMA (rev ba) 06:09.0 PCI bridge: PLX Technology, Inc. PEX 8609 8-lane, 8-Port PCI Express Gen 2 (5.0 GT/s) Switch with DMA (rev ba) 07:00.0 USB controller: ASMedia Technology Inc. ASM1042A USB 3.0 Host Controller 08:00.0 USB controller: ASMedia Technology Inc. ASM1042A USB 3.0 Host Controller 09:00.0 USB controller: ASMedia Technology Inc. ASM1042A USB 3.0 Host Controller 0a:00.0 USB controller: ASMedia Technology Inc. ASM1042A USB 3.0 Host Controller 14:00.0 PCI bridge: Intel Corporation Sky Lake-E PCI Express Root Port A (rev 04) 14:05.0 System peripheral: Intel Corporation Sky Lake-E VT-d (rev 04) 14:05.2 System peripheral: Intel Corporation Sky Lake-E RAS Configuration Registers (rev 04) 14:05.4 PIC: Intel Corporation Sky Lake-E IOxAPIC Configuration Registers (rev 04) 14:08.0 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04) 14:08.1 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04) 14:08.2 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04) 14:08.3 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04) 14:08.4 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04) 14:08.5 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04) 14:08.6 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04) 14:08.7 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04) 14:09.0 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04) 14:09.1 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04) 14:0e.0 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04) 14:0e.1 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04) 14:0e.2 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04) 14:0e.3 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04) 14:0e.4 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04) 14:0e.5 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04) 14:0e.6 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04) 14:0e.7 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04) 14:0f.0 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04) 14:0f.1 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04) 14:1d.0 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04) 14:1d.1 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04) 14:1d.2 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04) 14:1d.3 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04) 14:1e.0 System peripheral: Intel Corporation Sky Lake-E PCU Registers (rev 04) 14:1e.1 System peripheral: Intel Corporation Sky Lake-E PCU Registers (rev 04) 14:1e.2 System peripheral: Intel Corporation Sky Lake-E PCU Registers (rev 04) 14:1e.3 System peripheral: Intel Corporation Sky Lake-E PCU Registers (rev 04) 14:1e.4 System peripheral: Intel Corporation Sky Lake-E PCU Registers (rev 04) 14:1e.5 System peripheral: Intel Corporation Sky Lake-E PCU Registers (rev 04) 14:1e.6 System peripheral: Intel Corporation Sky Lake-E PCU Registers (rev 04) 15:00.0 PCI bridge: Pericom Semiconductor PI7C9X111SL PCIe-to-PCI Reversible Bridge (rev 02) 16:04.0 USB controller: VIA Technologies, Inc. VT82xx/62xx UHCI USB 1.1 Controller (rev 62) 16:04.1 USB controller: VIA Technologies, Inc. VT82xx/62xx UHCI USB 1.1 Controller (rev 62) 16:04.2 USB controller: VIA Technologies, Inc. USB 2.0 (rev 65) 20:00.0 PCI bridge: Intel Corporation Sky Lake-E PCI Express Root Port A (rev 04) 20:05.0 System peripheral: Intel Corporation Sky Lake-E VT-d (rev 04) 20:05.2 System peripheral: Intel Corporation Sky Lake-E RAS Configuration Registers (rev 04) 20:05.4 PIC: Intel Corporation Sky Lake-E IOxAPIC Configuration Registers (rev 04) 20:08.0 System peripheral: Intel Corporation Sky Lake-E Integrated Memory Controller (rev 04) 20:09.0 System peripheral: Intel Corporation Sky Lake-E Integrated Memory Controller (rev 04) 20:0a.0 System peripheral: Intel Corporation Sky Lake-E Integrated Memory Controller (rev 04) 20:0a.1 System peripheral: Intel Corporation Sky Lake-E Integrated Memory Controller (rev 04) 20:0a.2 System peripheral: Intel Corporation Sky Lake-E Integrated Memory Controller (rev 04) 20:0a.3 System peripheral: Intel Corporation Sky Lake-E Integrated Memory Controller (rev 04) 20:0a.4 System peripheral: Intel Corporation Sky Lake-E Integrated Memory Controller (rev 04) 20:0a.5 System peripheral: Intel Corporation Sky Lake-E LM Channel 1 (rev 04) 20:0a.6 System peripheral: Intel Corporation Sky Lake-E LMS Channel 1 (rev 04) 20:0a.7 System peripheral: Intel Corporation Sky Lake-E LMDP Channel 1 (rev 04) 20:0b.0 System peripheral: Intel Corporation Sky Lake-E DECS Channel 2 (rev 04) 20:0b.1 System peripheral: Intel Corporation Sky Lake-E LM Channel 2 (rev 04) 20:0b.2 System peripheral: Intel Corporation Sky Lake-E LMS Channel 2 (rev 04) 20:0b.3 System peripheral: Intel Corporation Sky Lake-E LMDP Channel 2 (rev 04) 20:0c.0 System peripheral: Intel Corporation Sky Lake-E Integrated Memory Controller (rev 04) 20:0c.1 System peripheral: Intel Corporation Sky Lake-E Integrated Memory Controller (rev 04) 20:0c.2 System peripheral: Intel Corporation Sky Lake-E Integrated Memory Controller (rev 04) 20:0c.3 System peripheral: Intel Corporation Sky Lake-E Integrated Memory Controller (rev 04) 20:0c.4 System peripheral: Intel Corporation Sky Lake-E Integrated Memory Controller (rev 04) 20:0c.5 System peripheral: Intel Corporation Sky Lake-E LM Channel 1 (rev 04) 20:0c.6 System peripheral: Intel Corporation Sky Lake-E LMS Channel 1 (rev 04) 20:0c.7 System peripheral: Intel Corporation Sky Lake-E LMDP Channel 1 (rev 04) 20:0d.0 System peripheral: Intel Corporation Sky Lake-E DECS Channel 2 (rev 04) 20:0d.1 System peripheral: Intel Corporation Sky Lake-E LM Channel 2 (rev 04) 20:0d.2 System peripheral: Intel Corporation Sky Lake-E LMS Channel 2 (rev 04) 20:0d.3 System peripheral: Intel Corporation Sky Lake-E LMDP Channel 2 (rev 04) 21:00.0 VGA compatible controller: NVIDIA Corporation GP107GL [Quadro P400] (rev a1) 21:00.1 Audio device: NVIDIA Corporation GP107GL High Definition Audio Controller (rev a1) 2c:00.0 PCI bridge: Intel Corporation Sky Lake-E PCI Express Root Port A (rev 04) 2c:02.0 PCI bridge: Intel Corporation Sky Lake-E PCI Express Root Port C (rev 04) 2c:03.0 PCI bridge: Intel Corporation Sky Lake-E PCI Express Root Port D (rev 04) 2c:05.0 System peripheral: Intel Corporation Sky Lake-E VT-d (rev 04) 2c:05.2 System peripheral: Intel Corporation Sky Lake-E RAS Configuration Registers (rev 04) 2c:05.4 PIC: Intel Corporation Sky Lake-E IOxAPIC Configuration Registers (rev 04) 2c:12.0 Performance counters: Intel Corporation Sky Lake-E M3KTI Registers (rev 04) 2c:12.1 Performance counters: Intel Corporation Sky Lake-E M3KTI Registers (rev 04) 2c:12.2 System peripheral: Intel Corporation Sky Lake-E M3KTI Registers (rev 04) 2c:15.0 System peripheral: Intel Corporation Sky Lake-E M2PCI Registers (rev 04) 2c:16.0 System peripheral: Intel Corporation Sky Lake-E M2PCI Registers (rev 04) 2c:16.4 System peripheral: Intel Corporation Sky Lake-E M2PCI Registers (rev 04) 2c:17.0 System peripheral: Intel Corporation Sky Lake-E M2PCI Registers (rev 04) On 11/19/20 12:22 PM, Alan Stern wrote: > On Mon, Nov 16, 2020 at 01:42:46PM -0500, Alberto Sentieri wrote: >> Alan, >> >> I am not sure if that would give you any clue, but I enabled usbfs_snoop in >> the kernel and I got URB error -108 (endpoint shutdown) just before the >> kernel lock. >> >> Just after starting the program, I got something like that: >> >> [ 174.325977] usb 1-1.1.1: opened by process 2961: ate >> [ 174.326006] usb 1-1.1.1: usbdev_do_ioctl: GETDRIVER >> [ 174.326015] usb 1-1.1.1: usbdev_do_ioctl: IOCTL >> [ 174.326334] usb 1-1.1.1: usbdev_do_ioctl: CLAIMINTERFACE >> [ 174.326371] usb 1-1.1.1: usbdev_do_ioctl: SETINTERFACE >> [ 174.326582] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB >> [ 174.326592] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64 >> [ 174.326617] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB >> [ 174.326623] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64 >> [ 174.326629] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB >> [ 174.326635] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64 >> [ 174.326640] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB >> [ 174.326645] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64 >> [ 174.326652] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB >> [ 174.326657] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64 >> [ 174.326663] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB >> [ 174.326668] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64 >> [ 174.326674] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB >> [ 174.326679] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64 >> [ 174.326685] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB >> [ 174.326691] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64 >> [ 174.326696] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB >> [ 174.326702] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64 >> [ 174.326708] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB >> [ 174.326713] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64 >> [ 174.326718] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB >> [ 174.326723] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64 >> [ 174.326729] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB >> [ 174.326734] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64 >> [ 174.326740] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB >> [ 174.326745] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64 >> [ 174.326751] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB >> [ 174.326756] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64 >> [ 174.326762] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB >> [ 174.326767] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64 >> [ 174.326773] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB >> [ 174.326783] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64 >> [ 174.327565] usb 1-1.3.1: opened by process 2961: ate >> [ 174.327584] usb 1-1.3.1: usbdev_do_ioctl: GETDRIVER >> [ 174.327593] usb 1-1.3.1: usbdev_do_ioctl: IOCTL >> [ 174.327855] usb 1-1.3.1: usbdev_do_ioctl: CLAIMINTERFACE >> [ 174.327881] usb 1-1.3.1: usbdev_do_ioctl: SETINTERFACE >> [ 174.328180] usb 1-1.3.1: usbdev_do_ioctl: SUBMITURB >> [ 174.328188] usb 1-1.3.1: userurb 0000000000000000, ep1 int-in, length 64 >> [ 174.328205] usb 1-1.3.1: usbdev_do_ioctl: SUBMITURB > Can you post the output from "lsusb -t" for your test system? The > device numbers aren't what I expected to see. Not wrong, just > unexpected. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: kernel locks due to USB I/O 2020-11-19 19:21 ` Alberto Sentieri @ 2020-11-19 19:43 ` Alan Stern 2020-11-19 22:14 ` Alberto Sentieri 0 siblings, 1 reply; 17+ messages in thread From: Alan Stern @ 2020-11-19 19:43 UTC (permalink / raw) To: Alberto Sentieri; +Cc: Greg Kroah-Hartman, linux-usb On Thu, Nov 19, 2020 at 02:21:47PM -0500, Alberto Sentieri wrote: > lsusb -t in a similar configuration I use for development (it has just 6 > device, and not 36): > > $ lsusb -t > /: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=ehci-pci/4p, 480M > |__ Port 3: Dev 5, If 0, Class=Hub, Driver=hub/7p, 480M > |__ Port 1: Dev 6, If 0, Class=Hub, Driver=hub/2p, 480M > |__ Port 1: Dev 8, If 0, Class=Human Interface Device, > Driver=usbhid, 12M > |__ Port 2: Dev 7, If 0, Class=Hub, Driver=hub/2p, 480M > |__ Port 1: Dev 10, If 0, Class=Human Interface Device, > Driver=usbhid, 12M > |__ Port 4: Dev 9, If 0, Class=Hub, Driver=hub/2p, 480M > |__ Port 1: Dev 12, If 0, Class=Human Interface Device, > Driver=usbhid, 12M > |__ Port 5: Dev 11, If 0, Class=Hub, Driver=hub/7p, 480M > |__ Port 6: Dev 13, If 0, Class=Hub, Driver=hub/7p, 480M > |__ Port 6: Dev 15, If 0, Class=Hub, Driver=hub/2p, 480M > |__ Port 1: Dev 17, If 0, Class=Human Interface Device, > Driver=usbhid, 12M > |__ Port 7: Dev 16, If 0, Class=Hub, Driver=hub/2p, 480M > |__ Port 1: Dev 18, If 0, Class=Human Interface Device, > Driver=usbhid, 12M > |__ Port 7: Dev 14, If 0, Class=Human Interface Device, > Driver=usbhid, 12M Previously you said that each HID microcontroller is connected to port 1 of a two-port hub. But that clearly isn't true for device 14 in the listing above. What happened there? Alan Stern ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: kernel locks due to USB I/O 2020-11-19 19:43 ` Alan Stern @ 2020-11-19 22:14 ` Alberto Sentieri 0 siblings, 0 replies; 17+ messages in thread From: Alberto Sentieri @ 2020-11-19 22:14 UTC (permalink / raw) To: Alan Stern; +Cc: Greg Kroah-Hartman, linux-usb On 11/19/20 2:43 PM, Alan Stern wrote: > On Thu, Nov 19, 2020 at 02:21:47PM -0500, Alberto Sentieri wrote: >> lsusb -t in a similar configuration I use for development (it has just 6 >> device, and not 36): >> >> $ lsusb -t >> /: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=ehci-pci/4p, 480M >> |__ Port 3: Dev 5, If 0, Class=Hub, Driver=hub/7p, 480M >> |__ Port 1: Dev 6, If 0, Class=Hub, Driver=hub/2p, 480M >> |__ Port 1: Dev 8, If 0, Class=Human Interface Device, >> Driver=usbhid, 12M >> |__ Port 2: Dev 7, If 0, Class=Hub, Driver=hub/2p, 480M >> |__ Port 1: Dev 10, If 0, Class=Human Interface Device, >> Driver=usbhid, 12M >> |__ Port 4: Dev 9, If 0, Class=Hub, Driver=hub/2p, 480M >> |__ Port 1: Dev 12, If 0, Class=Human Interface Device, >> Driver=usbhid, 12M >> |__ Port 5: Dev 11, If 0, Class=Hub, Driver=hub/7p, 480M >> |__ Port 6: Dev 13, If 0, Class=Hub, Driver=hub/7p, 480M >> |__ Port 6: Dev 15, If 0, Class=Hub, Driver=hub/2p, 480M >> |__ Port 1: Dev 17, If 0, Class=Human Interface Device, >> Driver=usbhid, 12M >> |__ Port 7: Dev 16, If 0, Class=Hub, Driver=hub/2p, 480M >> |__ Port 1: Dev 18, If 0, Class=Human Interface Device, >> Driver=usbhid, 12M >> |__ Port 7: Dev 14, If 0, Class=Human Interface Device, >> Driver=usbhid, 12M > Previously you said that each HID microcontroller is connected to port 1 > of a two-port hub. But that clearly isn't true for device 14 in the > listing above. What happened there? The program never talks to that device. It does not try to open it. The program has a list of valid interfaces, and if one is not in the list it will not be part of the game. Really I sent you a list of 5 devices, because one was disconnect when I ran lsusb. This would be the list with the 6 devices: /: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=ehci-pci/4p, 480M |__ Port 3: Dev 53, If 0, Class=Hub, Driver=hub/7p, 480M |__ Port 1: Dev 54, If 0, Class=Hub, Driver=hub/2p, 480M |__ Port 1: Dev 56, If 0, Class=Human Interface Device, Driver=, 12M |__ Port 2: Dev 55, If 0, Class=Hub, Driver=hub/2p, 480M |__ Port 1: Dev 58, If 0, Class=Human Interface Device, Driver=, 12M |__ Port 3: Dev 57, If 0, Class=Hub, Driver=hub/2p, 480M |__ Port 1: Dev 60, If 0, Class=Human Interface Device, Driver=, 12M |__ Port 4: Dev 59, If 0, Class=Hub, Driver=hub/2p, 480M |__ Port 1: Dev 62, If 0, Class=Human Interface Device, Driver=, 12M |__ Port 5: Dev 61, If 0, Class=Hub, Driver=hub/7p, 480M |__ Port 6: Dev 63, If 0, Class=Hub, Driver=hub/7p, 480M |__ Port 6: Dev 69, If 0, Class=Hub, Driver=hub/2p, 480M |__ Port 1: Dev 70, If 0, Class=Human Interface Device, Driver=, 12M |__ Port 7: Dev 66, If 0, Class=Hub, Driver=hub/2p, 480M |__ Port 1: Dev 68, If 0, Class=Human Interface Device, Driver=, 12M |__ Port 7: Dev 64, If 0, Class=Human Interface Device, Driver=usbhid, 12M Again, device 64 (previous device 14) is not part of the game. I compiled kernel 5.9.8, which is running, and have other kernels as well. I will try to reproduce the problem in my lab. I will let you know as soon as I have more information. If I were able to reproduce the problem, I will try to use a Beagle USB analyzer at the same time the problem occurs. Thanks, Alberto > Alan Stern > ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2020-12-16 22:15 UTC | newest] Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-11-10 19:20 kernel locks due to USB I/O Alberto Sentieri 2020-11-10 20:51 ` Alan Stern 2020-11-10 23:42 ` Alberto Sentieri 2020-11-11 7:51 ` Greg Kroah-Hartman 2020-11-11 15:51 ` Alan Stern 2020-11-11 19:31 ` Alberto Sentieri 2020-11-16 16:53 ` Alberto Sentieri 2020-11-16 17:06 ` Alan Stern 2020-11-16 18:42 ` Alberto Sentieri 2020-11-19 17:22 ` Alan Stern 2020-11-19 18:50 ` Alberto Sentieri 2020-11-19 20:01 ` Alan Stern [not found] ` <4f8f545e-4846-45e0-b8f8-5c73876b150a@tripolho.com> [not found] ` <20201119225144.GA590990@rowland.harvard.edu> [not found] ` <3df90f9d-0af2-2aaa-9853-966f99e961a4@tripolho.com> 2020-12-14 17:18 ` Alan Stern 2020-12-16 22:14 ` Alberto Sentieri 2020-11-19 19:21 ` Alberto Sentieri 2020-11-19 19:43 ` Alan Stern 2020-11-19 22:14 ` Alberto Sentieri
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.