* kernel locks due to USB I/O
@ 2020-11-10 19:20 Alberto Sentieri
2020-11-10 20:51 ` Alan Stern
0 siblings, 1 reply; 17+ messages in thread
From: Alberto Sentieri @ 2020-11-10 19:20 UTC (permalink / raw)
To: Greg Kroah-Hartman, linux-usb
I’ve seen many kernel locks caused by a particular user-level
application. After the kernel locks, there is no report left in the
machine, neither in the logs. These locks have to do with USB input and
output.
The objective of this email is to get guidance about how to collect more
data related to the locks.
Follows a description of the problem.
I manage a few remote machines installed at a manufacturing facility,
which run Ubuntu 18.04. For months I had seen unexpected kernel locks,
which I could not explain. By locks I mean that the machine completely
dies. The graphical screen and keyboard freezes. I cannot ping or
connect through ssh during the locks. The only way of making the machine
come back is through a “pull the plug”. After rebooting I cannot find
anything meaningful about the lock in the logs. The machine is a good
quality one with a 6-core Xeon, 32 GB ECC memory (and the application is
using about 1GB). Exact the same problem happens in two identical
machines, one running kernel 5.0.0-37 generic and the other running
kernel 5.3.0-62-generic.
A few days ago I was able to create a sequence of events that produce
the locks in a couple of minutes. These events have to do with USB 2.0
interrupt I/O on USB devices connected at 12 Mbits/s and the frequency
URBs are submitted and reaped . It is necessary to have at least 36
devices connected to reproduce the problem easily, which I cannot do
from where I am. The machines are in a country other than the one I
live, and my physical access to them is not possible due to COVID-19
restrictions.
There is no special USB drivers installed. However, there is a NVIDIA
manufacturer driver installed, which I installed using the Ubuntu
regular tools for non-free software. All USB I/O is done by a regular
user opening /dev/bus/usb/xxx/xxx (the device group is set to the user
group by udev).
Each set of 18 USB devices is connected to a 10-Amp.-power-supply
powered HUB. Each hub has its own USB 2.0 root, I mean, I installed
multiple USB 2.0 PCI express expansion cards, and only one port of each
expansion card is used for each HUB.
The protocol to talk to any of the 36 devices is pretty simple. It uses
USB interrupt frames. A 64-byte frame is sent to the device (request
packet). I use ioctl (USBDEVFS_SUBMITURB). The file descriptor is
monitored by epoll and when an answer comes back, the response packet
(another 64-byte interrupt packet) is recovered by ioctl
(USBDEVFS_REAPURBNDELAY). Then a 64-byte packet (confirmation packet) is
sent through USBDEVFS_SUBMITURB. This sequence happens once every few
seconds and the delay between the three packets is just a couple of
milliseconds. All process of dealing with the 36 devices is in a unique
thread, under the same epoll loop.
So if I synchronize all 36 devices, I mean, I try to talk to all them
basically at the same time, the kernel will lock in about 2 minutes or
less. By “at the same time” I mean to submit the URBs for the request
packet around the same time for all of them, and then sit there, waiting
for the proper epoll wake-up to deal with the state machine (response
and confirmation packets).
However, if I lock a semaphore before sending the request packet for one
device, and only unlock after reaping the URB I used to send the
confirmation packet, it ran for ate least 72 hours without problems. So,
one device at a time (using basically the same software plus the
semaphore) does not cause the kernel lock.
My point is that simple ioctl calls to USB devices should not break the
kernel. I need help to address the kernel issue. The problem is
difficult to reproduce at my office because it needs many devices
connected to it, which are available only in a place I do not have
physical access to, due to COVID-19 travel restrictions.
My guess is that, for a regular user, this bug rarely manifests itself
and it may be there for a long time.
I would like to figure out exactly where the problem is and I am looking
for your guidance to get more information about it.
Thanks,
Alberto Sentieri
# cat /proc/version
Linux version 5.3.0-62-generic (buildd@lcy01-amd64-004) (gcc version
7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)) #56~18.04.1-Ubuntu SMP Wed Jun 24
16:17:03 UTC 2020
# free -h (while running the application)
total used free shared buff/cache
available
Mem: 31G 860M 29G 7.1M 1.3G
29G
Swap: 15G 0B 15G
# lspci
00:00.0 Host bridge: Intel Corporation Sky Lake-E DMI3 Registers (rev 04)
00:04.0 System peripheral: Intel Corporation Sky Lake-E CBDMA Registers
(rev 04)
00:04.1 System peripheral: Intel Corporation Sky Lake-E CBDMA Registers
(rev 04)
00:04.2 System peripheral: Intel Corporation Sky Lake-E CBDMA Registers
(rev 04)
00:04.3 System peripheral: Intel Corporation Sky Lake-E CBDMA Registers
(rev 04)
00:04.4 System peripheral: Intel Corporation Sky Lake-E CBDMA Registers
(rev 04)
00:04.5 System peripheral: Intel Corporation Sky Lake-E CBDMA Registers
(rev 04)
00:04.6 System peripheral: Intel Corporation Sky Lake-E CBDMA Registers
(rev 04)
00:04.7 System peripheral: Intel Corporation Sky Lake-E CBDMA Registers
(rev 04)
00:05.0 System peripheral: Intel Corporation Sky Lake-E MM/Vt-d
Configuration Registers (rev 04)
00:05.2 System peripheral: Intel Corporation Device 2025 (rev 04)
00:05.4 PIC: Intel Corporation Device 2026 (rev 04)
00:08.0 System peripheral: Intel Corporation Sky Lake-E Ubox Registers
(rev 04)
00:08.1 Performance counters: Intel Corporation Sky Lake-E Ubox
Registers (rev 04)
00:08.2 System peripheral: Intel Corporation Sky Lake-E Ubox Registers
(rev 04)
00:14.0 USB controller: Intel Corporation 200 Series/Z370 Chipset Family
USB 3.0 xHCI Controller
00:14.2 Signal processing controller: Intel Corporation 200 Series PCH
Thermal Subsystem
00:16.0 Communication controller: Intel Corporation 200 Series PCH CSME
HECI #1
00:17.0 RAID bus controller: Intel Corporation C600/X79 series chipset
SATA RAID Controller
00:1b.0 PCI bridge: Intel Corporation 200 Series PCH PCI Express Root
Port #21 (rev f0)
00:1c.0 PCI bridge: Intel Corporation 200 Series PCH PCI Express Root
Port #3 (rev f0)
00:1c.4 PCI bridge: Intel Corporation 200 Series PCH PCI Express Root
Port #5 (rev f0)
00:1d.0 PCI bridge: Intel Corporation 200 Series PCH PCI Express Root
Port #9 (rev f0)
00:1f.0 ISA bridge: Intel Corporation C422 Chipset LPC/eSPI Controller
00:1f.2 Memory controller: Intel Corporation 200 Series/Z370 Chipset
Family Power Management Controller
00:1f.3 Audio device: Intel Corporation 200 Series PCH HD Audio
00:1f.4 SMBus: Intel Corporation 200 Series/Z370 Chipset Family SMBus
Controller
00:1f.5 Non-VGA unclassified device: Intel Corporation 200 Series/Z370
Chipset Family SPI Controller
00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (2)
I219-LM
01:00.0 PCI bridge: Pericom Semiconductor PI7C9X111SL PCIe-to-PCI
Reversible Bridge (rev 02)
02:04.0 USB controller: VIA Technologies, Inc. VT82xx/62xx UHCI USB 1.1
Controller (rev 62)
02:04.1 USB controller: VIA Technologies, Inc. VT82xx/62xx UHCI USB 1.1
Controller (rev 62)
02:04.2 USB controller: VIA Technologies, Inc. USB 2.0 (rev 65)
03:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network
Connection (rev 03)
05:00.0 PCI bridge: Pericom Semiconductor PI7C9X111SL PCIe-to-PCI
Reversible Bridge (rev 02)
06:04.0 USB controller: VIA Technologies, Inc. VT82xx/62xx UHCI USB 1.1
Controller (rev 62)
06:04.1 USB controller: VIA Technologies, Inc. VT82xx/62xx UHCI USB 1.1
Controller (rev 62)
06:04.2 USB controller: VIA Technologies, Inc. USB 2.0 (rev 65)
14:00.0 PCI bridge: Intel Corporation Sky Lake-E PCI Express Root Port A
(rev 04)
14:05.0 System peripheral: Intel Corporation Device 2034 (rev 04)
14:05.2 System peripheral: Intel Corporation Sky Lake-E RAS
Configuration Registers (rev 04)
14:05.4 PIC: Intel Corporation Device 2036 (rev 04)
14:08.0 System peripheral: Intel Corporation Sky Lake-E CHA Registers
(rev 04)
14:08.1 System peripheral: Intel Corporation Sky Lake-E CHA Registers
(rev 04)
14:08.2 System peripheral: Intel Corporation Sky Lake-E CHA Registers
(rev 04)
14:08.3 System peripheral: Intel Corporation Sky Lake-E CHA Registers
(rev 04)
14:08.4 System peripheral: Intel Corporation Sky Lake-E CHA Registers
(rev 04)
14:08.5 System peripheral: Intel Corporation Sky Lake-E CHA Registers
(rev 04)
14:08.6 System peripheral: Intel Corporation Sky Lake-E CHA Registers
(rev 04)
14:08.7 System peripheral: Intel Corporation Sky Lake-E CHA Registers
(rev 04)
14:09.0 System peripheral: Intel Corporation Sky Lake-E CHA Registers
(rev 04)
14:09.1 System peripheral: Intel Corporation Sky Lake-E CHA Registers
(rev 04)
14:0e.0 System peripheral: Intel Corporation Sky Lake-E CHA Registers
(rev 04)
14:0e.1 System peripheral: Intel Corporation Sky Lake-E CHA Registers
(rev 04)
14:0e.2 System peripheral: Intel Corporation Sky Lake-E CHA Registers
(rev 04)
14:0e.3 System peripheral: Intel Corporation Sky Lake-E CHA Registers
(rev 04)
14:0e.4 System peripheral: Intel Corporation Sky Lake-E CHA Registers
(rev 04)
14:0e.5 System peripheral: Intel Corporation Sky Lake-E CHA Registers
(rev 04)
14:0e.6 System peripheral: Intel Corporation Sky Lake-E CHA Registers
(rev 04)
14:0e.7 System peripheral: Intel Corporation Sky Lake-E CHA Registers
(rev 04)
14:0f.0 System peripheral: Intel Corporation Sky Lake-E CHA Registers
(rev 04)
14:0f.1 System peripheral: Intel Corporation Sky Lake-E CHA Registers
(rev 04)
14:1d.0 System peripheral: Intel Corporation Sky Lake-E CHA Registers
(rev 04)
14:1d.1 System peripheral: Intel Corporation Sky Lake-E CHA Registers
(rev 04)
14:1d.2 System peripheral: Intel Corporation Sky Lake-E CHA Registers
(rev 04)
14:1d.3 System peripheral: Intel Corporation Sky Lake-E CHA Registers
(rev 04)
14:1e.0 System peripheral: Intel Corporation Sky Lake-E PCU Registers
(rev 04)
14:1e.1 System peripheral: Intel Corporation Sky Lake-E PCU Registers
(rev 04)
14:1e.2 System peripheral: Intel Corporation Sky Lake-E PCU Registers
(rev 04)
14:1e.3 System peripheral: Intel Corporation Sky Lake-E PCU Registers
(rev 04)
14:1e.4 System peripheral: Intel Corporation Sky Lake-E PCU Registers
(rev 04)
14:1e.5 System peripheral: Intel Corporation Sky Lake-E PCU Registers
(rev 04)
14:1e.6 System peripheral: Intel Corporation Sky Lake-E PCU Registers
(rev 04)
15:00.0 PCI bridge: Pericom Semiconductor PI7C9X111SL PCIe-to-PCI
Reversible Bridge (rev 02)
16:04.0 USB controller: VIA Technologies, Inc. VT82xx/62xx UHCI USB 1.1
Controller (rev 62)
16:04.1 USB controller: VIA Technologies, Inc. VT82xx/62xx UHCI USB 1.1
Controller (rev 62)
16:04.2 USB controller: VIA Technologies, Inc. USB 2.0 (rev 65)
20:00.0 PCI bridge: Intel Corporation Sky Lake-E PCI Express Root Port A
(rev 04)
20:05.0 System peripheral: Intel Corporation Device 2034 (rev 04)
20:05.2 System peripheral: Intel Corporation Sky Lake-E RAS
Configuration Registers (rev 04)
20:05.4 PIC: Intel Corporation Device 2036 (rev 04)
20:08.0 System peripheral: Intel Corporation Device 2066 (rev 04)
20:09.0 System peripheral: Intel Corporation Device 2066 (rev 04)
20:0a.0 System peripheral: Intel Corporation Device 2040 (rev 04)
20:0a.1 System peripheral: Intel Corporation Device 2041 (rev 04)
20:0a.2 System peripheral: Intel Corporation Device 2042 (rev 04)
20:0a.3 System peripheral: Intel Corporation Device 2043 (rev 04)
20:0a.4 System peripheral: Intel Corporation Device 2044 (rev 04)
20:0a.5 System peripheral: Intel Corporation Device 2045 (rev 04)
20:0a.6 System peripheral: Intel Corporation Device 2046 (rev 04)
20:0a.7 System peripheral: Intel Corporation Device 2047 (rev 04)
20:0b.0 System peripheral: Intel Corporation Device 2048 (rev 04)
20:0b.1 System peripheral: Intel Corporation Device 2049 (rev 04)
20:0b.2 System peripheral: Intel Corporation Device 204a (rev 04)
20:0b.3 System peripheral: Intel Corporation Device 204b (rev 04)
20:0c.0 System peripheral: Intel Corporation Device 2040 (rev 04)
20:0c.1 System peripheral: Intel Corporation Device 2041 (rev 04)
20:0c.2 System peripheral: Intel Corporation Device 2042 (rev 04)
20:0c.3 System peripheral: Intel Corporation Device 2043 (rev 04)
20:0c.4 System peripheral: Intel Corporation Device 2044 (rev 04)
20:0c.5 System peripheral: Intel Corporation Device 2045 (rev 04)
20:0c.6 System peripheral: Intel Corporation Device 2046 (rev 04)
20:0c.7 System peripheral: Intel Corporation Device 2047 (rev 04)
20:0d.0 System peripheral: Intel Corporation Device 2048 (rev 04)
20:0d.1 System peripheral: Intel Corporation Device 2049 (rev 04)
20:0d.2 System peripheral: Intel Corporation Device 204a (rev 04)
20:0d.3 System peripheral: Intel Corporation Device 204b (rev 04)
21:00.0 VGA compatible controller: NVIDIA Corporation GP107GL [Quadro
P400] (rev a1)
21:00.1 Audio device: NVIDIA Corporation GP107GL High Definition Audio
Controller (rev a1)
2c:00.0 PCI bridge: Intel Corporation Sky Lake-E PCI Express Root Port A
(rev 04)
2c:02.0 PCI bridge: Intel Corporation Sky Lake-E PCI Express Root Port C
(rev 04)
2c:03.0 PCI bridge: Intel Corporation Sky Lake-E PCI Express Root Port D
(rev 04)
2c:05.0 System peripheral: Intel Corporation Device 2034 (rev 04)
2c:05.2 System peripheral: Intel Corporation Sky Lake-E RAS
Configuration Registers (rev 04)
2c:05.4 PIC: Intel Corporation Device 2036 (rev 04)
2c:12.0 Performance counters: Intel Corporation Sky Lake-E M3KTI
Registers (rev 04)
2c:12.1 Performance counters: Intel Corporation Sky Lake-E M3KTI
Registers (rev 04)
2c:12.2 System peripheral: Intel Corporation Sky Lake-E M3KTI Registers
(rev 04)
2c:15.0 System peripheral: Intel Corporation Sky Lake-E M2PCI Registers
(rev 04)
2c:16.0 System peripheral: Intel Corporation Sky Lake-E M2PCI Registers
(rev 04)
2c:16.4 System peripheral: Intel Corporation Sky Lake-E M2PCI Registers
(rev 04)
2c:17.0 System peripheral: Intel Corporation Sky Lake-E M2PCI Registers
(rev 04)
2d:00.0 PCI bridge: Pericom Semiconductor PI7C9X111SL PCIe-to-PCI
Reversible Bridge (rev 02)
2e:04.0 USB controller: VIA Technologies, Inc. VT82xx/62xx UHCI USB 1.1
Controller (rev 62)
2e:04.1 USB controller: VIA Technologies, Inc. VT82xx/62xx UHCI USB 1.1
Controller (rev 62)
2e:04.2 USB controller: VIA Technologies, Inc. USB 2.0 (rev 65)
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: kernel locks due to USB I/O
2020-11-10 19:20 kernel locks due to USB I/O Alberto Sentieri
@ 2020-11-10 20:51 ` Alan Stern
2020-11-10 23:42 ` Alberto Sentieri
0 siblings, 1 reply; 17+ messages in thread
From: Alan Stern @ 2020-11-10 20:51 UTC (permalink / raw)
To: Alberto Sentieri; +Cc: Greg Kroah-Hartman, linux-usb
On Tue, Nov 10, 2020 at 02:20:50PM -0500, Alberto Sentieri wrote:
> I’ve seen many kernel locks caused by a particular user-level application.
> After the kernel locks, there is no report left in the machine, neither in
> the logs. These locks have to do with USB input and output.
>
> The objective of this email is to get guidance about how to collect more
> data related to the locks.
>
> Follows a description of the problem.
>
> I manage a few remote machines installed at a manufacturing facility, which
> run Ubuntu 18.04. For months I had seen unexpected kernel locks, which I
> could not explain. By locks I mean that the machine completely dies. The
> graphical screen and keyboard freezes. I cannot ping or connect through ssh
> during the locks. The only way of making the machine come back is through a
> “pull the plug”. After rebooting I cannot find anything meaningful about the
> lock in the logs. The machine is a good quality one with a 6-core Xeon, 32
> GB ECC memory (and the application is using about 1GB). Exact the same
> problem happens in two identical machines, one running kernel 5.0.0-37
> generic and the other running kernel 5.3.0-62-generic.
Can you update either machine to a 5.9 kernel?
> A few days ago I was able to create a sequence of events that produce the
> locks in a couple of minutes. These events have to do with USB 2.0 interrupt
> I/O on USB devices connected at 12 Mbits/s and the frequency URBs are
> submitted and reaped . It is necessary to have at least 36 devices connected
> to reproduce the problem easily, which I cannot do from where I am. The
> machines are in a country other than the one I live, and my physical access
> to them is not possible due to COVID-19 restrictions.
>
> There is no special USB drivers installed. However, there is a NVIDIA
> manufacturer driver installed, which I installed using the Ubuntu regular
> tools for non-free software. All USB I/O is done by a regular user opening
> /dev/bus/usb/xxx/xxx (the device group is set to the user group by udev).
>
> Each set of 18 USB devices is connected to a 10-Amp.-power-supply powered
> HUB. Each hub has its own USB 2.0 root, I mean, I installed multiple USB 2.0
> PCI express expansion cards, and only one port of each expansion card is
> used for each HUB.
>
> The protocol to talk to any of the 36 devices is pretty simple. It uses USB
> interrupt frames. A 64-byte frame is sent to the device (request packet). I
> use ioctl (USBDEVFS_SUBMITURB). The file descriptor is monitored by epoll
> and when an answer comes back, the response packet (another 64-byte
> interrupt packet) is recovered by ioctl (USBDEVFS_REAPURBNDELAY). Then a
> 64-byte packet (confirmation packet) is sent through USBDEVFS_SUBMITURB.
> This sequence happens once every few seconds and the delay between the three
> packets is just a couple of milliseconds. All process of dealing with the 36
> devices is in a unique thread, under the same epoll loop.
This sentence is ambiguous. Do you mean there is a single unique thread
which talks to all 36 devices? Or do you mean there is a separate
unique thread for each device (so 36 threads)?
> So if I synchronize all 36 devices, I mean, I try to talk to all them
> basically at the same time, the kernel will lock in about 2 minutes or less.
> By “at the same time” I mean to submit the URBs for the request packet
> around the same time for all of them, and then sit there, waiting for the
> proper epoll wake-up to deal with the state machine (response and
> confirmation packets).
>
> However, if I lock a semaphore before sending the request packet for one
> device, and only unlock after reaping the URB I used to send the
> confirmation packet, it ran for ate least 72 hours without problems. So, one
> device at a time (using basically the same software plus the semaphore) does
> not cause the kernel lock.
>
> My point is that simple ioctl calls to USB devices should not break the
> kernel. I need help to address the kernel issue. The problem is difficult to
> reproduce at my office because it needs many devices connected to it, which
> are available only in a place I do not have physical access to, due to
> COVID-19 travel restrictions.
>
> My guess is that, for a regular user, this bug rarely manifests itself and
> it may be there for a long time.
>
> I would like to figure out exactly where the problem is and I am looking for
> your guidance to get more information about it.
You could try using a network console. Or have someone who is on-site
take a picture of the computer screen when a crash occurs.
Alan Stern
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: kernel locks due to USB I/O
2020-11-10 20:51 ` Alan Stern
@ 2020-11-10 23:42 ` Alberto Sentieri
2020-11-11 7:51 ` Greg Kroah-Hartman
2020-11-11 15:51 ` Alan Stern
0 siblings, 2 replies; 17+ messages in thread
From: Alberto Sentieri @ 2020-11-10 23:42 UTC (permalink / raw)
To: Alan Stern; +Cc: Greg Kroah-Hartman, linux-usb
1) The current Ubuntu Kernel is 5.4.0-53. Do you want me to upgrade it
to 5.9, from kernel.org? Or is there a Ubuntu 5.9 package that I can
use? It would be easy to do it If there is a Ubuntu package with 5.9,
which I would install and, after the tests, uninstall.
2) Why do you believe that 5.9 would solve the problem? I am asking that
because I cannot change the production machine for a test if I cannot go
back to the original state. There is always a risk involved.
3) It is one single thread dealing with all 36 devices. Each device has
its own co-routine (not preemptive), but all co-routines are executed by
a unique thread.
4) By network console, do you mean ssh? It dies as well when it locks.
The screen is the regular GNOME3 screen and nothing can be seen there.
Every time it locks they send a picture, and I cannot see anything
meaningful there. I am thinking about disabling GNOME3, but I need their
blessing for that.
Thanks,
Alberto
On 11/10/20 3:51 PM, Alan Stern wrote:
> On Tue, Nov 10, 2020 at 02:20:50PM -0500, Alberto Sentieri wrote:
>> I’ve seen many kernel locks caused by a particular user-level application.
>> After the kernel locks, there is no report left in the machine, neither in
>> the logs. These locks have to do with USB input and output.
>>
>> The objective of this email is to get guidance about how to collect more
>> data related to the locks.
>>
>> Follows a description of the problem.
>>
>> I manage a few remote machines installed at a manufacturing facility, which
>> run Ubuntu 18.04. For months I had seen unexpected kernel locks, which I
>> could not explain. By locks I mean that the machine completely dies. The
>> graphical screen and keyboard freezes. I cannot ping or connect through ssh
>> during the locks. The only way of making the machine come back is through a
>> “pull the plug”. After rebooting I cannot find anything meaningful about the
>> lock in the logs. The machine is a good quality one with a 6-core Xeon, 32
>> GB ECC memory (and the application is using about 1GB). Exact the same
>> problem happens in two identical machines, one running kernel 5.0.0-37
>> generic and the other running kernel 5.3.0-62-generic.
> Can you update either machine to a 5.9 kernel?
>
>> A few days ago I was able to create a sequence of events that produce the
>> locks in a couple of minutes. These events have to do with USB 2.0 interrupt
>> I/O on USB devices connected at 12 Mbits/s and the frequency URBs are
>> submitted and reaped . It is necessary to have at least 36 devices connected
>> to reproduce the problem easily, which I cannot do from where I am. The
>> machines are in a country other than the one I live, and my physical access
>> to them is not possible due to COVID-19 restrictions.
>>
>> There is no special USB drivers installed. However, there is a NVIDIA
>> manufacturer driver installed, which I installed using the Ubuntu regular
>> tools for non-free software. All USB I/O is done by a regular user opening
>> /dev/bus/usb/xxx/xxx (the device group is set to the user group by udev).
>>
>> Each set of 18 USB devices is connected to a 10-Amp.-power-supply powered
>> HUB. Each hub has its own USB 2.0 root, I mean, I installed multiple USB 2.0
>> PCI express expansion cards, and only one port of each expansion card is
>> used for each HUB.
>>
>> The protocol to talk to any of the 36 devices is pretty simple. It uses USB
>> interrupt frames. A 64-byte frame is sent to the device (request packet). I
>> use ioctl (USBDEVFS_SUBMITURB). The file descriptor is monitored by epoll
>> and when an answer comes back, the response packet (another 64-byte
>> interrupt packet) is recovered by ioctl (USBDEVFS_REAPURBNDELAY). Then a
>> 64-byte packet (confirmation packet) is sent through USBDEVFS_SUBMITURB.
>> This sequence happens once every few seconds and the delay between the three
>> packets is just a couple of milliseconds. All process of dealing with the 36
>> devices is in a unique thread, under the same epoll loop.
> This sentence is ambiguous. Do you mean there is a single unique thread
> which talks to all 36 devices? Or do you mean there is a separate
> unique thread for each device (so 36 threads)?
>
>> So if I synchronize all 36 devices, I mean, I try to talk to all them
>> basically at the same time, the kernel will lock in about 2 minutes or less.
>> By “at the same time” I mean to submit the URBs for the request packet
>> around the same time for all of them, and then sit there, waiting for the
>> proper epoll wake-up to deal with the state machine (response and
>> confirmation packets).
>>
>> However, if I lock a semaphore before sending the request packet for one
>> device, and only unlock after reaping the URB I used to send the
>> confirmation packet, it ran for ate least 72 hours without problems. So, one
>> device at a time (using basically the same software plus the semaphore) does
>> not cause the kernel lock.
>>
>> My point is that simple ioctl calls to USB devices should not break the
>> kernel. I need help to address the kernel issue. The problem is difficult to
>> reproduce at my office because it needs many devices connected to it, which
>> are available only in a place I do not have physical access to, due to
>> COVID-19 travel restrictions.
>>
>> My guess is that, for a regular user, this bug rarely manifests itself and
>> it may be there for a long time.
>>
>> I would like to figure out exactly where the problem is and I am looking for
>> your guidance to get more information about it.
> You could try using a network console. Or have someone who is on-site
> take a picture of the computer screen when a crash occurs.
>
> Alan Stern
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: kernel locks due to USB I/O
2020-11-10 23:42 ` Alberto Sentieri
@ 2020-11-11 7:51 ` Greg Kroah-Hartman
2020-11-11 15:51 ` Alan Stern
1 sibling, 0 replies; 17+ messages in thread
From: Greg Kroah-Hartman @ 2020-11-11 7:51 UTC (permalink / raw)
To: Alberto Sentieri; +Cc: Alan Stern, linux-usb
On Tue, Nov 10, 2020 at 06:42:17PM -0500, Alberto Sentieri wrote:
> 1) The current Ubuntu Kernel is 5.4.0-53. Do you want me to upgrade it to
> 5.9, from kernel.org? Or is there a Ubuntu 5.9 package that I can use? It
> would be easy to do it If there is a Ubuntu package with 5.9, which I would
> install and, after the tests, uninstall.
There might be a Ubuntu package, we don't know.
The issue is that the Ubuntu kernel you are using is an old one, and we
(the community) have no idea what is in there. We can help you if you
use a kernel.org release, otherwise you really should get support from
Ubuntu as you are relying on them for this type of thing (and probably
paying them for support as you are using it in a corporate setting and
relying on it, right?)
> 2) Why do you believe that 5.9 would solve the problem? I am asking that
> because I cannot change the production machine for a test if I cannot go
> back to the original state. There is always a risk involved.
You should always be able to choose a different kernel version at boot
time, no need to ever get rid of your older one in case something goes
wrong.
There has been lots of changes in the past year of kernel development,
as always, from 5.4 to 5.9, so something that we did then might have
resolved this. Maybe not, but we don't know if you can't try as to try
to remember what we did a year ago is hard, as I am sure you can imagine
:)
> 3) It is one single thread dealing with all 36 devices. Each device has its
> own co-routine (not preemptive), but all co-routines are executed by a
> unique thread.
Are you sure that these threads are not just getting stalled somehow
when a response doesn't come back properly and are not timing out, which
your userspace co-routines should be doing if the bus stalls, right?
One thing to note, USB hubs are almost never tested with that many
devices at once, so you might have to upgrade to one that says it really
will work with that many devices, and provide enough power to them. We
have seen more low-power issues on USB hubs and connections over the
years to know that this is a real issue, and one that the OS can't do
much about, but can affect it.
> 4) By network console, do you mean ssh? It dies as well when it locks. The
> screen is the regular GNOME3 screen and nothing can be seen there. Every
> time it locks they send a picture, and I cannot see anything meaningful
> there. I am thinking about disabling GNOME3, but I need their blessing for
> that.
No, netconsole is something else, see the kernel documentation for the
full details.
thanks,
greg k-h
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: kernel locks due to USB I/O
2020-11-10 23:42 ` Alberto Sentieri
2020-11-11 7:51 ` Greg Kroah-Hartman
@ 2020-11-11 15:51 ` Alan Stern
2020-11-11 19:31 ` Alberto Sentieri
2020-11-16 16:53 ` Alberto Sentieri
1 sibling, 2 replies; 17+ messages in thread
From: Alan Stern @ 2020-11-11 15:51 UTC (permalink / raw)
To: Alberto Sentieri; +Cc: Greg Kroah-Hartman, linux-usb
On Tue, Nov 10, 2020 at 06:42:17PM -0500, Alberto Sentieri wrote:
> 1) The current Ubuntu Kernel is 5.4.0-53. Do you want me to upgrade it to
> 5.9, from kernel.org? Or is there a Ubuntu 5.9 package that I can use? It
> would be easy to do it If there is a Ubuntu package with 5.9, which I would
> install and, after the tests, uninstall.
If you want to know what Ubuntu packages are available, you should ask
on an Ubuntu mailing list instead of the linux-usb mailing list.
> 2) Why do you believe that 5.9 would solve the problem? I am asking that
> because I cannot change the production machine for a test if I cannot go
> back to the original state. There is always a risk involved.
We do not believe that 5.9 will solve the problem -- we have no reason
to believe this -- but we could be wrong. In any case it is always
best to test with the most up-to-date software available, and 5.9 is the
version closest to what we are working on now.
> 3) It is one single thread dealing with all 36 devices. Each device has its
> own co-routine (not preemptive), but all co-routines are executed by a
> unique thread.
If everything runs within a single thread, how can adding a semaphore
or mutex make any difference?
> 4) By network console, do you mean ssh? It dies as well when it locks. The
> screen is the regular GNOME3 screen and nothing can be seen there. Every
> time it locks they send a picture, and I cannot see anything meaningful
> there. I am thinking about disabling GNOME3, but I need their blessing for
> that.
See https://www.kernel.org/doc/Documentation/networking/netconsole.txt
for instructions on netconsole. And when you use it for testing, be
sure to set the console log level to a high value.
Alan Stern
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: kernel locks due to USB I/O
2020-11-11 15:51 ` Alan Stern
@ 2020-11-11 19:31 ` Alberto Sentieri
2020-11-16 16:53 ` Alberto Sentieri
1 sibling, 0 replies; 17+ messages in thread
From: Alberto Sentieri @ 2020-11-11 19:31 UTC (permalink / raw)
To: Alan Stern; +Cc: Greg Kroah-Hartman, linux-usb
On 11/11/20 10:51 AM, Alan Stern wrote:
> On Tue, Nov 10, 2020 at 06:42:17PM -0500, Alberto Sentieri wrote:
>> 1) The current Ubuntu Kernel is 5.4.0-53. Do you want me to upgrade it to
>> 5.9, from kernel.org? Or is there a Ubuntu 5.9 package that I can use? It
>> would be easy to do it If there is a Ubuntu package with 5.9, which I would
>> install and, after the tests, uninstall.
> If you want to know what Ubuntu packages are available, you should ask
> on an Ubuntu mailing list instead of the linux-usb mailing list.
I am sorry to be asking about Ubuntu. For some reason I imagined I was
exchanging emails with Ubuntu guys, but now I understand that you are
from the kernel.org.
>
>> 2) Why do you believe that 5.9 would solve the problem? I am asking that
>> because I cannot change the production machine for a test if I cannot go
>> back to the original state. There is always a risk involved.
> We do not believe that 5.9 will solve the problem -- we have no reason
> to believe this -- but we could be wrong. In any case it is always
> best to test with the most up-to-date software available, and 5.9 is the
> version closest to what we are working on now.
I will try kernel 5.9 soon...
>
>> 3) It is one single thread dealing with all 36 devices. Each device has its
>> own co-routine (not preemptive), but all co-routines are executed by a
>> unique thread.
> If everything runs within a single thread, how can adding a semaphore
> or mutex make any difference?
The semaphore will block a co-routine, not a thread. It is not the type
of semaphore C programmers are used to. So, before the introduction of
the semaphore, a sequence like that would happen:
request packet device 1 URB submit
request packet device 2 URB submit
...
request packet device 36 URB submit
wait on epoll after submitting 36 URBs, one for each device.
reap URBs, receive response packets, send confirmation packets
(basically run the state machine, each device has its own state)
After the semaphore, a sequence like that would happen:
lock the semaphore
send request packet device 1 (URB submit)
wait on epoll
reap URB with device 1 response packet
submit URB with device 1 confirmation
wait on epoll
reap URB submitted on last step
unlock the semaphore
Now go to the next device, which was waiting on the co-routine semaphore.
The main difference is that I would not submit 36 URBs to 36 different
devices at the same time. The submission of 36 URBs would make the
devices start responding as soon as they get ready and receive a pool.
>
>> 4) By network console, do you mean ssh? It dies as well when it locks. The
>> screen is the regular GNOME3 screen and nothing can be seen there. Every
>> time it locks they send a picture, and I cannot see anything meaningful
>> there. I am thinking about disabling GNOME3, but I need their blessing for
>> that.
> See https://www.kernel.org/doc/Documentation/networking/netconsole.txt
> for instructions on netconsole. And when you use it for testing, be
> sure to set the console log level to a high value.
>
> Alan Stern
I will try kernel 5.9. However, it seems that the 5.3 kernel gets lost
when too many submits / reaps start happening very close to each other.
Thanks,
Alberto Sentieri
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: kernel locks due to USB I/O
2020-11-11 15:51 ` Alan Stern
2020-11-11 19:31 ` Alberto Sentieri
@ 2020-11-16 16:53 ` Alberto Sentieri
2020-11-16 17:06 ` Alan Stern
1 sibling, 1 reply; 17+ messages in thread
From: Alberto Sentieri @ 2020-11-16 16:53 UTC (permalink / raw)
To: Alan Stern; +Cc: Greg Kroah-Hartman, linux-usb
The objective of this email is to report the current status of my findings.
I loaded netconsole on both machines I was having problems with. I tried
3 times on the machine with kernel 5.0.0-37 and twice with on the
machine with kernel 5.3.0-62. Each attempt consisted of running the
program which lock the kernel until it locked (about 3 minutes after
stating the program). The referred program had the "semphore code"
commented out. Nothing was sent to netconsole on all the 5 attempts I
made when the kernel locked.
Just to be clear about my use of netconsole, before loading the
netconsole kernel module, I ran "dmesg -n 8". When netconsole module was
loaded I could clearly see about 9 message lines on the computer
receiving the netconsole messages telling me that netconsole was loaded
(and how it was configured), so no doubts about the correct netconsole
setup. The "netconsole server" was a machine on the same local network.
My next attempt will be to compile kernel 5.9, as you suggest, and try it.
Thanks,
Alberto Sentieri
On 11/11/20 10:51 AM, Alan Stern wrote:
> On Tue, Nov 10, 2020 at 06:42:17PM -0500, Alberto Sentieri wrote:
>> 1) The current Ubuntu Kernel is 5.4.0-53. Do you want me to upgrade it to
>> 5.9, from kernel.org? Or is there a Ubuntu 5.9 package that I can use? It
>> would be easy to do it If there is a Ubuntu package with 5.9, which I would
>> install and, after the tests, uninstall.
> If you want to know what Ubuntu packages are available, you should ask
> on an Ubuntu mailing list instead of the linux-usb mailing list.
>
>> 2) Why do you believe that 5.9 would solve the problem? I am asking that
>> because I cannot change the production machine for a test if I cannot go
>> back to the original state. There is always a risk involved.
> We do not believe that 5.9 will solve the problem -- we have no reason
> to believe this -- but we could be wrong. In any case it is always
> best to test with the most up-to-date software available, and 5.9 is the
> version closest to what we are working on now.
>
>> 3) It is one single thread dealing with all 36 devices. Each device has its
>> own co-routine (not preemptive), but all co-routines are executed by a
>> unique thread.
> If everything runs within a single thread, how can adding a semaphore
> or mutex make any difference?
>
>> 4) By network console, do you mean ssh? It dies as well when it locks. The
>> screen is the regular GNOME3 screen and nothing can be seen there. Every
>> time it locks they send a picture, and I cannot see anything meaningful
>> there. I am thinking about disabling GNOME3, but I need their blessing for
>> that.
> See https://www.kernel.org/doc/Documentation/networking/netconsole.txt
> for instructions on netconsole. And when you use it for testing, be
> sure to set the console log level to a high value.
>
> Alan Stern
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: kernel locks due to USB I/O
2020-11-16 16:53 ` Alberto Sentieri
@ 2020-11-16 17:06 ` Alan Stern
2020-11-16 18:42 ` Alberto Sentieri
0 siblings, 1 reply; 17+ messages in thread
From: Alan Stern @ 2020-11-16 17:06 UTC (permalink / raw)
To: Alberto Sentieri; +Cc: Greg Kroah-Hartman, linux-usb
On Mon, Nov 16, 2020 at 11:53:38AM -0500, Alberto Sentieri wrote:
> The objective of this email is to report the current status of my findings.
>
> I loaded netconsole on both machines I was having problems with. I tried 3
> times on the machine with kernel 5.0.0-37 and twice with on the machine with
> kernel 5.3.0-62. Each attempt consisted of running the program which lock
> the kernel until it locked (about 3 minutes after stating the program). The
> referred program had the "semphore code" commented out. Nothing was sent to
> netconsole on all the 5 attempts I made when the kernel locked.
>
> Just to be clear about my use of netconsole, before loading the netconsole
> kernel module, I ran "dmesg -n 8". When netconsole module was loaded I could
> clearly see about 9 message lines on the computer receiving the netconsole
> messages telling me that netconsole was loaded (and how it was configured),
> so no doubts about the correct netconsole setup. The "netconsole server" was
> a machine on the same local network.
>
> My next attempt will be to compile kernel 5.9, as you suggest, and try it.
While this may not generate any useful information, one way to create a
bunch of log output while running your test is to set the usbfs_snoop
module parameter for usbcore to true.
You can also enable dynamic debugging for usbcore, although in a stable
environment like yours it probably won't produce much output.
Alan Stern
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: kernel locks due to USB I/O
2020-11-16 17:06 ` Alan Stern
@ 2020-11-16 18:42 ` Alberto Sentieri
2020-11-19 17:22 ` Alan Stern
0 siblings, 1 reply; 17+ messages in thread
From: Alberto Sentieri @ 2020-11-16 18:42 UTC (permalink / raw)
To: Alan Stern; +Cc: Greg Kroah-Hartman, linux-usb
Alan,
I am not sure if that would give you any clue, but I enabled usbfs_snoop
in the kernel and I got URB error -108 (endpoint shutdown) just before
the kernel lock.
Just after starting the program, I got something like that:
[ 174.325977] usb 1-1.1.1: opened by process 2961: ate
[ 174.326006] usb 1-1.1.1: usbdev_do_ioctl: GETDRIVER
[ 174.326015] usb 1-1.1.1: usbdev_do_ioctl: IOCTL
[ 174.326334] usb 1-1.1.1: usbdev_do_ioctl: CLAIMINTERFACE
[ 174.326371] usb 1-1.1.1: usbdev_do_ioctl: SETINTERFACE
[ 174.326582] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB
[ 174.326592] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64
[ 174.326617] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB
[ 174.326623] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64
[ 174.326629] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB
[ 174.326635] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64
[ 174.326640] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB
[ 174.326645] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64
[ 174.326652] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB
[ 174.326657] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64
[ 174.326663] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB
[ 174.326668] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64
[ 174.326674] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB
[ 174.326679] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64
[ 174.326685] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB
[ 174.326691] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64
[ 174.326696] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB
[ 174.326702] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64
[ 174.326708] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB
[ 174.326713] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64
[ 174.326718] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB
[ 174.326723] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64
[ 174.326729] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB
[ 174.326734] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64
[ 174.326740] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB
[ 174.326745] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64
[ 174.326751] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB
[ 174.326756] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64
[ 174.326762] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB
[ 174.326767] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64
[ 174.326773] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB
[ 174.326783] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64
[ 174.327565] usb 1-1.3.1: opened by process 2961: ate
[ 174.327584] usb 1-1.3.1: usbdev_do_ioctl: GETDRIVER
[ 174.327593] usb 1-1.3.1: usbdev_do_ioctl: IOCTL
[ 174.327855] usb 1-1.3.1: usbdev_do_ioctl: CLAIMINTERFACE
[ 174.327881] usb 1-1.3.1: usbdev_do_ioctl: SETINTERFACE
[ 174.328180] usb 1-1.3.1: usbdev_do_ioctl: SUBMITURB
[ 174.328188] usb 1-1.3.1: userurb 0000000000000000, ep1 int-in, length 64
[ 174.328205] usb 1-1.3.1: usbdev_do_ioctl: SUBMITURB
During the program execution, I got many lines like that:
[ 178.537958] usb 1-1.4.1: urb complete
[ 178.537975] usb 1-1.4.1: userurb pK-error, ep1 int-out,
actual_length 64 status 0
[ 178.538000] usb 1-1.2.1: urb complete
[ 178.538006] usb 1-1.2.1: userurb pK-error, ep1 int-out,
actual_length 64 status 0
I am not sure if pK-error has a special meaning there.
However, just before the lock, I got this sequence:
[ 387.326795] usb 1-1.5.3.1: usbdev_do_ioctl: SUBMITURB
[ 387.326801] usb 1-1.5.3.1: userurb 0000000000000000, ep1 int-in,
length 64
[ 387.327531] usb 1-1.5.3.1: urb complete
[ 387.327539] usb 1-1.5.3.1: userurb pK-error, ep1 int-out,
actual_length 64 status 0
[ 387.327582] usb 1-1.5.3.1: usbdev_do_ioctl: REAPURBNDELAY
[ 387.327590] usb 1-1.5.3.1: reap 0000000000000000
[ 387.327598] usb 1-1.5.3.1: usbdev_do_ioctl: REAPURBNDELAY
[ 388.109400] usb 3-1.5.7.1: usbdev_do_ioctl: REAPURBNDELAY
[ 388.110032] usb 3-1.6.7.1: usbdev_do_ioctl: REAPURBNDELAY
[ 388.110169] usb 3-1.5.5.1: usbdev_do_ioctl: REAPURBNDELAY
[ 388.110605] usb 3-1.6.5.1: usbdev_do_ioctl: REAPURBNDELAY
[ 388.111394] usb 3-1.1.1: usbdev_do_ioctl: REAPURBNDELAY
[ 388.403278] usb 3-1.6.4.1: usbdev_do_ioctl: REAPURBNDELAY
[ 388.404011] usb 3-1.5.1.1: usbdev_do_ioctl: REAPURBNDELAY
[ 388.404344] usb 3-1.6.3.1: usbdev_do_ioctl: REAPURBNDELAY
[ 388.404649] usb 3-1.4.1: usbdev_do_ioctl: REAPURBNDELAY
[ 388.404790] usb 3-1.3.1: usbdev_do_ioctl: REAPURBNDELAY
[ 388.405283] usb 3-1.5.3.1: usbdev_do_ioctl: REAPURBNDELAY
[ 388.407828] usb 3-1.6.1.1: usbdev_do_ioctl: REAPURBNDELAY
[ 388.408771] usb 3-1.5.4.1: usbdev_do_ioctl: REAPURBNDELAY
[ 388.409155] usb 3-1.5.2.1: usbdev_do_ioctl: REAPURBNDELAY
[ 388.409452] usb 3-1.5.6.1: usbdev_do_ioctl: REAPURBNDELAY
[ 388.409743] usb 3-1.6.6.1: usbdev_do_ioctl: REAPURBNDELAY
[ 388.409848] usb 3-1.2.1: usbdev_do_ioctl: REAPURBNDELAY
[ 388.410840] usb 3-1.6.2.1: usbdev_do_ioctl: REAPURBNDELAY
[ 389.612597] usb 3-1.5.7.1: opened by process 2961: ate
[ 389.612622] usb 3-1.5.7.1: usbdev_do_ioctl: GETDRIVER
[ 389.612631] usb 3-1.5.7.1: usbdev_do_ioctl: IOCTL
[ 389.614965] usb 3-1.5.7.1: urb complete
[ 389.614981] usb 3-1.5.7.1: userurb pK-error, ep1 int-in,
actual_length 0 status -108
[ 389.614989] usb 3-1.5.7.1: urb complete
[ 389.614994] usb 3-1.5.7.1: userurb pK-error, ep1 int-in,
actual_length 0 status -108
[ 389.614999] usb 3-1.5.7.1: urb complete
[ 389.615003] usb 3-1.5.7.1: userurb pK-error, ep1 int-in,
actual_length 0 status -108
[ 389.615008] usb 3-1.5.7.1: urb complete
[ 389.615013] usb 3-1.5.7.1: userurb pK-error, ep1 int-in,
actual_length 0 status -108
[ 389.615018] usb 3-1.5.7.1: urb complete
[ 389.615022] usb 3-1.5.7.1: userurb pK-error, ep1 int-in,
actual_length 0 status -108
[ 389.615027] usb 3-1.5.7.1: urb complete
[ 389.615032] usb 3-1.5.7.1: userurb pK-error, ep1 int-in,
actual_length 0 status -108
[ 389.615037] usb 3-1.5.7.1: urb complete
[ 389.615042] usb 3-1.5.7.1: userurb pK-error, ep1 int-in,
actual_length 0 status -108
[ 389.615047] usb 3-1.5.7.1: urb complete
[ 389.615051] usb 3-1.5.7.1: userurb pK-error, ep1 int-in,
actual_length 0 status -108
[ 389.615056] usb 3-1.5.7.1: urb complete
[ 389.615060] usb 3-1.5.7.1: userurb pK-error, ep1 int-in,
actual_length 0 status -108
[ 389.615065] usb 3-1.5.7.1: urb complete
[ 389.615070] usb 3-1.5.7.1: userurb pK-error, ep1 int-in,
actual_length 0 status -108
[ 389.615074] usb 3-1.5.7.1: urb complete
[ 389.615079] usb 3-1.5.7.1: userurb pK-error, ep1 int-in,
actual_length 0 status -108
[ 389.615084] usb 3-1.5.7.1: urb complete
[ 389.615089] usb 3-1.5.7.1: userurb pK-error, ep1 int-in,
actual_length 0 status -108
[ 389.615094] usb 3-1.5.7.1: urb complete
[ 389.615098] usb 3-1.5.7.1: userurb pK-error, ep1 int-in,
actual_length 0 status -108
[ 389.615103] usb 3-1.5.7.1: urb complete
[ 389.615108] usb 3-1.5.7.1: userurb pK-error, ep1 int-in,
actual_length 0 status -108
[ 389.615112] usb 3-1.5.7.1: urb complete
[ 389.615117] usb 3-1.5.7.1: userurb pK-error, ep1 int-in,
actual_length 0 status -108
[ 389.615121] usb 3-1.5.7.1: urb complete
[ 389.615126] usb 3-1.5.7.1: userurb pK-error, ep1 int-in,
actual_length 0 status -108
[ 389.616823] usb 3-1.5.7.1: urb complete
[ 389.616839] usb 3-1.5.7.1: userurb pK-error, ep1 int-out,
actual_length 0 status -108
[ 389.616926] usb 3-1.5.7.1: usbdev_do_ioctl: CLAIMINTERFACE
[ 389.616964] usb 3-1.5.7.1: usbdev_do_ioctl: SETINTERFACE
As you can see, it locked in 215 seconds, in this test. Do you see
anything special in this sequence?
Thanks,
Alberto Sentieri
On 11/16/20 12:06 PM, Alan Stern wrote:
> On Mon, Nov 16, 2020 at 11:53:38AM -0500, Alberto Sentieri wrote:
>> The objective of this email is to report the current status of my findings.
>>
>> I loaded netconsole on both machines I was having problems with. I tried 3
>> times on the machine with kernel 5.0.0-37 and twice with on the machine with
>> kernel 5.3.0-62. Each attempt consisted of running the program which lock
>> the kernel until it locked (about 3 minutes after stating the program). The
>> referred program had the "semphore code" commented out. Nothing was sent to
>> netconsole on all the 5 attempts I made when the kernel locked.
>>
>> Just to be clear about my use of netconsole, before loading the netconsole
>> kernel module, I ran "dmesg -n 8". When netconsole module was loaded I could
>> clearly see about 9 message lines on the computer receiving the netconsole
>> messages telling me that netconsole was loaded (and how it was configured),
>> so no doubts about the correct netconsole setup. The "netconsole server" was
>> a machine on the same local network.
>>
>> My next attempt will be to compile kernel 5.9, as you suggest, and try it.
> While this may not generate any useful information, one way to create a
> bunch of log output while running your test is to set the usbfs_snoop
> module parameter for usbcore to true.
>
> You can also enable dynamic debugging for usbcore, although in a stable
> environment like yours it probably won't produce much output.
>
> Alan Stern
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: kernel locks due to USB I/O
2020-11-16 18:42 ` Alberto Sentieri
@ 2020-11-19 17:22 ` Alan Stern
2020-11-19 18:50 ` Alberto Sentieri
2020-11-19 19:21 ` Alberto Sentieri
0 siblings, 2 replies; 17+ messages in thread
From: Alan Stern @ 2020-11-19 17:22 UTC (permalink / raw)
To: Alberto Sentieri; +Cc: Greg Kroah-Hartman, linux-usb
On Mon, Nov 16, 2020 at 01:42:46PM -0500, Alberto Sentieri wrote:
> Alan,
>
> I am not sure if that would give you any clue, but I enabled usbfs_snoop in
> the kernel and I got URB error -108 (endpoint shutdown) just before the
> kernel lock.
>
> Just after starting the program, I got something like that:
>
> [ 174.325977] usb 1-1.1.1: opened by process 2961: ate
> [ 174.326006] usb 1-1.1.1: usbdev_do_ioctl: GETDRIVER
> [ 174.326015] usb 1-1.1.1: usbdev_do_ioctl: IOCTL
> [ 174.326334] usb 1-1.1.1: usbdev_do_ioctl: CLAIMINTERFACE
> [ 174.326371] usb 1-1.1.1: usbdev_do_ioctl: SETINTERFACE
> [ 174.326582] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB
> [ 174.326592] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64
> [ 174.326617] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB
> [ 174.326623] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64
> [ 174.326629] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB
> [ 174.326635] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64
> [ 174.326640] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB
> [ 174.326645] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64
> [ 174.326652] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB
> [ 174.326657] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64
> [ 174.326663] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB
> [ 174.326668] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64
> [ 174.326674] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB
> [ 174.326679] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64
> [ 174.326685] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB
> [ 174.326691] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64
> [ 174.326696] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB
> [ 174.326702] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64
> [ 174.326708] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB
> [ 174.326713] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64
> [ 174.326718] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB
> [ 174.326723] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64
> [ 174.326729] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB
> [ 174.326734] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64
> [ 174.326740] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB
> [ 174.326745] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64
> [ 174.326751] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB
> [ 174.326756] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64
> [ 174.326762] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB
> [ 174.326767] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64
> [ 174.326773] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB
> [ 174.326783] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64
> [ 174.327565] usb 1-1.3.1: opened by process 2961: ate
> [ 174.327584] usb 1-1.3.1: usbdev_do_ioctl: GETDRIVER
> [ 174.327593] usb 1-1.3.1: usbdev_do_ioctl: IOCTL
> [ 174.327855] usb 1-1.3.1: usbdev_do_ioctl: CLAIMINTERFACE
> [ 174.327881] usb 1-1.3.1: usbdev_do_ioctl: SETINTERFACE
> [ 174.328180] usb 1-1.3.1: usbdev_do_ioctl: SUBMITURB
> [ 174.328188] usb 1-1.3.1: userurb 0000000000000000, ep1 int-in, length 64
> [ 174.328205] usb 1-1.3.1: usbdev_do_ioctl: SUBMITURB
Can you post the output from "lsusb -t" for your test system? The
device numbers aren't what I expected to see. Not wrong, just
unexpected.
Apart from that, this looks like normal initialization. Why does the
program submit 16 URBs all for the same endpoint without waiting?
Wouldn't two or three be enough?
> During the program execution, I got many lines like that:
>
> [ 178.537958] usb 1-1.4.1: urb complete
> [ 178.537975] usb 1-1.4.1: userurb pK-error, ep1 int-out,
> actual_length 64 status 0
> [ 178.538000] usb 1-1.2.1: urb complete
> [ 178.538006] usb 1-1.2.1: userurb pK-error, ep1 int-out,
> actual_length 64 status 0
>
> I am not sure if pK-error has a special meaning there.
It's an error in the debugging output. I have just submitted a patch
to fix the error (see
<https://marc.info/?l=linux-usb&m=160580539818074&w=2>). In any case,
it should be irrelevant for your problem.
> However, just before the lock, I got this sequence:
>
> [ 387.326795] usb 1-1.5.3.1: usbdev_do_ioctl: SUBMITURB
> [ 387.326801] usb 1-1.5.3.1: userurb 0000000000000000, ep1 int-in, length
> 64
> [ 387.327531] usb 1-1.5.3.1: urb complete
> [ 387.327539] usb 1-1.5.3.1: userurb pK-error, ep1 int-out,
> actual_length 64 status 0
> [ 387.327582] usb 1-1.5.3.1: usbdev_do_ioctl: REAPURBNDELAY
> [ 387.327590] usb 1-1.5.3.1: reap 0000000000000000
> [ 387.327598] usb 1-1.5.3.1: usbdev_do_ioctl: REAPURBNDELAY
> [ 388.109400] usb 3-1.5.7.1: usbdev_do_ioctl: REAPURBNDELAY
> [ 388.110032] usb 3-1.6.7.1: usbdev_do_ioctl: REAPURBNDELAY
> [ 388.110169] usb 3-1.5.5.1: usbdev_do_ioctl: REAPURBNDELAY
> [ 388.110605] usb 3-1.6.5.1: usbdev_do_ioctl: REAPURBNDELAY
> [ 388.111394] usb 3-1.1.1: usbdev_do_ioctl: REAPURBNDELAY
> [ 388.403278] usb 3-1.6.4.1: usbdev_do_ioctl: REAPURBNDELAY
> [ 388.404011] usb 3-1.5.1.1: usbdev_do_ioctl: REAPURBNDELAY
> [ 388.404344] usb 3-1.6.3.1: usbdev_do_ioctl: REAPURBNDELAY
> [ 388.404649] usb 3-1.4.1: usbdev_do_ioctl: REAPURBNDELAY
> [ 388.404790] usb 3-1.3.1: usbdev_do_ioctl: REAPURBNDELAY
> [ 388.405283] usb 3-1.5.3.1: usbdev_do_ioctl: REAPURBNDELAY
> [ 388.407828] usb 3-1.6.1.1: usbdev_do_ioctl: REAPURBNDELAY
> [ 388.408771] usb 3-1.5.4.1: usbdev_do_ioctl: REAPURBNDELAY
> [ 388.409155] usb 3-1.5.2.1: usbdev_do_ioctl: REAPURBNDELAY
> [ 388.409452] usb 3-1.5.6.1: usbdev_do_ioctl: REAPURBNDELAY
> [ 388.409743] usb 3-1.6.6.1: usbdev_do_ioctl: REAPURBNDELAY
> [ 388.409848] usb 3-1.2.1: usbdev_do_ioctl: REAPURBNDELAY
> [ 388.410840] usb 3-1.6.2.1: usbdev_do_ioctl: REAPURBNDELAY
This rapid sequence of reap requests with no completions or actual
reaps is odd. Is this the sort of thing you expect your program to
do? Maybe you can add some debugging output to the program.
Also, can you turn on dynamic debugging for usbcore before starting
the test?
echo 'module usbcore =p' >/sys/kernel/debug/dynamic_debug/control
> [ 389.612597] usb 3-1.5.7.1: opened by process 2961: ate
> [ 389.612622] usb 3-1.5.7.1: usbdev_do_ioctl: GETDRIVER
> [ 389.612631] usb 3-1.5.7.1: usbdev_do_ioctl: IOCTL
> [ 389.614965] usb 3-1.5.7.1: urb complete
> [ 389.614981] usb 3-1.5.7.1: userurb pK-error, ep1 int-in,
> actual_length 0 status -108
> [ 389.614989] usb 3-1.5.7.1: urb complete
> [ 389.614994] usb 3-1.5.7.1: userurb pK-error, ep1 int-in,
> actual_length 0 status -108
The -108 erroro codes are ESHUTDOWN. They indicate that the device
was disconnected or something like that, although I don't see any
other indication of this in the log.
> [ 389.616926] usb 3-1.5.7.1: usbdev_do_ioctl: CLAIMINTERFACE
> [ 389.616964] usb 3-1.5.7.1: usbdev_do_ioctl: SETINTERFACE
And if the device had been disconnected, these calls would have failed.
> As you can see, it locked in 215 seconds, in this test. Do you see anything
> special in this sequence?
No, but several things are puzzling. But there's nothing that
indicates any reason for the system to freeze.
Alan Stern
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: kernel locks due to USB I/O
2020-11-19 17:22 ` Alan Stern
@ 2020-11-19 18:50 ` Alberto Sentieri
2020-11-19 20:01 ` Alan Stern
2020-11-19 19:21 ` Alberto Sentieri
1 sibling, 1 reply; 17+ messages in thread
From: Alberto Sentieri @ 2020-11-19 18:50 UTC (permalink / raw)
To: Alan Stern; +Cc: Greg Kroah-Hartman, linux-usb
Alan,
I have to get the proper blessing to do more tests, and I will work on them.
Meanwhile, my comments are below.
Thanks for your attention,
Alberto
On 11/19/20 12:22 PM, Alan Stern wrote:
> On Mon, Nov 16, 2020 at 01:42:46PM -0500, Alberto Sentieri wrote:
>> Alan,
>>
>> I am not sure if that would give you any clue, but I enabled usbfs_snoop in
>> the kernel and I got URB error -108 (endpoint shutdown) just before the
>> kernel lock.
>>
>> Just after starting the program, I got something like that:
>>
>> [ 174.325977] usb 1-1.1.1: opened by process 2961: ate
>> [ 174.326006] usb 1-1.1.1: usbdev_do_ioctl: GETDRIVER
>> [ 174.326015] usb 1-1.1.1: usbdev_do_ioctl: IOCTL
>> [ 174.326334] usb 1-1.1.1: usbdev_do_ioctl: CLAIMINTERFACE
>> [ 174.326371] usb 1-1.1.1: usbdev_do_ioctl: SETINTERFACE
>> [ 174.326582] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB
>> [ 174.326592] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64
>> [ 174.326617] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB
>> [ 174.326623] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64
>> [ 174.326629] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB
>> [ 174.326635] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64
>> [ 174.326640] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB
>> [ 174.326645] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64
>> [ 174.326652] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB
>> [ 174.326657] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64
>> [ 174.326663] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB
>> [ 174.326668] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64
>> [ 174.326674] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB
>> [ 174.326679] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64
>> [ 174.326685] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB
>> [ 174.326691] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64
>> [ 174.326696] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB
>> [ 174.326702] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64
>> [ 174.326708] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB
>> [ 174.326713] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64
>> [ 174.326718] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB
>> [ 174.326723] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64
>> [ 174.326729] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB
>> [ 174.326734] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64
>> [ 174.326740] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB
>> [ 174.326745] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64
>> [ 174.326751] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB
>> [ 174.326756] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64
>> [ 174.326762] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB
>> [ 174.326767] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64
>> [ 174.326773] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB
>> [ 174.326783] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64
>> [ 174.327565] usb 1-1.3.1: opened by process 2961: ate
>> [ 174.327584] usb 1-1.3.1: usbdev_do_ioctl: GETDRIVER
>> [ 174.327593] usb 1-1.3.1: usbdev_do_ioctl: IOCTL
>> [ 174.327855] usb 1-1.3.1: usbdev_do_ioctl: CLAIMINTERFACE
>> [ 174.327881] usb 1-1.3.1: usbdev_do_ioctl: SETINTERFACE
>> [ 174.328180] usb 1-1.3.1: usbdev_do_ioctl: SUBMITURB
>> [ 174.328188] usb 1-1.3.1: userurb 0000000000000000, ep1 int-in, length 64
>> [ 174.328205] usb 1-1.3.1: usbdev_do_ioctl: SUBMITURB
> Can you post the output from "lsusb -t" for your test system? The
> device numbers aren't what I expected to see. Not wrong, just
> unexpected.
I will run lsusb -t when I get access to the system. But the numbers are
correct. Each device has a 2-port USB hub which is connected to the USB
bus, just before the micro-controller. That is the reason for all of
them ending in .1. The micro-controller is connected to port 1 of the
2-port hub.
>
> Apart from that, this looks like normal initialization. Why does the
> program submit 16 URBs all for the same endpoint without waiting?
> Wouldn't two or three be enough?
I am using simple commands (to the device) and the responses fit in a
64-byte HID packet. But there are more complex commands, which may
return many packets. All submitted input URBs are there because, in some
cases, the worker main loop can take many milliseconds to start reaping
the URBs after they arrive.
>> During the program execution, I got many lines like that:
>>
>> [ 178.537958] usb 1-1.4.1: urb complete
>> [ 178.537975] usb 1-1.4.1: userurb pK-error, ep1 int-out,
>> actual_length 64 status 0
>> [ 178.538000] usb 1-1.2.1: urb complete
>> [ 178.538006] usb 1-1.2.1: userurb pK-error, ep1 int-out,
>> actual_length 64 status 0
>>
>> I am not sure if pK-error has a special meaning there.
> It's an error in the debugging output. I have just submitted a patch
> to fix the error (see
> <https://marc.info/?l=linux-usb&m=160580539818074&w=2>). In any case,
> it should be irrelevant for your problem.
Thanks.
>> However, just before the lock, I got this sequence:
>>
>> [ 387.326795] usb 1-1.5.3.1: usbdev_do_ioctl: SUBMITURB
>> [ 387.326801] usb 1-1.5.3.1: userurb 0000000000000000, ep1 int-in, length
>> 64
>> [ 387.327531] usb 1-1.5.3.1: urb complete
>> [ 387.327539] usb 1-1.5.3.1: userurb pK-error, ep1 int-out,
>> actual_length 64 status 0
>> [ 387.327582] usb 1-1.5.3.1: usbdev_do_ioctl: REAPURBNDELAY
>> [ 387.327590] usb 1-1.5.3.1: reap 0000000000000000
>> [ 387.327598] usb 1-1.5.3.1: usbdev_do_ioctl: REAPURBNDELAY
>> [ 388.109400] usb 3-1.5.7.1: usbdev_do_ioctl: REAPURBNDELAY
>> [ 388.110032] usb 3-1.6.7.1: usbdev_do_ioctl: REAPURBNDELAY
>> [ 388.110169] usb 3-1.5.5.1: usbdev_do_ioctl: REAPURBNDELAY
>> [ 388.110605] usb 3-1.6.5.1: usbdev_do_ioctl: REAPURBNDELAY
>> [ 388.111394] usb 3-1.1.1: usbdev_do_ioctl: REAPURBNDELAY
>> [ 388.403278] usb 3-1.6.4.1: usbdev_do_ioctl: REAPURBNDELAY
>> [ 388.404011] usb 3-1.5.1.1: usbdev_do_ioctl: REAPURBNDELAY
>> [ 388.404344] usb 3-1.6.3.1: usbdev_do_ioctl: REAPURBNDELAY
>> [ 388.404649] usb 3-1.4.1: usbdev_do_ioctl: REAPURBNDELAY
>> [ 388.404790] usb 3-1.3.1: usbdev_do_ioctl: REAPURBNDELAY
>> [ 388.405283] usb 3-1.5.3.1: usbdev_do_ioctl: REAPURBNDELAY
>> [ 388.407828] usb 3-1.6.1.1: usbdev_do_ioctl: REAPURBNDELAY
>> [ 388.408771] usb 3-1.5.4.1: usbdev_do_ioctl: REAPURBNDELAY
>> [ 388.409155] usb 3-1.5.2.1: usbdev_do_ioctl: REAPURBNDELAY
>> [ 388.409452] usb 3-1.5.6.1: usbdev_do_ioctl: REAPURBNDELAY
>> [ 388.409743] usb 3-1.6.6.1: usbdev_do_ioctl: REAPURBNDELAY
>> [ 388.409848] usb 3-1.2.1: usbdev_do_ioctl: REAPURBNDELAY
>> [ 388.410840] usb 3-1.6.2.1: usbdev_do_ioctl: REAPURBNDELAY
> This rapid sequence of reap requests with no completions or actual
> reaps is odd. Is this the sort of thing you expect your program to
> do? Maybe you can add some debugging output to the program.
What apparently happened before that (but unfortunately I did not
capture it this time) is that the particular device return errno -71 in
some URBs, without de-enumerating. However, I am only seeing userurbs
with status 0 in my the netconsole capture, before the ones with status
-108 (just before the lock). So, if error -71 happened, all URBs are
reaped and the devices are then closed. Usually I see error -71 in the
program log happening for a few devices at the same time. After closing
the interface, the program opens it again. The return from open should
be a valid file descriptor, otherwise it would fail. After opening the
device successfully it starts submitting urbs, and error -108 is coming
back, telling me that the device just opened is not open.
It just came to my attention now: maybe something to be investigated is
what the kernel does if a interface is closed due to a URB error and
immediately opened again.
> Also, can you turn on dynamic debugging for usbcore before starting
> the test?
>
> echo 'module usbcore =p' >/sys/kernel/debug/dynamic_debug/control
>
>> [ 389.612597] usb 3-1.5.7.1: opened by process 2961: ate
>> [ 389.612622] usb 3-1.5.7.1: usbdev_do_ioctl: GETDRIVER
>> [ 389.612631] usb 3-1.5.7.1: usbdev_do_ioctl: IOCTL
>> [ 389.614965] usb 3-1.5.7.1: urb complete
>> [ 389.614981] usb 3-1.5.7.1: userurb pK-error, ep1 int-in,
>> actual_length 0 status -108
>> [ 389.614989] usb 3-1.5.7.1: urb complete
>> [ 389.614994] usb 3-1.5.7.1: userurb pK-error, ep1 int-in,
>> actual_length 0 status -108
> The -108 erroro codes are ESHUTDOWN. They indicate that the device
> was disconnected or something like that, although I don't see any
> other indication of this in the log.
>
>> [ 389.616926] usb 3-1.5.7.1: usbdev_do_ioctl: CLAIMINTERFACE
>> [ 389.616964] usb 3-1.5.7.1: usbdev_do_ioctl: SETINTERFACE
> And if the device had been disconnected, these calls would have failed.
>
>> As you can see, it locked in 215 seconds, in this test. Do you see anything
>> special in this sequence?
> No, but several things are puzzling. But there's nothing that
> indicates any reason for the system to freeze.
>
> Alan Stern
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: kernel locks due to USB I/O
2020-11-19 17:22 ` Alan Stern
2020-11-19 18:50 ` Alberto Sentieri
@ 2020-11-19 19:21 ` Alberto Sentieri
2020-11-19 19:43 ` Alan Stern
1 sibling, 1 reply; 17+ messages in thread
From: Alberto Sentieri @ 2020-11-19 19:21 UTC (permalink / raw)
To: Alan Stern; +Cc: Greg Kroah-Hartman, linux-usb
lsusb -t in a similar configuration I use for development (it has just 6
device, and not 36):
$ lsusb -t
/: Bus 16.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/2p, 5000M
/: Bus 15.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/2p, 480M
/: Bus 14.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/2p, 5000M
/: Bus 13.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/2p, 480M
/: Bus 12.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/2p, 5000M
/: Bus 11.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/2p, 480M
/: Bus 10.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/2p, 5000M
/: Bus 09.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/2p, 480M
/: Bus 08.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/10p, 5000M
|__ Port 8: Dev 2, If 0, Class=Hub, Driver=hub/4p, 5000M
/: Bus 07.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/16p, 480M
|__ Port 6: Dev 2, If 0, Class=Hub, Driver=hub/4p, 480M
/: Bus 06.Port 1: Dev 1, Class=root_hub, Driver=uhci_hcd/2p, 12M
/: Bus 05.Port 1: Dev 1, Class=root_hub, Driver=uhci_hcd/2p, 12M
/: Bus 04.Port 1: Dev 1, Class=root_hub, Driver=ohci-pci/2p, 12M
/: Bus 03.Port 1: Dev 1, Class=root_hub, Driver=ohci-pci/3p, 12M
/: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=ehci-pci/4p, 480M
|__ Port 3: Dev 5, If 0, Class=Hub, Driver=hub/7p, 480M
|__ Port 1: Dev 6, If 0, Class=Hub, Driver=hub/2p, 480M
|__ Port 1: Dev 8, If 0, Class=Human Interface Device,
Driver=usbhid, 12M
|__ Port 2: Dev 7, If 0, Class=Hub, Driver=hub/2p, 480M
|__ Port 1: Dev 10, If 0, Class=Human Interface Device,
Driver=usbhid, 12M
|__ Port 4: Dev 9, If 0, Class=Hub, Driver=hub/2p, 480M
|__ Port 1: Dev 12, If 0, Class=Human Interface Device,
Driver=usbhid, 12M
|__ Port 5: Dev 11, If 0, Class=Hub, Driver=hub/7p, 480M
|__ Port 6: Dev 13, If 0, Class=Hub, Driver=hub/7p, 480M
|__ Port 6: Dev 15, If 0, Class=Hub, Driver=hub/2p, 480M
|__ Port 1: Dev 17, If 0, Class=Human Interface Device,
Driver=usbhid, 12M
|__ Port 7: Dev 16, If 0, Class=Hub, Driver=hub/2p, 480M
|__ Port 1: Dev 18, If 0, Class=Human Interface Device,
Driver=usbhid, 12M
|__ Port 7: Dev 14, If 0, Class=Human Interface Device,
Driver=usbhid, 12M
/: Bus 01.Port 1: Dev 1, Class=root_hub, Driver=ehci-pci/5p, 480M
And lspci (I believe we are using the VIA Technologies, Inc controller
at the site where we are having locks):
$ lspci
00:00.0 Host bridge: Intel Corporation Sky Lake-E DMI3 Registers (rev 04)
00:04.0 System peripheral: Intel Corporation Sky Lake-E CBDMA Registers
(rev 04)
00:04.1 System peripheral: Intel Corporation Sky Lake-E CBDMA Registers
(rev 04)
00:04.2 System peripheral: Intel Corporation Sky Lake-E CBDMA Registers
(rev 04)
00:04.3 System peripheral: Intel Corporation Sky Lake-E CBDMA Registers
(rev 04)
00:04.4 System peripheral: Intel Corporation Sky Lake-E CBDMA Registers
(rev 04)
00:04.5 System peripheral: Intel Corporation Sky Lake-E CBDMA Registers
(rev 04)
00:04.6 System peripheral: Intel Corporation Sky Lake-E CBDMA Registers
(rev 04)
00:04.7 System peripheral: Intel Corporation Sky Lake-E CBDMA Registers
(rev 04)
00:05.0 System peripheral: Intel Corporation Sky Lake-E MM/Vt-d
Configuration Registers (rev 04)
00:05.2 System peripheral: Intel Corporation Sky Lake-E RAS (rev 04)
00:05.4 PIC: Intel Corporation Sky Lake-E IOAPIC (rev 04)
00:08.0 System peripheral: Intel Corporation Sky Lake-E Ubox Registers
(rev 04)
00:08.1 Performance counters: Intel Corporation Sky Lake-E Ubox
Registers (rev 04)
00:08.2 System peripheral: Intel Corporation Sky Lake-E Ubox Registers
(rev 04)
00:14.0 USB controller: Intel Corporation 200 Series/Z370 Chipset Family
USB 3.0 xHCI Controller
00:14.2 Signal processing controller: Intel Corporation 200 Series PCH
Thermal Subsystem
00:17.0 RAID bus controller: Intel Corporation C600/X79 series chipset
SATA RAID Controller
00:1b.0 PCI bridge: Intel Corporation 200 Series PCH PCI Express Root
Port #21 (rev f0)
00:1c.0 PCI bridge: Intel Corporation 200 Series PCH PCI Express Root
Port #3 (rev f0)
00:1c.4 PCI bridge: Intel Corporation 200 Series PCH PCI Express Root
Port #5 (rev f0)
00:1d.0 PCI bridge: Intel Corporation 200 Series PCH PCI Express Root
Port #9 (rev f0)
00:1f.0 ISA bridge: Intel Corporation C422 Chipset LPC/eSPI Controller
00:1f.2 Memory controller: Intel Corporation 200 Series/Z370 Chipset
Family Power Management Controller
00:1f.3 Audio device: Intel Corporation 200 Series PCH HD Audio
00:1f.4 SMBus: Intel Corporation 200 Series/Z370 Chipset Family SMBus
Controller
00:1f.5 Non-VGA unclassified device: Intel Corporation 200 Series/Z370
Chipset Family SPI Controller
00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (2)
I219-LM
01:00.0 PCI bridge: Texas Instruments XIO2000(A)/XIO2200A PCI
Express-to-PCI Bridge (rev 03)
02:01.0 USB controller: NEC Corporation OHCI USB Controller (rev 43)
02:01.1 USB controller: NEC Corporation OHCI USB Controller (rev 43)
02:01.2 USB controller: NEC Corporation uPD72010x USB 2.0 Controller
(rev 04)
03:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network
Connection (rev 03)
05:00.0 PCI bridge: PLX Technology, Inc. PEX 8609 8-lane, 8-Port PCI
Express Gen 2 (5.0 GT/s) Switch with DMA (rev ba)
05:00.1 System peripheral: PLX Technology, Inc. PEX 8609 8-lane, 8-Port
PCI Express Gen 2 (5.0 GT/s) Switch with DMA (rev ba)
06:01.0 PCI bridge: PLX Technology, Inc. PEX 8609 8-lane, 8-Port PCI
Express Gen 2 (5.0 GT/s) Switch with DMA (rev ba)
06:05.0 PCI bridge: PLX Technology, Inc. PEX 8609 8-lane, 8-Port PCI
Express Gen 2 (5.0 GT/s) Switch with DMA (rev ba)
06:07.0 PCI bridge: PLX Technology, Inc. PEX 8609 8-lane, 8-Port PCI
Express Gen 2 (5.0 GT/s) Switch with DMA (rev ba)
06:09.0 PCI bridge: PLX Technology, Inc. PEX 8609 8-lane, 8-Port PCI
Express Gen 2 (5.0 GT/s) Switch with DMA (rev ba)
07:00.0 USB controller: ASMedia Technology Inc. ASM1042A USB 3.0 Host
Controller
08:00.0 USB controller: ASMedia Technology Inc. ASM1042A USB 3.0 Host
Controller
09:00.0 USB controller: ASMedia Technology Inc. ASM1042A USB 3.0 Host
Controller
0a:00.0 USB controller: ASMedia Technology Inc. ASM1042A USB 3.0 Host
Controller
14:00.0 PCI bridge: Intel Corporation Sky Lake-E PCI Express Root Port A
(rev 04)
14:05.0 System peripheral: Intel Corporation Sky Lake-E VT-d (rev 04)
14:05.2 System peripheral: Intel Corporation Sky Lake-E RAS
Configuration Registers (rev 04)
14:05.4 PIC: Intel Corporation Sky Lake-E IOxAPIC Configuration
Registers (rev 04)
14:08.0 System peripheral: Intel Corporation Sky Lake-E CHA Registers
(rev 04)
14:08.1 System peripheral: Intel Corporation Sky Lake-E CHA Registers
(rev 04)
14:08.2 System peripheral: Intel Corporation Sky Lake-E CHA Registers
(rev 04)
14:08.3 System peripheral: Intel Corporation Sky Lake-E CHA Registers
(rev 04)
14:08.4 System peripheral: Intel Corporation Sky Lake-E CHA Registers
(rev 04)
14:08.5 System peripheral: Intel Corporation Sky Lake-E CHA Registers
(rev 04)
14:08.6 System peripheral: Intel Corporation Sky Lake-E CHA Registers
(rev 04)
14:08.7 System peripheral: Intel Corporation Sky Lake-E CHA Registers
(rev 04)
14:09.0 System peripheral: Intel Corporation Sky Lake-E CHA Registers
(rev 04)
14:09.1 System peripheral: Intel Corporation Sky Lake-E CHA Registers
(rev 04)
14:0e.0 System peripheral: Intel Corporation Sky Lake-E CHA Registers
(rev 04)
14:0e.1 System peripheral: Intel Corporation Sky Lake-E CHA Registers
(rev 04)
14:0e.2 System peripheral: Intel Corporation Sky Lake-E CHA Registers
(rev 04)
14:0e.3 System peripheral: Intel Corporation Sky Lake-E CHA Registers
(rev 04)
14:0e.4 System peripheral: Intel Corporation Sky Lake-E CHA Registers
(rev 04)
14:0e.5 System peripheral: Intel Corporation Sky Lake-E CHA Registers
(rev 04)
14:0e.6 System peripheral: Intel Corporation Sky Lake-E CHA Registers
(rev 04)
14:0e.7 System peripheral: Intel Corporation Sky Lake-E CHA Registers
(rev 04)
14:0f.0 System peripheral: Intel Corporation Sky Lake-E CHA Registers
(rev 04)
14:0f.1 System peripheral: Intel Corporation Sky Lake-E CHA Registers
(rev 04)
14:1d.0 System peripheral: Intel Corporation Sky Lake-E CHA Registers
(rev 04)
14:1d.1 System peripheral: Intel Corporation Sky Lake-E CHA Registers
(rev 04)
14:1d.2 System peripheral: Intel Corporation Sky Lake-E CHA Registers
(rev 04)
14:1d.3 System peripheral: Intel Corporation Sky Lake-E CHA Registers
(rev 04)
14:1e.0 System peripheral: Intel Corporation Sky Lake-E PCU Registers
(rev 04)
14:1e.1 System peripheral: Intel Corporation Sky Lake-E PCU Registers
(rev 04)
14:1e.2 System peripheral: Intel Corporation Sky Lake-E PCU Registers
(rev 04)
14:1e.3 System peripheral: Intel Corporation Sky Lake-E PCU Registers
(rev 04)
14:1e.4 System peripheral: Intel Corporation Sky Lake-E PCU Registers
(rev 04)
14:1e.5 System peripheral: Intel Corporation Sky Lake-E PCU Registers
(rev 04)
14:1e.6 System peripheral: Intel Corporation Sky Lake-E PCU Registers
(rev 04)
15:00.0 PCI bridge: Pericom Semiconductor PI7C9X111SL PCIe-to-PCI
Reversible Bridge (rev 02)
16:04.0 USB controller: VIA Technologies, Inc. VT82xx/62xx UHCI USB 1.1
Controller (rev 62)
16:04.1 USB controller: VIA Technologies, Inc. VT82xx/62xx UHCI USB 1.1
Controller (rev 62)
16:04.2 USB controller: VIA Technologies, Inc. USB 2.0 (rev 65)
20:00.0 PCI bridge: Intel Corporation Sky Lake-E PCI Express Root Port A
(rev 04)
20:05.0 System peripheral: Intel Corporation Sky Lake-E VT-d (rev 04)
20:05.2 System peripheral: Intel Corporation Sky Lake-E RAS
Configuration Registers (rev 04)
20:05.4 PIC: Intel Corporation Sky Lake-E IOxAPIC Configuration
Registers (rev 04)
20:08.0 System peripheral: Intel Corporation Sky Lake-E Integrated
Memory Controller (rev 04)
20:09.0 System peripheral: Intel Corporation Sky Lake-E Integrated
Memory Controller (rev 04)
20:0a.0 System peripheral: Intel Corporation Sky Lake-E Integrated
Memory Controller (rev 04)
20:0a.1 System peripheral: Intel Corporation Sky Lake-E Integrated
Memory Controller (rev 04)
20:0a.2 System peripheral: Intel Corporation Sky Lake-E Integrated
Memory Controller (rev 04)
20:0a.3 System peripheral: Intel Corporation Sky Lake-E Integrated
Memory Controller (rev 04)
20:0a.4 System peripheral: Intel Corporation Sky Lake-E Integrated
Memory Controller (rev 04)
20:0a.5 System peripheral: Intel Corporation Sky Lake-E LM Channel 1
(rev 04)
20:0a.6 System peripheral: Intel Corporation Sky Lake-E LMS Channel 1
(rev 04)
20:0a.7 System peripheral: Intel Corporation Sky Lake-E LMDP Channel 1
(rev 04)
20:0b.0 System peripheral: Intel Corporation Sky Lake-E DECS Channel 2
(rev 04)
20:0b.1 System peripheral: Intel Corporation Sky Lake-E LM Channel 2
(rev 04)
20:0b.2 System peripheral: Intel Corporation Sky Lake-E LMS Channel 2
(rev 04)
20:0b.3 System peripheral: Intel Corporation Sky Lake-E LMDP Channel 2
(rev 04)
20:0c.0 System peripheral: Intel Corporation Sky Lake-E Integrated
Memory Controller (rev 04)
20:0c.1 System peripheral: Intel Corporation Sky Lake-E Integrated
Memory Controller (rev 04)
20:0c.2 System peripheral: Intel Corporation Sky Lake-E Integrated
Memory Controller (rev 04)
20:0c.3 System peripheral: Intel Corporation Sky Lake-E Integrated
Memory Controller (rev 04)
20:0c.4 System peripheral: Intel Corporation Sky Lake-E Integrated
Memory Controller (rev 04)
20:0c.5 System peripheral: Intel Corporation Sky Lake-E LM Channel 1
(rev 04)
20:0c.6 System peripheral: Intel Corporation Sky Lake-E LMS Channel 1
(rev 04)
20:0c.7 System peripheral: Intel Corporation Sky Lake-E LMDP Channel 1
(rev 04)
20:0d.0 System peripheral: Intel Corporation Sky Lake-E DECS Channel 2
(rev 04)
20:0d.1 System peripheral: Intel Corporation Sky Lake-E LM Channel 2
(rev 04)
20:0d.2 System peripheral: Intel Corporation Sky Lake-E LMS Channel 2
(rev 04)
20:0d.3 System peripheral: Intel Corporation Sky Lake-E LMDP Channel 2
(rev 04)
21:00.0 VGA compatible controller: NVIDIA Corporation GP107GL [Quadro
P400] (rev a1)
21:00.1 Audio device: NVIDIA Corporation GP107GL High Definition Audio
Controller (rev a1)
2c:00.0 PCI bridge: Intel Corporation Sky Lake-E PCI Express Root Port A
(rev 04)
2c:02.0 PCI bridge: Intel Corporation Sky Lake-E PCI Express Root Port C
(rev 04)
2c:03.0 PCI bridge: Intel Corporation Sky Lake-E PCI Express Root Port D
(rev 04)
2c:05.0 System peripheral: Intel Corporation Sky Lake-E VT-d (rev 04)
2c:05.2 System peripheral: Intel Corporation Sky Lake-E RAS
Configuration Registers (rev 04)
2c:05.4 PIC: Intel Corporation Sky Lake-E IOxAPIC Configuration
Registers (rev 04)
2c:12.0 Performance counters: Intel Corporation Sky Lake-E M3KTI
Registers (rev 04)
2c:12.1 Performance counters: Intel Corporation Sky Lake-E M3KTI
Registers (rev 04)
2c:12.2 System peripheral: Intel Corporation Sky Lake-E M3KTI Registers
(rev 04)
2c:15.0 System peripheral: Intel Corporation Sky Lake-E M2PCI Registers
(rev 04)
2c:16.0 System peripheral: Intel Corporation Sky Lake-E M2PCI Registers
(rev 04)
2c:16.4 System peripheral: Intel Corporation Sky Lake-E M2PCI Registers
(rev 04)
2c:17.0 System peripheral: Intel Corporation Sky Lake-E M2PCI Registers
(rev 04)
On 11/19/20 12:22 PM, Alan Stern wrote:
> On Mon, Nov 16, 2020 at 01:42:46PM -0500, Alberto Sentieri wrote:
>> Alan,
>>
>> I am not sure if that would give you any clue, but I enabled usbfs_snoop in
>> the kernel and I got URB error -108 (endpoint shutdown) just before the
>> kernel lock.
>>
>> Just after starting the program, I got something like that:
>>
>> [ 174.325977] usb 1-1.1.1: opened by process 2961: ate
>> [ 174.326006] usb 1-1.1.1: usbdev_do_ioctl: GETDRIVER
>> [ 174.326015] usb 1-1.1.1: usbdev_do_ioctl: IOCTL
>> [ 174.326334] usb 1-1.1.1: usbdev_do_ioctl: CLAIMINTERFACE
>> [ 174.326371] usb 1-1.1.1: usbdev_do_ioctl: SETINTERFACE
>> [ 174.326582] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB
>> [ 174.326592] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64
>> [ 174.326617] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB
>> [ 174.326623] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64
>> [ 174.326629] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB
>> [ 174.326635] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64
>> [ 174.326640] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB
>> [ 174.326645] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64
>> [ 174.326652] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB
>> [ 174.326657] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64
>> [ 174.326663] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB
>> [ 174.326668] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64
>> [ 174.326674] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB
>> [ 174.326679] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64
>> [ 174.326685] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB
>> [ 174.326691] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64
>> [ 174.326696] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB
>> [ 174.326702] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64
>> [ 174.326708] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB
>> [ 174.326713] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64
>> [ 174.326718] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB
>> [ 174.326723] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64
>> [ 174.326729] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB
>> [ 174.326734] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64
>> [ 174.326740] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB
>> [ 174.326745] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64
>> [ 174.326751] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB
>> [ 174.326756] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64
>> [ 174.326762] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB
>> [ 174.326767] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64
>> [ 174.326773] usb 1-1.1.1: usbdev_do_ioctl: SUBMITURB
>> [ 174.326783] usb 1-1.1.1: userurb 0000000000000000, ep1 int-in, length 64
>> [ 174.327565] usb 1-1.3.1: opened by process 2961: ate
>> [ 174.327584] usb 1-1.3.1: usbdev_do_ioctl: GETDRIVER
>> [ 174.327593] usb 1-1.3.1: usbdev_do_ioctl: IOCTL
>> [ 174.327855] usb 1-1.3.1: usbdev_do_ioctl: CLAIMINTERFACE
>> [ 174.327881] usb 1-1.3.1: usbdev_do_ioctl: SETINTERFACE
>> [ 174.328180] usb 1-1.3.1: usbdev_do_ioctl: SUBMITURB
>> [ 174.328188] usb 1-1.3.1: userurb 0000000000000000, ep1 int-in, length 64
>> [ 174.328205] usb 1-1.3.1: usbdev_do_ioctl: SUBMITURB
> Can you post the output from "lsusb -t" for your test system? The
> device numbers aren't what I expected to see. Not wrong, just
> unexpected.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: kernel locks due to USB I/O
2020-11-19 19:21 ` Alberto Sentieri
@ 2020-11-19 19:43 ` Alan Stern
2020-11-19 22:14 ` Alberto Sentieri
0 siblings, 1 reply; 17+ messages in thread
From: Alan Stern @ 2020-11-19 19:43 UTC (permalink / raw)
To: Alberto Sentieri; +Cc: Greg Kroah-Hartman, linux-usb
On Thu, Nov 19, 2020 at 02:21:47PM -0500, Alberto Sentieri wrote:
> lsusb -t in a similar configuration I use for development (it has just 6
> device, and not 36):
>
> $ lsusb -t
> /: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=ehci-pci/4p, 480M
> |__ Port 3: Dev 5, If 0, Class=Hub, Driver=hub/7p, 480M
> |__ Port 1: Dev 6, If 0, Class=Hub, Driver=hub/2p, 480M
> |__ Port 1: Dev 8, If 0, Class=Human Interface Device,
> Driver=usbhid, 12M
> |__ Port 2: Dev 7, If 0, Class=Hub, Driver=hub/2p, 480M
> |__ Port 1: Dev 10, If 0, Class=Human Interface Device,
> Driver=usbhid, 12M
> |__ Port 4: Dev 9, If 0, Class=Hub, Driver=hub/2p, 480M
> |__ Port 1: Dev 12, If 0, Class=Human Interface Device,
> Driver=usbhid, 12M
> |__ Port 5: Dev 11, If 0, Class=Hub, Driver=hub/7p, 480M
> |__ Port 6: Dev 13, If 0, Class=Hub, Driver=hub/7p, 480M
> |__ Port 6: Dev 15, If 0, Class=Hub, Driver=hub/2p, 480M
> |__ Port 1: Dev 17, If 0, Class=Human Interface Device,
> Driver=usbhid, 12M
> |__ Port 7: Dev 16, If 0, Class=Hub, Driver=hub/2p, 480M
> |__ Port 1: Dev 18, If 0, Class=Human Interface Device,
> Driver=usbhid, 12M
> |__ Port 7: Dev 14, If 0, Class=Human Interface Device,
> Driver=usbhid, 12M
Previously you said that each HID microcontroller is connected to port 1
of a two-port hub. But that clearly isn't true for device 14 in the
listing above. What happened there?
Alan Stern
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: kernel locks due to USB I/O
2020-11-19 18:50 ` Alberto Sentieri
@ 2020-11-19 20:01 ` Alan Stern
[not found] ` <4f8f545e-4846-45e0-b8f8-5c73876b150a@tripolho.com>
0 siblings, 1 reply; 17+ messages in thread
From: Alan Stern @ 2020-11-19 20:01 UTC (permalink / raw)
To: Alberto Sentieri; +Cc: Greg Kroah-Hartman, linux-usb
On Thu, Nov 19, 2020 at 01:50:13PM -0500, Alberto Sentieri wrote:
> > > However, just before the lock, I got this sequence:
> > >
> > > [ 387.326795] usb 1-1.5.3.1: usbdev_do_ioctl: SUBMITURB
> > > [ 387.326801] usb 1-1.5.3.1: userurb 0000000000000000, ep1 int-in, length
> > > 64
> > > [ 387.327531] usb 1-1.5.3.1: urb complete
> > > [ 387.327539] usb 1-1.5.3.1: userurb pK-error, ep1 int-out,
> > > actual_length 64 status 0
> > > [ 387.327582] usb 1-1.5.3.1: usbdev_do_ioctl: REAPURBNDELAY
> > > [ 387.327590] usb 1-1.5.3.1: reap 0000000000000000
> > > [ 387.327598] usb 1-1.5.3.1: usbdev_do_ioctl: REAPURBNDELAY
> > > [ 388.109400] usb 3-1.5.7.1: usbdev_do_ioctl: REAPURBNDELAY
> > > [ 388.110032] usb 3-1.6.7.1: usbdev_do_ioctl: REAPURBNDELAY
> > > [ 388.110169] usb 3-1.5.5.1: usbdev_do_ioctl: REAPURBNDELAY
> > > [ 388.110605] usb 3-1.6.5.1: usbdev_do_ioctl: REAPURBNDELAY
> > > [ 388.111394] usb 3-1.1.1: usbdev_do_ioctl: REAPURBNDELAY
> > > [ 388.403278] usb 3-1.6.4.1: usbdev_do_ioctl: REAPURBNDELAY
> > > [ 388.404011] usb 3-1.5.1.1: usbdev_do_ioctl: REAPURBNDELAY
> > > [ 388.404344] usb 3-1.6.3.1: usbdev_do_ioctl: REAPURBNDELAY
> > > [ 388.404649] usb 3-1.4.1: usbdev_do_ioctl: REAPURBNDELAY
> > > [ 388.404790] usb 3-1.3.1: usbdev_do_ioctl: REAPURBNDELAY
> > > [ 388.405283] usb 3-1.5.3.1: usbdev_do_ioctl: REAPURBNDELAY
> > > [ 388.407828] usb 3-1.6.1.1: usbdev_do_ioctl: REAPURBNDELAY
> > > [ 388.408771] usb 3-1.5.4.1: usbdev_do_ioctl: REAPURBNDELAY
> > > [ 388.409155] usb 3-1.5.2.1: usbdev_do_ioctl: REAPURBNDELAY
> > > [ 388.409452] usb 3-1.5.6.1: usbdev_do_ioctl: REAPURBNDELAY
> > > [ 388.409743] usb 3-1.6.6.1: usbdev_do_ioctl: REAPURBNDELAY
> > > [ 388.409848] usb 3-1.2.1: usbdev_do_ioctl: REAPURBNDELAY
> > > [ 388.410840] usb 3-1.6.2.1: usbdev_do_ioctl: REAPURBNDELAY
> > This rapid sequence of reap requests with no completions or actual
> > reaps is odd. Is this the sort of thing you expect your program to
> > do? Maybe you can add some debugging output to the program.
> What apparently happened before that (but unfortunately I did not capture it
> this time) is that the particular device return errno -71 in some URBs,
> without de-enumerating.
That's actually quite bad. -71 (-EPROTO) indicates a serious error in
the USB protocol. Most often it means that the kernel did not receive
a reply packet from the device -- this could be caused by the device's
firmware crashing or a USB cable failing. In a well-functioning
system it should never occur.
> However, I am only seeing userurbs with status 0 in
> my the netconsole capture, before the ones with status -108 (just before the
> lock). So, if error -71 happened, all URBs are reaped and the devices are
> then closed. Usually I see error -71 in the program log happening for a few
> devices at the same time.
That probably means there's something wrong with a hub they are all
connected to. But of course, a problem in a hub wouldn't explain why
the whole system freezes.
> After closing the interface, the program opens it
> again. The return from open should be a valid file descriptor, otherwise it
> would fail. After opening the device successfully it starts submitting urbs,
> and error -108 is coming back, telling me that the device just opened is not
> open.
It means that the device is no longer present. Perhaps it was
logically (if not physically) disconnected after the open but before
the URBs were submitted.
> It just came to my attention now: maybe something to be investigated is what
> the kernel does if a interface is closed due to a URB error and immediately
> opened again.
In itself, that should work perfectly. But whatever caused the USB
error could go on to cause other things to go wrong.
It might help to see a usbmon trace. You merely have to send the data
from /sys/kernel/debug/usb/usbmon/1u (or 3u, or even both -- the number
refers to the USB bus being monitored) through a UDP pipe, using netcat
or something similar, to another computer which could safely store the
output.
Alan Stern
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: kernel locks due to USB I/O
2020-11-19 19:43 ` Alan Stern
@ 2020-11-19 22:14 ` Alberto Sentieri
0 siblings, 0 replies; 17+ messages in thread
From: Alberto Sentieri @ 2020-11-19 22:14 UTC (permalink / raw)
To: Alan Stern; +Cc: Greg Kroah-Hartman, linux-usb
On 11/19/20 2:43 PM, Alan Stern wrote:
> On Thu, Nov 19, 2020 at 02:21:47PM -0500, Alberto Sentieri wrote:
>> lsusb -t in a similar configuration I use for development (it has just 6
>> device, and not 36):
>>
>> $ lsusb -t
>> /: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=ehci-pci/4p, 480M
>> |__ Port 3: Dev 5, If 0, Class=Hub, Driver=hub/7p, 480M
>> |__ Port 1: Dev 6, If 0, Class=Hub, Driver=hub/2p, 480M
>> |__ Port 1: Dev 8, If 0, Class=Human Interface Device,
>> Driver=usbhid, 12M
>> |__ Port 2: Dev 7, If 0, Class=Hub, Driver=hub/2p, 480M
>> |__ Port 1: Dev 10, If 0, Class=Human Interface Device,
>> Driver=usbhid, 12M
>> |__ Port 4: Dev 9, If 0, Class=Hub, Driver=hub/2p, 480M
>> |__ Port 1: Dev 12, If 0, Class=Human Interface Device,
>> Driver=usbhid, 12M
>> |__ Port 5: Dev 11, If 0, Class=Hub, Driver=hub/7p, 480M
>> |__ Port 6: Dev 13, If 0, Class=Hub, Driver=hub/7p, 480M
>> |__ Port 6: Dev 15, If 0, Class=Hub, Driver=hub/2p, 480M
>> |__ Port 1: Dev 17, If 0, Class=Human Interface Device,
>> Driver=usbhid, 12M
>> |__ Port 7: Dev 16, If 0, Class=Hub, Driver=hub/2p, 480M
>> |__ Port 1: Dev 18, If 0, Class=Human Interface Device,
>> Driver=usbhid, 12M
>> |__ Port 7: Dev 14, If 0, Class=Human Interface Device,
>> Driver=usbhid, 12M
> Previously you said that each HID microcontroller is connected to port 1
> of a two-port hub. But that clearly isn't true for device 14 in the
> listing above. What happened there?
The program never talks to that device. It does not try to open it. The
program has a list of valid interfaces, and if one is not in the list it
will not be part of the game. Really I sent you a list of 5 devices,
because one was disconnect when I ran lsusb.
This would be the list with the 6 devices:
/: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=ehci-pci/4p, 480M
|__ Port 3: Dev 53, If 0, Class=Hub, Driver=hub/7p, 480M
|__ Port 1: Dev 54, If 0, Class=Hub, Driver=hub/2p, 480M
|__ Port 1: Dev 56, If 0, Class=Human Interface Device,
Driver=, 12M
|__ Port 2: Dev 55, If 0, Class=Hub, Driver=hub/2p, 480M
|__ Port 1: Dev 58, If 0, Class=Human Interface Device,
Driver=, 12M
|__ Port 3: Dev 57, If 0, Class=Hub, Driver=hub/2p, 480M
|__ Port 1: Dev 60, If 0, Class=Human Interface Device,
Driver=, 12M
|__ Port 4: Dev 59, If 0, Class=Hub, Driver=hub/2p, 480M
|__ Port 1: Dev 62, If 0, Class=Human Interface Device,
Driver=, 12M
|__ Port 5: Dev 61, If 0, Class=Hub, Driver=hub/7p, 480M
|__ Port 6: Dev 63, If 0, Class=Hub, Driver=hub/7p, 480M
|__ Port 6: Dev 69, If 0, Class=Hub, Driver=hub/2p, 480M
|__ Port 1: Dev 70, If 0, Class=Human Interface Device,
Driver=, 12M
|__ Port 7: Dev 66, If 0, Class=Hub, Driver=hub/2p, 480M
|__ Port 1: Dev 68, If 0, Class=Human Interface Device,
Driver=, 12M
|__ Port 7: Dev 64, If 0, Class=Human Interface Device,
Driver=usbhid, 12M
Again, device 64 (previous device 14) is not part of the game.
I compiled kernel 5.9.8, which is running, and have other kernels as
well. I will try to reproduce the problem in my lab. I will let you know
as soon as I have more information. If I were able to reproduce the
problem, I will try to use a Beagle USB analyzer at the same time the
problem occurs.
Thanks,
Alberto
> Alan Stern
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: kernel locks due to USB I/O
[not found] ` <3df90f9d-0af2-2aaa-9853-966f99e961a4@tripolho.com>
@ 2020-12-14 17:18 ` Alan Stern
2020-12-16 22:14 ` Alberto Sentieri
0 siblings, 1 reply; 17+ messages in thread
From: Alan Stern @ 2020-12-14 17:18 UTC (permalink / raw)
To: Alberto Sentieri; +Cc: Greg Kroah-Hartman, linux-usb
On Mon, Dec 14, 2020 at 09:43:25AM -0500, Alberto Sentieri wrote:
> Alan,
>
> I was finally able to assemble (on my desk) a similar environment which can
> reproduce the problem. It is locking with only 18 USB devices, which are
> being accessed in higher frequency than on the regular application (once
> every 3 seconds instead of once every 15 seconds). I was able to reproduce
> locks with kernel version 5.9.8, which I compiled myself.
>
> Now a brief description of what I did.
>
> Each access I referred to on the first paragraph was composed of 3 64-byte
> USB interrupt packets, which are: a request the computer sends, a response
> the device sends back, and a confirmation that the computer send to finish
> the cycle, which would be repeated in about 3 seconds.
>
> The computer motherboard I am using has a UART (apparently a 16550) and I
> could enable a serial console using: console=tty1 console=ttyS0,115200. I
> also would like to point out that ipv6.disable=1 is the other kernel
> argument I am using. The serial console was working properly and all its
> output was being captured by another computer during the locks. My lab
> machine has the same USB expansion card model being used at the production
> environment and that was the one being used during the experiments. Though
> my lab machine has some extra PCI USB cards installed, they were not being
> using during the locks.
>
> I did the experiment just a few times during the weekend, after receiving
> the last parts I needed on Saturday.
>
> So, with kernel 5.4.0-53-generic, which is one of the kernels released with
> Ubuntu, the locked happened in just a couple of minutes. I tried that twice.
> With kernel 5.9.8, the lock took more time to happen. Initially I though it
> would not happen, because I'd observed the machine running for about 15
> minutes and everything seemed normal. However after a couple of hours I
> found the machine locked. I also would like to point out that I’ve been
> using this machine for more than 6 months and that it had never locked on
> me, so that excludes any hardware problem.
>
> Unfortunately on the 3 times it locked (2 with 5.4.0 and one with 5.9.8)
> nothing was printed on the serial console.
>
> I will be working on this during the day and I will try to enable some of
> the debugging tools you suggested in your previous emails.
>
> Any comment on this will be highly appreciated.
We'll have to wait and see what the tests and debugging tools reveal.
As for the comment about the hardware being reliable -- in fact, you
can't be certain of this. It may be that your test is stressing the
hardware in a way that it never experienced during the prior six
months.
Alan Stern
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: kernel locks due to USB I/O
2020-12-14 17:18 ` Alan Stern
@ 2020-12-16 22:14 ` Alberto Sentieri
0 siblings, 0 replies; 17+ messages in thread
From: Alberto Sentieri @ 2020-12-16 22:14 UTC (permalink / raw)
To: Alan Stern; +Cc: Greg Kroah-Hartman, linux-usb
This is just a quick status update. I am just checking if it rings any
bell. I am still doing more tests.
The machine where all these problems are happening is a HP Z4 G4, as
well as the other two in production. Until this morning it had BIOS
v01.76 dated 04/22/2019. Today I update the BIOS to v02.63 dated 10/28/2020.
After updating the BIOS, the behavior definitively changed. Now both
kernels (5.4.0-53-generic and 5.9.8) reboot unexpectedly instead of
locking. After re-enabling the console, expecting to capture something,
I could not see a lock or reboot. However the serial console, after
displaying the whole boot, locked, and I cannot use it to login.
Alberto Sentieri
On 12/14/20 12:18 PM, Alan Stern wrote:
> We'll have to wait and see what the tests and debugging tools reveal.
> As for the comment about the hardware being reliable -- in fact, you
> can't be certain of this. It may be that your test is stressing the
> hardware in a way that it never experienced during the prior six
> months.
>
> Alan Stern
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2020-12-16 22:15 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-10 19:20 kernel locks due to USB I/O Alberto Sentieri
2020-11-10 20:51 ` Alan Stern
2020-11-10 23:42 ` Alberto Sentieri
2020-11-11 7:51 ` Greg Kroah-Hartman
2020-11-11 15:51 ` Alan Stern
2020-11-11 19:31 ` Alberto Sentieri
2020-11-16 16:53 ` Alberto Sentieri
2020-11-16 17:06 ` Alan Stern
2020-11-16 18:42 ` Alberto Sentieri
2020-11-19 17:22 ` Alan Stern
2020-11-19 18:50 ` Alberto Sentieri
2020-11-19 20:01 ` Alan Stern
[not found] ` <4f8f545e-4846-45e0-b8f8-5c73876b150a@tripolho.com>
[not found] ` <20201119225144.GA590990@rowland.harvard.edu>
[not found] ` <3df90f9d-0af2-2aaa-9853-966f99e961a4@tripolho.com>
2020-12-14 17:18 ` Alan Stern
2020-12-16 22:14 ` Alberto Sentieri
2020-11-19 19:21 ` Alberto Sentieri
2020-11-19 19:43 ` Alan Stern
2020-11-19 22:14 ` Alberto Sentieri
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.