All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alan Stern <stern@rowland.harvard.edu>
To: Alberto Sentieri <22t@tripolho.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	linux-usb@vger.kernel.org
Subject: Re: kernel locks due to USB I/O
Date: Tue, 10 Nov 2020 15:51:14 -0500	[thread overview]
Message-ID: <20201110205114.GB204624@rowland.harvard.edu> (raw)
In-Reply-To: <9428ae70-887e-b48b-f31c-f95d58f67c61@tripolho.com>

On Tue, Nov 10, 2020 at 02:20:50PM -0500, Alberto Sentieri wrote:
> I’ve seen many kernel locks caused by a particular user-level application.
> After the kernel locks, there is no report left in the machine, neither in
> the logs. These locks have to do with USB input and output.
> 
> The objective of this email is to get guidance about how to collect more
> data related to the locks.
> 
> Follows a description of the problem.
> 
> I manage a few remote machines installed at a manufacturing facility, which
> run Ubuntu 18.04. For months I had seen unexpected kernel locks, which I
> could not explain. By locks I mean that the machine completely dies. The
> graphical screen and keyboard freezes. I cannot ping or connect through ssh
> during the locks. The only way of making the machine come back is through a
> “pull the plug”. After rebooting I cannot find anything meaningful about the
> lock in the logs. The machine is a good quality one with a 6-core Xeon, 32
> GB ECC memory (and the application is using about 1GB). Exact the same
> problem happens in two identical machines, one running kernel 5.0.0-37
> generic and the other running kernel 5.3.0-62-generic.

Can you update either machine to a 5.9 kernel?

> A few days ago I was able to create a sequence of events that produce the
> locks in a couple of minutes. These events have to do with USB 2.0 interrupt
> I/O on USB devices connected at 12 Mbits/s and the frequency URBs are
> submitted and reaped . It is necessary to have at least 36 devices connected
> to reproduce the problem easily, which I cannot do from where I am. The
> machines are in a country other than the one I live, and my physical access
> to them is not possible due to COVID-19 restrictions.
> 
> There is no special USB drivers installed. However, there is a NVIDIA
> manufacturer driver installed, which I installed using the Ubuntu regular
> tools for non-free software. All USB I/O is done by a regular user opening
> /dev/bus/usb/xxx/xxx (the device group is set to the user group by udev).
> 
> Each set of 18 USB devices is connected to a 10-Amp.-power-supply powered
> HUB. Each hub has its own USB 2.0 root, I mean, I installed multiple USB 2.0
> PCI express expansion cards, and only one port of each expansion card is
> used for each HUB.
> 
> The protocol to talk to any of the 36 devices is pretty simple. It uses USB
> interrupt frames. A 64-byte frame is sent to the device (request packet). I
> use ioctl (USBDEVFS_SUBMITURB). The file descriptor is monitored by epoll
> and when an answer comes back, the response packet (another 64-byte
> interrupt packet) is recovered by ioctl (USBDEVFS_REAPURBNDELAY). Then a
> 64-byte packet (confirmation packet) is sent through USBDEVFS_SUBMITURB.
> This sequence happens once every few seconds and the delay between the three
> packets is just a couple of milliseconds. All process of dealing with the 36
> devices is in a unique thread, under the same epoll loop.

This sentence is ambiguous.  Do you mean there is a single unique thread 
which talks to all 36 devices?  Or do you mean there is a separate 
unique thread for each device (so 36 threads)?

> So if I synchronize all 36 devices, I mean, I try to talk to all them
> basically at the same time, the kernel will lock in about 2 minutes or less.
> By “at the same time” I mean to submit the URBs for the request packet
> around the same time for all of them, and then sit there, waiting for the
> proper epoll wake-up to deal with the state machine (response and
> confirmation packets).
> 
> However, if I lock a semaphore before sending the request packet for one
> device, and only unlock after reaping the URB I used to send the
> confirmation packet, it ran for ate least 72 hours without problems. So, one
> device at a time (using basically the same software plus the semaphore) does
> not cause the kernel lock.
> 
> My point is that simple ioctl calls to USB devices should not break the
> kernel. I need help to address the kernel issue. The problem is difficult to
> reproduce at my office because it needs many devices connected to it, which
> are available only in a place I do not have physical access to, due to
> COVID-19 travel restrictions.
> 
> My guess is that, for a regular user, this bug rarely manifests itself and
> it may be there for a long time.
> 
> I would like to figure out exactly where the problem is and I am looking for
> your guidance to get more information about it.

You could try using a network console.  Or have someone who is on-site 
take a picture of the computer screen when a crash occurs.

Alan Stern

  reply	other threads:[~2020-11-10 20:51 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-10 19:20 kernel locks due to USB I/O Alberto Sentieri
2020-11-10 20:51 ` Alan Stern [this message]
2020-11-10 23:42   ` Alberto Sentieri
2020-11-11  7:51     ` Greg Kroah-Hartman
2020-11-11 15:51     ` Alan Stern
2020-11-11 19:31       ` Alberto Sentieri
2020-11-16 16:53       ` Alberto Sentieri
2020-11-16 17:06         ` Alan Stern
2020-11-16 18:42           ` Alberto Sentieri
2020-11-19 17:22             ` Alan Stern
2020-11-19 18:50               ` Alberto Sentieri
2020-11-19 20:01                 ` Alan Stern
     [not found]                   ` <4f8f545e-4846-45e0-b8f8-5c73876b150a@tripolho.com>
     [not found]                     ` <20201119225144.GA590990@rowland.harvard.edu>
     [not found]                       ` <3df90f9d-0af2-2aaa-9853-966f99e961a4@tripolho.com>
2020-12-14 17:18                         ` Alan Stern
2020-12-16 22:14                           ` Alberto Sentieri
2020-11-19 19:21               ` Alberto Sentieri
2020-11-19 19:43                 ` Alan Stern
2020-11-19 22:14                   ` Alberto Sentieri

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201110205114.GB204624@rowland.harvard.edu \
    --to=stern@rowland.harvard.edu \
    --cc=22t@tripolho.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-usb@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.