From: Rik van Riel <riel@surriel.com>
To: Mathias Nyman <mathias.nyman@linux.intel.com>,
linux-usb <linux-usb@vger.kernel.org>
Cc: alsa-devel@alsa-project.org,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Mathias Nyman <mathias.nyman@intel.com>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
Jaroslav Kysela <perex@perex.cz>, Takashi Iwai <tiwai@suse.com>
Subject: Re: XHCI vs PCM2903B/PCM2904 part 2
Date: Wed, 20 May 2020 15:53:25 -0400 [thread overview]
Message-ID: <438394b419955a430d884a175a6eda0a99ce4d08.camel@surriel.com> (raw)
In-Reply-To: <296d1b03-fbc3-6107-950e-00d38633ea1f@linux.intel.com>
[-- Attachment #1: Type: text/plain, Size: 5293 bytes --]
On Wed, 2020-05-20 at 16:50 +0300, Mathias Nyman wrote:
> On 20.5.2020 14.26, Rik van Riel wrote:
> > After a few more weeks of digging, I have come to the tentative
> > conclusion that either the XHCI driver, or the USB sound driver,
> > or both, fail to handle USB errors correctly.
> >
> > I have some questions at the bottom, after a (brief-ish)
> > explanation
> > of exactly what seems to go wrong.
> >
> > TL;DR: arecord from a misbehaving device can hang forever
> > after a USB error, due to poll on /dev/snd/timer never returning.
> >
> > The details: under some mysterious circumstances, the PCM290x
> > family sound chips can send more data than expected during an
> > isochronous transfer, leading to a babble error. Those
> > circumstances seem to in part depend on the USB host controller
> > and/or the electrical environment, since the chips work just
> > fine for most people.
> >
> > Receiving data past the end of the isochronous transfer window
> > scheduled for a device results in the XHCI controller throwing
> > a babble error, which moves the endpoint into halted state.
> >
> > This is followed by the host controller software sending a
> > reset endpoint command, and moving the endpoint into stopped
> > state, as specified on pages 164-165 of the XHCI specification.
>
> Note that isoch endpoints should generate buffer overrun instead of
> babble detect error on TD babble conditions.
> See xHCI spec 6.4.5 additional note 115.
Maybe it should, but I'm hitting this printk in handle_tx_event:
case COMP_BABBLE_DETECTED_ERROR:
xhci_dbg(xhci, "Babble error for slot %u ep %u on
endpoint\n",
slot_id, ep_index);
status = -EOVERFLOW;
break;
Tracing in the sound driver suggests that packet belongs to
the sound data device (not the sync device), because the
URB with -EOVERFLOW status for the isochronous packet in that
URB is the last one seen there.
> Or maybe a frame babble could halt an isoc endpoint, see xhci
> 4.10.2.4.1
> but then you should see a port error and port going to disabled
> state.
>
> Any logs of this?
>
> mount -t debugfs none /sys/kernel/debug
> echo 'module xhci_hcd =p' >/sys/kernel/debug/dynamic_debug/control
> echo 'module usbcore =p' >/sys/kernel/debug/dynamic_debug/control
> echo 81920 > /sys/kernel/debug/tracing/buffer_size_kb
> echo 1 > /sys/kernel/debug/tracing/events/xhci-hcd/enable
> < trigger the issue >
> Send output of dmesg
> Send content of /sys/kernel/debug/tracing/trace
dmesg:
https://drive.google.com/open?id=1S2Qc8lroqA5-RMukuLBLWFGx10vEjG-i
usbtrace:
https://drive.google.com/open?id=1cbLcOnAtQRW0Chgak6PNC0l4yJv__4uO
> > However, the USB sound driver seems to have no idea that this
> > error happened. The function retire_capture_urb looks at the
> > status of each isochronous frame, but seems to be under the
> > assumption that the sound device just keeps on running.
> >
> > The function snd_complete_urb seems to only detect that the
> > device is not running if usb_submit_urb returns a failure.
> >
> > err = usb_submit_urb(urb, GFP_ATOMIC);
> > if (err == 0)
> > return;
> >
> > usb_audio_err(ep->chip, "cannot submit urb (err = %d)\n",
> > err);
> >
> > if (ep->data_subs && ep->data_subs->pcm_substream) {
> > substream = ep->data_subs->pcm_substream;
> > snd_pcm_stop_xrun(substream);
> > }
> >
> > However, the XHCI driver will happily submit an URB to a
> > stopped device. Looking at the call trace usb_submit_urb ->
> > xhci_urb_enqueue -> xhci_queue_isoc_tx_prepare -> prepare_ring,
> > you can see this code:
> >
> > /* Make sure the endpoint has been added to xHC schedule */
> > switch (ep_state) {
> > ...
> > case EP_STATE_HALTED:
> > xhci_dbg(xhci, "WARN halted endpoint, queueing URB
> > anyway.\n");
> > case EP_STATE_STOPPED:
> > case EP_STATE_RUNNING:
> > break;
> >
> > This leads me to a few questions:
> > - should retire_capture_urb call snd_pcm_stop_xrun,
> > or another function like it, if it sees certain
> > errors in the iso frame in the URB?
> > - should snd_complete_urb do something with these
> > errors, too, in case they happen on the sync frames
> > and not the data frames?
> > - does the XHCI code need to ring the doorbell when
> > submitting an URB to a stopped device, or is it
> > always up to the higher-level driver to fully reset
> > the device before it can do anything useful?
>
> xhci driver should ring the doorbell.
>
> xhci_queue_isoc_tx()
> giveback_first_trb()
> xhci_ring_ep_doorbell()
>
> when we are talking about babble or transaction errors the device
> might be
> completely unaware of the situation. Device side of the endpoint is
> not necessarily halted.
> xHC host will only halt its internal endpoint state, and it needs a
> reset
> endopoint command from the xhci driver to clear the internal halt
> state.
Interesting. I had no idea the endpoint on each side
was controlled separately.
--
All Rights Reversed.
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
next prev parent reply other threads:[~2020-05-20 19:53 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-05-20 11:26 XHCI vs PCM2903B/PCM2904 part 2 Rik van Riel
2020-05-20 11:41 ` Takashi Iwai
2020-05-20 13:50 ` Mathias Nyman
2020-05-20 19:53 ` Rik van Riel [this message]
2020-05-20 16:38 ` Alan Stern
2020-05-20 19:21 ` Rik van Riel
2020-05-20 20:34 ` Alan Stern
2020-05-21 3:45 ` Rik van Riel
2020-05-25 9:37 ` Mathias Nyman
2020-06-30 3:21 ` Rik van Riel
2020-06-30 3:55 ` Rik van Riel
[not found] ` <90D456E4-328F-49F7-99C4-D729E38FA04E@surriel.com>
2020-06-30 14:27 ` Mathias Nyman
2020-06-30 18:52 ` Rik van Riel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=438394b419955a430d884a175a6eda0a99ce4d08.camel@surriel.com \
--to=riel@surriel.com \
--cc=alsa-devel@alsa-project.org \
--cc=gregkh@linuxfoundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-usb@vger.kernel.org \
--cc=mathias.nyman@intel.com \
--cc=mathias.nyman@linux.intel.com \
--cc=perex@perex.cz \
--cc=tiwai@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).