All of lore.kernel.org
 help / color / mirror / Atom feed
From: Rik van Riel <riel@surriel.com>
To: Mathias Nyman <mathias.nyman@linux.intel.com>,
	linux-usb <linux-usb@vger.kernel.org>
Cc: alsa-devel@alsa-project.org,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Mathias Nyman <mathias.nyman@intel.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Jaroslav Kysela <perex@perex.cz>, Takashi Iwai <tiwai@suse.com>
Subject: Re: XHCI vs PCM2903B/PCM2904 part 2
Date: Wed, 20 May 2020 15:53:25 -0400	[thread overview]
Message-ID: <438394b419955a430d884a175a6eda0a99ce4d08.camel@surriel.com> (raw)
In-Reply-To: <296d1b03-fbc3-6107-950e-00d38633ea1f@linux.intel.com>

[-- Attachment #1: Type: text/plain, Size: 5293 bytes --]

On Wed, 2020-05-20 at 16:50 +0300, Mathias Nyman wrote:
> On 20.5.2020 14.26, Rik van Riel wrote:
> > After a few more weeks of digging, I have come to the tentative
> > conclusion that either the XHCI driver, or the USB sound driver,
> > or both, fail to handle USB errors correctly.
> > 
> > I have some questions at the bottom, after a (brief-ish)
> > explanation
> > of exactly what seems to go wrong.
> > 
> > TL;DR: arecord from a misbehaving device can hang forever
> > after a USB error, due to poll on /dev/snd/timer never returning.
> > 
> > The details: under some mysterious circumstances, the PCM290x
> > family sound chips can send more data than expected during an
> > isochronous transfer, leading to a babble error. Those
> > circumstances seem to in part depend on the USB host controller
> > and/or the electrical environment, since the chips work just
> > fine for most people.
> > 
> > Receiving data past the end of the isochronous transfer window
> > scheduled for a device results in the XHCI controller throwing
> > a babble error, which moves the endpoint into halted state.
> > 
> > This is followed by the host controller software sending a
> > reset endpoint command, and moving the endpoint into stopped
> > state, as specified on pages 164-165 of the XHCI specification.
> 
> Note that isoch endpoints should generate buffer overrun instead of
> babble detect error on TD babble conditions. 
> See xHCI spec 6.4.5 additional note 115.

Maybe it should, but I'm hitting this printk in handle_tx_event:

        case COMP_BABBLE_DETECTED_ERROR:
                xhci_dbg(xhci, "Babble error for slot %u ep %u on
endpoint\n",
                         slot_id, ep_index);
                status = -EOVERFLOW;
                break;

Tracing in the sound driver suggests that packet belongs to
the sound data device (not the sync device), because the
URB with -EOVERFLOW status for the isochronous packet in that
URB is the last one seen there.

> Or maybe a frame babble could halt an isoc endpoint, see xhci
> 4.10.2.4.1
> but then you should see a port error and port going to disabled
> state.
> 
> Any logs of this?
> 
> mount -t debugfs none /sys/kernel/debug
> echo 'module xhci_hcd =p' >/sys/kernel/debug/dynamic_debug/control
> echo 'module usbcore =p' >/sys/kernel/debug/dynamic_debug/control
> echo 81920 > /sys/kernel/debug/tracing/buffer_size_kb
> echo 1 > /sys/kernel/debug/tracing/events/xhci-hcd/enable
> < trigger the issue >
> Send output of dmesg
> Send content of /sys/kernel/debug/tracing/trace

dmesg: 
https://drive.google.com/open?id=1S2Qc8lroqA5-RMukuLBLWFGx10vEjG-i

usbtrace: 
https://drive.google.com/open?id=1cbLcOnAtQRW0Chgak6PNC0l4yJv__4uO

> > However, the USB sound driver seems to have no idea that this
> > error happened. The function retire_capture_urb looks at the
> > status of each isochronous frame, but seems to be under the
> > assumption that the sound device just keeps on running.
> > 
> > The function snd_complete_urb seems to only detect that the
> > device is not running if usb_submit_urb returns a failure.
> > 
> >         err = usb_submit_urb(urb, GFP_ATOMIC);
> >         if (err == 0)
> >                 return;
> > 
> >         usb_audio_err(ep->chip, "cannot submit urb (err = %d)\n",
> > err);
> > 
> >         if (ep->data_subs && ep->data_subs->pcm_substream) {
> >                 substream = ep->data_subs->pcm_substream;
> >                 snd_pcm_stop_xrun(substream);
> >         }
> > 
> > However, the XHCI driver will happily submit an URB to a
> > stopped device. Looking at the call trace usb_submit_urb ->
> > xhci_urb_enqueue -> xhci_queue_isoc_tx_prepare -> prepare_ring,
> > you can see this code:
> > 
> >         /* Make sure the endpoint has been added to xHC schedule */
> >         switch (ep_state) {
> > ...
> >         case EP_STATE_HALTED:
> >                 xhci_dbg(xhci, "WARN halted endpoint, queueing URB
> > anyway.\n");
> >         case EP_STATE_STOPPED:
> >         case EP_STATE_RUNNING:
> >                 break;
> > 
> > This leads me to a few questions:
> > - should retire_capture_urb call snd_pcm_stop_xrun,
> >   or another function like it, if it sees certain
> >   errors in the iso frame in the URB?
> > - should snd_complete_urb do something with these
> >   errors, too, in case they happen on the sync frames
> >   and not the data frames?
> > - does the XHCI code need to ring the doorbell when
> >   submitting an URB to a stopped device, or is it
> >   always up to the higher-level driver to fully reset
> >   the device before it can do anything useful?
> 
> xhci driver should ring the doorbell.
> 
> xhci_queue_isoc_tx()
>   giveback_first_trb()
>     xhci_ring_ep_doorbell()
> 
> when we are talking about babble or transaction errors the device
> might be
> completely unaware of the situation. Device side of the endpoint is
> not necessarily halted.
> xHC host will only halt its internal endpoint state, and it needs a
> reset
> endopoint command from the xhci driver to clear the internal halt
> state. 

Interesting. I had no idea the endpoint on each side
was controlled separately.

-- 
All Rights Reversed.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

WARNING: multiple messages have this Message-ID (diff)
From: Rik van Riel <riel@surriel.com>
To: Mathias Nyman <mathias.nyman@linux.intel.com>,
	linux-usb <linux-usb@vger.kernel.org>
Cc: alsa-devel@alsa-project.org,
	Mathias Nyman <mathias.nyman@intel.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Takashi Iwai <tiwai@suse.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: XHCI vs PCM2903B/PCM2904 part 2
Date: Wed, 20 May 2020 15:53:25 -0400	[thread overview]
Message-ID: <438394b419955a430d884a175a6eda0a99ce4d08.camel@surriel.com> (raw)
In-Reply-To: <296d1b03-fbc3-6107-950e-00d38633ea1f@linux.intel.com>

[-- Attachment #1: Type: text/plain, Size: 5293 bytes --]

On Wed, 2020-05-20 at 16:50 +0300, Mathias Nyman wrote:
> On 20.5.2020 14.26, Rik van Riel wrote:
> > After a few more weeks of digging, I have come to the tentative
> > conclusion that either the XHCI driver, or the USB sound driver,
> > or both, fail to handle USB errors correctly.
> > 
> > I have some questions at the bottom, after a (brief-ish)
> > explanation
> > of exactly what seems to go wrong.
> > 
> > TL;DR: arecord from a misbehaving device can hang forever
> > after a USB error, due to poll on /dev/snd/timer never returning.
> > 
> > The details: under some mysterious circumstances, the PCM290x
> > family sound chips can send more data than expected during an
> > isochronous transfer, leading to a babble error. Those
> > circumstances seem to in part depend on the USB host controller
> > and/or the electrical environment, since the chips work just
> > fine for most people.
> > 
> > Receiving data past the end of the isochronous transfer window
> > scheduled for a device results in the XHCI controller throwing
> > a babble error, which moves the endpoint into halted state.
> > 
> > This is followed by the host controller software sending a
> > reset endpoint command, and moving the endpoint into stopped
> > state, as specified on pages 164-165 of the XHCI specification.
> 
> Note that isoch endpoints should generate buffer overrun instead of
> babble detect error on TD babble conditions. 
> See xHCI spec 6.4.5 additional note 115.

Maybe it should, but I'm hitting this printk in handle_tx_event:

        case COMP_BABBLE_DETECTED_ERROR:
                xhci_dbg(xhci, "Babble error for slot %u ep %u on
endpoint\n",
                         slot_id, ep_index);
                status = -EOVERFLOW;
                break;

Tracing in the sound driver suggests that packet belongs to
the sound data device (not the sync device), because the
URB with -EOVERFLOW status for the isochronous packet in that
URB is the last one seen there.

> Or maybe a frame babble could halt an isoc endpoint, see xhci
> 4.10.2.4.1
> but then you should see a port error and port going to disabled
> state.
> 
> Any logs of this?
> 
> mount -t debugfs none /sys/kernel/debug
> echo 'module xhci_hcd =p' >/sys/kernel/debug/dynamic_debug/control
> echo 'module usbcore =p' >/sys/kernel/debug/dynamic_debug/control
> echo 81920 > /sys/kernel/debug/tracing/buffer_size_kb
> echo 1 > /sys/kernel/debug/tracing/events/xhci-hcd/enable
> < trigger the issue >
> Send output of dmesg
> Send content of /sys/kernel/debug/tracing/trace

dmesg: 
https://drive.google.com/open?id=1S2Qc8lroqA5-RMukuLBLWFGx10vEjG-i

usbtrace: 
https://drive.google.com/open?id=1cbLcOnAtQRW0Chgak6PNC0l4yJv__4uO

> > However, the USB sound driver seems to have no idea that this
> > error happened. The function retire_capture_urb looks at the
> > status of each isochronous frame, but seems to be under the
> > assumption that the sound device just keeps on running.
> > 
> > The function snd_complete_urb seems to only detect that the
> > device is not running if usb_submit_urb returns a failure.
> > 
> >         err = usb_submit_urb(urb, GFP_ATOMIC);
> >         if (err == 0)
> >                 return;
> > 
> >         usb_audio_err(ep->chip, "cannot submit urb (err = %d)\n",
> > err);
> > 
> >         if (ep->data_subs && ep->data_subs->pcm_substream) {
> >                 substream = ep->data_subs->pcm_substream;
> >                 snd_pcm_stop_xrun(substream);
> >         }
> > 
> > However, the XHCI driver will happily submit an URB to a
> > stopped device. Looking at the call trace usb_submit_urb ->
> > xhci_urb_enqueue -> xhci_queue_isoc_tx_prepare -> prepare_ring,
> > you can see this code:
> > 
> >         /* Make sure the endpoint has been added to xHC schedule */
> >         switch (ep_state) {
> > ...
> >         case EP_STATE_HALTED:
> >                 xhci_dbg(xhci, "WARN halted endpoint, queueing URB
> > anyway.\n");
> >         case EP_STATE_STOPPED:
> >         case EP_STATE_RUNNING:
> >                 break;
> > 
> > This leads me to a few questions:
> > - should retire_capture_urb call snd_pcm_stop_xrun,
> >   or another function like it, if it sees certain
> >   errors in the iso frame in the URB?
> > - should snd_complete_urb do something with these
> >   errors, too, in case they happen on the sync frames
> >   and not the data frames?
> > - does the XHCI code need to ring the doorbell when
> >   submitting an URB to a stopped device, or is it
> >   always up to the higher-level driver to fully reset
> >   the device before it can do anything useful?
> 
> xhci driver should ring the doorbell.
> 
> xhci_queue_isoc_tx()
>   giveback_first_trb()
>     xhci_ring_ep_doorbell()
> 
> when we are talking about babble or transaction errors the device
> might be
> completely unaware of the situation. Device side of the endpoint is
> not necessarily halted.
> xHC host will only halt its internal endpoint state, and it needs a
> reset
> endopoint command from the xhci driver to clear the internal halt
> state. 

Interesting. I had no idea the endpoint on each side
was controlled separately.

-- 
All Rights Reversed.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2020-05-20 19:53 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-20 11:26 XHCI vs PCM2903B/PCM2904 part 2 Rik van Riel
2020-05-20 11:26 ` Rik van Riel
2020-05-20 11:41 ` Takashi Iwai
2020-05-20 11:41   ` Takashi Iwai
2020-05-20 13:50 ` Mathias Nyman
2020-05-20 13:50   ` Mathias Nyman
2020-05-20 19:53   ` Rik van Riel [this message]
2020-05-20 19:53     ` Rik van Riel
2020-05-20 16:38 ` Alan Stern
2020-05-20 16:38   ` Alan Stern
2020-05-20 19:21   ` Rik van Riel
2020-05-20 19:21     ` Rik van Riel
2020-05-20 20:34     ` Alan Stern
2020-05-20 20:34       ` Alan Stern
2020-05-21  3:45       ` Rik van Riel
2020-05-21  3:45         ` Rik van Riel
2020-05-25  9:37         ` Mathias Nyman
2020-05-25  9:37           ` Mathias Nyman
2020-06-30  3:21           ` Rik van Riel
2020-06-30  3:55             ` Rik van Riel
     [not found]               ` <90D456E4-328F-49F7-99C4-D729E38FA04E@surriel.com>
2020-06-30 14:27                 ` Mathias Nyman
2020-06-30 14:27                   ` Mathias Nyman
2020-06-30 18:52                   ` Rik van Riel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=438394b419955a430d884a175a6eda0a99ce4d08.camel@surriel.com \
    --to=riel@surriel.com \
    --cc=alsa-devel@alsa-project.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-usb@vger.kernel.org \
    --cc=mathias.nyman@intel.com \
    --cc=mathias.nyman@linux.intel.com \
    --cc=perex@perex.cz \
    --cc=tiwai@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.