RE: Re: Re: Re: Re: Re: [syzbot] INFO: rcu detected stall in tx

All of lore.kernel.org
 help / color / mirror / Atom feed

* RE: Re: Re: Re: Re: Re: [syzbot] INFO: rcu detected stall in tx
@ 2021-05-19 16:14 Guido Kiener
  2021-05-19 17:35 ` Alan Stern
  2021-05-19 18:04 ` Re: Re: Re: Re: " Lee Jones
  0 siblings, 2 replies; 17+ messages in thread
From: Guido Kiener @ 2021-05-19 16:14 UTC (permalink / raw)
  To: Alan Stern, dave penkler
  Cc: Dmitry Vyukov, syzbot, Greg Kroah-Hartman, lee.jones, USB list,
	bp, dwmw, hpa, linux-kernel, luto, mingo, syzkaller-bugs, tglx,
	x86

> On Wed, May 19, 2021 at 10:48:29AM +0200, dave penkler wrote:
> > On Sat, 8 May 2021 at 16:29, Alan Stern <stern@rowland.harvard.edu> wrote:
> > >
> > > On Sat, May 08, 2021 at 10:14:41AM +0200, dave penkler wrote:
> > > > When the host driver detects a protocol error while processing an
> > > > URB it completes the URB with EPROTO status and marks the endpoint
> > > > as halted.
> > >
> > > Not true.  It does not mark the endpoint as halted, not unless it
> > > receives a STALL handshake from the device.  A STALL is not a
> > > protocol error.
> > >
> > > > When the class driver resubmits the URB and the if the host driver
> > > > finds the endpoint still marked as halted it should return EPIPE
> > > > status on the resubmitted URB
> > >
> > > Irrelevant.
> > Not at all. The point is that when an application is talking to an
> > instrument over the usbtmc driver, the underlying host controller and
> > its driver will detect and silence a babbling endpoint.
> 
> No, they won't.  That is, they will detect a babble error and return an error status, but
> they won't silence the endpoint.  What makes you think they will?

Maybe there is a misunderstanding. I guess that Dave wanted to propose:
"EPROTO is a link level issue and needs to be handled by the host driver.
When the host driver detects a protocol error while processing an
URB it SHOULD complete the URB with EPROTO status and SHOULD mark the endpoint
as halted."
Is this a realistic fix for all host drivers?

-Guido

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Re: Re: Re: Re: Re: [syzbot] INFO: rcu detected stall in tx
  2021-05-19 16:14 Re: Re: Re: Re: Re: [syzbot] INFO: rcu detected stall in tx Guido Kiener
@ 2021-05-19 17:35 ` Alan Stern
  2021-05-19 19:38   ` Thinh Nguyen
  2021-05-19 18:04 ` Re: Re: Re: Re: " Lee Jones
  1 sibling, 1 reply; 17+ messages in thread
From: Alan Stern @ 2021-05-19 17:35 UTC (permalink / raw)
  To: Guido Kiener
  Cc: dave penkler, Dmitry Vyukov, syzbot, Greg Kroah-Hartman,
	lee.jones, USB list, bp, dwmw, hpa, linux-kernel, luto, mingo,
	syzkaller-bugs, tglx, x86

On Wed, May 19, 2021 at 04:14:20PM +0000, Guido Kiener wrote:
> > On Wed, May 19, 2021 at 10:48:29AM +0200, dave penkler wrote:
> > > On Sat, 8 May 2021 at 16:29, Alan Stern <stern@rowland.harvard.edu> wrote:
> > > >
> > > > On Sat, May 08, 2021 at 10:14:41AM +0200, dave penkler wrote:
> > > > > When the host driver detects a protocol error while processing an
> > > > > URB it completes the URB with EPROTO status and marks the endpoint
> > > > > as halted.
> > > >
> > > > Not true.  It does not mark the endpoint as halted, not unless it
> > > > receives a STALL handshake from the device.  A STALL is not a
> > > > protocol error.
> > > >
> > > > > When the class driver resubmits the URB and the if the host driver
> > > > > finds the endpoint still marked as halted it should return EPIPE
> > > > > status on the resubmitted URB
> > > >
> > > > Irrelevant.
> > > Not at all. The point is that when an application is talking to an
> > > instrument over the usbtmc driver, the underlying host controller and
> > > its driver will detect and silence a babbling endpoint.
> > 
> > No, they won't.  That is, they will detect a babble error and return an error status, but
> > they won't silence the endpoint.  What makes you think they will?
> 
> Maybe there is a misunderstanding. I guess that Dave wanted to propose:
> "EPROTO is a link level issue and needs to be handled by the host driver.
> When the host driver detects a protocol error while processing an
> URB it SHOULD complete the URB with EPROTO status

The host controller drivers _do_ complete URBs with -EPROTO (or similar) 
status when a link-level error occurs...

> and SHOULD mark the endpoint
> as halted."

but they don't mark the endpoint as halted.  Even if they did, it 
wouldn't fix anything because the kernel allows URBs to be submitted to 
halted endpoints.  In fact, it doesn't even keep track of which 
endpoints are or are not halted.

> Is this a realistic fix for all host drivers?

No, it isn't.

An endpoint shouldn't be marked as halted unless it really is halted.  
Otherwise a driver might be tempted to clear the Halt feature, and 
some devices do not like to receive a Clear-Halt request for an endpoint 
that isn't halted.

What we could do is what you suggested earlier: Note the fact that the 
endpoint is in some sort of fault condition and disallow further 
communication with the endpoint until the fault condition has been 
cleared.  (It isn't entirely obvious exactly what actions should clear 
such a fault...  I guess resetting or re-enabling the endpoint, or 
resetting the entire device.)

Alan Stern

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Re: Re: Re: Re: Re: [syzbot] INFO: rcu detected stall in tx
  2021-05-19 16:14 Re: Re: Re: Re: Re: [syzbot] INFO: rcu detected stall in tx Guido Kiener
  2021-05-19 17:35 ` Alan Stern
@ 2021-05-19 18:04 ` Lee Jones
  1 sibling, 0 replies; 17+ messages in thread
From: Lee Jones @ 2021-05-19 18:04 UTC (permalink / raw)
  To: Guido Kiener
  Cc: Alan Stern, dave penkler, Dmitry Vyukov, syzbot,
	Greg Kroah-Hartman, USB list, bp, dwmw, hpa, linux-kernel, luto,
	mingo, syzkaller-bugs, tglx, x86

On Wed, 19 May 2021, Guido Kiener wrote:

> > On Wed, May 19, 2021 at 10:48:29AM +0200, dave penkler wrote:
> > > On Sat, 8 May 2021 at 16:29, Alan Stern <stern@rowland.harvard.edu> wrote:
> > > >
> > > > On Sat, May 08, 2021 at 10:14:41AM +0200, dave penkler wrote:
> > > > > When the host driver detects a protocol error while processing an
> > > > > URB it completes the URB with EPROTO status and marks the endpoint
> > > > > as halted.
> > > >
> > > > Not true.  It does not mark the endpoint as halted, not unless it
> > > > receives a STALL handshake from the device.  A STALL is not a
> > > > protocol error.
> > > >
> > > > > When the class driver resubmits the URB and the if the host driver
> > > > > finds the endpoint still marked as halted it should return EPIPE
> > > > > status on the resubmitted URB
> > > >
> > > > Irrelevant.
> > > Not at all. The point is that when an application is talking to an
> > > instrument over the usbtmc driver, the underlying host controller and
> > > its driver will detect and silence a babbling endpoint.
> > 
> > No, they won't.  That is, they will detect a babble error and return an error status, but
> > they won't silence the endpoint.  What makes you think they will?
> 
> Maybe there is a misunderstanding. I guess that Dave wanted to propose:
> "EPROTO is a link level issue and needs to be handled by the host driver.
> When the host driver detects a protocol error while processing an
> URB it SHOULD complete the URB with EPROTO status and SHOULD mark the endpoint
> as halted."
> Is this a realistic fix for all host drivers?
> 
> -Guido

Guido, would you mind taking a look at your mailer settings please?  I
now have >=7 threads running through my inbox with the same subject.
For some reason your mailer is insisting on creating a new one for
each of your replies.

It's also adding odd "re: re: re: ..." prefixes.

TIA

-- 
Lee Jones [李琼斯]
Senior Technical Lead - Developer Services
Linaro.org │ Open source software for Arm SoCs
Follow Linaro: Facebook | Twitter | Blog

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [syzbot] INFO: rcu detected stall in tx
  2021-05-19 17:35 ` Alan Stern
@ 2021-05-19 19:38   ` Thinh Nguyen
  2021-05-20  2:01     ` Alan Stern
  0 siblings, 1 reply; 17+ messages in thread
From: Thinh Nguyen @ 2021-05-19 19:38 UTC (permalink / raw)
  To: Alan Stern, Guido Kiener
  Cc: dave penkler, Dmitry Vyukov, syzbot, Greg Kroah-Hartman,
	lee.jones, USB list, bp, dwmw, hpa, linux-kernel, luto, mingo,
	syzkaller-bugs, tglx, x86

Alan Stern wrote:
> On Wed, May 19, 2021 at 04:14:20PM +0000, Guido Kiener wrote:
>>> On Wed, May 19, 2021 at 10:48:29AM +0200, dave penkler wrote:
>>>> On Sat, 8 May 2021 at 16:29, Alan Stern <stern@rowland.harvard.edu> wrote:
>>>>>
>>>>> On Sat, May 08, 2021 at 10:14:41AM +0200, dave penkler wrote:
>>>>>> When the host driver detects a protocol error while processing an
>>>>>> URB it completes the URB with EPROTO status and marks the endpoint
>>>>>> as halted.
>>>>>
>>>>> Not true.  It does not mark the endpoint as halted, not unless it
>>>>> receives a STALL handshake from the device.  A STALL is not a
>>>>> protocol error.
>>>>>
>>>>>> When the class driver resubmits the URB and the if the host driver
>>>>>> finds the endpoint still marked as halted it should return EPIPE
>>>>>> status on the resubmitted URB
>>>>>
>>>>> Irrelevant.
>>>> Not at all. The point is that when an application is talking to an
>>>> instrument over the usbtmc driver, the underlying host controller and
>>>> its driver will detect and silence a babbling endpoint.
>>>
>>> No, they won't.  That is, they will detect a babble error and return an error status, but
>>> they won't silence the endpoint.  What makes you think they will?
>>
>> Maybe there is a misunderstanding. I guess that Dave wanted to propose:
>> "EPROTO is a link level issue and needs to be handled by the host driver.
>> When the host driver detects a protocol error while processing an
>> URB it SHOULD complete the URB with EPROTO status
> 
> The host controller drivers _do_ complete URBs with -EPROTO (or similar) 
> status when a link-level error occurs...
> 
>> and SHOULD mark the endpoint
>> as halted."
> 
> but they don't mark the endpoint as halted.  Even if they did, it 
> wouldn't fix anything because the kernel allows URBs to be submitted to 
> halted endpoints.  In fact, it doesn't even keep track of which 
> endpoints are or are not halted.
> 
>> Is this a realistic fix for all host drivers?
> 
> No, it isn't.
> 
> An endpoint shouldn't be marked as halted unless it really is halted.  
> Otherwise a driver might be tempted to clear the Halt feature, and 
> some devices do not like to receive a Clear-Halt request for an endpoint 
> that isn't halted.
> 
> What we could do is what you suggested earlier: Note the fact that the 
> endpoint is in some sort of fault condition and disallow further 
> communication with the endpoint until the fault condition has been 
> cleared.  (It isn't entirely obvious exactly what actions should clear 
> such a fault...  I guess resetting or re-enabling the endpoint, or 
> resetting the entire device.)
> 
> Alan Stern
> 

Hi Alan,

Sorry if this diverges from the thread, but I've been wondering whether
to add a change for this also.

For xHCI hosts, after transactions errors, the endpoint will enter
halted state. The driver will attempt a few soft-retries before giving
up. According to the xHCI spec (section 4.6.8), a host may send a
ClearFeature(endpoint_halt) to recover and restart the transfer (see
"reset a pipe" in xhci spec), and the class driver can handle this after
receiving something like -EPROTO from xhci.

However, as you've pointed out, some devices don't like
ClearFeature(ep_halt) and may not properly synchronize with the host on
where it should restart.

Some OS (such as Windows) do this. Not sure if we also want this?
Currently the recovery is just a timeout and a port reset from the class
driver, but the timeout is usually defaulted to a long time (e.g. 30
seconds for storage class driver).

Thanks,
Thinh

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [syzbot] INFO: rcu detected stall in tx
  2021-05-19 19:38   ` Thinh Nguyen
@ 2021-05-20  2:01     ` Alan Stern
  2021-05-20 20:30       ` Thinh Nguyen
  0 siblings, 1 reply; 17+ messages in thread
From: Alan Stern @ 2021-05-20  2:01 UTC (permalink / raw)
  To: Thinh Nguyen
  Cc: Guido Kiener, dave penkler, Dmitry Vyukov, syzbot,
	Greg Kroah-Hartman, lee.jones, USB list, bp, dwmw, hpa,
	linux-kernel, luto, mingo, syzkaller-bugs, tglx, x86

On Wed, May 19, 2021 at 07:38:52PM +0000, Thinh Nguyen wrote:
> Hi Alan,
> 
> Sorry if this diverges from the thread, but I've been wondering whether
> to add a change for this also.
> 
> For xHCI hosts, after transactions errors, the endpoint will enter
> halted state.

No.  You are misreading the xHCI spec.  Section 4.6.8 says:

	... the state of the associated Endpoint Context is set to 
	Halted...

Note this carefully.  It says "Endpoint Context", not "endpoint".

The endpoint is part of the device, whereas the endpoint context is part 
of the host controller.  The device doesn't know when a transaction 
error has occurred; consequently such errors do not affect the endpoint.  
The host controller does know, and consequently such errors do affect 
the endpoint context.

> The driver will attempt a few soft-retries before giving
> up. According to the xHCI spec (section 4.6.8), a host may send a
> ClearFeature(endpoint_halt) to recover and restart the transfer (see

Not quite.  The section of the spec you're talking about says:

	Software shall execute the following sequence to “reset a 
	pipe”....  Issue a ClearFeature(ENDPOINT_HALT) request to 
	device.

It does not say the host controller will do this; it says that software 
will do it.

> "reset a pipe" in xhci spec), and the class driver can handle this after
> receiving something like -EPROTO from xhci.
> 
> However, as you've pointed out, some devices don't like
> ClearFeature(ep_halt) and may not properly synchronize with the host on
> where it should restart.
> 
> Some OS (such as Windows) do this. Not sure if we also want this?

In general we should do the same thing as Windows does, because most 
hardware designers test their equipment on Windows systems but 
relatively few test on Linux systems.

> Currently the recovery is just a timeout and a port reset from the class

This depends on the driver.  Some perform no recovery at all.

> driver, but the timeout is usually defaulted to a long time (e.g. 30
> seconds for storage class driver).

That 30-second timeout in the mass-storage driver applies in situations 
where a command fails to complete, not in situations where it completes 
quickly but with a -EPROTO or -EPIPE error.

The fact is that only a small percentage of -EPROTO errors are 
recoverable.  Some of them can be handled by a port reset, which can be 
pretty awkward to perform but does occasionally work.  A lot of them 
occur because the USB cable has been unplugged; obviously there's no way 
to recover from that.  With only a few exceptions, the best and simplest 
approach is not to try to recover at all.

For the case in question (the syzbot bug report that started this 
thread), the class driver doesn't try to perform any recovery.  It just 
resubmits the URB, getting into a tight retry loop which consumes too 
much CPU time.  Simply giving up would be preferable.

Alan Stern

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [syzbot] INFO: rcu detected stall in tx
  2021-05-20  2:01     ` Alan Stern
@ 2021-05-20 20:30       ` Thinh Nguyen
  2021-05-20 21:05         ` Recovering from transaction errors [was: Re: [syzbot] INFO: rcu detected stall in tx] Alan Stern
  2021-05-24 15:18         ` [syzbot] INFO: rcu detected stall in tx Mathias Nyman
  0 siblings, 2 replies; 17+ messages in thread
From: Thinh Nguyen @ 2021-05-20 20:30 UTC (permalink / raw)
  To: Alan Stern, Thinh Nguyen, Mathias Nyman
  Cc: Guido Kiener, dave penkler, Dmitry Vyukov, syzbot,
	Greg Kroah-Hartman, lee.jones, USB list, bp, dwmw, hpa,
	linux-kernel, luto, mingo, syzkaller-bugs, tglx, x86

+Mathias

Alan Stern wrote:
> On Wed, May 19, 2021 at 07:38:52PM +0000, Thinh Nguyen wrote:
>> Hi Alan,
>>
>> Sorry if this diverges from the thread, but I've been wondering whether
>> to add a change for this also.
>>
>> For xHCI hosts, after transactions errors, the endpoint will enter
>> halted state.
> 
> No.  You are misreading the xHCI spec.  Section 4.6.8 says:
> 
> 	... the state of the associated Endpoint Context is set to 
> 	Halted...
> 
> Note this carefully.  It says "Endpoint Context", not "endpoint".
> 
> The endpoint is part of the device, whereas the endpoint context is part 
> of the host controller.  The device doesn't know when a transaction 
> error has occurred; consequently such errors do not affect the endpoint.  
> The host controller does know, and consequently such errors do affect 
> the endpoint context.
> 

You're right, my mistake here.

>> The driver will attempt a few soft-retries before giving
>> up. According to the xHCI spec (section 4.6.8), a host may send a
>> ClearFeature(endpoint_halt) to recover and restart the transfer (see
> 
> Not quite.  The section of the spec you're talking about says:
> 
> 	Software shall execute the following sequence to “reset a 
> 	pipe”....  Issue a ClearFeature(ENDPOINT_HALT) request to 
> 	device.
> 
> It does not say the host controller will do this; it says that software 
> will do it.

Sorry for being unclear. I meant from the class driver, see my next
sentence.

> 
>> "reset a pipe" in xhci spec), and the class driver can handle this after
>> receiving something like -EPROTO from xhci.
>>
>> However, as you've pointed out, some devices don't like
>> ClearFeature(ep_halt) and may not properly synchronize with the host on
>> where it should restart.
>>
>> Some OS (such as Windows) do this. Not sure if we also want this?
> 
> In general we should do the same thing as Windows does, because most 
> hardware designers test their equipment on Windows systems but 
> relatively few test on Linux systems.
> 
>> Currently the recovery is just a timeout and a port reset from the class
> 
> This depends on the driver.  Some perform no recovery at all.
> 
>> driver, but the timeout is usually defaulted to a long time (e.g. 30
>> seconds for storage class driver).
> 
> That 30-second timeout in the mass-storage driver applies in situations 
> where a command fails to complete, not in situations where it completes 
> quickly but with a -EPROTO or -EPIPE error.

Hm... looks like we have a couple of issues in the uas storage class
driver and the xhci driver.

We may need to fix that in the uas storage driver because it doesn't
seem to handle it. (check uas_data_cmplt() in uas.c).

As for the xhci driver, there maybe a case where the stream URB never
gets to complete because the transaction err_count is not properly
updated. The err_count for transaction error is stored in ep_ring, but
the xhci driver may not be able to lookup the correct ep_ring based on
TRB address for streams. There are cases for streams where the event
TRBs have their TRB pointer field cleared to '0' (xhci spec section
4.12.2). If the xhci driver doesn't see ep_ring for transaction error,
it automatically does a soft-retry. This is seen from one of our
testings that the driver was repeatedly doing soft-retry until the class
driver timed out.

Hi Mathias, maybe you have some comment on this? Thanks.

> 
> The fact is that only a small percentage of -EPROTO errors are 
> recoverable.  Some of them can be handled by a port reset, which can be 
> pretty awkward to perform but does occasionally work.  A lot of them 
> occur because the USB cable has been unplugged; obviously there's no way 
> to recover from that.  With only a few exceptions, the best and simplest 
> approach is not to try to recover at all.

If the cable is unplugged, then we should get a connection change event
and the driver can handle it properly.

Yes, it's probably simplest to do a port reset and let the transfer be
incomplete/corrupted. However, I think we should give
ClearFeature(ep_halt) some more thoughts as I think it can be a recovery
mechanism for storage class driver, even though that it may not be
foolproof.

> 
> For the case in question (the syzbot bug report that started this 
> thread), the class driver doesn't try to perform any recovery.  It just 
> resubmits the URB, getting into a tight retry loop which consumes too 
> much CPU time.  Simply giving up would be preferable.
> 
> Alan Stern
> 

I see. By giving up, you mean doing port reset right? Otherwise it needs
some other mechanism to synchronize with the device side.

Thanks,
Thinh

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Recovering from transaction errors [was: Re: [syzbot] INFO: rcu detected stall in tx]
  2021-05-20 20:30       ` Thinh Nguyen
@ 2021-05-20 21:05         ` Alan Stern
  2021-05-20 21:23           ` Thinh Nguyen
  2021-05-24 15:18         ` [syzbot] INFO: rcu detected stall in tx Mathias Nyman
  1 sibling, 1 reply; 17+ messages in thread
From: Alan Stern @ 2021-05-20 21:05 UTC (permalink / raw)
  To: Thinh Nguyen; +Cc: Mathias Nyman, Guido Kiener, dave penkler, USB list

On Thu, May 20, 2021 at 08:30:01PM +0000, Thinh Nguyen wrote:
> Hm... looks like we have a couple of issues in the uas storage class
> driver and the xhci driver.
> 
> We may need to fix that in the uas storage driver because it doesn't
> seem to handle it. (check uas_data_cmplt() in uas.c).

Hmmm.  I see that if there is an error, the driver assumes no data was 
transferred.  It shouldn't make that assumption; it should always use 
urb->actual_length.

It also doesn't indicate to the SCSI layer when a command gets an error 
and it doesn't try to do any recovery.  Of course, the SCSI error 
handler may initiate some recovery actions.

> As for the xhci driver, there maybe a case where the stream URB never
> gets to complete because the transaction err_count is not properly
> updated. The err_count for transaction error is stored in ep_ring, but
> the xhci driver may not be able to lookup the correct ep_ring based on
> TRB address for streams. There are cases for streams where the event
> TRBs have their TRB pointer field cleared to '0' (xhci spec section
> 4.12.2). If the xhci driver doesn't see ep_ring for transaction error,
> it automatically does a soft-retry. This is seen from one of our
> testings that the driver was repeatedly doing soft-retry until the class
> driver timed out.
> 
> Hi Mathias, maybe you have some comment on this? Thanks.
> 
> > 
> > The fact is that only a small percentage of -EPROTO errors are 
> > recoverable.  Some of them can be handled by a port reset, which can be 
> > pretty awkward to perform but does occasionally work.  A lot of them 
> > occur because the USB cable has been unplugged; obviously there's no way 
> > to recover from that.  With only a few exceptions, the best and simplest 
> > approach is not to try to recover at all.
> 
> If the cable is unplugged, then we should get a connection change event
> and the driver can handle it properly.

Yes -- unless the driver is in such a tight retry loop that the rest of 
the system never gets a chance to process the connection change event.  
I've seen bug reports where that happened.

> Yes, it's probably simplest to do a port reset and let the transfer be
> incomplete/corrupted. However, I think we should give
> ClearFeature(ep_halt) some more thoughts as I think it can be a recovery
> mechanism for storage class driver, even though that it may not be
> foolproof.

When you say storage class driver, which one are you talking about,
usb-storage or uas?  usb-storage already has a pretty robust recovery 
mechanism.

> > For the case in question (the syzbot bug report that started this 
> > thread), the class driver doesn't try to perform any recovery.  It just 
> > resubmits the URB, getting into a tight retry loop which consumes too 
> > much CPU time.  Simply giving up would be preferable.
> > 
> > Alan Stern
> > 
> 
> I see. By giving up, you mean doing port reset right? Otherwise it needs
> some other mechanism to synchronize with the device side.

No, I mean the driver should just stop communicating with the device.  
That's an appropriate action for lots of drivers.  If the user wants to 
re-synchronize with the device, he can unplug the USB cable and plug it 
back in again.

Alan Stern

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Recovering from transaction errors [was: Re: [syzbot] INFO: rcu detected stall in tx]
  2021-05-20 21:05         ` Recovering from transaction errors [was: Re: [syzbot] INFO: rcu detected stall in tx] Alan Stern
@ 2021-05-20 21:23           ` Thinh Nguyen
  2021-05-21  1:21             ` Alan Stern
  0 siblings, 1 reply; 17+ messages in thread
From: Thinh Nguyen @ 2021-05-20 21:23 UTC (permalink / raw)
  To: Alan Stern, Thinh Nguyen
  Cc: Mathias Nyman, Guido Kiener, dave penkler, USB list

Alan Stern wrote:
> On Thu, May 20, 2021 at 08:30:01PM +0000, Thinh Nguyen wrote:
>> Hm... looks like we have a couple of issues in the uas storage class
>> driver and the xhci driver.
>>
>> We may need to fix that in the uas storage driver because it doesn't
>> seem to handle it. (check uas_data_cmplt() in uas.c).
> 
> Hmmm.  I see that if there is an error, the driver assumes no data was 
> transferred.  It shouldn't make that assumption; it should always use 
> urb->actual_length.
> 
> It also doesn't indicate to the SCSI layer when a command gets an error 
> and it doesn't try to do any recovery.  Of course, the SCSI error 
> handler may initiate some recovery actions.

Right.

> 
>> As for the xhci driver, there maybe a case where the stream URB never
>> gets to complete because the transaction err_count is not properly
>> updated. The err_count for transaction error is stored in ep_ring, but
>> the xhci driver may not be able to lookup the correct ep_ring based on
>> TRB address for streams. There are cases for streams where the event
>> TRBs have their TRB pointer field cleared to '0' (xhci spec section
>> 4.12.2). If the xhci driver doesn't see ep_ring for transaction error,
>> it automatically does a soft-retry. This is seen from one of our
>> testings that the driver was repeatedly doing soft-retry until the class
>> driver timed out.
>>
>> Hi Mathias, maybe you have some comment on this? Thanks.
>>
>>>
>>> The fact is that only a small percentage of -EPROTO errors are 
>>> recoverable.  Some of them can be handled by a port reset, which can be 
>>> pretty awkward to perform but does occasionally work.  A lot of them 
>>> occur because the USB cable has been unplugged; obviously there's no way 
>>> to recover from that.  With only a few exceptions, the best and simplest 
>>> approach is not to try to recover at all.
>>
>> If the cable is unplugged, then we should get a connection change event
>> and the driver can handle it properly.
> 
> Yes -- unless the driver is in such a tight retry loop that the rest of 
> the system never gets a chance to process the connection change event.  
> I've seen bug reports where that happened.

I see. I'll keep that in mind, but it sounds like HW issue? The driver
handles retry base on events generated from the HW and the HW should
properly generate connection event and not get stuck in some loop.

> 
>> Yes, it's probably simplest to do a port reset and let the transfer be
>> incomplete/corrupted. However, I think we should give
>> ClearFeature(ep_halt) some more thoughts as I think it can be a recovery
>> mechanism for storage class driver, even though that it may not be
>> foolproof.
> 
> When you say storage class driver, which one are you talking about,
> usb-storage or uas?  usb-storage already has a pretty robust recovery 
> mechanism.

I mean for both.

> 
>>> For the case in question (the syzbot bug report that started this 
>>> thread), the class driver doesn't try to perform any recovery.  It just 
>>> resubmits the URB, getting into a tight retry loop which consumes too 
>>> much CPU time.  Simply giving up would be preferable.
>>>
>>> Alan Stern
>>>
>>
>> I see. By giving up, you mean doing port reset right? Otherwise it needs
>> some other mechanism to synchronize with the device side.
> 
> No, I mean the driver should just stop communicating with the device.  
> That's an appropriate action for lots of drivers.  If the user wants to 
> re-synchronize with the device, he can unplug the USB cable and plug it 
> back in again.
> 
> Alan Stern
> 

Ok. Would it be more difficult to automate this if it requires user
intervention? I assume syzbot doesn't want the user to do that.

Thanks,
Thinh

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Recovering from transaction errors [was: Re: [syzbot] INFO: rcu detected stall in tx]
  2021-05-20 21:23           ` Thinh Nguyen
@ 2021-05-21  1:21             ` Alan Stern
  2021-05-21  2:12               ` Thinh Nguyen
  0 siblings, 1 reply; 17+ messages in thread
From: Alan Stern @ 2021-05-21  1:21 UTC (permalink / raw)
  To: Thinh Nguyen; +Cc: Mathias Nyman, Guido Kiener, dave penkler, USB list

On Thu, May 20, 2021 at 09:23:57PM +0000, Thinh Nguyen wrote:
> Alan Stern wrote:
> >> If the cable is unplugged, then we should get a connection change event
> >> and the driver can handle it properly.
> > 
> > Yes -- unless the driver is in such a tight retry loop that the rest of 
> > the system never gets a chance to process the connection change event.  
> > I've seen bug reports where that happened.
> 
> I see. I'll keep that in mind, but it sounds like HW issue? The driver
> handles retry base on events generated from the HW and the HW should
> properly generate connection event and not get stuck in some loop.

The hardware _does_ generate disconnect events.  The problem is that the 
class driver doesn't react properly to transaction errors and thereby 
prevents the rest of the system from handling the disconnect events.  
It's a bug in the class driver, not in the hardware.

> >>> For the case in question (the syzbot bug report that started this 
> >>> thread), the class driver doesn't try to perform any recovery.  It just 
> >>> resubmits the URB, getting into a tight retry loop which consumes too 
> >>> much CPU time.  Simply giving up would be preferable.
> >>>
> >>> Alan Stern
> >>>
> >>
> >> I see. By giving up, you mean doing port reset right? Otherwise it needs
> >> some other mechanism to synchronize with the device side.
> > 
> > No, I mean the driver should just stop communicating with the device.  
> > That's an appropriate action for lots of drivers.  If the user wants to 
> > re-synchronize with the device, he can unplug the USB cable and plug it 
> > back in again.
> > 
> > Alan Stern
> > 
> 
> Ok. Would it be more difficult to automate this if it requires user
> intervention? I assume syzbot doesn't want the user to do that.

Difficult to automate what, exactly?  Unplugging the USB cable?  How 
could you possibly automate that?

At the moment, I think the best approach is Guido's suggestion to reject 
URBs submitted to endpoints that have gotten a transaction error, until 
the error status has somehow been cleared.  Is that what you would like 
to see automated?

Alan Stern

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Recovering from transaction errors [was: Re: [syzbot] INFO: rcu detected stall in tx]
  2021-05-21  1:21             ` Alan Stern
@ 2021-05-21  2:12               ` Thinh Nguyen
  2021-05-21 14:34                 ` Alan Stern
  0 siblings, 1 reply; 17+ messages in thread
From: Thinh Nguyen @ 2021-05-21  2:12 UTC (permalink / raw)
  To: Alan Stern, Thinh Nguyen
  Cc: Mathias Nyman, Guido Kiener, dave penkler, USB list

Alan Stern wrote:
> On Thu, May 20, 2021 at 09:23:57PM +0000, Thinh Nguyen wrote:
>> Alan Stern wrote:
>>>> If the cable is unplugged, then we should get a connection change event
>>>> and the driver can handle it properly.
>>>
>>> Yes -- unless the driver is in such a tight retry loop that the rest of 
>>> the system never gets a chance to process the connection change event.  
>>> I've seen bug reports where that happened.
>>
>> I see. I'll keep that in mind, but it sounds like HW issue? The driver
>> handles retry base on events generated from the HW and the HW should
>> properly generate connection event and not get stuck in some loop.
> 
> The hardware _does_ generate disconnect events.  The problem is that the 
> class driver doesn't react properly to transaction errors and thereby 
> prevents the rest of the system from handling the disconnect events.  
> It's a bug in the class driver, not in the hardware.

Ok. Got it.

> 
>>>>> For the case in question (the syzbot bug report that started this 
>>>>> thread), the class driver doesn't try to perform any recovery.  It just 
>>>>> resubmits the URB, getting into a tight retry loop which consumes too 
>>>>> much CPU time.  Simply giving up would be preferable.
>>>>>
>>>>> Alan Stern
>>>>>
>>>>
>>>> I see. By giving up, you mean doing port reset right? Otherwise it needs
>>>> some other mechanism to synchronize with the device side.
>>>
>>> No, I mean the driver should just stop communicating with the device.  
>>> That's an appropriate action for lots of drivers.  If the user wants to 
>>> re-synchronize with the device, he can unplug the USB cable and plug it 
>>> back in again.
>>>
>>> Alan Stern
>>>
>>
>> Ok. Would it be more difficult to automate this if it requires user
>> intervention? I assume syzbot doesn't want the user to do that.
> 
> Difficult to automate what, exactly?  Unplugging the USB cable?  How 
> could you possibly automate that?
> 
> At the moment, I think the best approach is Guido's suggestion to reject 
> URBs submitted to endpoints that have gotten a transaction error, until 
> the error status has somehow been cleared.  Is that what you would like 
> to see automated?
> 

First, just want to point out that I'm not familiar with syzbot. I was
just thinking if this issue occurs, and if the user wants to start a new
test, then she doesn't have to unplug+plug the device back and allow
some application to automatically trigger a new test after a failure.

BR,
Thinh

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Recovering from transaction errors [was: Re: [syzbot] INFO: rcu detected stall in tx]
  2021-05-21  2:12               ` Thinh Nguyen
@ 2021-05-21 14:34                 ` Alan Stern
  2021-05-21 20:00                   ` Thinh Nguyen
  0 siblings, 1 reply; 17+ messages in thread
From: Alan Stern @ 2021-05-21 14:34 UTC (permalink / raw)
  To: Thinh Nguyen; +Cc: Mathias Nyman, Guido Kiener, dave penkler, USB list

On Fri, May 21, 2021 at 02:12:43AM +0000, Thinh Nguyen wrote:
> Alan Stern wrote:
> > At the moment, I think the best approach is Guido's suggestion to reject 
> > URBs submitted to endpoints that have gotten a transaction error, until 
> > the error status has somehow been cleared.  Is that what you would like 
> > to see automated?
> > 
> 
> First, just want to point out that I'm not familiar with syzbot. I was
> just thinking if this issue occurs, and if the user wants to start a new
> test, then she doesn't have to unplug+plug the device back and allow
> some application to automatically trigger a new test after a failure.

If you're just talking about syzbot testing, this isn't a issue.  Syzbot 
doesn't test real devices; it tests emulated devices using a 
special-purpose userspace driver.

Alan Stern

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Recovering from transaction errors [was: Re: [syzbot] INFO: rcu detected stall in tx]
  2021-05-21 14:34                 ` Alan Stern
@ 2021-05-21 20:00                   ` Thinh Nguyen
  0 siblings, 0 replies; 17+ messages in thread
From: Thinh Nguyen @ 2021-05-21 20:00 UTC (permalink / raw)
  To: Alan Stern, Thinh Nguyen
  Cc: Mathias Nyman, Guido Kiener, dave penkler, USB list

Alan Stern wrote:
> On Fri, May 21, 2021 at 02:12:43AM +0000, Thinh Nguyen wrote:
>> Alan Stern wrote:
>>> At the moment, I think the best approach is Guido's suggestion to reject 
>>> URBs submitted to endpoints that have gotten a transaction error, until 
>>> the error status has somehow been cleared.  Is that what you would like 
>>> to see automated?
>>>
>>
>> First, just want to point out that I'm not familiar with syzbot. I was
>> just thinking if this issue occurs, and if the user wants to start a new
>> test, then she doesn't have to unplug+plug the device back and allow
>> some application to automatically trigger a new test after a failure.
> 
> If you're just talking about syzbot testing, this isn't a issue.  Syzbot 
> doesn't test real devices; it tests emulated devices using a 
> special-purpose userspace driver.
> 
> Alan Stern
> 

Ah I see. Thanks for the clarification.

BR,
Thinh

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [syzbot] INFO: rcu detected stall in tx
  2021-05-20 20:30       ` Thinh Nguyen
  2021-05-20 21:05         ` Recovering from transaction errors [was: Re: [syzbot] INFO: rcu detected stall in tx] Alan Stern
@ 2021-05-24 15:18         ` Mathias Nyman
  2021-05-24 18:55           ` Alan Stern
  1 sibling, 1 reply; 17+ messages in thread
From: Mathias Nyman @ 2021-05-24 15:18 UTC (permalink / raw)
  To: Thinh Nguyen, Alan Stern, Mathias Nyman
  Cc: Guido Kiener, dave penkler, Dmitry Vyukov, syzbot,
	Greg Kroah-Hartman, lee.jones, USB list, bp, dwmw, hpa,
	linux-kernel, luto, mingo, syzkaller-bugs, tglx, x86

On 20.5.2021 23.30, Thinh Nguyen wrote:
> +Mathias
> 
...

> Hm... looks like we have a couple of issues in the uas storage class
> driver and the xhci driver.
> 
> We may need to fix that in the uas storage driver because it doesn't
> seem to handle it. (check uas_data_cmplt() in uas.c).
> 
> As for the xhci driver, there maybe a case where the stream URB never
> gets to complete because the transaction err_count is not properly
> updated. The err_count for transaction error is stored in ep_ring, but
> the xhci driver may not be able to lookup the correct ep_ring based on
> TRB address for streams. There are cases for streams where the event
> TRBs have their TRB pointer field cleared to '0' (xhci spec section
> 4.12.2). If the xhci driver doesn't see ep_ring for transaction error,
> it automatically does a soft-retry. This is seen from one of our
> testings that the driver was repeatedly doing soft-retry until the class
> driver timed out.
> 
> Hi Mathias, maybe you have some comment on this? Thanks.

This is true, if TRB pointer is 0 then there is no retry limit for soft retry.
We should add one and prevent a loop. after e few soft resets we can end with a
hard reset to clear the host side endpoint halt.

We don't know the URB that was being tansferred during the error, and can't 
give it back with a proper error code.
In that sense we still end up waiting for a timeout and someone to cancel
the urb.

-Mathias

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [syzbot] INFO: rcu detected stall in tx
  2021-05-24 15:18         ` [syzbot] INFO: rcu detected stall in tx Mathias Nyman
@ 2021-05-24 18:55           ` Alan Stern
  2021-05-24 19:23             ` Thinh Nguyen
  0 siblings, 1 reply; 17+ messages in thread
From: Alan Stern @ 2021-05-24 18:55 UTC (permalink / raw)
  To: Mathias Nyman
  Cc: Thinh Nguyen, Mathias Nyman, Guido Kiener, dave penkler,
	Dmitry Vyukov, syzbot, Greg Kroah-Hartman, lee.jones, USB list,
	bp, dwmw, hpa, linux-kernel, luto, mingo, syzkaller-bugs, tglx,
	x86

On Mon, May 24, 2021 at 06:18:59PM +0300, Mathias Nyman wrote:
> On 20.5.2021 23.30, Thinh Nguyen wrote:
> > As for the xhci driver, there maybe a case where the stream URB never
> > gets to complete because the transaction err_count is not properly
> > updated. The err_count for transaction error is stored in ep_ring, but
> > the xhci driver may not be able to lookup the correct ep_ring based on
> > TRB address for streams. There are cases for streams where the event
> > TRBs have their TRB pointer field cleared to '0' (xhci spec section
> > 4.12.2). If the xhci driver doesn't see ep_ring for transaction error,
> > it automatically does a soft-retry. This is seen from one of our
> > testings that the driver was repeatedly doing soft-retry until the class
> > driver timed out.
> > 
> > Hi Mathias, maybe you have some comment on this? Thanks.
> 
> This is true, if TRB pointer is 0 then there is no retry limit for soft retry.
> We should add one and prevent a loop. after e few soft resets we can end with a
> hard reset to clear the host side endpoint halt.
> 
> We don't know the URB that was being tansferred during the error, and can't 
> give it back with a proper error code.
> In that sense we still end up waiting for a timeout and someone to cancel
> the urb.

That's not good.  There may not be a timeout; drivers expect transfers 
to complete with a failure, not to be retried indefinitely.

However, if you do know which endpoint/stream the error is connected to, 
you should be able to get the URB.  It will be the first one queued for 
that endpoint/stream.

Alan Stern

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [syzbot] INFO: rcu detected stall in tx
  2021-05-24 18:55           ` Alan Stern
@ 2021-05-24 19:23             ` Thinh Nguyen
  2021-05-24 22:16               ` Mathias Nyman
  0 siblings, 1 reply; 17+ messages in thread
From: Thinh Nguyen @ 2021-05-24 19:23 UTC (permalink / raw)
  To: Alan Stern, Mathias Nyman
  Cc: Thinh Nguyen, Mathias Nyman, Guido Kiener, dave penkler,
	Dmitry Vyukov, syzbot, Greg Kroah-Hartman, lee.jones, USB list,
	bp, dwmw, hpa, linux-kernel, luto, mingo, syzkaller-bugs, tglx,
	x86

Alan Stern wrote:
> On Mon, May 24, 2021 at 06:18:59PM +0300, Mathias Nyman wrote:
>> On 20.5.2021 23.30, Thinh Nguyen wrote:
>>> As for the xhci driver, there maybe a case where the stream URB never
>>> gets to complete because the transaction err_count is not properly
>>> updated. The err_count for transaction error is stored in ep_ring, but
>>> the xhci driver may not be able to lookup the correct ep_ring based on
>>> TRB address for streams. There are cases for streams where the event
>>> TRBs have their TRB pointer field cleared to '0' (xhci spec section
>>> 4.12.2). If the xhci driver doesn't see ep_ring for transaction error,
>>> it automatically does a soft-retry. This is seen from one of our
>>> testings that the driver was repeatedly doing soft-retry until the class
>>> driver timed out.
>>>
>>> Hi Mathias, maybe you have some comment on this? Thanks.
>>
>> This is true, if TRB pointer is 0 then there is no retry limit for soft retry.
>> We should add one and prevent a loop. after e few soft resets we can end with a
>> hard reset to clear the host side endpoint halt.
>>
>> We don't know the URB that was being tansferred during the error, and can't 
>> give it back with a proper error code.
>> In that sense we still end up waiting for a timeout and someone to cancel
>> the urb.
> 
> That's not good.  There may not be a timeout; drivers expect transfers 
> to complete with a failure, not to be retried indefinitely.
> 
> However, if you do know which endpoint/stream the error is connected to, 
> you should be able to get the URB.  It will be the first one queued for 
> that endpoint/stream.
> 

When the xhci can't recover a transfer with soft-retry, no outstanding
transfer can proceed/complete for the endpoint. If the TRB pointer is 0,
we just don't know which stream or endpoint ring it's for, but we know
all the outstanding URBs of an endpoint. Let's may as well return an
error status for all of them after a limited number of soft-retries.

BR,
Thinh

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [syzbot] INFO: rcu detected stall in tx
  2021-05-24 19:23             ` Thinh Nguyen
@ 2021-05-24 22:16               ` Mathias Nyman
  2021-05-24 22:48                 ` Thinh Nguyen
  0 siblings, 1 reply; 17+ messages in thread
From: Mathias Nyman @ 2021-05-24 22:16 UTC (permalink / raw)
  To: Thinh Nguyen, Alan Stern
  Cc: Mathias Nyman, Guido Kiener, dave penkler, Dmitry Vyukov, syzbot,
	Greg Kroah-Hartman, lee.jones, USB list, bp, dwmw, hpa,
	linux-kernel, luto, mingo, syzkaller-bugs, tglx, x86

On 24.5.2021 22.23, Thinh Nguyen wrote:
> Alan Stern wrote:
>> On Mon, May 24, 2021 at 06:18:59PM +0300, Mathias Nyman wrote:
>>> On 20.5.2021 23.30, Thinh Nguyen wrote:
>>>> As for the xhci driver, there maybe a case where the stream URB never
>>>> gets to complete because the transaction err_count is not properly
>>>> updated. The err_count for transaction error is stored in ep_ring, but
>>>> the xhci driver may not be able to lookup the correct ep_ring based on
>>>> TRB address for streams. There are cases for streams where the event
>>>> TRBs have their TRB pointer field cleared to '0' (xhci spec section
>>>> 4.12.2). If the xhci driver doesn't see ep_ring for transaction error,
>>>> it automatically does a soft-retry. This is seen from one of our
>>>> testings that the driver was repeatedly doing soft-retry until the class
>>>> driver timed out.
>>>>
>>>> Hi Mathias, maybe you have some comment on this? Thanks.
>>>
>>> This is true, if TRB pointer is 0 then there is no retry limit for soft retry.
>>> We should add one and prevent a loop. after e few soft resets we can end with a
>>> hard reset to clear the host side endpoint halt.
>>>
>>> We don't know the URB that was being tansferred during the error, and can't 
>>> give it back with a proper error code.
>>> In that sense we still end up waiting for a timeout and someone to cancel
>>> the urb.
>>
>> That's not good.  There may not be a timeout; drivers expect transfers 
>> to complete with a failure, not to be retried indefinitely.
>>
>> However, if you do know which endpoint/stream the error is connected to, 
>> you should be able to get the URB.  It will be the first one queued for 
>> that endpoint/stream.
>>
> 
> When the xhci can't recover a transfer with soft-retry, no outstanding
> transfer can proceed/complete for the endpoint. If the TRB pointer is 0,
> we just don't know which stream or endpoint ring it's for, but we know
> all the outstanding URBs of an endpoint. Let's may as well return an
> error status for all of them after a limited number of soft-retries.

We get the endpoint, but not the stream.

I guess we could walk through each stream of this endpoint, and return the 
first URB of every stream that has a pending URB.
xHCI spec claims to supports 65533 streams per endpoint, but in real life 
UAS probably only uses a few per endpoint?

-Mathias 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [syzbot] INFO: rcu detected stall in tx
  2021-05-24 22:16               ` Mathias Nyman
@ 2021-05-24 22:48                 ` Thinh Nguyen
  0 siblings, 0 replies; 17+ messages in thread
From: Thinh Nguyen @ 2021-05-24 22:48 UTC (permalink / raw)
  To: Mathias Nyman, Thinh Nguyen, Alan Stern
  Cc: Mathias Nyman, Guido Kiener, dave penkler, Dmitry Vyukov, syzbot,
	Greg Kroah-Hartman, lee.jones, USB list, bp, dwmw, hpa,
	linux-kernel, luto, mingo, syzkaller-bugs, tglx, x86

Mathias Nyman wrote:
> On 24.5.2021 22.23, Thinh Nguyen wrote:
>> Alan Stern wrote:
>>> On Mon, May 24, 2021 at 06:18:59PM +0300, Mathias Nyman wrote:
>>>> On 20.5.2021 23.30, Thinh Nguyen wrote:
>>>>> As for the xhci driver, there maybe a case where the stream URB never
>>>>> gets to complete because the transaction err_count is not properly
>>>>> updated. The err_count for transaction error is stored in ep_ring, but
>>>>> the xhci driver may not be able to lookup the correct ep_ring based on
>>>>> TRB address for streams. There are cases for streams where the event
>>>>> TRBs have their TRB pointer field cleared to '0' (xhci spec section
>>>>> 4.12.2). If the xhci driver doesn't see ep_ring for transaction error,
>>>>> it automatically does a soft-retry. This is seen from one of our
>>>>> testings that the driver was repeatedly doing soft-retry until the class
>>>>> driver timed out.
>>>>>
>>>>> Hi Mathias, maybe you have some comment on this? Thanks.
>>>>
>>>> This is true, if TRB pointer is 0 then there is no retry limit for soft retry.
>>>> We should add one and prevent a loop. after e few soft resets we can end with a
>>>> hard reset to clear the host side endpoint halt.
>>>>
>>>> We don't know the URB that was being tansferred during the error, and can't 
>>>> give it back with a proper error code.
>>>> In that sense we still end up waiting for a timeout and someone to cancel
>>>> the urb.
>>>
>>> That's not good.  There may not be a timeout; drivers expect transfers 
>>> to complete with a failure, not to be retried indefinitely.
>>>
>>> However, if you do know which endpoint/stream the error is connected to, 
>>> you should be able to get the URB.  It will be the first one queued for 
>>> that endpoint/stream.
>>>
>>
>> When the xhci can't recover a transfer with soft-retry, no outstanding
>> transfer can proceed/complete for the endpoint. If the TRB pointer is 0,
>> we just don't know which stream or endpoint ring it's for, but we know
>> all the outstanding URBs of an endpoint. Let's may as well return an
>> error status for all of them after a limited number of soft-retries.
> 
> We get the endpoint, but not the stream.

Right.

> 
> I guess we could walk through each stream of this endpoint, and return the 
> first URB of every stream that has a pending URB.
> xHCI spec claims to supports 65533 streams per endpoint, but in real life 
> UAS probably only uses a few per endpoint?
> 
> -Mathias 
> 

Typically UASP devices advertise to support up to 32 streams. We notice
that some newer builds of Windows OS has a bug (or intentional?) that it
rejects any device that uses more or less than 32 streams (probably a
bug) in the descriptor.

I think we only need to do this if we don't know which stream the event
belongs to. Otherwise, we can keep the old logic.

BR,
Thinh


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2021-05-24 22:49 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-19 16:14 Re: Re: Re: Re: Re: [syzbot] INFO: rcu detected stall in tx Guido Kiener
2021-05-19 17:35 ` Alan Stern
2021-05-19 19:38   ` Thinh Nguyen
2021-05-20  2:01     ` Alan Stern
2021-05-20 20:30       ` Thinh Nguyen
2021-05-20 21:05         ` Recovering from transaction errors [was: Re: [syzbot] INFO: rcu detected stall in tx] Alan Stern
2021-05-20 21:23           ` Thinh Nguyen
2021-05-21  1:21             ` Alan Stern
2021-05-21  2:12               ` Thinh Nguyen
2021-05-21 14:34                 ` Alan Stern
2021-05-21 20:00                   ` Thinh Nguyen
2021-05-24 15:18         ` [syzbot] INFO: rcu detected stall in tx Mathias Nyman
2021-05-24 18:55           ` Alan Stern
2021-05-24 19:23             ` Thinh Nguyen
2021-05-24 22:16               ` Mathias Nyman
2021-05-24 22:48                 ` Thinh Nguyen
2021-05-19 18:04 ` Re: Re: Re: Re: " Lee Jones

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.