All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] xhci: guard accesses to ep_state in xhci_endpoint_reset()
@ 2021-08-31 16:02 Phil Elwell
  2021-09-01  9:21 ` Mathias Nyman
  0 siblings, 1 reply; 4+ messages in thread
From: Phil Elwell @ 2021-08-31 16:02 UTC (permalink / raw)
  To: Mathias Nyman, Greg Kroah-Hartman, Jonathan Bell, linux-usb,
	linux-kernel

From: Jonathan Bell <jonathan@raspberrypi.com>

See https://github.com/raspberrypi/linux/issues/3981

Two read-modify-write cycles on ep->ep_state are not guarded by
xhci->lock. Fix these.

Signed-off-by: Jonathan Bell <jonathan@raspberrypi.com>
---
 drivers/usb/host/xhci.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index f3dabd02382c2..902f410874e8e 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -3213,10 +3213,13 @@ static void xhci_endpoint_reset(struct usb_hcd *hcd,
 		return;
 
 	/* Bail out if toggle is already being cleared by a endpoint reset */
+	spin_lock_irqsave(&xhci->lock, flags);
 	if (ep->ep_state & EP_HARD_CLEAR_TOGGLE) {
 		ep->ep_state &= ~EP_HARD_CLEAR_TOGGLE;
+		spin_unlock_irqrestore(&xhci->lock, flags);
 		return;
 	}
+	spin_unlock_irqrestore(&xhci->lock, flags);
 	/* Only interrupt and bulk ep's use data toggle, USB2 spec 5.5.4-> */
 	if (usb_endpoint_xfer_control(&host_ep->desc) ||
 	    usb_endpoint_xfer_isoc(&host_ep->desc))
@@ -3302,8 +3305,10 @@ static void xhci_endpoint_reset(struct usb_hcd *hcd,
 	xhci_free_command(xhci, cfg_cmd);
 cleanup:
 	xhci_free_command(xhci, stop_cmd);
+	spin_lock_irqsave(&xhci->lock, flags);
 	if (ep->ep_state & EP_SOFT_CLEAR_TOGGLE)
 		ep->ep_state &= ~EP_SOFT_CLEAR_TOGGLE;
+	spin_unlock_irqrestore(&xhci->lock, flags);
 }
 
 static int xhci_check_streams_endpoint(struct xhci_hcd *xhci,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] xhci: guard accesses to ep_state in xhci_endpoint_reset()
  2021-08-31 16:02 [PATCH] xhci: guard accesses to ep_state in xhci_endpoint_reset() Phil Elwell
@ 2021-09-01  9:21 ` Mathias Nyman
  2021-09-01 13:15   ` Phil Elwell
  0 siblings, 1 reply; 4+ messages in thread
From: Mathias Nyman @ 2021-09-01  9:21 UTC (permalink / raw)
  To: Phil Elwell, Mathias Nyman, Greg Kroah-Hartman, Jonathan Bell,
	linux-usb, linux-kernel

On 31.8.2021 19.02, Phil Elwell wrote:
> From: Jonathan Bell <jonathan@raspberrypi.com>
> 
> See https://github.com/raspberrypi/linux/issues/3981

Thanks, so in a nutshell the issue looks something like:

[827586.220071] xhci_hcd 0000:01:00.0: WARN Cannot submit Set TR Deq Ptr
[827586.220087] xhci_hcd 0000:01:00.0: A Set TR Deq Ptr command is pending.
[827723.160680] INFO: task usb-storage:93 blocked for more than 122 seconds.

The blocked task is probably because xhci driver failed to give back the
URB after failing to submit a "Set TR Deq Ptr" command. This part should
be fixed in:
https://lore.kernel.org/r/20210820123503.2605901-4-mathias.nyman@linux.intel.com
which is currently in usb-next, and should be in 5.15-rc1 and future 5.12+ stable.

> 
> Two read-modify-write cycles on ep->ep_state are not guarded by
> xhci->lock. Fix these.
> 

This is probably one cause for the "Warn Cannot submit Set TR Deq Ptr A Set TR
Deq Ptr command is pending" message.
Another possibility is that with UAS and streams we have several transfer rings
per endpoint, meaning that if two TDs on separate stream rings on the same 
endpoint both stall, or are cancelled we could see this message.

The SET_DEQ_PENDING flag in ep->ep_state should probably be per ring, not per
endpoint. Then we also need a "rings_with_pending_set_deq" counter per endpoint
to keep track when all set_tr_deq commands complete, and we can restart the endpoint  

Anyway, my patch linked above together with this patch should make these errors
a lot more harmless.

Let me know if you can trigger the issue with both these patches applied.

I'll add your patch to the queue as well.

Thanks
-Mathias

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] xhci: guard accesses to ep_state in xhci_endpoint_reset()
  2021-09-01  9:21 ` Mathias Nyman
@ 2021-09-01 13:15   ` Phil Elwell
  2021-09-01 18:51     ` Jonathan Bell
  0 siblings, 1 reply; 4+ messages in thread
From: Phil Elwell @ 2021-09-01 13:15 UTC (permalink / raw)
  To: Mathias Nyman, Mathias Nyman, Greg Kroah-Hartman, Jonathan Bell,
	linux-usb, linux-kernel

Hi Mathias,

On 01/09/2021 10:21, Mathias Nyman wrote:
> On 31.8.2021 19.02, Phil Elwell wrote:
>> From: Jonathan Bell <jonathan@raspberrypi.com>
>>
>> See https://github.com/raspberrypi/linux/issues/3981
> 
> Thanks, so in a nutshell the issue looks something like:
> 
> [827586.220071] xhci_hcd 0000:01:00.0: WARN Cannot submit Set TR Deq Ptr
> [827586.220087] xhci_hcd 0000:01:00.0: A Set TR Deq Ptr command is pending.
> [827723.160680] INFO: task usb-storage:93 blocked for more than 122 seconds.
> 
> The blocked task is probably because xhci driver failed to give back the
> URB after failing to submit a "Set TR Deq Ptr" command. This part should
> be fixed in:
> https://lore.kernel.org/r/20210820123503.2605901-4-mathias.nyman@linux.intel.com
> which is currently in usb-next, and should be in 5.15-rc1 and future 5.12+ stable.
> 
>>
>> Two read-modify-write cycles on ep->ep_state are not guarded by
>> xhci->lock. Fix these.
>>
> 
> This is probably one cause for the "Warn Cannot submit Set TR Deq Ptr A Set TR
> Deq Ptr command is pending" message.
> Another possibility is that with UAS and streams we have several transfer rings
> per endpoint, meaning that if two TDs on separate stream rings on the same
> endpoint both stall, or are cancelled we could see this message.
> 
> The SET_DEQ_PENDING flag in ep->ep_state should probably be per ring, not per
> endpoint. Then we also need a "rings_with_pending_set_deq" counter per endpoint
> to keep track when all set_tr_deq commands complete, and we can restart the endpoint

Jonathan, the author of the patch, may give some detailed feedback on these 
statements when he has a moment - "Well, sort of... it's complicated" was the 
summary.

> Anyway, my patch linked above together with this patch should make these errors
> a lot more harmless.

Yes, I think that's true. We have a downstream patch to warn about a pending
Set TR Deq Ptr command but proceed anyway, allowing systems to recover, but with 
the additional spin lock usage our users are reporting no failures _and_ no
warnings.

> Let me know if you can trigger the issue with both these patches applied.

We've not tried your patch yet.

> I'll add your patch to the queue as well.

Many thanks,

Phil

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] xhci: guard accesses to ep_state in xhci_endpoint_reset()
  2021-09-01 13:15   ` Phil Elwell
@ 2021-09-01 18:51     ` Jonathan Bell
  0 siblings, 0 replies; 4+ messages in thread
From: Jonathan Bell @ 2021-09-01 18:51 UTC (permalink / raw)
  To: Phil Elwell
  Cc: Mathias Nyman, Mathias Nyman, Greg Kroah-Hartman, linux-usb,
	linux-kernel

On Wed, 1 Sept 2021 at 14:15, Phil Elwell <phil@raspberrypi.com> wrote:
>
> Hi Mathias,
>
> On 01/09/2021 10:21, Mathias Nyman wrote:
> > On 31.8.2021 19.02, Phil Elwell wrote:
> >> From: Jonathan Bell <jonathan@raspberrypi.com>
> >>
> >> See https://github.com/raspberrypi/linux/issues/3981
> >
> > Thanks, so in a nutshell the issue looks something like:
> >
> > [827586.220071] xhci_hcd 0000:01:00.0: WARN Cannot submit Set TR Deq Ptr
> > [827586.220087] xhci_hcd 0000:01:00.0: A Set TR Deq Ptr command is pending.
> > [827723.160680] INFO: task usb-storage:93 blocked for more than 122 seconds.
> >
> > The blocked task is probably because xhci driver failed to give back the
> > URB after failing to submit a "Set TR Deq Ptr" command. This part should
> > be fixed in:
> > https://lore.kernel.org/r/20210820123503.2605901-4-mathias.nyman@linux.intel.com
> > which is currently in usb-next, and should be in 5.15-rc1 and future 5.12+ stable.
> >
> >>
> >> Two read-modify-write cycles on ep->ep_state are not guarded by
> >> xhci->lock. Fix these.
> >>
> >
> > This is probably one cause for the "Warn Cannot submit Set TR Deq Ptr A Set TR
> > Deq Ptr command is pending" message.
> > Another possibility is that with UAS and streams we have several transfer rings
> > per endpoint, meaning that if two TDs on separate stream rings on the same
> > endpoint both stall, or are cancelled we could see this message.
> >
> > The SET_DEQ_PENDING flag in ep->ep_state should probably be per ring, not per
> > endpoint. Then we also need a "rings_with_pending_set_deq" counter per endpoint
> > to keep track when all set_tr_deq commands complete, and we can restart the endpoint
>
> Jonathan, the author of the patch, may give some detailed feedback on these
> statements when he has a moment - "Well, sort of... it's complicated" was the
> summary.

The bug in this case was apparent only on usb-storage devices, so no streams.
The testcase in the linked github issue repeatedly invoked smartctl on
a device that didn't
support that particular ATA command so stalled the endpoint.

The unguarded read-modify-write raced with the completion of the Set
TR Deq command
being executed by the controller, and the SET_DEQ_PENDING flag would
get overwritten
with an invalid '1'.

I've not investigated how the xhci driver responds to stalls on stream
endpoints in detail
as I've not seen any user reports of uas active and this error message
appearing, but in
a wider context, getting a stall on a streaming endpoint on a Pi 4
seems to be generally
fatal until the device receives a reset and I've not determined which
of the HCD, controller,
or device is stuck.

Thanks
Jonathan

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-09-01 18:52 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-31 16:02 [PATCH] xhci: guard accesses to ep_state in xhci_endpoint_reset() Phil Elwell
2021-09-01  9:21 ` Mathias Nyman
2021-09-01 13:15   ` Phil Elwell
2021-09-01 18:51     ` Jonathan Bell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.