stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] xhci: make sure TRB is fully written before giving it to the controller
       [not found] <20210115161907.2875631-1-mathias.nyman@linux.intel.com>
@ 2021-01-15 16:19 ` Mathias Nyman
  2021-01-15 16:40   ` Sergei Shtylyov
  2021-01-15 16:19 ` [PATCH 2/2] xhci: tegra: Delay for disabling LFPS detector Mathias Nyman
  1 sibling, 1 reply; 7+ messages in thread
From: Mathias Nyman @ 2021-01-15 16:19 UTC (permalink / raw)
  To: gregkh; +Cc: linux-usb, Mathias Nyman, stable, Ross Zwisler

Once the command ring doorbell is rung the xHC controller will parse all
command TRBs on the command ring that have the cycle bit set properly.

If the driver just started writing the next command TRB to the ring when
hardware finished the previous TRB, then HW might fetch an incomplete TRB
as long as its cycle bit set correctly.

A command TRB is 16 bytes (128 bits) long.
Driver writes the command TRB in four 32 bit chunks, with the chunk
containing the cycle bit last. This does however not guarantee that
chunks actually get written in that order.

This was detected in stress testing when canceling URBs with several
connected USB devices.
Two consecutive "Set TR Dequeue pointer" commands got queued right
after each other, and the second one was only partially written when
the controller parsed it, causing the dequeue pointer to be set
to bogus values. This was seen as error messages:

"Mismatch between completed Set TR Deq Ptr command & xHCI internal state"

Solution is to add a write memory barrier before writing the cycle bit.

Cc: <stable@vger.kernel.org>
Tested-by: Ross Zwisler <zwisler@google.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
---
 drivers/usb/host/xhci-ring.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 5677b81c0915..cf0c93a90200 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -2931,6 +2931,8 @@ static void queue_trb(struct xhci_hcd *xhci, struct xhci_ring *ring,
 	trb->field[0] = cpu_to_le32(field1);
 	trb->field[1] = cpu_to_le32(field2);
 	trb->field[2] = cpu_to_le32(field3);
+	/* make sure TRB is fully written before giving it to the controller */
+	wmb();
 	trb->field[3] = cpu_to_le32(field4);
 
 	trace_xhci_queue_trb(ring, trb);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/2] xhci: tegra: Delay for disabling LFPS detector
       [not found] <20210115161907.2875631-1-mathias.nyman@linux.intel.com>
  2021-01-15 16:19 ` [PATCH 1/2] xhci: make sure TRB is fully written before giving it to the controller Mathias Nyman
@ 2021-01-15 16:19 ` Mathias Nyman
  1 sibling, 0 replies; 7+ messages in thread
From: Mathias Nyman @ 2021-01-15 16:19 UTC (permalink / raw)
  To: gregkh; +Cc: linux-usb, JC Kuo, stable, Mathias Nyman

From: JC Kuo <jckuo@nvidia.com>

Occasionally, we are seeing some SuperSpeed devices resumes right after
being directed to U3. This commits add 500us delay to ensure LFPS
detector is disabled before sending ACK to firmware.

[   16.099363] tegra-xusb 70090000.usb: entering ELPG
[   16.104343] tegra-xusb 70090000.usb: 2-1 isn't suspended: 0x0c001203
[   16.114576] tegra-xusb 70090000.usb: not all ports suspended: -16
[   16.120789] tegra-xusb 70090000.usb: entering ELPG failed

The register write passes through a few flop stages of 32KHz clock domain.
NVIDIA ASIC designer reviewed RTL and suggests 500us delay.

Cc: stable@vger.kernel.org
Signed-off-by: JC Kuo <jckuo@nvidia.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
---
 drivers/usb/host/xhci-tegra.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/usb/host/xhci-tegra.c b/drivers/usb/host/xhci-tegra.c
index 934be1686352..50bb91b6a4b8 100644
--- a/drivers/usb/host/xhci-tegra.c
+++ b/drivers/usb/host/xhci-tegra.c
@@ -623,6 +623,13 @@ static void tegra_xusb_mbox_handle(struct tegra_xusb *tegra,
 								     enable);
 			if (err < 0)
 				break;
+
+			/*
+			 * wait 500us for LFPS detector to be disabled before
+			 * sending ACK
+			 */
+			if (!enable)
+				usleep_range(500, 1000);
 		}
 
 		if (err < 0) {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] xhci: make sure TRB is fully written before giving it to the controller
  2021-01-15 16:19 ` [PATCH 1/2] xhci: make sure TRB is fully written before giving it to the controller Mathias Nyman
@ 2021-01-15 16:40   ` Sergei Shtylyov
  2021-01-15 16:50     ` David Laight
  0 siblings, 1 reply; 7+ messages in thread
From: Sergei Shtylyov @ 2021-01-15 16:40 UTC (permalink / raw)
  To: Mathias Nyman, gregkh; +Cc: linux-usb, stable, Ross Zwisler

On 1/15/21 7:19 PM, Mathias Nyman wrote:

> Once the command ring doorbell is rung the xHC controller will parse all
> command TRBs on the command ring that have the cycle bit set properly.
> 
> If the driver just started writing the next command TRB to the ring when
> hardware finished the previous TRB, then HW might fetch an incomplete TRB
> as long as its cycle bit set correctly.
> 
> A command TRB is 16 bytes (128 bits) long.
> Driver writes the command TRB in four 32 bit chunks, with the chunk
> containing the cycle bit last. This does however not guarantee that
> chunks actually get written in that order.
> 
> This was detected in stress testing when canceling URBs with several
> connected USB devices.
> Two consecutive "Set TR Dequeue pointer" commands got queued right
> after each other, and the second one was only partially written when
> the controller parsed it, causing the dequeue pointer to be set
> to bogus values. This was seen as error messages:
> 
> "Mismatch between completed Set TR Deq Ptr command & xHCI internal state"
> 
> Solution is to add a write memory barrier before writing the cycle bit.
> 
> Cc: <stable@vger.kernel.org>
> Tested-by: Ross Zwisler <zwisler@google.com>
> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
> ---
>  drivers/usb/host/xhci-ring.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
> index 5677b81c0915..cf0c93a90200 100644
> --- a/drivers/usb/host/xhci-ring.c
> +++ b/drivers/usb/host/xhci-ring.c
> @@ -2931,6 +2931,8 @@ static void queue_trb(struct xhci_hcd *xhci, struct xhci_ring *ring,
>  	trb->field[0] = cpu_to_le32(field1);
>  	trb->field[1] = cpu_to_le32(field2);
>  	trb->field[2] = cpu_to_le32(field3);
> +	/* make sure TRB is fully written before giving it to the controller */
> +	wmb();

   Have you tried the lighter barrier, dma_wmb()? IIRC, it exists for these exact cases...

>  	trb->field[3] = cpu_to_le32(field4);
>  
>  	trace_xhci_queue_trb(ring, trb);

MBR, Sergei

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH 1/2] xhci: make sure TRB is fully written before giving it to the controller
  2021-01-15 16:40   ` Sergei Shtylyov
@ 2021-01-15 16:50     ` David Laight
  2021-01-15 17:21       ` Sergei Shtylyov
  0 siblings, 1 reply; 7+ messages in thread
From: David Laight @ 2021-01-15 16:50 UTC (permalink / raw)
  To: 'Sergei Shtylyov', Mathias Nyman, gregkh
  Cc: linux-usb, stable, Ross Zwisler

From: Sergei Shtylyov
> Sent: 15 January 2021 16:40
> 
> On 1/15/21 7:19 PM, Mathias Nyman wrote:
> 
> > Once the command ring doorbell is rung the xHC controller will parse all
> > command TRBs on the command ring that have the cycle bit set properly.
> >
> > If the driver just started writing the next command TRB to the ring when
> > hardware finished the previous TRB, then HW might fetch an incomplete TRB
> > as long as its cycle bit set correctly.
> >
> > A command TRB is 16 bytes (128 bits) long.
> > Driver writes the command TRB in four 32 bit chunks, with the chunk
> > containing the cycle bit last. This does however not guarantee that
> > chunks actually get written in that order.
> >
> > This was detected in stress testing when canceling URBs with several
> > connected USB devices.
> > Two consecutive "Set TR Dequeue pointer" commands got queued right
> > after each other, and the second one was only partially written when
> > the controller parsed it, causing the dequeue pointer to be set
> > to bogus values. This was seen as error messages:
> >
> > "Mismatch between completed Set TR Deq Ptr command & xHCI internal state"
> >
> > Solution is to add a write memory barrier before writing the cycle bit.
> >
> > Cc: <stable@vger.kernel.org>
> > Tested-by: Ross Zwisler <zwisler@google.com>
> > Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
> > ---
> >  drivers/usb/host/xhci-ring.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
> > index 5677b81c0915..cf0c93a90200 100644
> > --- a/drivers/usb/host/xhci-ring.c
> > +++ b/drivers/usb/host/xhci-ring.c
> > @@ -2931,6 +2931,8 @@ static void queue_trb(struct xhci_hcd *xhci, struct xhci_ring *ring,
> >  	trb->field[0] = cpu_to_le32(field1);
> >  	trb->field[1] = cpu_to_le32(field2);
> >  	trb->field[2] = cpu_to_le32(field3);
> > +	/* make sure TRB is fully written before giving it to the controller */
> > +	wmb();
> 
>    Have you tried the lighter barrier, dma_wmb()? IIRC, it exists for these exact cases...

Isn't dma_wmb() needed between the last memory write and the io_write to the doorbell?
Here we need to ensure the two memory writes aren't re-ordered.
Apart from alpha isn't a barrier() likely to be enough for that.
It is worth checking that the failing compiles didn't have the writes reordered.

	David

> 
> >  	trb->field[3] = cpu_to_le32(field4);
> >
> >  	trace_xhci_queue_trb(ring, trb);
> 
> MBR, Sergei

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] xhci: make sure TRB is fully written before giving it to the controller
  2021-01-15 16:50     ` David Laight
@ 2021-01-15 17:21       ` Sergei Shtylyov
  2021-01-18 12:07         ` Mathias Nyman
  0 siblings, 1 reply; 7+ messages in thread
From: Sergei Shtylyov @ 2021-01-15 17:21 UTC (permalink / raw)
  To: David Laight, Mathias Nyman, gregkh; +Cc: linux-usb, stable, Ross Zwisler

On 1/15/21 7:50 PM, David Laight wrote:
> From: Sergei Shtylyov
>> Sent: 15 January 2021 16:40
>>
>> On 1/15/21 7:19 PM, Mathias Nyman wrote:
>>
>>> Once the command ring doorbell is rung the xHC controller will parse all
>>> command TRBs on the command ring that have the cycle bit set properly.
>>>
>>> If the driver just started writing the next command TRB to the ring when
>>> hardware finished the previous TRB, then HW might fetch an incomplete TRB
>>> as long as its cycle bit set correctly.
>>>
>>> A command TRB is 16 bytes (128 bits) long.
>>> Driver writes the command TRB in four 32 bit chunks, with the chunk
>>> containing the cycle bit last. This does however not guarantee that
>>> chunks actually get written in that order.
>>>
>>> This was detected in stress testing when canceling URBs with several
>>> connected USB devices.
>>> Two consecutive "Set TR Dequeue pointer" commands got queued right
>>> after each other, and the second one was only partially written when
>>> the controller parsed it, causing the dequeue pointer to be set
>>> to bogus values. This was seen as error messages:
>>>
>>> "Mismatch between completed Set TR Deq Ptr command & xHCI internal state"
>>>
>>> Solution is to add a write memory barrier before writing the cycle bit.
>>>
>>> Cc: <stable@vger.kernel.org>
>>> Tested-by: Ross Zwisler <zwisler@google.com>
>>> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
>>> ---
>>>  drivers/usb/host/xhci-ring.c | 2 ++
>>>  1 file changed, 2 insertions(+)
>>>
>>> diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
>>> index 5677b81c0915..cf0c93a90200 100644
>>> --- a/drivers/usb/host/xhci-ring.c
>>> +++ b/drivers/usb/host/xhci-ring.c
>>> @@ -2931,6 +2931,8 @@ static void queue_trb(struct xhci_hcd *xhci, struct xhci_ring *ring,
>>>  	trb->field[0] = cpu_to_le32(field1);
>>>  	trb->field[1] = cpu_to_le32(field2);
>>>  	trb->field[2] = cpu_to_le32(field3);
>>> +	/* make sure TRB is fully written before giving it to the controller */
>>> +	wmb();
>>
>>    Have you tried the lighter barrier, dma_wmb()? IIRC, it exists for these exact cases...
> 
> Isn't dma_wmb() needed between the last memory write and the io_write to the doorbell?

   No.

> Here we need to ensure the two memory writes aren't re-ordered.

   No, we need all 3 ring memory writes to be ordered such that they all happen before the 4th
write. It's not wonder this bug hasn't been noticed before -- x86 has strong write ordering
unlike ARM/etc.

> Apart from alpha isn't a barrier() likely to be enough for that.

   Not sure -- we don't have any barriers before the equivalents of a doorbell write
in e.g. the Renesas Ehter driver.

> It is worth checking that the failing compiles didn't have the writes reordered.

  The writes are reordered not because of the compiler -- the read/write reordering is a
CPU feature (on at least non-x86). :-)

> 	David
> 
>>
>>>  	trb->field[3] = cpu_to_le32(field4);
>>>
>>>  	trace_xhci_queue_trb(ring, trb);

MBR, Sergei

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] xhci: make sure TRB is fully written before giving it to the controller
  2021-01-15 17:21       ` Sergei Shtylyov
@ 2021-01-18 12:07         ` Mathias Nyman
  2021-01-19 23:25           ` Ross Zwisler
  0 siblings, 1 reply; 7+ messages in thread
From: Mathias Nyman @ 2021-01-18 12:07 UTC (permalink / raw)
  To: Sergei Shtylyov, David Laight, gregkh; +Cc: linux-usb, stable, Ross Zwisler

On 15.1.2021 19.21, Sergei Shtylyov wrote:
> On 1/15/21 7:50 PM, David Laight wrote:
>> From: Sergei Shtylyov
>>> Sent: 15 January 2021 16:40
>>>
>>> On 1/15/21 7:19 PM, Mathias Nyman wrote:
>>>
>>>> Once the command ring doorbell is rung the xHC controller will parse all
>>>> command TRBs on the command ring that have the cycle bit set properly.
>>>>
>>>> If the driver just started writing the next command TRB to the ring when
>>>> hardware finished the previous TRB, then HW might fetch an incomplete TRB
>>>> as long as its cycle bit set correctly.
>>>>
>>>> A command TRB is 16 bytes (128 bits) long.
>>>> Driver writes the command TRB in four 32 bit chunks, with the chunk
>>>> containing the cycle bit last. This does however not guarantee that
>>>> chunks actually get written in that order.
>>>>
>>>> This was detected in stress testing when canceling URBs with several
>>>> connected USB devices.
>>>> Two consecutive "Set TR Dequeue pointer" commands got queued right
>>>> after each other, and the second one was only partially written when
>>>> the controller parsed it, causing the dequeue pointer to be set
>>>> to bogus values. This was seen as error messages:
>>>>
>>>> "Mismatch between completed Set TR Deq Ptr command & xHCI internal state"
>>>>
>>>> Solution is to add a write memory barrier before writing the cycle bit.
>>>>
>>>> Cc: <stable@vger.kernel.org>
>>>> Tested-by: Ross Zwisler <zwisler@google.com>
>>>> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
>>>> ---
>>>>  drivers/usb/host/xhci-ring.c | 2 ++
>>>>  1 file changed, 2 insertions(+)
>>>>
>>>> diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
>>>> index 5677b81c0915..cf0c93a90200 100644
>>>> --- a/drivers/usb/host/xhci-ring.c
>>>> +++ b/drivers/usb/host/xhci-ring.c
>>>> @@ -2931,6 +2931,8 @@ static void queue_trb(struct xhci_hcd *xhci, struct xhci_ring *ring,
>>>>  	trb->field[0] = cpu_to_le32(field1);
>>>>  	trb->field[1] = cpu_to_le32(field2);
>>>>  	trb->field[2] = cpu_to_le32(field3);
>>>> +	/* make sure TRB is fully written before giving it to the controller */
>>>> +	wmb();
>>>
>>>    Have you tried the lighter barrier, dma_wmb()? IIRC, it exists for these exact cases...
>>

True, good point, dma_wmb() should be enough here.
In fact most other wmb()s in xhci could be turned into dma_wmb().

Looks like Greg already picked this so maybe a later patch to usb-next that does this
wmb() -> dma_wmb() optimization where possible.

>> Isn't dma_wmb() needed between the last memory write and the io_write to the doorbell?
> 
>    No.

Transfer trbs already have a wmb in giveback_first_trb() 
So no need in that case.

For command trbs it's unlikely but not impossible.
The issue we are solving here is xHC controller parsing two commands after a doorbell ring.
First one was the intended, properly written command. Second was a out-of order
partially written command. driver didn't even ring the doorbell for the second command yet.

There are a couple operations between trb last memory write and command doorbell ring.
a wmb() in that place would solve a case where memory write is so out of order and delayed
that xHC controller reads and reacts to the doorbell ring, and reads the command ring
before the memory write to the command ring is done. Unlikely but not impossible.

No such issues seen so far, but maybe a dma_wmb() in xhci_ring_cmd_db() wouldn't hurt.

-Mathias

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] xhci: make sure TRB is fully written before giving it to the controller
  2021-01-18 12:07         ` Mathias Nyman
@ 2021-01-19 23:25           ` Ross Zwisler
  0 siblings, 0 replies; 7+ messages in thread
From: Ross Zwisler @ 2021-01-19 23:25 UTC (permalink / raw)
  To: Mathias Nyman; +Cc: Sergei Shtylyov, David Laight, gregkh, linux-usb, stable

On Mon, Jan 18, 2021 at 5:05 AM Mathias Nyman
<mathias.nyman@linux.intel.com> wrote:
<>
> True, good point, dma_wmb() should be enough here.
> In fact most other wmb()s in xhci could be turned into dma_wmb().

FWIW I've confirmed in my testing that dma_wmb() does indeed also
solve the problem.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-01-19 23:27 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20210115161907.2875631-1-mathias.nyman@linux.intel.com>
2021-01-15 16:19 ` [PATCH 1/2] xhci: make sure TRB is fully written before giving it to the controller Mathias Nyman
2021-01-15 16:40   ` Sergei Shtylyov
2021-01-15 16:50     ` David Laight
2021-01-15 17:21       ` Sergei Shtylyov
2021-01-18 12:07         ` Mathias Nyman
2021-01-19 23:25           ` Ross Zwisler
2021-01-15 16:19 ` [PATCH 2/2] xhci: tegra: Delay for disabling LFPS detector Mathias Nyman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).