All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 1/2] usb: dwc3: gadget: Prevent losing events in event cache
@ 2017-05-12  0:26 Thinh Nguyen
  2017-05-12  0:26 ` [PATCH v2 2/2] usb: gadget: f_mass_storage: Serialize wake and sleep execution Thinh Nguyen
  0 siblings, 1 reply; 5+ messages in thread
From: Thinh Nguyen @ 2017-05-12  0:26 UTC (permalink / raw)
  To: Felipe Balbi, linux-usb; +Cc: John Youn, stable

The dwc3 driver can overwite its previous events if its top-half IRQ
handler (TH) gets invoked again before processing the events in the
cache. We see this as a hang in the file transfer and the host will
attempt to reset the device. TH gets the event count and deasserts the
interrupt line by writing DWC3_GEVNTSIZ_INTMASK to DWC3_GEVNTSIZ. If
there's a new event coming between reading the event count and interrupt
deassertion, dwc3 will lose previous pending events. More generally, we
will see 0 event count, which should not affect anything.

This shouldn't be possible in the current dwc3 implementation. However,
through testing and reading the PCIe trace, the TH occasionally still
gets invoked one more time after HW interrupt deassertion. (With PCIe
legacy interrupts, TH is called repeatedly as long as the interrupt line
is asserted). We suspect that there is a small detection delay in the
SW.

To avoid this issue, Check DWC3_EVENT_PENDING flag to determine if the
events are processed in the bottom-half IRQ handler. If not, return
IRQ_HANDLED and don't process new event.

Cc: stable@vger.kernel.org
Signed-off-by: Thinh Nguyen <thinhn@synopsys.com>
---
 drivers/usb/dwc3/gadget.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 6f6f0b3be3ad..6ae998aee837 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -3026,6 +3026,15 @@ static irqreturn_t dwc3_check_event_buf(struct dwc3_event_buffer *evt)
 		return IRQ_HANDLED;
 	}
 
+	/*
+	 * With PCIe legacy interrupt, test shows that top-half irq handler can
+	 * be called again after HW interrupt deassertion. Check if bottom-half
+	 * irq event handler completes before caching new event to prevent
+	 * losing events.
+	 */
+	if (evt->flags & DWC3_EVENT_PENDING)
+		return IRQ_HANDLED;
+
 	count = dwc3_readl(dwc->regs, DWC3_GEVNTCOUNT(0));
 	count &= DWC3_GEVNTCOUNT_MASK;
 	if (!count)
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v2 2/2] usb: gadget: f_mass_storage: Serialize wake and sleep execution
  2017-05-12  0:26 [PATCH v2 1/2] usb: dwc3: gadget: Prevent losing events in event cache Thinh Nguyen
@ 2017-05-12  0:26 ` Thinh Nguyen
  2017-05-16  8:09   ` Felipe Balbi
  2017-05-18 22:26   ` Thinh Nguyen
  0 siblings, 2 replies; 5+ messages in thread
From: Thinh Nguyen @ 2017-05-12  0:26 UTC (permalink / raw)
  To: Felipe Balbi, linux-usb, Philipp Gesang, Ingo Molnar
  Cc: John Youn, Alan Stern, stable

f_mass_storage has a memorry barrier issue with the sleep and wake
functions that can cause a deadlock. This results in intermittent hangs
during MSC file transfer. The host will reset the device after receiving
no response to resume the transfer. This issue is seen when dwc3 is
processing 2 transfer-in-progress events at the same time, invoking
completion handlers for CSW and CBW. Also this issue occurs depending on
the system timing and latency.

To increase the chance to hit this issue, you can force dwc3 driver to
wait and process those 2 events at once by adding a small delay (~100us)
in dwc3_check_event_buf() whenever the request is for CSW and read the
event count again. Avoid debugging with printk and ftrace as extra
delays and memory barrier will mask this issue.

Scenario which can lead to failure:
-----------------------------------
1) The main thread sleeps and waits for the next command in
   get_next_command().
2) bulk_in_complete() wakes up main thread for CSW.
3) bulk_out_complete() tries to wake up the running main thread for CBW.
4) thread_wakeup_needed is not loaded with correct value in
   sleep_thread().
5) Main thread goes to sleep again.

The pattern is shown below. Note the 2 critical variables.
 * common->thread_wakeup_needed
 * bh->state

	CPU 0 (sleep_thread)		CPU 1 (wakeup_thread)
	==============================  ===============================

					bh->state = BH_STATE_FULL;
					smp_wmb();
	thread_wakeup_needed = 0;	thread_wakeup_needed = 1;
	smp_rmb();
	if (bh->state != BH_STATE_FULL)
		sleep again ...

As pointed out by Alan Stern, this is an R-pattern issue. The issue can
be seen when there are two wakeups in quick succession. The
thread_wakeup_needed can be overwritten in sleep_thread, and the read of
the bh->state maybe reordered before the write to thread_wakeup_needed.

This patch applies full memory barrier smp_mb() in both sleep_thread()
and wakeup_thread() to ensure the order which the thread_wakeup_needed
and bh->state are written and loaded.

However, a better solution in the future would be to use wait_queue
method that takes care of managing memory barrier between waker and
waiter.

Cc: stable@vger.kernel.org
Signed-off-by: Thinh Nguyen <thinhn@synopsys.com>
---
 drivers/usb/gadget/function/f_mass_storage.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/usb/gadget/function/f_mass_storage.c b/drivers/usb/gadget/function/f_mass_storage.c
index 4c8aacc232c0..74d57d6994da 100644
--- a/drivers/usb/gadget/function/f_mass_storage.c
+++ b/drivers/usb/gadget/function/f_mass_storage.c
@@ -396,7 +396,11 @@ static int fsg_set_halt(struct fsg_dev *fsg, struct usb_ep *ep)
 /* Caller must hold fsg->lock */
 static void wakeup_thread(struct fsg_common *common)
 {
-	smp_wmb();	/* ensure the write of bh->state is complete */
+	/*
+	 * Ensure the reading of thread_wakeup_needed
+	 * and the writing of bh->state are completed
+	 */
+	smp_mb();
 	/* Tell the main thread that something has happened */
 	common->thread_wakeup_needed = 1;
 	if (common->thread_task)
@@ -627,7 +631,12 @@ static int sleep_thread(struct fsg_common *common, bool can_freeze)
 	}
 	__set_current_state(TASK_RUNNING);
 	common->thread_wakeup_needed = 0;
-	smp_rmb();	/* ensure the latest bh->state is visible */
+
+	/*
+	 * Ensure the writing of thread_wakeup_needed
+	 * and the reading of bh->state are completed
+	 */
+	smp_mb();
 	return rc;
 }
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 2/2] usb: gadget: f_mass_storage: Serialize wake and sleep execution
  2017-05-12  0:26 ` [PATCH v2 2/2] usb: gadget: f_mass_storage: Serialize wake and sleep execution Thinh Nguyen
@ 2017-05-16  8:09   ` Felipe Balbi
  2017-05-16 13:58     ` Alan Stern
  2017-05-18 22:26   ` Thinh Nguyen
  1 sibling, 1 reply; 5+ messages in thread
From: Felipe Balbi @ 2017-05-16  8:09 UTC (permalink / raw)
  To: Thinh Nguyen, linux-usb, Philipp Gesang, Ingo Molnar
  Cc: John Youn, Alan Stern, stable

[-- Attachment #1: Type: text/plain, Size: 2548 bytes --]


Hi,

Thinh Nguyen <Thinh.Nguyen@synopsys.com> writes:
> f_mass_storage has a memorry barrier issue with the sleep and wake
> functions that can cause a deadlock. This results in intermittent hangs
> during MSC file transfer. The host will reset the device after receiving
> no response to resume the transfer. This issue is seen when dwc3 is
> processing 2 transfer-in-progress events at the same time, invoking
> completion handlers for CSW and CBW. Also this issue occurs depending on
> the system timing and latency.
>
> To increase the chance to hit this issue, you can force dwc3 driver to
> wait and process those 2 events at once by adding a small delay (~100us)
> in dwc3_check_event_buf() whenever the request is for CSW and read the
> event count again. Avoid debugging with printk and ftrace as extra
> delays and memory barrier will mask this issue.
>
> Scenario which can lead to failure:
> -----------------------------------
> 1) The main thread sleeps and waits for the next command in
>    get_next_command().
> 2) bulk_in_complete() wakes up main thread for CSW.
> 3) bulk_out_complete() tries to wake up the running main thread for CBW.
> 4) thread_wakeup_needed is not loaded with correct value in
>    sleep_thread().
> 5) Main thread goes to sleep again.
>
> The pattern is shown below. Note the 2 critical variables.
>  * common->thread_wakeup_needed
>  * bh->state
>
> 	CPU 0 (sleep_thread)		CPU 1 (wakeup_thread)
> 	==============================  ===============================
>
> 					bh->state = BH_STATE_FULL;
> 					smp_wmb();
> 	thread_wakeup_needed = 0;	thread_wakeup_needed = 1;
> 	smp_rmb();
> 	if (bh->state != BH_STATE_FULL)
> 		sleep again ...
>
> As pointed out by Alan Stern, this is an R-pattern issue. The issue can
> be seen when there are two wakeups in quick succession. The
> thread_wakeup_needed can be overwritten in sleep_thread, and the read of
> the bh->state maybe reordered before the write to thread_wakeup_needed.
>
> This patch applies full memory barrier smp_mb() in both sleep_thread()
> and wakeup_thread() to ensure the order which the thread_wakeup_needed
> and bh->state are written and loaded.
>
> However, a better solution in the future would be to use wait_queue
> method that takes care of managing memory barrier between waker and
> waiter.
>
> Cc: stable@vger.kernel.org
> Signed-off-by: Thinh Nguyen <thinhn@synopsys.com>

Alan, just to make sure you're okay with this patch. Can I have your
acked-by?

-- 
balbi

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 2/2] usb: gadget: f_mass_storage: Serialize wake and sleep execution
  2017-05-16  8:09   ` Felipe Balbi
@ 2017-05-16 13:58     ` Alan Stern
  0 siblings, 0 replies; 5+ messages in thread
From: Alan Stern @ 2017-05-16 13:58 UTC (permalink / raw)
  To: Felipe Balbi
  Cc: Thinh Nguyen, linux-usb, Philipp Gesang, Ingo Molnar, John Youn, stable

On Tue, 16 May 2017, Felipe Balbi wrote:

> Hi,
> 
> Thinh Nguyen <Thinh.Nguyen@synopsys.com> writes:
> > f_mass_storage has a memorry barrier issue with the sleep and wake
> > functions that can cause a deadlock. This results in intermittent hangs
> > during MSC file transfer. The host will reset the device after receiving
> > no response to resume the transfer. This issue is seen when dwc3 is
> > processing 2 transfer-in-progress events at the same time, invoking
> > completion handlers for CSW and CBW. Also this issue occurs depending on
> > the system timing and latency.
> >
> > To increase the chance to hit this issue, you can force dwc3 driver to
> > wait and process those 2 events at once by adding a small delay (~100us)
> > in dwc3_check_event_buf() whenever the request is for CSW and read the
> > event count again. Avoid debugging with printk and ftrace as extra
> > delays and memory barrier will mask this issue.
> >
> > Scenario which can lead to failure:
> > -----------------------------------
> > 1) The main thread sleeps and waits for the next command in
> >    get_next_command().
> > 2) bulk_in_complete() wakes up main thread for CSW.
> > 3) bulk_out_complete() tries to wake up the running main thread for CBW.
> > 4) thread_wakeup_needed is not loaded with correct value in
> >    sleep_thread().
> > 5) Main thread goes to sleep again.
> >
> > The pattern is shown below. Note the 2 critical variables.
> >  * common->thread_wakeup_needed
> >  * bh->state
> >
> > 	CPU 0 (sleep_thread)		CPU 1 (wakeup_thread)
> > 	==============================  ===============================
> >
> > 					bh->state = BH_STATE_FULL;
> > 					smp_wmb();
> > 	thread_wakeup_needed = 0;	thread_wakeup_needed = 1;
> > 	smp_rmb();
> > 	if (bh->state != BH_STATE_FULL)
> > 		sleep again ...
> >
> > As pointed out by Alan Stern, this is an R-pattern issue. The issue can
> > be seen when there are two wakeups in quick succession. The
> > thread_wakeup_needed can be overwritten in sleep_thread, and the read of
> > the bh->state maybe reordered before the write to thread_wakeup_needed.
> >
> > This patch applies full memory barrier smp_mb() in both sleep_thread()
> > and wakeup_thread() to ensure the order which the thread_wakeup_needed
> > and bh->state are written and loaded.
> >
> > However, a better solution in the future would be to use wait_queue
> > method that takes care of managing memory barrier between waker and
> > waiter.
> >
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Thinh Nguyen <thinhn@synopsys.com>
> 
> Alan, just to make sure you're okay with this patch. Can I have your
> acked-by?

Acked-by: Alan Stern <stern@rowland.harvard.edu>

This patch is suitable for the -stable kernels.  However, going forward 
the patches I sent for testing will be a better solution overall (and 
obviously they will clash with this one).

Alan STern

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 2/2] usb: gadget: f_mass_storage: Serialize wake and sleep execution
  2017-05-12  0:26 ` [PATCH v2 2/2] usb: gadget: f_mass_storage: Serialize wake and sleep execution Thinh Nguyen
  2017-05-16  8:09   ` Felipe Balbi
@ 2017-05-18 22:26   ` Thinh Nguyen
  1 sibling, 0 replies; 5+ messages in thread
From: Thinh Nguyen @ 2017-05-18 22:26 UTC (permalink / raw)
  To: Felipe Balbi
  Cc: linux-usb, Philipp Gesang, Ingo Molnar, John Youn, Alan Stern, stable

Hi Felipe,

On 5/11/2017 5:27 PM, Thinh Nguyen wrote:
> f_mass_storage has a memorry barrier issue with the sleep and wake
> functions that can cause a deadlock. This results in intermittent hangs
> during MSC file transfer. The host will reset the device after receiving
> no response to resume the transfer. This issue is seen when dwc3 is
> processing 2 transfer-in-progress events at the same time, invoking
> completion handlers for CSW and CBW. Also this issue occurs depending on
> the system timing and latency.
> 
> To increase the chance to hit this issue, you can force dwc3 driver to
> wait and process those 2 events at once by adding a small delay (~100us)
> in dwc3_check_event_buf() whenever the request is for CSW and read the
> event count again. Avoid debugging with printk and ftrace as extra
> delays and memory barrier will mask this issue.
> 
> Scenario which can lead to failure:
> -----------------------------------
> 1) The main thread sleeps and waits for the next command in
>     get_next_command().
> 2) bulk_in_complete() wakes up main thread for CSW.
> 3) bulk_out_complete() tries to wake up the running main thread for CBW.
> 4) thread_wakeup_needed is not loaded with correct value in
>     sleep_thread().
> 5) Main thread goes to sleep again.
> 
> The pattern is shown below. Note the 2 critical variables.
>   * common->thread_wakeup_needed
>   * bh->state
> 
> 	CPU 0 (sleep_thread)		CPU 1 (wakeup_thread)
> 	==============================  ===============================
> 
> 					bh->state = BH_STATE_FULL;
> 					smp_wmb();
> 	thread_wakeup_needed = 0;	thread_wakeup_needed = 1;
> 	smp_rmb();
> 	if (bh->state != BH_STATE_FULL)
> 		sleep again ...
> 
> As pointed out by Alan Stern, this is an R-pattern issue. The issue can
> be seen when there are two wakeups in quick succession. The
> thread_wakeup_needed can be overwritten in sleep_thread, and the read of
> the bh->state maybe reordered before the write to thread_wakeup_needed.
> 
> This patch applies full memory barrier smp_mb() in both sleep_thread()
> and wakeup_thread() to ensure the order which the thread_wakeup_needed
> and bh->state are written and loaded.
> 
> However, a better solution in the future would be to use wait_queue
> method that takes care of managing memory barrier between waker and
> waiter.
> 
> Cc: stable@vger.kernel.org
> Signed-off-by: Thinh Nguyen <thinhn@synopsys.com>
> ---

Will this patch go on the rc?

Thanks,
Thinh

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-05-18 22:26 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-12  0:26 [PATCH v2 1/2] usb: dwc3: gadget: Prevent losing events in event cache Thinh Nguyen
2017-05-12  0:26 ` [PATCH v2 2/2] usb: gadget: f_mass_storage: Serialize wake and sleep execution Thinh Nguyen
2017-05-16  8:09   ` Felipe Balbi
2017-05-16 13:58     ` Alan Stern
2017-05-18 22:26   ` Thinh Nguyen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.