All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 1/2] usb: dwc2: host: Don't retry NAKed transactions right away
@ 2017-10-30 17:08 Douglas Anderson
  2017-10-30 17:08 ` [PATCH v3 2/2] usb: dwc2: host: Convert hcd_queue to timer_setup Douglas Anderson
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Douglas Anderson @ 2017-10-30 17:08 UTC (permalink / raw)
  To: balbi, johnyoun
  Cc: stefan.wahren, amstan, linux-rockchip, gregkh, johan, eric, mka,
	john.stultz, linux-rpi-kernel, jwerner, Douglas Anderson, stable,
	linux-usb, linux-kernel

On rk3288-veyron devices on Chrome OS it was found that plugging in an
Arduino-based USB device could cause the system to lockup, especially
if the CPU Frequency was at one of the slower operating points (like
100 MHz / 200 MHz).

Upon tracing, I found that the following was happening:
* The USB device (full speed) was connected to a high speed hub and
  then to the rk3288.  Thus, we were dealing with split transactions,
  which is all handled in software on dwc2.
* Userspace was initiating a BULK IN transfer
* When we sent the SSPLIT (to start the split transaction), we got an
  ACK.  Good.  Then we issued the CSPLIT.
* When we sent the CSPLIT, we got back a NAK.  We immediately (from
  the interrupt handler) started to retry and sent another SSPLIT.
* The device kept NAKing our CSPLIT, so we kept ping-ponging between
  sending a SSPLIT and a CSPLIT, each time sending from the interrupt
  handler.
* The handling of the interrupts was (because of the low CPU speed and
  the inefficiency of the dwc2 interrupt handler) was actually taking
  _longer_ than it took the other side to send the ACK/NAK.  Thus we
  were _always_ in the USB interrupt routine.
* The fact that USB interrupts were always going off was preventing
  other things from happening in the system.  This included preventing
  the system from being able to transition to a higher CPU frequency.

As I understand it, there is no requirement to retry super quickly
after a NAK, we just have to retry sometime in the future.  Thus one
solution to the above is to just add a delay between getting a NAK and
retrying the transmission.  If this delay is sufficiently long to get
out of the interrupt routine then the rest of the system will be able
to make forward progress.  Even a 25 us delay would probably be
enough, but we'll be extra conservative and try to delay 1 ms (the
exact amount depends on HZ and the accuracy of the jiffy and how close
the current jiffy is to ticking, but could be as much as 20 ms or as
little as 1 ms).

Presumably adding a delay like this could impact the USB throughput,
so we only add the delay with repeated NAKs.

NOTE: Upon further testing of a pl2303 serial adapter, I found that
this fix may help with problems there.  Specifically I found that the
pl2303 serial adapters tend to respond with a NAK when they have
nothing to say and thus we end with this same sequence.

Signed-off-by: Douglas Anderson <dianders@chromium.org>
Cc: stable@vger.kernel.org
Reviewed-by: Julius Werner <jwerner@chromium.org>
Tested-by: Stefan Wahren <stefan.wahren@i2se.com>
---

Changes in v3:
- Add tested-by for Stefan Wahren
- Sent to Felipe Balbi as candiate to land this.
- Add Cc for stable (it's always been broken so go as far is as easy)

Changes in v2:
- Address http://crosreview.com/737520 feedback

 drivers/usb/dwc2/core.h      |  1 +
 drivers/usb/dwc2/hcd.c       |  7 ++++
 drivers/usb/dwc2/hcd.h       |  9 +++++
 drivers/usb/dwc2/hcd_intr.c  | 20 +++++++++++
 drivers/usb/dwc2/hcd_queue.c | 81 +++++++++++++++++++++++++++++++++++++++++---
 5 files changed, 114 insertions(+), 4 deletions(-)

diff --git a/drivers/usb/dwc2/core.h b/drivers/usb/dwc2/core.h
index 8367d4f985c1..c0807e558726 100644
--- a/drivers/usb/dwc2/core.h
+++ b/drivers/usb/dwc2/core.h
@@ -962,6 +962,7 @@ struct dwc2_hsotg {
 	} flags;
 
 	struct list_head non_periodic_sched_inactive;
+	struct list_head non_periodic_sched_waiting;
 	struct list_head non_periodic_sched_active;
 	struct list_head *non_periodic_qh_ptr;
 	struct list_head periodic_sched_inactive;
diff --git a/drivers/usb/dwc2/hcd.c b/drivers/usb/dwc2/hcd.c
index c2631145f404..ed92ae78dcf9 100644
--- a/drivers/usb/dwc2/hcd.c
+++ b/drivers/usb/dwc2/hcd.c
@@ -653,6 +653,10 @@ static void dwc2_dump_channel_info(struct dwc2_hsotg *hsotg,
 	list_for_each_entry(qh, &hsotg->non_periodic_sched_inactive,
 			    qh_list_entry)
 		dev_dbg(hsotg->dev, "    %p\n", qh);
+	dev_dbg(hsotg->dev, "  NP waiting sched:\n");
+	list_for_each_entry(qh, &hsotg->non_periodic_sched_waiting,
+			    qh_list_entry)
+		dev_dbg(hsotg->dev, "    %p\n", qh);
 	dev_dbg(hsotg->dev, "  NP active sched:\n");
 	list_for_each_entry(qh, &hsotg->non_periodic_sched_active,
 			    qh_list_entry)
@@ -1812,6 +1816,7 @@ static void dwc2_qh_list_free(struct dwc2_hsotg *hsotg,
 static void dwc2_kill_all_urbs(struct dwc2_hsotg *hsotg)
 {
 	dwc2_kill_urbs_in_qh_list(hsotg, &hsotg->non_periodic_sched_inactive);
+	dwc2_kill_urbs_in_qh_list(hsotg, &hsotg->non_periodic_sched_waiting);
 	dwc2_kill_urbs_in_qh_list(hsotg, &hsotg->non_periodic_sched_active);
 	dwc2_kill_urbs_in_qh_list(hsotg, &hsotg->periodic_sched_inactive);
 	dwc2_kill_urbs_in_qh_list(hsotg, &hsotg->periodic_sched_ready);
@@ -4989,6 +4994,7 @@ static void dwc2_hcd_free(struct dwc2_hsotg *hsotg)
 
 	/* Free memory for QH/QTD lists */
 	dwc2_qh_list_free(hsotg, &hsotg->non_periodic_sched_inactive);
+	dwc2_qh_list_free(hsotg, &hsotg->non_periodic_sched_waiting);
 	dwc2_qh_list_free(hsotg, &hsotg->non_periodic_sched_active);
 	dwc2_qh_list_free(hsotg, &hsotg->periodic_sched_inactive);
 	dwc2_qh_list_free(hsotg, &hsotg->periodic_sched_ready);
@@ -5151,6 +5157,7 @@ int dwc2_hcd_init(struct dwc2_hsotg *hsotg)
 
 	/* Initialize the non-periodic schedule */
 	INIT_LIST_HEAD(&hsotg->non_periodic_sched_inactive);
+	INIT_LIST_HEAD(&hsotg->non_periodic_sched_waiting);
 	INIT_LIST_HEAD(&hsotg->non_periodic_sched_active);
 
 	/* Initialize the periodic schedule */
diff --git a/drivers/usb/dwc2/hcd.h b/drivers/usb/dwc2/hcd.h
index 11c3c145b793..af4f4360e8b7 100644
--- a/drivers/usb/dwc2/hcd.h
+++ b/drivers/usb/dwc2/hcd.h
@@ -313,12 +313,16 @@ struct dwc2_hs_transfer_time {
  *                      descriptor and indicates original XferSize value for the
  *                      descriptor
  * @unreserve_timer:    Timer for releasing periodic reservation.
+ * @wait_timer:         Timer used to wait before re-queuing.
  * @dwc2_tt:            Pointer to our tt info (or NULL if no tt).
  * @ttport:             Port number within our tt.
  * @tt_buffer_dirty     True if clear_tt_buffer_complete is pending
  * @unreserve_pending:  True if we planned to unreserve but haven't yet.
  * @schedule_low_speed: True if we have a low/full speed component (either the
  *			host is in low/full speed mode or do_split).
+ * @want_wait:          We should wait before re-queuing; only matters for non-
+ *                      periodic transfers and is ignored for periodic ones.
+ * @wait_timer_cancel:  Set to true to cancel the wait_timer.
  *
  * A Queue Head (QH) holds the static characteristics of an endpoint and
  * maintains a list of transfers (QTDs) for that endpoint. A QH structure may
@@ -353,11 +357,14 @@ struct dwc2_qh {
 	u32 desc_list_sz;
 	u32 *n_bytes;
 	struct timer_list unreserve_timer;
+	struct timer_list wait_timer;
 	struct dwc2_tt *dwc_tt;
 	int ttport;
 	unsigned tt_buffer_dirty:1;
 	unsigned unreserve_pending:1;
 	unsigned schedule_low_speed:1;
+	unsigned want_wait:1;
+	unsigned wait_timer_cancel:1;
 };
 
 /**
@@ -388,6 +395,7 @@ struct dwc2_qh {
  * @n_desc:             Number of DMA descriptors for this QTD
  * @isoc_frame_index_last: Last activated frame (packet) index, used in
  *                      descriptor DMA mode only
+ * @num_naks:           Number of NAKs received on this QTD.
  * @urb:                URB for this transfer
  * @qh:                 Queue head for this QTD
  * @qtd_list_entry:     For linking to the QH's list of QTDs
@@ -418,6 +426,7 @@ struct dwc2_qtd {
 	u8 error_count;
 	u8 n_desc;
 	u16 isoc_frame_index_last;
+	u16 num_naks;
 	struct dwc2_hcd_urb *urb;
 	struct dwc2_qh *qh;
 	struct list_head qtd_list_entry;
diff --git a/drivers/usb/dwc2/hcd_intr.c b/drivers/usb/dwc2/hcd_intr.c
index 28a8210710b1..aad3abd2ddf2 100644
--- a/drivers/usb/dwc2/hcd_intr.c
+++ b/drivers/usb/dwc2/hcd_intr.c
@@ -52,6 +52,12 @@
 #include "core.h"
 #include "hcd.h"
 
+/*
+ * If we get this many NAKs on a split transaction we'll slow down
+ * retransmission.  A 1 here means delay after the first NAK.
+ */
+#define DWC2_NAKS_BEFORE_DELAY		3
+
 /* This function is for debug only */
 static void dwc2_track_missed_sofs(struct dwc2_hsotg *hsotg)
 {
@@ -1200,11 +1206,25 @@ static void dwc2_hc_nak_intr(struct dwc2_hsotg *hsotg,
 	/*
 	 * Handle NAK for IN/OUT SSPLIT/CSPLIT transfers, bulk, control, and
 	 * interrupt. Re-start the SSPLIT transfer.
+	 *
+	 * Normally for non-periodic transfers we'll retry right away, but to
+	 * avoid interrupt storms we'll wait before retrying if we've got
+	 * several NAKs. If we didn't do this we'd retry directly from the
+	 * interrupt handler and could end up quickly getting another
+	 * interrupt (another NAK), which we'd retry.
+	 *
+	 * Note that in DMA mode software only gets involved to re-send NAKed
+	 * transfers for split transactions, so we only need to apply this
+	 * delaying logic when handling splits. In non-DMA mode presumably we
+	 * might want a similar delay if someone can demonstrate this problem
+	 * affects that code path too.
 	 */
 	if (chan->do_split) {
 		if (chan->complete_split)
 			qtd->error_count = 0;
 		qtd->complete_split = 0;
+		qtd->num_naks++;
+		qtd->qh->want_wait = qtd->num_naks >= DWC2_NAKS_BEFORE_DELAY;
 		dwc2_halt_channel(hsotg, chan, qtd, DWC2_HC_XFER_NAK);
 		goto handle_nak_done;
 	}
diff --git a/drivers/usb/dwc2/hcd_queue.c b/drivers/usb/dwc2/hcd_queue.c
index 3ae8b1bbaa55..bea0aadd756e 100644
--- a/drivers/usb/dwc2/hcd_queue.c
+++ b/drivers/usb/dwc2/hcd_queue.c
@@ -57,6 +57,9 @@
 /* Wait this long before releasing periodic reservation */
 #define DWC2_UNRESERVE_DELAY (msecs_to_jiffies(5))
 
+/* If we get a NAK, wait this long before retrying */
+#define DWC2_RETRY_WAIT_DELAY (msecs_to_jiffies(1))
+
 /**
  * dwc2_periodic_channel_available() - Checks that a channel is available for a
  * periodic transfer
@@ -1439,6 +1442,55 @@ static void dwc2_deschedule_periodic(struct dwc2_hsotg *hsotg,
 	list_del_init(&qh->qh_list_entry);
 }
 
+/**
+ * dwc2_wait_timer_fn() - Timer function to re-queue after waiting
+ *
+ * As per the spec, a NAK indicates that "a function is temporarily unable to
+ * transmit or receive data, but will eventually be able to do so without need
+ * of host intervention".
+ *
+ * That means that when we encounter a NAK we're supposed to retry.
+ *
+ * ...but if we retry right away (from the interrupt handler that saw the NAK)
+ * then we can end up with an interrupt storm (if the other side keeps NAKing
+ * us) because on slow enough CPUs it could take us longer to get out of the
+ * interrupt routine than it takes for the device to send another NAK.  That
+ * leads to a constant stream of NAK interrupts and the CPU locks.
+ *
+ * ...so instead of retrying right away in the case of a NAK we'll set a timer
+ * to retry some time later.  This function handles that timer and moves the
+ * qh back to the "inactive" list, then queues transactions.
+ *
+ * @data: Pointer to a qh to re-schedule.
+ */
+static void dwc2_wait_timer_fn(unsigned long data)
+{
+	struct dwc2_qh *qh = (struct dwc2_qh *)data;
+	struct dwc2_hsotg *hsotg = qh->hsotg;
+	unsigned long flags;
+
+	spin_lock_irqsave(&hsotg->lock, flags);
+
+	/*
+	 * We'll set wait_timer_cancel to true if we want to cancel this
+	 * operation in dwc2_hcd_qh_unlink().
+	 */
+	if (!qh->wait_timer_cancel) {
+		enum dwc2_transaction_type tr_type;
+
+		qh->want_wait = false;
+
+		list_move(&qh->qh_list_entry,
+			  &hsotg->non_periodic_sched_inactive);
+
+		tr_type = dwc2_hcd_select_transactions(hsotg);
+		if (tr_type != DWC2_TRANSACTION_NONE)
+			dwc2_hcd_queue_transactions(hsotg, tr_type);
+	}
+
+	spin_unlock_irqrestore(&hsotg->lock, flags);
+}
+
 /**
  * dwc2_qh_init() - Initializes a QH structure
  *
@@ -1468,6 +1520,7 @@ static void dwc2_qh_init(struct dwc2_hsotg *hsotg, struct dwc2_qh *qh,
 	qh->hsotg = hsotg;
 	setup_timer(&qh->unreserve_timer, dwc2_unreserve_timer_fn,
 		    (unsigned long)qh);
+	setup_timer(&qh->wait_timer, dwc2_wait_timer_fn, (unsigned long)qh);
 	qh->ep_type = ep_type;
 	qh->ep_is_in = ep_is_in;
 
@@ -1628,6 +1681,16 @@ void dwc2_hcd_qh_free(struct dwc2_hsotg *hsotg, struct dwc2_qh *qh)
 		dwc2_do_unreserve(hsotg, qh);
 		spin_unlock_irqrestore(&hsotg->lock, flags);
 	}
+
+	/*
+	 * We don't have the lock so we can safely wait until the wait timer
+	 * finishes.  Of course, at this point in time we'd better have set
+	 * wait_timer_active to false so if this timer was still pending it
+	 * won't do anything anyway, but we want it to finish before we free
+	 * memory.
+	 */
+	del_timer_sync(&qh->wait_timer);
+
 	dwc2_host_put_tt_info(hsotg, qh->dwc_tt);
 
 	if (qh->desc_list)
@@ -1663,9 +1726,16 @@ int dwc2_hcd_qh_add(struct dwc2_hsotg *hsotg, struct dwc2_qh *qh)
 		qh->start_active_frame = hsotg->frame_number;
 		qh->next_active_frame = qh->start_active_frame;
 
-		/* Always start in inactive schedule */
-		list_add_tail(&qh->qh_list_entry,
-			      &hsotg->non_periodic_sched_inactive);
+		if (qh->want_wait) {
+			list_add_tail(&qh->qh_list_entry,
+				      &hsotg->non_periodic_sched_waiting);
+			qh->wait_timer_cancel = false;
+			mod_timer(&qh->wait_timer,
+				  jiffies + DWC2_RETRY_WAIT_DELAY + 1);
+		} else {
+			list_add_tail(&qh->qh_list_entry,
+				      &hsotg->non_periodic_sched_inactive);
+		}
 		return 0;
 	}
 
@@ -1695,6 +1765,9 @@ void dwc2_hcd_qh_unlink(struct dwc2_hsotg *hsotg, struct dwc2_qh *qh)
 
 	dev_vdbg(hsotg->dev, "%s()\n", __func__);
 
+	/* If the wait_timer is pending, this will stop it from acting */
+	qh->wait_timer_cancel = true;
+
 	if (list_empty(&qh->qh_list_entry))
 		/* QH is not in a schedule */
 		return;
@@ -1903,7 +1976,7 @@ void dwc2_hcd_qh_deactivate(struct dwc2_hsotg *hsotg, struct dwc2_qh *qh,
 	if (dwc2_qh_is_non_per(qh)) {
 		dwc2_hcd_qh_unlink(hsotg, qh);
 		if (!list_empty(&qh->qtd_list))
-			/* Add back to inactive non-periodic schedule */
+			/* Add back to inactive/waiting non-periodic schedule */
 			dwc2_hcd_qh_add(hsotg, qh);
 		return;
 	}
-- 
2.15.0.rc2.357.g7e34df9404-goog

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v3 2/2] usb: dwc2: host: Convert hcd_queue to timer_setup
  2017-10-30 17:08 [PATCH v3 1/2] usb: dwc2: host: Don't retry NAKed transactions right away Douglas Anderson
@ 2017-10-30 17:08 ` Douglas Anderson
  2017-10-30 21:25     ` Kees Cook
  2017-12-05 16:18 ` [PATCH v3 1/2] usb: dwc2: host: Don't retry NAKed transactions right away Stefan Wahren
  2017-12-12 11:06   ` Felipe Balbi
  2 siblings, 1 reply; 11+ messages in thread
From: Douglas Anderson @ 2017-10-30 17:08 UTC (permalink / raw)
  To: balbi, johnyoun
  Cc: stefan.wahren, amstan, linux-rockchip, gregkh, johan, eric, mka,
	john.stultz, linux-rpi-kernel, jwerner, Douglas Anderson,
	Kees Cook, linux-usb, linux-kernel

Convert the timers in hcd_queue to use the new timer_setup() call
introduced in commit 686fef928bba ("timer: Prepare to change timer
callback argument type").

Suggested-by: Stefan Wahren <stefan.wahren@i2se.com>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Cc: Kees Cook <keescook@chromium.org>
---

Changes in v3:
- Convert hcd_queue to timer_setup new for v3

 drivers/usb/dwc2/hcd_queue.c | 17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/drivers/usb/dwc2/hcd_queue.c b/drivers/usb/dwc2/hcd_queue.c
index bea0aadd756e..6f5b5c8b0467 100644
--- a/drivers/usb/dwc2/hcd_queue.c
+++ b/drivers/usb/dwc2/hcd_queue.c
@@ -1275,11 +1275,11 @@ static void dwc2_do_unreserve(struct dwc2_hsotg *hsotg, struct dwc2_qh *qh)
  * release the reservation.  This worker is called after the appropriate
  * delay.
  *
- * @work: Pointer to a qh unreserve_work.
+ * @t: Pointer to unreserve_timer in a qh.
  */
-static void dwc2_unreserve_timer_fn(unsigned long data)
+static void dwc2_unreserve_timer_fn(struct timer_list *t)
 {
-	struct dwc2_qh *qh = (struct dwc2_qh *)data;
+	struct dwc2_qh *qh = from_timer(qh, t, unreserve_timer);
 	struct dwc2_hsotg *hsotg = qh->hsotg;
 	unsigned long flags;
 
@@ -1461,11 +1461,11 @@ static void dwc2_deschedule_periodic(struct dwc2_hsotg *hsotg,
  * to retry some time later.  This function handles that timer and moves the
  * qh back to the "inactive" list, then queues transactions.
  *
- * @data: Pointer to a qh to re-schedule.
+ * @t: Pointer to wait_timer in a qh.
  */
-static void dwc2_wait_timer_fn(unsigned long data)
+static void dwc2_wait_timer_fn(struct timer_list *t)
 {
-	struct dwc2_qh *qh = (struct dwc2_qh *)data;
+	struct dwc2_qh *qh = from_timer(qh, t, wait_timer);
 	struct dwc2_hsotg *hsotg = qh->hsotg;
 	unsigned long flags;
 
@@ -1518,9 +1518,8 @@ static void dwc2_qh_init(struct dwc2_hsotg *hsotg, struct dwc2_qh *qh,
 
 	/* Initialize QH */
 	qh->hsotg = hsotg;
-	setup_timer(&qh->unreserve_timer, dwc2_unreserve_timer_fn,
-		    (unsigned long)qh);
-	setup_timer(&qh->wait_timer, dwc2_wait_timer_fn, (unsigned long)qh);
+	timer_setup(&qh->unreserve_timer, dwc2_unreserve_timer_fn, 0);
+	timer_setup(&qh->wait_timer, dwc2_wait_timer_fn, 0);
 	qh->ep_type = ep_type;
 	qh->ep_is_in = ep_is_in;
 
-- 
2.15.0.rc2.357.g7e34df9404-goog

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 2/2] usb: dwc2: host: Convert hcd_queue to timer_setup
@ 2017-10-30 21:25     ` Kees Cook
  0 siblings, 0 replies; 11+ messages in thread
From: Kees Cook @ 2017-10-30 21:25 UTC (permalink / raw)
  To: Douglas Anderson
  Cc: Felipe Balbi, johnyoun, Stefan Wahren, amstan, linux-rockchip,
	Johan Hovold, Greg KH, Eric Anholt, Matthias Kaehlcke,
	John Stultz, linux-rpi-kernel, jwerner, linux-usb, LKML

On Mon, Oct 30, 2017 at 10:08 AM, Douglas Anderson
<dianders@chromium.org> wrote:
> Convert the timers in hcd_queue to use the new timer_setup() call
> introduced in commit 686fef928bba ("timer: Prepare to change timer
> callback argument type").
>
> Suggested-by: Stefan Wahren <stefan.wahren@i2se.com>
> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> Cc: Kees Cook <keescook@chromium.org>

Acked-by: Kees Cook <keescook@chromium.org>

This matches my automated Coccinelle output for this code. (Actually,
it's better because it fixes the documentation too.) Please feel free
to take the patch into -next in advance of the global replacement.

Thanks!

-Kees

> ---
>
> Changes in v3:
> - Convert hcd_queue to timer_setup new for v3
>
>  drivers/usb/dwc2/hcd_queue.c | 17 ++++++++---------
>  1 file changed, 8 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/usb/dwc2/hcd_queue.c b/drivers/usb/dwc2/hcd_queue.c
> index bea0aadd756e..6f5b5c8b0467 100644
> --- a/drivers/usb/dwc2/hcd_queue.c
> +++ b/drivers/usb/dwc2/hcd_queue.c
> @@ -1275,11 +1275,11 @@ static void dwc2_do_unreserve(struct dwc2_hsotg *hsotg, struct dwc2_qh *qh)
>   * release the reservation.  This worker is called after the appropriate
>   * delay.
>   *
> - * @work: Pointer to a qh unreserve_work.
> + * @t: Pointer to unreserve_timer in a qh.
>   */
> -static void dwc2_unreserve_timer_fn(unsigned long data)
> +static void dwc2_unreserve_timer_fn(struct timer_list *t)
>  {
> -       struct dwc2_qh *qh = (struct dwc2_qh *)data;
> +       struct dwc2_qh *qh = from_timer(qh, t, unreserve_timer);
>         struct dwc2_hsotg *hsotg = qh->hsotg;
>         unsigned long flags;
>
> @@ -1461,11 +1461,11 @@ static void dwc2_deschedule_periodic(struct dwc2_hsotg *hsotg,
>   * to retry some time later.  This function handles that timer and moves the
>   * qh back to the "inactive" list, then queues transactions.
>   *
> - * @data: Pointer to a qh to re-schedule.
> + * @t: Pointer to wait_timer in a qh.
>   */
> -static void dwc2_wait_timer_fn(unsigned long data)
> +static void dwc2_wait_timer_fn(struct timer_list *t)
>  {
> -       struct dwc2_qh *qh = (struct dwc2_qh *)data;
> +       struct dwc2_qh *qh = from_timer(qh, t, wait_timer);
>         struct dwc2_hsotg *hsotg = qh->hsotg;
>         unsigned long flags;
>
> @@ -1518,9 +1518,8 @@ static void dwc2_qh_init(struct dwc2_hsotg *hsotg, struct dwc2_qh *qh,
>
>         /* Initialize QH */
>         qh->hsotg = hsotg;
> -       setup_timer(&qh->unreserve_timer, dwc2_unreserve_timer_fn,
> -                   (unsigned long)qh);
> -       setup_timer(&qh->wait_timer, dwc2_wait_timer_fn, (unsigned long)qh);
> +       timer_setup(&qh->unreserve_timer, dwc2_unreserve_timer_fn, 0);
> +       timer_setup(&qh->wait_timer, dwc2_wait_timer_fn, 0);
>         qh->ep_type = ep_type;
>         qh->ep_is_in = ep_is_in;
>
> --
> 2.15.0.rc2.357.g7e34df9404-goog
>



-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 2/2] usb: dwc2: host: Convert hcd_queue to timer_setup
@ 2017-10-30 21:25     ` Kees Cook
  0 siblings, 0 replies; 11+ messages in thread
From: Kees Cook @ 2017-10-30 21:25 UTC (permalink / raw)
  To: Douglas Anderson
  Cc: Felipe Balbi, johnyoun-HKixBCOQz3hWk0Htik3J/w, Stefan Wahren,
	amstan-F7+t8E8rja9g9hUCZPvPmw,
	linux-rockchip-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Johan Hovold,
	Greg KH, Eric Anholt, Matthias Kaehlcke, John Stultz,
	linux-rpi-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	jwerner-F7+t8E8rja9g9hUCZPvPmw, linux-usb-u79uwXL29TY76Z2rM5mHXA,
	LKML

On Mon, Oct 30, 2017 at 10:08 AM, Douglas Anderson
<dianders-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org> wrote:
> Convert the timers in hcd_queue to use the new timer_setup() call
> introduced in commit 686fef928bba ("timer: Prepare to change timer
> callback argument type").
>
> Suggested-by: Stefan Wahren <stefan.wahren-eS4NqCHxEME@public.gmane.org>
> Signed-off-by: Douglas Anderson <dianders-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
> Cc: Kees Cook <keescook-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>

Acked-by: Kees Cook <keescook-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>

This matches my automated Coccinelle output for this code. (Actually,
it's better because it fixes the documentation too.) Please feel free
to take the patch into -next in advance of the global replacement.

Thanks!

-Kees

> ---
>
> Changes in v3:
> - Convert hcd_queue to timer_setup new for v3
>
>  drivers/usb/dwc2/hcd_queue.c | 17 ++++++++---------
>  1 file changed, 8 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/usb/dwc2/hcd_queue.c b/drivers/usb/dwc2/hcd_queue.c
> index bea0aadd756e..6f5b5c8b0467 100644
> --- a/drivers/usb/dwc2/hcd_queue.c
> +++ b/drivers/usb/dwc2/hcd_queue.c
> @@ -1275,11 +1275,11 @@ static void dwc2_do_unreserve(struct dwc2_hsotg *hsotg, struct dwc2_qh *qh)
>   * release the reservation.  This worker is called after the appropriate
>   * delay.
>   *
> - * @work: Pointer to a qh unreserve_work.
> + * @t: Pointer to unreserve_timer in a qh.
>   */
> -static void dwc2_unreserve_timer_fn(unsigned long data)
> +static void dwc2_unreserve_timer_fn(struct timer_list *t)
>  {
> -       struct dwc2_qh *qh = (struct dwc2_qh *)data;
> +       struct dwc2_qh *qh = from_timer(qh, t, unreserve_timer);
>         struct dwc2_hsotg *hsotg = qh->hsotg;
>         unsigned long flags;
>
> @@ -1461,11 +1461,11 @@ static void dwc2_deschedule_periodic(struct dwc2_hsotg *hsotg,
>   * to retry some time later.  This function handles that timer and moves the
>   * qh back to the "inactive" list, then queues transactions.
>   *
> - * @data: Pointer to a qh to re-schedule.
> + * @t: Pointer to wait_timer in a qh.
>   */
> -static void dwc2_wait_timer_fn(unsigned long data)
> +static void dwc2_wait_timer_fn(struct timer_list *t)
>  {
> -       struct dwc2_qh *qh = (struct dwc2_qh *)data;
> +       struct dwc2_qh *qh = from_timer(qh, t, wait_timer);
>         struct dwc2_hsotg *hsotg = qh->hsotg;
>         unsigned long flags;
>
> @@ -1518,9 +1518,8 @@ static void dwc2_qh_init(struct dwc2_hsotg *hsotg, struct dwc2_qh *qh,
>
>         /* Initialize QH */
>         qh->hsotg = hsotg;
> -       setup_timer(&qh->unreserve_timer, dwc2_unreserve_timer_fn,
> -                   (unsigned long)qh);
> -       setup_timer(&qh->wait_timer, dwc2_wait_timer_fn, (unsigned long)qh);
> +       timer_setup(&qh->unreserve_timer, dwc2_unreserve_timer_fn, 0);
> +       timer_setup(&qh->wait_timer, dwc2_wait_timer_fn, 0);
>         qh->ep_type = ep_type;
>         qh->ep_is_in = ep_is_in;
>
> --
> 2.15.0.rc2.357.g7e34df9404-goog
>



-- 
Kees Cook
Pixel Security
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 1/2] usb: dwc2: host: Don't retry NAKed transactions right away
  2017-10-30 17:08 [PATCH v3 1/2] usb: dwc2: host: Don't retry NAKed transactions right away Douglas Anderson
  2017-10-30 17:08 ` [PATCH v3 2/2] usb: dwc2: host: Convert hcd_queue to timer_setup Douglas Anderson
@ 2017-12-05 16:18 ` Stefan Wahren
  2017-12-06  6:06     ` John Youn
  2017-12-12 11:06   ` Felipe Balbi
  2 siblings, 1 reply; 11+ messages in thread
From: Stefan Wahren @ 2017-12-05 16:18 UTC (permalink / raw)
  To: balbi, johnyoun
  Cc: Douglas Anderson, amstan, linux-rockchip, gregkh, johan, eric,
	mka, john.stultz, linux-rpi-kernel, jwerner, stable, linux-usb,
	linux-kernel

Hi Felipe,
Hi John,


Am 30.10.2017 um 18:08 schrieb Douglas Anderson:
> On rk3288-veyron devices on Chrome OS it was found that plugging in an
> Arduino-based USB device could cause the system to lockup, especially
> if the CPU Frequency was at one of the slower operating points (like
> 100 MHz / 200 MHz).
>
> Upon tracing, I found that the following was happening:
> * The USB device (full speed) was connected to a high speed hub and
>    then to the rk3288.  Thus, we were dealing with split transactions,
>    which is all handled in software on dwc2.
> * Userspace was initiating a BULK IN transfer
> * When we sent the SSPLIT (to start the split transaction), we got an
>    ACK.  Good.  Then we issued the CSPLIT.
> * When we sent the CSPLIT, we got back a NAK.  We immediately (from
>    the interrupt handler) started to retry and sent another SSPLIT.
> * The device kept NAKing our CSPLIT, so we kept ping-ponging between
>    sending a SSPLIT and a CSPLIT, each time sending from the interrupt
>    handler.
> * The handling of the interrupts was (because of the low CPU speed and
>    the inefficiency of the dwc2 interrupt handler) was actually taking
>    _longer_ than it took the other side to send the ACK/NAK.  Thus we
>    were _always_ in the USB interrupt routine.
> * The fact that USB interrupts were always going off was preventing
>    other things from happening in the system.  This included preventing
>    the system from being able to transition to a higher CPU frequency.
>
> As I understand it, there is no requirement to retry super quickly
> after a NAK, we just have to retry sometime in the future.  Thus one
> solution to the above is to just add a delay between getting a NAK and
> retrying the transmission.  If this delay is sufficiently long to get
> out of the interrupt routine then the rest of the system will be able
> to make forward progress.  Even a 25 us delay would probably be
> enough, but we'll be extra conservative and try to delay 1 ms (the
> exact amount depends on HZ and the accuracy of the jiffy and how close
> the current jiffy is to ticking, but could be as much as 20 ms or as
> little as 1 ms).
>
> Presumably adding a delay like this could impact the USB throughput,
> so we only add the delay with repeated NAKs.
>
> NOTE: Upon further testing of a pl2303 serial adapter, I found that
> this fix may help with problems there.  Specifically I found that the
> pl2303 serial adapters tend to respond with a NAK when they have
> nothing to say and thus we end with this same sequence.
>
> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> Cc: stable@vger.kernel.org
> Reviewed-by: Julius Werner <jwerner@chromium.org>
> Tested-by: Stefan Wahren <stefan.wahren@i2se.com>
> ---
>
> Changes in v3:
> - Add tested-by for Stefan Wahren
> - Sent to Felipe Balbi as candiate to land this.
> - Add Cc for stable (it's always been broken so go as far is as easy)
>
> Changes in v2:
> - Address http://crosreview.com/737520 feedback
>

does it need a resend?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 1/2] usb: dwc2: host: Don't retry NAKed transactions right away
  2017-12-05 16:18 ` [PATCH v3 1/2] usb: dwc2: host: Don't retry NAKed transactions right away Stefan Wahren
  2017-12-06  6:06     ` John Youn
@ 2017-12-06  6:06     ` John Youn
  0 siblings, 0 replies; 11+ messages in thread
From: John Youn @ 2017-12-06  6:06 UTC (permalink / raw)
  To: Stefan Wahren, balbi, John.Youn
  Cc: Douglas Anderson, amstan, linux-rockchip, gregkh, johan, eric,
	mka, john.stultz, linux-rpi-kernel, jwerner, stable, linux-usb,
	linux-kernel

On 12/05/2017 08:18 AM, Stefan Wahren wrote:
> Hi Felipe,
> Hi John,
>
>
> Am 30.10.2017 um 18:08 schrieb Douglas Anderson:
>> On rk3288-veyron devices on Chrome OS it was found that plugging in an
>> Arduino-based USB device could cause the system to lockup, especially
>> if the CPU Frequency was at one of the slower operating points (like
>> 100 MHz / 200 MHz).
>>
>> Upon tracing, I found that the following was happening:
>> * The USB device (full speed) was connected to a high speed hub and
>>    then to the rk3288.  Thus, we were dealing with split transactions,
>>    which is all handled in software on dwc2.
>> * Userspace was initiating a BULK IN transfer
>> * When we sent the SSPLIT (to start the split transaction), we got an
>>    ACK.  Good.  Then we issued the CSPLIT.
>> * When we sent the CSPLIT, we got back a NAK.  We immediately (from
>>    the interrupt handler) started to retry and sent another SSPLIT.
>> * The device kept NAKing our CSPLIT, so we kept ping-ponging between
>>    sending a SSPLIT and a CSPLIT, each time sending from the interrupt
>>    handler.
>> * The handling of the interrupts was (because of the low CPU speed and
>>    the inefficiency of the dwc2 interrupt handler) was actually taking
>>    _longer_ than it took the other side to send the ACK/NAK.  Thus we
>>    were _always_ in the USB interrupt routine.
>> * The fact that USB interrupts were always going off was preventing
>>    other things from happening in the system.  This included preventing
>>    the system from being able to transition to a higher CPU frequency.
>>
>> As I understand it, there is no requirement to retry super quickly
>> after a NAK, we just have to retry sometime in the future.  Thus one
>> solution to the above is to just add a delay between getting a NAK and
>> retrying the transmission.  If this delay is sufficiently long to get
>> out of the interrupt routine then the rest of the system will be able
>> to make forward progress.  Even a 25 us delay would probably be
>> enough, but we'll be extra conservative and try to delay 1 ms (the
>> exact amount depends on HZ and the accuracy of the jiffy and how close
>> the current jiffy is to ticking, but could be as much as 20 ms or as
>> little as 1 ms).
>>
>> Presumably adding a delay like this could impact the USB throughput,
>> so we only add the delay with repeated NAKs.
>>
>> NOTE: Upon further testing of a pl2303 serial adapter, I found that
>> this fix may help with problems there.  Specifically I found that the
>> pl2303 serial adapters tend to respond with a NAK when they have
>> nothing to say and thus we end with this same sequence.
>>
>> Signed-off-by: Douglas Anderson <dianders@chromium.org>
>> Cc: stable@vger.kernel.org
>> Reviewed-by: Julius Werner <jwerner@chromium.org>
>> Tested-by: Stefan Wahren <stefan.wahren@i2se.com>
>> ---
>>
>> Changes in v3:
>> - Add tested-by for Stefan Wahren
>> - Sent to Felipe Balbi as candiate to land this.
>> - Add Cc for stable (it's always been broken so go as far is as easy)
>>
>> Changes in v2:
>> - Address https://urldefense.proofpoint.com/v2/url?u=http-3A__crosreview.com_737520&d=DwICaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=U3o8uKoKhWme5_V9D-eeCkB11BFwt4KvWztBgdE9ZpA&m=Y_xpJ6Ks0XAK5_bQgmeQEvgKThZtPBQJ3cejNCGfEvM&s=olyPwyYvn_072esVwYxrCduKOKKJPUgc1YHX-CNhM1s&e= feedback
>>
>
> does it need a resend?
>

You can add my acked-by:

Acked-by: John Youn <johnyoun@synopsys.com>

Regards,
John

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 1/2] usb: dwc2: host: Don't retry NAKed transactions right away
@ 2017-12-06  6:06     ` John Youn
  0 siblings, 0 replies; 11+ messages in thread
From: John Youn @ 2017-12-06  6:06 UTC (permalink / raw)
  To: Stefan Wahren, balbi, John.Youn
  Cc: Douglas Anderson, amstan, linux-rockchip, gregkh, johan, eric,
	mka, john.stultz, linux-rpi-kernel, jwerner, stable, linux-usb,
	linux-kernel

On 12/05/2017 08:18 AM, Stefan Wahren wrote:
> Hi Felipe,
> Hi John,
>
>
> Am 30.10.2017 um 18:08 schrieb Douglas Anderson:
>> On rk3288-veyron devices on Chrome OS it was found that plugging in an
>> Arduino-based USB device could cause the system to lockup, especially
>> if the CPU Frequency was at one of the slower operating points (like
>> 100 MHz / 200 MHz).
>>
>> Upon tracing, I found that the following was happening:
>> * The USB device (full speed) was connected to a high speed hub and
>>    then to the rk3288.  Thus, we were dealing with split transactions,
>>    which is all handled in software on dwc2.
>> * Userspace was initiating a BULK IN transfer
>> * When we sent the SSPLIT (to start the split transaction), we got an
>>    ACK.  Good.  Then we issued the CSPLIT.
>> * When we sent the CSPLIT, we got back a NAK.  We immediately (from
>>    the interrupt handler) started to retry and sent another SSPLIT.
>> * The device kept NAKing our CSPLIT, so we kept ping-ponging between
>>    sending a SSPLIT and a CSPLIT, each time sending from the interrupt
>>    handler.
>> * The handling of the interrupts was (because of the low CPU speed and
>>    the inefficiency of the dwc2 interrupt handler) was actually taking
>>    _longer_ than it took the other side to send the ACK/NAK.  Thus we
>>    were _always_ in the USB interrupt routine.
>> * The fact that USB interrupts were always going off was preventing
>>    other things from happening in the system.  This included preventing
>>    the system from being able to transition to a higher CPU frequency.
>>
>> As I understand it, there is no requirement to retry super quickly
>> after a NAK, we just have to retry sometime in the future.  Thus one
>> solution to the above is to just add a delay between getting a NAK and
>> retrying the transmission.  If this delay is sufficiently long to get
>> out of the interrupt routine then the rest of the system will be able
>> to make forward progress.  Even a 25 us delay would probably be
>> enough, but we'll be extra conservative and try to delay 1 ms (the
>> exact amount depends on HZ and the accuracy of the jiffy and how close
>> the current jiffy is to ticking, but could be as much as 20 ms or as
>> little as 1 ms).
>>
>> Presumably adding a delay like this could impact the USB throughput,
>> so we only add the delay with repeated NAKs.
>>
>> NOTE: Upon further testing of a pl2303 serial adapter, I found that
>> this fix may help with problems there.  Specifically I found that the
>> pl2303 serial adapters tend to respond with a NAK when they have
>> nothing to say and thus we end with this same sequence.
>>
>> Signed-off-by: Douglas Anderson <dianders@chromium.org>
>> Cc: stable@vger.kernel.org
>> Reviewed-by: Julius Werner <jwerner@chromium.org>
>> Tested-by: Stefan Wahren <stefan.wahren@i2se.com>
>> ---
>>
>> Changes in v3:
>> - Add tested-by for Stefan Wahren
>> - Sent to Felipe Balbi as candiate to land this.
>> - Add Cc for stable (it's always been broken so go as far is as easy)
>>
>> Changes in v2:
>> - Address https://urldefense.proofpoint.com/v2/url?u=http-3A__crosreview.com_737520&d=DwICaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=U3o8uKoKhWme5_V9D-eeCkB11BFwt4KvWztBgdE9ZpA&m=Y_xpJ6Ks0XAK5_bQgmeQEvgKThZtPBQJ3cejNCGfEvM&s=olyPwyYvn_072esVwYxrCduKOKKJPUgc1YHX-CNhM1s&e= feedback
>>
>
> does it need a resend?
>

You can add my acked-by:

Acked-by: John Youn <johnyoun@synopsys.com>

Regards,
John

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 1/2] usb: dwc2: host: Don't retry NAKed transactions right away
@ 2017-12-06  6:06     ` John Youn
  0 siblings, 0 replies; 11+ messages in thread
From: John Youn @ 2017-12-06  6:06 UTC (permalink / raw)
  To: Stefan Wahren, balbi-DgEjT+Ai2ygdnm+yROfE0A,
	John.Youn-HKixBCOQz3hWk0Htik3J/w
  Cc: Douglas Anderson, amstan-F7+t8E8rja9g9hUCZPvPmw,
	linux-rockchip-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r,
	johan-DgEjT+Ai2ygdnm+yROfE0A, eric-WhKQ6XTQaPysTnJN9+BGXg,
	mka-F7+t8E8rja9g9hUCZPvPmw, john.stultz-QSEj5FYQhm4dnm+yROfE0A,
	linux-rpi-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	jwerner-F7+t8E8rja9g9hUCZPvPmw, stable-u79uwXL29TY76Z2rM5mHXA,
	linux-usb-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On 12/05/2017 08:18 AM, Stefan Wahren wrote:
> Hi Felipe,
> Hi John,
>
>
> Am 30.10.2017 um 18:08 schrieb Douglas Anderson:
>> On rk3288-veyron devices on Chrome OS it was found that plugging in an
>> Arduino-based USB device could cause the system to lockup, especially
>> if the CPU Frequency was at one of the slower operating points (like
>> 100 MHz / 200 MHz).
>>
>> Upon tracing, I found that the following was happening:
>> * The USB device (full speed) was connected to a high speed hub and
>>    then to the rk3288.  Thus, we were dealing with split transactions,
>>    which is all handled in software on dwc2.
>> * Userspace was initiating a BULK IN transfer
>> * When we sent the SSPLIT (to start the split transaction), we got an
>>    ACK.  Good.  Then we issued the CSPLIT.
>> * When we sent the CSPLIT, we got back a NAK.  We immediately (from
>>    the interrupt handler) started to retry and sent another SSPLIT.
>> * The device kept NAKing our CSPLIT, so we kept ping-ponging between
>>    sending a SSPLIT and a CSPLIT, each time sending from the interrupt
>>    handler.
>> * The handling of the interrupts was (because of the low CPU speed and
>>    the inefficiency of the dwc2 interrupt handler) was actually taking
>>    _longer_ than it took the other side to send the ACK/NAK.  Thus we
>>    were _always_ in the USB interrupt routine.
>> * The fact that USB interrupts were always going off was preventing
>>    other things from happening in the system.  This included preventing
>>    the system from being able to transition to a higher CPU frequency.
>>
>> As I understand it, there is no requirement to retry super quickly
>> after a NAK, we just have to retry sometime in the future.  Thus one
>> solution to the above is to just add a delay between getting a NAK and
>> retrying the transmission.  If this delay is sufficiently long to get
>> out of the interrupt routine then the rest of the system will be able
>> to make forward progress.  Even a 25 us delay would probably be
>> enough, but we'll be extra conservative and try to delay 1 ms (the
>> exact amount depends on HZ and the accuracy of the jiffy and how close
>> the current jiffy is to ticking, but could be as much as 20 ms or as
>> little as 1 ms).
>>
>> Presumably adding a delay like this could impact the USB throughput,
>> so we only add the delay with repeated NAKs.
>>
>> NOTE: Upon further testing of a pl2303 serial adapter, I found that
>> this fix may help with problems there.  Specifically I found that the
>> pl2303 serial adapters tend to respond with a NAK when they have
>> nothing to say and thus we end with this same sequence.
>>
>> Signed-off-by: Douglas Anderson <dianders-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
>> Cc: stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> Reviewed-by: Julius Werner <jwerner-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
>> Tested-by: Stefan Wahren <stefan.wahren-eS4NqCHxEME@public.gmane.org>
>> ---
>>
>> Changes in v3:
>> - Add tested-by for Stefan Wahren
>> - Sent to Felipe Balbi as candiate to land this.
>> - Add Cc for stable (it's always been broken so go as far is as easy)
>>
>> Changes in v2:
>> - Address https://urldefense.proofpoint.com/v2/url?u=http-3A__crosreview.com_737520&d=DwICaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=U3o8uKoKhWme5_V9D-eeCkB11BFwt4KvWztBgdE9ZpA&m=Y_xpJ6Ks0XAK5_bQgmeQEvgKThZtPBQJ3cejNCGfEvM&s=olyPwyYvn_072esVwYxrCduKOKKJPUgc1YHX-CNhM1s&e= feedback
>>
>
> does it need a resend?
>

You can add my acked-by:

Acked-by: John Youn <johnyoun-HKixBCOQz3hWk0Htik3J/w@public.gmane.org>

Regards,
John

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 1/2] usb: dwc2: host: Don't retry NAKed transactions right away
  2017-10-30 17:08 [PATCH v3 1/2] usb: dwc2: host: Don't retry NAKed transactions right away Douglas Anderson
@ 2017-12-12 11:06   ` Felipe Balbi
  2017-12-05 16:18 ` [PATCH v3 1/2] usb: dwc2: host: Don't retry NAKed transactions right away Stefan Wahren
  2017-12-12 11:06   ` Felipe Balbi
  2 siblings, 0 replies; 11+ messages in thread
From: Felipe Balbi @ 2017-12-12 11:06 UTC (permalink / raw)
  To: Douglas Anderson, johnyoun
  Cc: stefan.wahren, amstan, linux-rockchip, gregkh, johan, eric, mka,
	john.stultz, linux-rpi-kernel, jwerner, Douglas Anderson, stable,
	linux-usb, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2960 bytes --]


Hi,

Douglas Anderson <dianders@chromium.org> writes:
> On rk3288-veyron devices on Chrome OS it was found that plugging in an
> Arduino-based USB device could cause the system to lockup, especially
> if the CPU Frequency was at one of the slower operating points (like
> 100 MHz / 200 MHz).
>
> Upon tracing, I found that the following was happening:
> * The USB device (full speed) was connected to a high speed hub and
>   then to the rk3288.  Thus, we were dealing with split transactions,
>   which is all handled in software on dwc2.
> * Userspace was initiating a BULK IN transfer
> * When we sent the SSPLIT (to start the split transaction), we got an
>   ACK.  Good.  Then we issued the CSPLIT.
> * When we sent the CSPLIT, we got back a NAK.  We immediately (from
>   the interrupt handler) started to retry and sent another SSPLIT.
> * The device kept NAKing our CSPLIT, so we kept ping-ponging between
>   sending a SSPLIT and a CSPLIT, each time sending from the interrupt
>   handler.
> * The handling of the interrupts was (because of the low CPU speed and
>   the inefficiency of the dwc2 interrupt handler) was actually taking
>   _longer_ than it took the other side to send the ACK/NAK.  Thus we
>   were _always_ in the USB interrupt routine.
> * The fact that USB interrupts were always going off was preventing
>   other things from happening in the system.  This included preventing
>   the system from being able to transition to a higher CPU frequency.
>
> As I understand it, there is no requirement to retry super quickly
> after a NAK, we just have to retry sometime in the future.  Thus one
> solution to the above is to just add a delay between getting a NAK and
> retrying the transmission.  If this delay is sufficiently long to get
> out of the interrupt routine then the rest of the system will be able
> to make forward progress.  Even a 25 us delay would probably be
> enough, but we'll be extra conservative and try to delay 1 ms (the
> exact amount depends on HZ and the accuracy of the jiffy and how close
> the current jiffy is to ticking, but could be as much as 20 ms or as
> little as 1 ms).
>
> Presumably adding a delay like this could impact the USB throughput,
> so we only add the delay with repeated NAKs.
>
> NOTE: Upon further testing of a pl2303 serial adapter, I found that
> this fix may help with problems there.  Specifically I found that the
> pl2303 serial adapters tend to respond with a NAK when they have
> nothing to say and thus we end with this same sequence.
>
> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> Cc: stable@vger.kernel.org
> Reviewed-by: Julius Werner <jwerner@chromium.org>
> Tested-by: Stefan Wahren <stefan.wahren@i2se.com>

This seems too big for -rc or -stable inclusion. In any case, this
doesn't apply to my testing/next branch. Care to rebase and collect acks
you received while doing that?

-- 
balbi

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 1/2] usb: dwc2: host: Don't retry NAKed transactions right away
@ 2017-12-12 11:06   ` Felipe Balbi
  0 siblings, 0 replies; 11+ messages in thread
From: Felipe Balbi @ 2017-12-12 11:06 UTC (permalink / raw)
  To: johnyoun
  Cc: stefan.wahren, amstan, linux-rockchip, gregkh, johan, eric, mka,
	john.stultz, linux-rpi-kernel, jwerner, Douglas Anderson, stable,
	linux-usb, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2960 bytes --]


Hi,

Douglas Anderson <dianders@chromium.org> writes:
> On rk3288-veyron devices on Chrome OS it was found that plugging in an
> Arduino-based USB device could cause the system to lockup, especially
> if the CPU Frequency was at one of the slower operating points (like
> 100 MHz / 200 MHz).
>
> Upon tracing, I found that the following was happening:
> * The USB device (full speed) was connected to a high speed hub and
>   then to the rk3288.  Thus, we were dealing with split transactions,
>   which is all handled in software on dwc2.
> * Userspace was initiating a BULK IN transfer
> * When we sent the SSPLIT (to start the split transaction), we got an
>   ACK.  Good.  Then we issued the CSPLIT.
> * When we sent the CSPLIT, we got back a NAK.  We immediately (from
>   the interrupt handler) started to retry and sent another SSPLIT.
> * The device kept NAKing our CSPLIT, so we kept ping-ponging between
>   sending a SSPLIT and a CSPLIT, each time sending from the interrupt
>   handler.
> * The handling of the interrupts was (because of the low CPU speed and
>   the inefficiency of the dwc2 interrupt handler) was actually taking
>   _longer_ than it took the other side to send the ACK/NAK.  Thus we
>   were _always_ in the USB interrupt routine.
> * The fact that USB interrupts were always going off was preventing
>   other things from happening in the system.  This included preventing
>   the system from being able to transition to a higher CPU frequency.
>
> As I understand it, there is no requirement to retry super quickly
> after a NAK, we just have to retry sometime in the future.  Thus one
> solution to the above is to just add a delay between getting a NAK and
> retrying the transmission.  If this delay is sufficiently long to get
> out of the interrupt routine then the rest of the system will be able
> to make forward progress.  Even a 25 us delay would probably be
> enough, but we'll be extra conservative and try to delay 1 ms (the
> exact amount depends on HZ and the accuracy of the jiffy and how close
> the current jiffy is to ticking, but could be as much as 20 ms or as
> little as 1 ms).
>
> Presumably adding a delay like this could impact the USB throughput,
> so we only add the delay with repeated NAKs.
>
> NOTE: Upon further testing of a pl2303 serial adapter, I found that
> this fix may help with problems there.  Specifically I found that the
> pl2303 serial adapters tend to respond with a NAK when they have
> nothing to say and thus we end with this same sequence.
>
> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> Cc: stable@vger.kernel.org
> Reviewed-by: Julius Werner <jwerner@chromium.org>
> Tested-by: Stefan Wahren <stefan.wahren@i2se.com>

This seems too big for -rc or -stable inclusion. In any case, this
doesn't apply to my testing/next branch. Care to rebase and collect acks
you received while doing that?

-- 
balbi

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 1/2] usb: dwc2: host: Don't retry NAKed transactions right away
  2017-12-12 11:06   ` Felipe Balbi
  (?)
@ 2017-12-12 18:31   ` Doug Anderson
  -1 siblings, 0 replies; 11+ messages in thread
From: Doug Anderson @ 2017-12-12 18:31 UTC (permalink / raw)
  To: Felipe Balbi
  Cc: John Youn, Stefan Wahren, Alexandru M Stan,
	open list:ARM/Rockchip SoC...,
	Greg Kroah-Hartman, Johan Hovold, Eric Anholt, Matthias Kaehlcke,
	John Stultz, linux-rpi-kernel, Julius Werner, stable, linux-usb,
	LKML

Hi,

On Tue, Dec 12, 2017 at 3:06 AM, Felipe Balbi <balbi@kernel.org> wrote:
>
> Hi,
>
> Douglas Anderson <dianders@chromium.org> writes:
>> On rk3288-veyron devices on Chrome OS it was found that plugging in an
>> Arduino-based USB device could cause the system to lockup, especially
>> if the CPU Frequency was at one of the slower operating points (like
>> 100 MHz / 200 MHz).
>>
>> Upon tracing, I found that the following was happening:
>> * The USB device (full speed) was connected to a high speed hub and
>>   then to the rk3288.  Thus, we were dealing with split transactions,
>>   which is all handled in software on dwc2.
>> * Userspace was initiating a BULK IN transfer
>> * When we sent the SSPLIT (to start the split transaction), we got an
>>   ACK.  Good.  Then we issued the CSPLIT.
>> * When we sent the CSPLIT, we got back a NAK.  We immediately (from
>>   the interrupt handler) started to retry and sent another SSPLIT.
>> * The device kept NAKing our CSPLIT, so we kept ping-ponging between
>>   sending a SSPLIT and a CSPLIT, each time sending from the interrupt
>>   handler.
>> * The handling of the interrupts was (because of the low CPU speed and
>>   the inefficiency of the dwc2 interrupt handler) was actually taking
>>   _longer_ than it took the other side to send the ACK/NAK.  Thus we
>>   were _always_ in the USB interrupt routine.
>> * The fact that USB interrupts were always going off was preventing
>>   other things from happening in the system.  This included preventing
>>   the system from being able to transition to a higher CPU frequency.
>>
>> As I understand it, there is no requirement to retry super quickly
>> after a NAK, we just have to retry sometime in the future.  Thus one
>> solution to the above is to just add a delay between getting a NAK and
>> retrying the transmission.  If this delay is sufficiently long to get
>> out of the interrupt routine then the rest of the system will be able
>> to make forward progress.  Even a 25 us delay would probably be
>> enough, but we'll be extra conservative and try to delay 1 ms (the
>> exact amount depends on HZ and the accuracy of the jiffy and how close
>> the current jiffy is to ticking, but could be as much as 20 ms or as
>> little as 1 ms).
>>
>> Presumably adding a delay like this could impact the USB throughput,
>> so we only add the delay with repeated NAKs.
>>
>> NOTE: Upon further testing of a pl2303 serial adapter, I found that
>> this fix may help with problems there.  Specifically I found that the
>> pl2303 serial adapters tend to respond with a NAK when they have
>> nothing to say and thus we end with this same sequence.
>>
>> Signed-off-by: Douglas Anderson <dianders@chromium.org>
>> Cc: stable@vger.kernel.org
>> Reviewed-by: Julius Werner <jwerner@chromium.org>
>> Tested-by: Stefan Wahren <stefan.wahren@i2se.com>
>
> This seems too big for -rc or -stable inclusion.

I've removed the stable tag at your request.  I originally added it at
your request in response to v2 of this patch.  I'd agree that it's a
pretty big patch and therefore "risky" to pick back to stable.  ...but
it does fix a bug reported by several people on the mailing lists, so
I'll leave it to your discretion.  Previously in relation to the
stable tag, I had mentioned:

   It's a little weird since it doesn't "fix" any specific
   commit, so I guess it will be up to stable folks to decide how far to
   go back.  The dwc2 devices I work with are actually on 3.14, but we
   have some pretty massive backports related to dwc2 there...

> In any case, this
> doesn't apply to my testing/next branch. Care to rebase and collect acks
> you received while doing that?

Sure, no problem.  I've posted v4 with John Youn's Ack.  The reason v3
didn't apply is that you've now got commit e99e88a9d2b0 ("treewide:
setup_timer() -> timer_setup()").  Originally my plan was to beat that
patch into the kernel and then I'd do the timer conversion myself.
That was patch #2 in the v3 series, AKA
<https://patchwork.kernel.org/patch/10032935/>.  ...but since I failed
to beat Kees' patch in, I've now squashed patches #1 and #2 together
and resolved the trivial conflict.

If anyone were thinking of trying to backport this patch to older
kernels (where they presumably don't have Kees's timer patch) they can
always use the v3 patch posted here as a reference for how to make
things work.  ;)


-Doug

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2017-12-12 18:31 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-30 17:08 [PATCH v3 1/2] usb: dwc2: host: Don't retry NAKed transactions right away Douglas Anderson
2017-10-30 17:08 ` [PATCH v3 2/2] usb: dwc2: host: Convert hcd_queue to timer_setup Douglas Anderson
2017-10-30 21:25   ` Kees Cook
2017-10-30 21:25     ` Kees Cook
2017-12-05 16:18 ` [PATCH v3 1/2] usb: dwc2: host: Don't retry NAKed transactions right away Stefan Wahren
2017-12-06  6:06   ` John Youn
2017-12-06  6:06     ` John Youn
2017-12-06  6:06     ` John Youn
2017-12-12 11:06 ` Felipe Balbi
2017-12-12 11:06   ` Felipe Balbi
2017-12-12 18:31   ` Doug Anderson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.