All of lore.kernel.org
 help / color / mirror / Atom feed
From: Felipe Balbi <balbi@kernel.org>
To: Douglas Anderson <dianders@chromium.org>, johnyoun@synopsys.com
Cc: stefan.wahren@i2se.com, amstan@chromium.org,
	linux-rockchip@lists.infradead.org, gregkh@linuxfoundation.org,
	johan@kernel.org, eric@anholt.net, mka@chromium.org,
	john.stultz@linaro.org, linux-rpi-kernel@lists.infradead.org,
	jwerner@chromium.org, Douglas Anderson <dianders@chromium.org>,
	stable@vger.kernel.org, linux-usb@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3 1/2] usb: dwc2: host: Don't retry NAKed transactions right away
Date: Tue, 12 Dec 2017 13:06:01 +0200	[thread overview]
Message-ID: <87mv2onr3a.fsf@linux.intel.com> (raw)
In-Reply-To: <20171030170802.14489-1-dianders@chromium.org>

[-- Attachment #1: Type: text/plain, Size: 2960 bytes --]


Hi,

Douglas Anderson <dianders@chromium.org> writes:
> On rk3288-veyron devices on Chrome OS it was found that plugging in an
> Arduino-based USB device could cause the system to lockup, especially
> if the CPU Frequency was at one of the slower operating points (like
> 100 MHz / 200 MHz).
>
> Upon tracing, I found that the following was happening:
> * The USB device (full speed) was connected to a high speed hub and
>   then to the rk3288.  Thus, we were dealing with split transactions,
>   which is all handled in software on dwc2.
> * Userspace was initiating a BULK IN transfer
> * When we sent the SSPLIT (to start the split transaction), we got an
>   ACK.  Good.  Then we issued the CSPLIT.
> * When we sent the CSPLIT, we got back a NAK.  We immediately (from
>   the interrupt handler) started to retry and sent another SSPLIT.
> * The device kept NAKing our CSPLIT, so we kept ping-ponging between
>   sending a SSPLIT and a CSPLIT, each time sending from the interrupt
>   handler.
> * The handling of the interrupts was (because of the low CPU speed and
>   the inefficiency of the dwc2 interrupt handler) was actually taking
>   _longer_ than it took the other side to send the ACK/NAK.  Thus we
>   were _always_ in the USB interrupt routine.
> * The fact that USB interrupts were always going off was preventing
>   other things from happening in the system.  This included preventing
>   the system from being able to transition to a higher CPU frequency.
>
> As I understand it, there is no requirement to retry super quickly
> after a NAK, we just have to retry sometime in the future.  Thus one
> solution to the above is to just add a delay between getting a NAK and
> retrying the transmission.  If this delay is sufficiently long to get
> out of the interrupt routine then the rest of the system will be able
> to make forward progress.  Even a 25 us delay would probably be
> enough, but we'll be extra conservative and try to delay 1 ms (the
> exact amount depends on HZ and the accuracy of the jiffy and how close
> the current jiffy is to ticking, but could be as much as 20 ms or as
> little as 1 ms).
>
> Presumably adding a delay like this could impact the USB throughput,
> so we only add the delay with repeated NAKs.
>
> NOTE: Upon further testing of a pl2303 serial adapter, I found that
> this fix may help with problems there.  Specifically I found that the
> pl2303 serial adapters tend to respond with a NAK when they have
> nothing to say and thus we end with this same sequence.
>
> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> Cc: stable@vger.kernel.org
> Reviewed-by: Julius Werner <jwerner@chromium.org>
> Tested-by: Stefan Wahren <stefan.wahren@i2se.com>

This seems too big for -rc or -stable inclusion. In any case, this
doesn't apply to my testing/next branch. Care to rebase and collect acks
you received while doing that?

-- 
balbi

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

WARNING: multiple messages have this Message-ID (diff)
From: Felipe Balbi <balbi@kernel.org>
To: johnyoun@synopsys.com
Cc: stefan.wahren@i2se.com, amstan@chromium.org,
	linux-rockchip@lists.infradead.org, gregkh@linuxfoundation.org,
	johan@kernel.org, eric@anholt.net, mka@chromium.org,
	john.stultz@linaro.org, linux-rpi-kernel@lists.infradead.org,
	jwerner@chromium.org, Douglas Anderson <dianders@chromium.org>,
	stable@vger.kernel.org, linux-usb@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3 1/2] usb: dwc2: host: Don't retry NAKed transactions right away
Date: Tue, 12 Dec 2017 13:06:01 +0200	[thread overview]
Message-ID: <87mv2onr3a.fsf@linux.intel.com> (raw)
In-Reply-To: <20171030170802.14489-1-dianders@chromium.org>

[-- Attachment #1: Type: text/plain, Size: 2960 bytes --]


Hi,

Douglas Anderson <dianders@chromium.org> writes:
> On rk3288-veyron devices on Chrome OS it was found that plugging in an
> Arduino-based USB device could cause the system to lockup, especially
> if the CPU Frequency was at one of the slower operating points (like
> 100 MHz / 200 MHz).
>
> Upon tracing, I found that the following was happening:
> * The USB device (full speed) was connected to a high speed hub and
>   then to the rk3288.  Thus, we were dealing with split transactions,
>   which is all handled in software on dwc2.
> * Userspace was initiating a BULK IN transfer
> * When we sent the SSPLIT (to start the split transaction), we got an
>   ACK.  Good.  Then we issued the CSPLIT.
> * When we sent the CSPLIT, we got back a NAK.  We immediately (from
>   the interrupt handler) started to retry and sent another SSPLIT.
> * The device kept NAKing our CSPLIT, so we kept ping-ponging between
>   sending a SSPLIT and a CSPLIT, each time sending from the interrupt
>   handler.
> * The handling of the interrupts was (because of the low CPU speed and
>   the inefficiency of the dwc2 interrupt handler) was actually taking
>   _longer_ than it took the other side to send the ACK/NAK.  Thus we
>   were _always_ in the USB interrupt routine.
> * The fact that USB interrupts were always going off was preventing
>   other things from happening in the system.  This included preventing
>   the system from being able to transition to a higher CPU frequency.
>
> As I understand it, there is no requirement to retry super quickly
> after a NAK, we just have to retry sometime in the future.  Thus one
> solution to the above is to just add a delay between getting a NAK and
> retrying the transmission.  If this delay is sufficiently long to get
> out of the interrupt routine then the rest of the system will be able
> to make forward progress.  Even a 25 us delay would probably be
> enough, but we'll be extra conservative and try to delay 1 ms (the
> exact amount depends on HZ and the accuracy of the jiffy and how close
> the current jiffy is to ticking, but could be as much as 20 ms or as
> little as 1 ms).
>
> Presumably adding a delay like this could impact the USB throughput,
> so we only add the delay with repeated NAKs.
>
> NOTE: Upon further testing of a pl2303 serial adapter, I found that
> this fix may help with problems there.  Specifically I found that the
> pl2303 serial adapters tend to respond with a NAK when they have
> nothing to say and thus we end with this same sequence.
>
> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> Cc: stable@vger.kernel.org
> Reviewed-by: Julius Werner <jwerner@chromium.org>
> Tested-by: Stefan Wahren <stefan.wahren@i2se.com>

This seems too big for -rc or -stable inclusion. In any case, this
doesn't apply to my testing/next branch. Care to rebase and collect acks
you received while doing that?

-- 
balbi

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

  parent reply	other threads:[~2017-12-12 11:06 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-30 17:08 [PATCH v3 1/2] usb: dwc2: host: Don't retry NAKed transactions right away Douglas Anderson
2017-10-30 17:08 ` [PATCH v3 2/2] usb: dwc2: host: Convert hcd_queue to timer_setup Douglas Anderson
2017-10-30 21:25   ` Kees Cook
2017-10-30 21:25     ` Kees Cook
2017-12-05 16:18 ` [PATCH v3 1/2] usb: dwc2: host: Don't retry NAKed transactions right away Stefan Wahren
2017-12-06  6:06   ` John Youn
2017-12-06  6:06     ` John Youn
2017-12-06  6:06     ` John Youn
2017-12-12 11:06 ` Felipe Balbi [this message]
2017-12-12 11:06   ` Felipe Balbi
2017-12-12 18:31   ` Doug Anderson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87mv2onr3a.fsf@linux.intel.com \
    --to=balbi@kernel.org \
    --cc=amstan@chromium.org \
    --cc=dianders@chromium.org \
    --cc=eric@anholt.net \
    --cc=gregkh@linuxfoundation.org \
    --cc=johan@kernel.org \
    --cc=john.stultz@linaro.org \
    --cc=johnyoun@synopsys.com \
    --cc=jwerner@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rockchip@lists.infradead.org \
    --cc=linux-rpi-kernel@lists.infradead.org \
    --cc=linux-usb@vger.kernel.org \
    --cc=mka@chromium.org \
    --cc=stable@vger.kernel.org \
    --cc=stefan.wahren@i2se.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.