All of lore.kernel.org
 help / color / mirror / Atom feed
From: dianders@chromium.org (Doug Anderson)
To: linux-arm-kernel@lists.infradead.org
Subject: usb: dwc2: NMI watchdog: BUG: soft lockup - CPU#0 stuck for 146s
Date: Tue, 18 Apr 2017 13:41:30 -0700	[thread overview]
Message-ID: <CAD=FV=VHkJFrsJDmMytbUw=WjKQaGZ-8Ewfy5bPCA82AWS4O2Q@mail.gmail.com> (raw)
In-Reply-To: <186569458.91967.1492547106553@email.1und1.de>

Stefan,

On Tue, Apr 18, 2017 at 1:25 PM, Stefan Wahren <stefan.wahren@i2se.com> wrote:
> Hi,
>
> [add Johan]
>
>> Stefan Wahren <stefan.wahren@i2se.com> hat am 18. April 2017 um 10:07 geschrieben:
>>
>>
>> Am 18.04.2017 um 00:37 schrieb Doug Anderson:
>> > Hi,
>> >
>> > On Mon, Apr 17, 2017 at 4:05 AM, Stefan Wahren <stefan.wahren@i2se.com> wrote:
>> >> Hi,
>> >>
>> >>> Stefan Wahren <stefan.wahren@i2se.com> hat am 31. Oktober 2016 um 21:34 geschrieben:
>> >>>
>> >>>
>> >>> I inspired by this issue [1] i build up a slightly modified setup with a
>> >>> Raspberry Pi B (mainline kernel 4.9rc3), a powered 7 port USB hub and 5 Prolific
>> >>> PL2303 USB to serial convertors. I modified the usb_test for dwc2 [2], which
>> >>> only tries to open all ttyUSB devices one after the other.
>> >>>
>> >>> Unfortunately the complete system stuck after opening the first ttyUSB device (
>> >>> heartbeat LED stop blinking, no reaction to debug UART). The only way to
>> >>> reanimate the system is to powerdown the USB hub with the USB to serial
>> >>> convertors.
>> >>>
>> >>> [1] - https://github.com/raspberrypi/linux/issues/1692
>> >>> [2] - https://gist.github.com/lategoodbye/dd0d30af27b6f101b03d5923b279dbaa
>> >> since this issue still exists with 4.11 (even without or with microframe scheduler enabled), i want to ask some additional questions:
>> >>
>> >> Is this issue reproducible with other dwc2 platforms than bcm2835?
>> > +Edmund Szeto, who I seem to remember emailing me about similar
>> > questions in the past.
>> >
>> >
>> >> Does the soft lockup also occurs after opening the second serial convertor or later?
>> > I don't have serial converters easily available to me, but back in the
>> > day when I was stressing things out on rk3288 I never saw anything
>> > this bad.  ...of course, on rk3288 we've got 4 A17 cores running
>> > really fast, so possibly just being slower is what causes your
>> > problems here?
>>
>> The downstream kernel of the Raspberry Pi foundation with it's
>> out-of-tree dwc_otg driver is able to handle 8 serial converter on a RPI
>> B. I would be happy to get at least 2 or 3 working on mainline.
>>
>> >
>> > I will make the following observations:
>> >
>> > 1. With dwc2 you often end up in the situation where you need to
>> > service an interrupt every 125 uS.  If servicing that interrupt takes
>> > anywhere near 125 uS in the common case then you'll be in trouble.
>>
>> I will try to measure this with a logic analyzer.
>>
>
> i took GPIO17 to measure _dwc2_hcd_irq and GPIO18 to measure _dwc2_hcd_urb_enqueue (patch against 4.11rc1 below).
>
> So i made my observations for 3 test cases:
>
> 1) no serial converter connected (idle)
> 2) 1 FTDI serial converter connected
> 3) 1 PL2303 serial converter connected
>
> case   | ksoftirq cpu     | mean duration | max duration  | max duration | urb_enqueue  |
>        |                  | hcd_irq       | hcd_irq       | urb_enqueue  | within 10 sec|
> -------+------------------+---------------+---------------+--------------+--------------+
> idle   | 0.0%             | 2 us          | 16.5 us       |     12 us    | 5            |
> FTDI   | 25.0%            | 8.5 us        | 18.0 us       |  31000 us    | ~ 400        |
> PL2303 | top doesn't work | 8.5 us        | 22.5 us       | 900000 us    | 4            |

It's hard to know for sure that all of this time is really in
urb_enqueue().  Possible we could have task switched out and been
blocked elsewhere.  Using ftrace to get more fine-grained timings
would be useful.  ktime_get(), ktime_sub(), and ktime_to_us() are your
friends here if you want to use trace_printk.


> So it seems the serial USB driver has also an impact. In the analyzer trace the FTDI triggers many smaller urb_enqueue calls in the opposite to the PL2303 which only has few but huge calls.
>
> Additional notes:
> After closing the serial connection to the FTDI the system is usable as before. In case of PL2303 i need to disconnect the converter in order to get a usable system.
>
> Why do they behave so differently?
> Are these results of a overload?
> Doug, can you point me to your timing patch?

I did in the previous message, but basically take a look at:

https://chromium-review.googlesource.com/c/321980/4/drivers/usb/core/hcd.c#1668

I never posted this upstream since it was just a test patch.

>
> Thanks
> Stefan
>
> PS: My hacky patches for measuring
>
> diff --git a/drivers/usb/dwc2/core.h b/drivers/usb/dwc2/core.h
> index 1a7e830..aa490c3 100644
> --- a/drivers/usb/dwc2/core.h
> +++ b/drivers/usb/dwc2/core.h
> @@ -930,6 +930,9 @@ struct dwc2_hsotg {
>         struct dentry *debug_root;
>         struct debugfs_regset32 *regset;
>
> +       int gpio_17;
> +       int gpio_18;
> +
>         /* DWC OTG HW Release versions */
>  #define DWC2_CORE_REV_2_71a    0x4f54271a
>  #define DWC2_CORE_REV_2_90a    0x4f54290a
> diff --git a/drivers/usb/dwc2/hcd.c b/drivers/usb/dwc2/hcd.c
> index a73722e..97271c8 100644
> --- a/drivers/usb/dwc2/hcd.c
> +++ b/drivers/usb/dwc2/hcd.c
> @@ -45,6 +45,7 @@
>  #include <linux/platform_device.h>
>  #include <linux/dma-mapping.h>
>  #include <linux/delay.h>
> +#include <linux/gpio.h>
>  #include <linux/io.h>
>  #include <linux/slab.h>
>  #include <linux/usb.h>
> @@ -4595,13 +4596,17 @@ static int _dwc2_hcd_urb_enqueue(struct usb_hcd *hcd, struct urb *urb,
>         bool qh_allocated = false;
>         struct dwc2_qtd *qtd;
>
> +       gpio_set_value(hsotg->gpio_18, 1);
> +
>         if (dbg_urb(urb)) {
>                 dev_vdbg(hsotg->dev, "DWC OTG HCD URB Enqueue\n");
>                 dwc2_dump_urb_info(hcd, urb, "urb_enqueue");
>         }
>
> -       if (!ep)
> +       if (!ep) {
> +               gpio_set_value(hsotg->gpio_18, 0);
>                 return -EINVAL;
> +       }
>
>         if (usb_pipetype(urb->pipe) == PIPE_ISOCHRONOUS ||
>             usb_pipetype(urb->pipe) == PIPE_INTERRUPT) {
> @@ -4628,8 +4633,10 @@ static int _dwc2_hcd_urb_enqueue(struct usb_hcd *hcd, struct urb *urb,
>
>         dwc2_urb = dwc2_hcd_urb_alloc(hsotg, urb->number_of_packets,
>                                       mem_flags);
> -       if (!dwc2_urb)
> +       if (!dwc2_urb) {
> +               gpio_set_value(hsotg->gpio_18, 0);
>                 return -ENOMEM;
> +       }
>
>         dwc2_hcd_urb_set_pipeinfo(hsotg, dwc2_urb, usb_pipedevice(urb->pipe),
>                                   usb_pipeendpoint(urb->pipe), ep_type,
> @@ -4705,6 +4712,8 @@ static int _dwc2_hcd_urb_enqueue(struct usb_hcd *hcd, struct urb *urb,
>
>         spin_unlock_irqrestore(&hsotg->lock, flags);
>
> +       gpio_set_value(hsotg->gpio_18, 0);
> +
>         return 0;
>
>  fail3:
> @@ -4732,6 +4741,8 @@ static int _dwc2_hcd_urb_enqueue(struct usb_hcd *hcd, struct urb *urb,
>  fail0:
>         kfree(dwc2_urb);
>
> +       gpio_set_value(hsotg->gpio_18, 0);
> +
>         return retval;
>  }
>
> @@ -4825,8 +4836,15 @@ static void _dwc2_hcd_endpoint_reset(struct usb_hcd *hcd,
>  static irqreturn_t _dwc2_hcd_irq(struct usb_hcd *hcd)
>  {
>         struct dwc2_hsotg *hsotg = dwc2_hcd_to_hsotg(hcd);
> +       irqreturn_t ret;
>
> -       return dwc2_handle_hcd_intr(hsotg);
> +       gpio_set_value(hsotg->gpio_17, 1);
> +
> +       ret = dwc2_handle_hcd_intr(hsotg);
> +
> +       gpio_set_value(hsotg->gpio_17, 0);
> +
> +       return ret;
>  }
>
>  /*
> diff --git a/drivers/usb/dwc2/platform.c b/drivers/usb/dwc2/platform.c
> index 9564bc7..e3f8a8a 100644
> --- a/drivers/usb/dwc2/platform.c
> +++ b/drivers/usb/dwc2/platform.c
> @@ -35,6 +35,7 @@
>   */
>
>  #include <linux/kernel.h>
> +#include <linux/gpio.h>
>  #include <linux/module.h>
>  #include <linux/slab.h>
>  #include <linux/clk.h>
> @@ -404,6 +405,16 @@ static int dwc2_driver_probe(struct platform_device *dev)
>                 return hsotg->irq;
>         }
>
> +       hsotg->gpio_17 = 475;
> +       retval = devm_gpio_request_one(hsotg->dev, hsotg->gpio_17,  GPIOF_OUT_INIT_LOW, "gpio_17");
> +       if (retval)
> +               pr_warn("Unable to request GPIO17: %d\n", retval);
> +
> +       hsotg->gpio_18 = 476;
> +       retval = devm_gpio_request_one(hsotg->dev, hsotg->gpio_18,  GPIOF_OUT_INIT_LOW, "gpio_18");
> +       if (retval)
> +               pr_warn("Unable to request GPIO18: %d\n", retval);
> +
>         dev_dbg(hsotg->dev, "registering common handler for irq%d\n",
>                 hsotg->irq);
>         retval = devm_request_irq(hsotg->dev, hsotg->irq,

  reply	other threads:[~2017-04-18 20:41 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-31 20:34 usb: dwc2: NMI watchdog: BUG: soft lockup - CPU#0 stuck for 146s Stefan Wahren
2016-11-01  8:26 ` Michael Zoran
2017-04-17 11:05 ` Stefan Wahren
2017-04-17 22:37   ` Doug Anderson
2017-04-18  8:07     ` Stefan Wahren
2017-04-18 20:08       ` Doug Anderson
2017-04-18 20:25       ` Stefan Wahren
2017-04-18 20:41         ` Doug Anderson [this message]
2017-04-18 20:53           ` Stefan Wahren
2017-04-19 20:25           ` Stefan Wahren
2017-04-19 21:47             ` Doug Anderson
2017-04-20  7:46               ` Stefan Wahren
2017-04-20 16:19                 ` Doug Anderson
2017-04-20 18:54             ` Eric Anholt
2017-04-20 19:45               ` Doug Anderson
2017-04-20 19:57                 ` Eric Anholt
2017-04-20 20:37                   ` Doug Anderson
2017-04-22 20:50               ` Stefan Wahren
2017-04-25 18:11                 ` Stefan Wahren
2017-05-08 20:22                   ` Stefan Wahren
2017-05-10 16:31                     ` Johan Hovold
2017-05-10 23:50                       ` Doug Anderson
2017-05-13 12:28                         ` Stefan Wahren
2017-10-16 20:49                           ` Julius Werner
2017-10-17  8:52                             ` Johan Hovold
2017-10-25 21:22                             ` Doug Anderson
2017-04-19  8:56         ` Johan Hovold
2017-04-19  9:12           ` Stefan Wahren
2017-04-19  9:55             ` Johan Hovold
2017-04-17 23:45   ` Heiko Stuebner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAD=FV=VHkJFrsJDmMytbUw=WjKQaGZ-8Ewfy5bPCA82AWS4O2Q@mail.gmail.com' \
    --to=dianders@chromium.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.