linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Frequent dwc3 crashes on suspend or reboot since 5.0-rc1
@ 2019-02-02  0:18 John Stultz
  2019-02-02  0:29 ` Thinh Nguyen
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: John Stultz @ 2019-02-02  0:18 UTC (permalink / raw)
  To: Felipe Balbi, Zeng Tao, Jack Pham, Thinh Nguyen, Chen Yu
  Cc: lkml, Linux USB List, Greg Kroah-Hartman

Hey all,
  Since the 5.0 merge window opened, I've been tripping on frequent
dwc3 crashes on reboot and suspend, which I've added an example to the
bottom of this mail.

I've dug in a little bit and sort of have a sense of whats going on.

In ffs_epfile_io():
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/usb/gadget/function/f_fs.c#n1065

The completion done is setup on the stack:
  DECLARE_COMPLETION_ONSTACK(done);

Then later we setup a request and queue it:
  req->context  = &done;
  ...
  ret = usb_ep_queue(ep->ep, req, GFP_ATOMIC);

Then wait for it:
  if (unlikely(wait_for_completion_interruptible(&done))) {
    /*
    * To avoid race condition with ffs_epfile_io_complete,
    * dequeue the request first then check
    * status. usb_ep_dequeue API should guarantee no race
    * condition with req->complete callback.
    */
    usb_ep_dequeue(ep->ep, req);
    interrupted = ep->status < 0;
  }

The problem is, that we end up being interrupted, supposedly dequeue
the request, and exit.

But then (or in parallel) the irq triggers and we try calling
complete() on the context pointer which points to now random stack
space, which results in the panic.

It seems like something is wrong with usb_ep_dequeue not really
stopping the irq from happening?

If I revert all the changes to dwc3 back to 4.20, I don't see the issue.

I'll do some bisection to try to narrow things down, but I wanted to
see if this was a known issue or if anyone had immediate ideas as to
what might be wrong.

thanks
-john

[   36.911170] Unable to handle kernel paging request at virtual
address ffffff801153d660
[   36.912769] Unable to handle kernel paging request at virtual
address ffffff800004b564
[   36.919881] Mem abort info:
[   36.919884]   ESR = 0x96000047
[   36.919888]   Exception class = DABT (current EL), IL = 32 bits
[   36.919890]   SET = 0, FnV = 0
[   36.919895]   EA = 0, S1PTW = 0
[   36.927875] Mem abort info:
[   36.935718] Data abort info:
[   36.935721]   ISV = 0, ISS = 0x00000047
[   36.935723]   CM = 0, WnR = 1
[   36.935730] swapper pgtable: 4k pages, 39-bit VAs, pgdp = 00000000f1b819ef
[   36.935733] [ffffff801153d660] pgd=000000021ffff803,
pud=000000021ffff803, pmd=000000021fffb803, pte=0000000000000000
[   36.935744] Internal error: Oops: 96000047 [#1] PREEMPT SMP
[   36.935748] Modules linked in:
[   36.938552]   ESR = 0x86000006
[   36.941601] CPU: 0 PID: 2656 Comm: irq/69-dwc3 Tainted: G S
       4.20.0-10778-gadc8369 #210
[   36.941603] Hardware name: HiKey960 (DT)
[   36.941610] pstate: 00400085 (nzcv daIf +PAN -UAO)
[   36.947554]   Exception class = IABT (current EL), IL = 32 bits
[   36.950594] pc : queued_spin_lock_slowpath+0x1cc/0x2c8
[   36.950601] lr : queued_spin_lock_slowpath+0xd0/0x2c8
[   36.950603] sp : ffffff8011e13be0
[   36.950607] x29: ffffff8011e13be0 x28: 0000000000000000
[   36.950611] x27: ffffff801186d000 x26: ffffff8010159000
[   36.950615] x25: ffffff801186d000 x24: ffffffc218be36e8
[   36.950619] x23: ffffff801186e910 x22: 0000000000040000
[   36.950622] x21: ffffffc21f71b640 x20: ffffff801153d000
[   36.950626] x19: ffffff8011e1bbe8 x18: 0000000000000000
[   36.950629] x17: 00000000100eb564 x16: 00000000100eb564
[   36.950633] x15: 0000000000000000 x14: ffffff801187cf80
[   36.950636] x13: 000000420e1de000 x12: 0000000034d4d91d
[   36.950640] x11: 0000000000000000 x10: 0000000000000a20
[   36.950643] x9 : ffffff8011e13d10 x8 : ffffffc218874c00
[   36.950646] x7 : 0000000000000000 x6 : ffffff801186db08
[   36.950650] x5 : 0000000000000000 x4 : 0000000000000000
[   36.950653] x3 : ffffffc21f71b640 x2 : 0000000000000000
[   36.950656] x1 : ffffff801153d660 x0 : ffffffc21f71b648
[   36.950663] Process irq/69-dwc3 (pid: 2656, stack limit = 0x00000000b627af93)
[   36.950666] Call trace:
[   36.950670]  queued_spin_lock_slowpath+0x1cc/0x2c8
[   36.950681]  _raw_spin_lock_irqsave+0x64/0x78
[   36.950692]  complete+0x28/0x70
[   36.950703]  ffs_epfile_io_complete+0x3c/0x50
[   36.950713]  usb_gadget_giveback_request+0x34/0x108
[   36.950721]  dwc3_gadget_giveback+0x50/0x68
[   36.950723]  dwc3_thread_interrupt+0x358/0x1488
[   36.950731]  irq_thread_fn+0x30/0x88
[   36.950734]  irq_thread+0x114/0x1b0
[   36.950739]  kthread+0x104/0x130
[   36.950747]  ret_from_fork+0x10/0x1c
[   36.950755] Code: 91190281 8b021021 f860dae2 91002060 (f8226823)
[   36.953901]   SET = 0, FnV = 0
[   36.956685] ---[ end trace 3d13dc405c1e8aa7 ]---
[   36.965704] Kernel panic - not syncing: Fatal exception
[   36.966372]   EA = 0, S1PTW = 0
[   36.973246] SMP: stopping secondary CPUs
[   36.983855] Kernel Offset: disabled
[   36.983860] CPU features: 0x002,21882004
[   36.983861] Memory Limit: none
[   37.210976] Rebooting in 5 seconds..

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Frequent dwc3 crashes on suspend or reboot since 5.0-rc1
  2019-02-02  0:18 Frequent dwc3 crashes on suspend or reboot since 5.0-rc1 John Stultz
@ 2019-02-02  0:29 ` Thinh Nguyen
  2019-02-02  0:36   ` John Stultz
  2019-02-02  0:41 ` John Stultz
  2019-02-02 17:00 ` Alan Stern
  2 siblings, 1 reply; 10+ messages in thread
From: Thinh Nguyen @ 2019-02-02  0:29 UTC (permalink / raw)
  To: John Stultz, Felipe Balbi, Zeng Tao, Jack Pham, Thinh Nguyen, Chen Yu
  Cc: lkml, Linux USB List, Greg Kroah-Hartman

Hi John,

John Stultz wrote:
> Hey all,
>   Since the 5.0 merge window opened, I've been tripping on frequent
> dwc3 crashes on reboot and suspend, which I've added an example to the
> bottom of this mail.
>
> I've dug in a little bit and sort of have a sense of whats going on.
>
> In ffs_epfile_io():
> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_torvalds_linux.git_tree_drivers_usb_gadget_function_f-5Ffs.c-23n1065&d=DwIBaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=u9FYoxKtyhjrGFcyixFYqTjw1ZX0VsG2d8FCmzkTY-w&m=a8TU-itM8GBG_EARYf2yM-kVfCzmaPkKDNAUFQHTe3Q&s=BQiVAFiViSlxVg5_LemED0x_47FLVUD43M7R6h6T8qk&e=
>
> The completion done is setup on the stack:
>   DECLARE_COMPLETION_ONSTACK(done);
>
> Then later we setup a request and queue it:
>   req->context  = &done;
>   ...
>   ret = usb_ep_queue(ep->ep, req, GFP_ATOMIC);
>
> Then wait for it:
>   if (unlikely(wait_for_completion_interruptible(&done))) {
>     /*
>     * To avoid race condition with ffs_epfile_io_complete,
>     * dequeue the request first then check
>     * status. usb_ep_dequeue API should guarantee no race
>     * condition with req->complete callback.
>     */
>     usb_ep_dequeue(ep->ep, req);
>     interrupted = ep->status < 0;
>   }
>
> The problem is, that we end up being interrupted, supposedly dequeue
> the request, and exit.
>
> But then (or in parallel) the irq triggers and we try calling
> complete() on the context pointer which points to now random stack
> space, which results in the panic.
>
> It seems like something is wrong with usb_ep_dequeue not really
> stopping the irq from happening?
>
> If I revert all the changes to dwc3 back to 4.20, I don't see the issue.
>
> I'll do some bisection to try to narrow things down, but I wanted to
> see if this was a known issue or if anyone had immediate ideas as to
> what might be wrong.
>

I'm not sure if this is related, but can you try to test using Felipe's
testing/next branch? There is a fix to a race condition when the gadget
driver tries to dequeue requests.

See if you run into this issue again.

Thanks,
Thinh

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Frequent dwc3 crashes on suspend or reboot since 5.0-rc1
  2019-02-02  0:29 ` Thinh Nguyen
@ 2019-02-02  0:36   ` John Stultz
  0 siblings, 0 replies; 10+ messages in thread
From: John Stultz @ 2019-02-02  0:36 UTC (permalink / raw)
  To: Thinh Nguyen
  Cc: Felipe Balbi, Zeng Tao, Jack Pham, Chen Yu, lkml, Linux USB List,
	Greg Kroah-Hartman

On Fri, Feb 1, 2019 at 4:31 PM Thinh Nguyen <thinh.nguyen@synopsys.com> wrote:
>
> Hi John,
>
> John Stultz wrote:
> > Hey all,
> >   Since the 5.0 merge window opened, I've been tripping on frequent
> > dwc3 crashes on reboot and suspend, which I've added an example to the
> > bottom of this mail.
> >
> > I've dug in a little bit and sort of have a sense of whats going on.
> >
> > In ffs_epfile_io():
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_torvalds_linux.git_tree_drivers_usb_gadget_function_f-5Ffs.c-23n1065&d=DwIBaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=u9FYoxKtyhjrGFcyixFYqTjw1ZX0VsG2d8FCmzkTY-w&m=a8TU-itM8GBG_EARYf2yM-kVfCzmaPkKDNAUFQHTe3Q&s=BQiVAFiViSlxVg5_LemED0x_47FLVUD43M7R6h6T8qk&e=
> >
> > The completion done is setup on the stack:
> >   DECLARE_COMPLETION_ONSTACK(done);
> >
> > Then later we setup a request and queue it:
> >   req->context  = &done;
> >   ...
> >   ret = usb_ep_queue(ep->ep, req, GFP_ATOMIC);
> >
> > Then wait for it:
> >   if (unlikely(wait_for_completion_interruptible(&done))) {
> >     /*
> >     * To avoid race condition with ffs_epfile_io_complete,
> >     * dequeue the request first then check
> >     * status. usb_ep_dequeue API should guarantee no race
> >     * condition with req->complete callback.
> >     */
> >     usb_ep_dequeue(ep->ep, req);
> >     interrupted = ep->status < 0;
> >   }
> >
> > The problem is, that we end up being interrupted, supposedly dequeue
> > the request, and exit.
> >
> > But then (or in parallel) the irq triggers and we try calling
> > complete() on the context pointer which points to now random stack
> > space, which results in the panic.
> >
> > It seems like something is wrong with usb_ep_dequeue not really
> > stopping the irq from happening?
> >
> > If I revert all the changes to dwc3 back to 4.20, I don't see the issue.
> >
> > I'll do some bisection to try to narrow things down, but I wanted to
> > see if this was a known issue or if anyone had immediate ideas as to
> > what might be wrong.
> >
>
> I'm not sure if this is related, but can you try to test using Felipe's
> testing/next branch? There is a fix to a race condition when the gadget
> driver tries to dequeue requests.
>
> See if you run into this issue again.

I'll check that out! Thanks so much for the pointer!
-john

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Frequent dwc3 crashes on suspend or reboot since 5.0-rc1
  2019-02-02  0:18 Frequent dwc3 crashes on suspend or reboot since 5.0-rc1 John Stultz
  2019-02-02  0:29 ` Thinh Nguyen
@ 2019-02-02  0:41 ` John Stultz
  2019-02-02  0:46   ` Thinh Nguyen
  2019-02-02 17:00 ` Alan Stern
  2 siblings, 1 reply; 10+ messages in thread
From: John Stultz @ 2019-02-02  0:41 UTC (permalink / raw)
  To: Felipe Balbi, Zeng Tao, Jack Pham, Thinh Nguyen, Chen Yu
  Cc: lkml, Linux USB List, Greg Kroah-Hartman

On Fri, Feb 1, 2019 at 4:18 PM John Stultz <john.stultz@linaro.org> wrote:
>
> Hey all,
>   Since the 5.0 merge window opened, I've been tripping on frequent
> dwc3 crashes on reboot and suspend, which I've added an example to the
> bottom of this mail.
>
> I've dug in a little bit and sort of have a sense of whats going on.
>
> In ffs_epfile_io():
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/usb/gadget/function/f_fs.c#n1065
>
> The completion done is setup on the stack:
>   DECLARE_COMPLETION_ONSTACK(done);
>
> Then later we setup a request and queue it:
>   req->context  = &done;
>   ...
>   ret = usb_ep_queue(ep->ep, req, GFP_ATOMIC);
>
> Then wait for it:
>   if (unlikely(wait_for_completion_interruptible(&done))) {
>     /*
>     * To avoid race condition with ffs_epfile_io_complete,
>     * dequeue the request first then check
>     * status. usb_ep_dequeue API should guarantee no race
>     * condition with req->complete callback.
>     */
>     usb_ep_dequeue(ep->ep, req);
>     interrupted = ep->status < 0;
>   }
>
> The problem is, that we end up being interrupted, supposedly dequeue
> the request, and exit.
>
> But then (or in parallel) the irq triggers and we try calling
> complete() on the context pointer which points to now random stack
> space, which results in the panic.
>
> It seems like something is wrong with usb_ep_dequeue not really
> stopping the irq from happening?
>
> If I revert all the changes to dwc3 back to 4.20, I don't see the issue.
>
> I'll do some bisection to try to narrow things down, but I wanted to
> see if this was a known issue or if anyone had immediate ideas as to
> what might be wrong.

Bisecting the changes down, it seems like its due to commit
fec9095bdef4e ("usb: dwc3: gadget: remove wait_end_transfer").

It doesn't happen all the time, so I'll need to run some more testing,
but so far I've not been able to trigger it backing out the patches to
that point.

thanks
-john

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Frequent dwc3 crashes on suspend or reboot since 5.0-rc1
  2019-02-02  0:41 ` John Stultz
@ 2019-02-02  0:46   ` Thinh Nguyen
  2019-02-02  0:55     ` John Stultz
  0 siblings, 1 reply; 10+ messages in thread
From: Thinh Nguyen @ 2019-02-02  0:46 UTC (permalink / raw)
  To: John Stultz, Felipe Balbi, Zeng Tao, Jack Pham, Thinh Nguyen, Chen Yu
  Cc: lkml, Linux USB List, Greg Kroah-Hartman

Hi John,

John Stultz wrote:
> On Fri, Feb 1, 2019 at 4:18 PM John Stultz <john.stultz@linaro.org> wrote:
>> Hey all,
>>   Since the 5.0 merge window opened, I've been tripping on frequent
>> dwc3 crashes on reboot and suspend, which I've added an example to the
>> bottom of this mail.
>>
>> I've dug in a little bit and sort of have a sense of whats going on.
>>
>> In ffs_epfile_io():
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_torvalds_linux.git_tree_drivers_usb_gadget_function_f-5Ffs.c-23n1065&d=DwIBaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=u9FYoxKtyhjrGFcyixFYqTjw1ZX0VsG2d8FCmzkTY-w&m=Ikgcuoe1TJkip3EVA2Cce33perU7WerY9a24BCFW4DM&s=3gJjzpAGPdj79ROPvlM1ziRTY-4u6VRFRwKWbz5X_SA&e=
>>
>> The completion done is setup on the stack:
>>   DECLARE_COMPLETION_ONSTACK(done);
>>
>> Then later we setup a request and queue it:
>>   req->context  = &done;
>>   ...
>>   ret = usb_ep_queue(ep->ep, req, GFP_ATOMIC);
>>
>> Then wait for it:
>>   if (unlikely(wait_for_completion_interruptible(&done))) {
>>     /*
>>     * To avoid race condition with ffs_epfile_io_complete,
>>     * dequeue the request first then check
>>     * status. usb_ep_dequeue API should guarantee no race
>>     * condition with req->complete callback.
>>     */
>>     usb_ep_dequeue(ep->ep, req);
>>     interrupted = ep->status < 0;
>>   }
>>
>> The problem is, that we end up being interrupted, supposedly dequeue
>> the request, and exit.
>>
>> But then (or in parallel) the irq triggers and we try calling
>> complete() on the context pointer which points to now random stack
>> space, which results in the panic.
>>
>> It seems like something is wrong with usb_ep_dequeue not really
>> stopping the irq from happening?
>>
>> If I revert all the changes to dwc3 back to 4.20, I don't see the issue.
>>
>> I'll do some bisection to try to narrow things down, but I wanted to
>> see if this was a known issue or if anyone had immediate ideas as to
>> what might be wrong.
> Bisecting the changes down, it seems like its due to commit
> fec9095bdef4e ("usb: dwc3: gadget: remove wait_end_transfer").
>
> It doesn't happen all the time, so I'll need to run some more testing,
> but so far I've not been able to trigger it backing out the patches to
> that point.
>
> thanks
> -john
>

Yeah, it sounds like the same issue. You can review the discussion here:
https://www.spinics.net/lists/linux-usb/msg176110.html

Thinh

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Frequent dwc3 crashes on suspend or reboot since 5.0-rc1
  2019-02-02  0:46   ` Thinh Nguyen
@ 2019-02-02  0:55     ` John Stultz
  2019-02-02  1:31       ` Thinh Nguyen
  0 siblings, 1 reply; 10+ messages in thread
From: John Stultz @ 2019-02-02  0:55 UTC (permalink / raw)
  To: Thinh Nguyen
  Cc: Felipe Balbi, Zeng Tao, Jack Pham, Chen Yu, lkml, Linux USB List,
	Greg Kroah-Hartman

On Fri, Feb 1, 2019 at 4:46 PM Thinh Nguyen <thinh.nguyen@synopsys.com> wrote:
> John Stultz wrote:
> > On Fri, Feb 1, 2019 at 4:18 PM John Stultz <john.stultz@linaro.org> wrote:
> > Bisecting the changes down, it seems like its due to commit
> > fec9095bdef4e ("usb: dwc3: gadget: remove wait_end_transfer").
> >
> > It doesn't happen all the time, so I'll need to run some more testing,
> > but so far I've not been able to trigger it backing out the patches to
> > that point.
>
> Yeah, it sounds like the same issue. You can review the discussion here:
> https://www.spinics.net/lists/linux-usb/msg176110.html

Unfortunately, merging in
https://git.kernel.org/pub/scm/linux/kernel/git/balbi/usb.git
testing/next seems to trigger a different issue:

[   38.585141] OOM killer enabled.
[   38.585143] Restarting tasks ...
[   38.585874] ------------[ cut here ]------------
[   38.585882] ep1out: request 0000000000000000 already in flight
[   38.585944] WARNING: CPU: 7 PID: 2545 at
drivers/usb/dwc3/gadget.c:1430 dwc3_gadget_ep_queue+0x1d4/0x200
[   38.585946] Modules linked in:
[   38.585960] CPU: 7 PID: 2545 Comm: adbd Tainted: G S
5.0.0-rc4-00110-gad0e691 #509
[   38.585963] Hardware name: HiKey960 (DT)
[   38.585968] pstate: 80400085 (Nzcv daIf +PAN -UAO)
[   38.585972] pc : dwc3_gadget_ep_queue+0x1d4/0x200
[   38.585976] lr : dwc3_gadget_ep_queue+0x1d4/0x200
[   38.585978] sp : ffffff80149fbb40
[   38.585981] x29: ffffff80149fbb40 x28: ffffffc20c72a200
[   38.585985] x27: ffffffc211376d00 x26: 0000000000000000
[   38.585990] x25: ffffff80149fbc30 x24: 0000000000000018
[   38.585994] x23: 0000000000000080 x22: ffffffc218c0a1b0
[   38.585997] x21: ffffffffffffff94 x20: ffffffc21930f600
[   38.586001] x19: ffffffc211376d00 x18: ffffff801161da88
[   38.586005] x17: 0000000000000000 x16: 0000000000000000
[   38.586008] x15: ffffff80117f8468 x14: 0000000000080000
[   38.586012] x13: ffffff80117f8088 x12: ffffff80116436b8
[   38.586016] x11: ffffff8011643000 x10: ffffff8011775000
[   38.586020] x9 : 0000000000000000 x8 : ffffff801178b33c
[   38.586023] x7 : 0000000000000000 x6 : 00000000007dd811
[   38.586027] x5 : 0000000000000000 x4 : 0000000000000000
[   38.586030] x3 : 0000000000000003 x2 : 0000000000000003
[   38.586034] x1 : e8959d459c94b400 x0 : 0000000000000000
[   38.586039] Call trace:
[   38.586043]  dwc3_gadget_ep_queue+0x1d4/0x200
[   38.586053]  usb_ep_queue+0x5c/0x120
[   38.586059]  ffs_epfile_io.isra.12+0x3dc/0x6c0
[   38.586063]  ffs_epfile_read_iter+0xbc/0x190
[   38.586073]  __vfs_read+0x10c/0x168
[   38.586077]  vfs_read+0x8c/0x148
[   38.586081]  ksys_read+0x5c/0xc8
[   38.586085]  __arm64_sys_read+0x14/0x20
[   38.586094]  el0_svc_common+0xb4/0x118
[   38.586099]  el0_svc_handler+0x2c/0x80
[   38.586105]  el0_svc+0x8/0xc
[   38.586107] ---[ end trace 65a9814dc2b21238 ]---
[   38.587826] done.
[   38.587922] PM: suspend exit
[   38.589426] ------------[ cut here ]------------
[   38.589433] ep1out: request 0000000000000000 already in flight
[   38.589489] WARNING: CPU: 7 PID: 4087 at
drivers/usb/dwc3/gadget.c:1430 dwc3_gadget_ep_queue+0x1d4/0x200
[   38.589493] Modules linked in:
[   38.589506] CPU: 7 PID: 4087 Comm: adbd Tainted: G S      W
5.0.0-rc4-00110-gad0e691 #509
[   38.589508] Hardware name: HiKey960 (DT)
[   38.589514] pstate: 80400085 (Nzcv daIf +PAN -UAO)
[   38.589518] pc : dwc3_gadget_ep_queue+0x1d4/0x200
[   38.589521] lr : dwc3_gadget_ep_queue+0x1d4/0x200
[   38.589523] sp : ffffff80149fbb40
[   38.589526] x29: ffffff80149fbb40 x28: ffffffc20c72a200
[   38.589531] x27: ffffffc211376d00 x26: 0000000000000000
[   38.589535] x25: ffffff80149fbc30 x24: 0000000000000018
[   38.589539] x23: 0000000000000080 x22: ffffffc218c0a1b0
[   38.589543] x21: ffffffffffffff94 x20: ffffffc21930f600
[   38.589546] x19: ffffffc211376d00 x18: ffffff801161da88
[   38.589550] x17: 0000000000000000 x16: 0000000000000000
[   38.589554] x15: ffffff80117f8468 x14: 0000000000080000
[   38.589557] x13: ffffff80117f8088 x12: ffffff80116436b8
[   38.589561] x11: ffffff8011643000 x10: ffffff8011775000
[   38.589565] x9 : 0000000000000000 x8 : ffffff801178bb88
[   38.589568] x7 : 0000000000000000 x6 : 00000000007dd811
[   38.589572] x5 : 0000000000000000 x4 : 0000000000000000
[   38.589576] x3 : 0000000000000003 x2 : 0000000000000003
[   38.589579] x1 : e8959d459c94b400 x0 : 0000000000000000
[   38.589584] Call trace:
[   38.589588]  dwc3_gadget_ep_queue+0x1d4/0x200
[   38.589597]  usb_ep_queue+0x5c/0x120
[   38.589603]  ffs_epfile_io.isra.12+0x3dc/0x6c0
[   38.589606]  ffs_epfile_read_iter+0xbc/0x190
[   38.589617]  __vfs_read+0x10c/0x168
[   38.589621]  vfs_read+0x8c/0x148
[   38.589625]  ksys_read+0x5c/0xc8
[   38.589630]  __arm64_sys_read+0x14/0x20
[   38.589638]  el0_svc_common+0xb4/0x118
[   38.589642]  el0_svc_handler+0x2c/0x80
[   38.589648]  el0_svc+0x8/0xc
[   38.589651] ---[ end trace 65a9814dc2b21239 ]---

Let me know if you have other ideas to try!

thanks
-john

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Frequent dwc3 crashes on suspend or reboot since 5.0-rc1
  2019-02-02  0:55     ` John Stultz
@ 2019-02-02  1:31       ` Thinh Nguyen
  0 siblings, 0 replies; 10+ messages in thread
From: Thinh Nguyen @ 2019-02-02  1:31 UTC (permalink / raw)
  To: John Stultz, Thinh Nguyen
  Cc: Felipe Balbi, Zeng Tao, Jack Pham, Chen Yu, lkml, Linux USB List,
	Greg Kroah-Hartman

John Stultz wrote:
> On Fri, Feb 1, 2019 at 4:46 PM Thinh Nguyen <thinh.nguyen@synopsys.com> wrote:
>> John Stultz wrote:
>>> On Fri, Feb 1, 2019 at 4:18 PM John Stultz <john.stultz@linaro.org> wrote:
>>> Bisecting the changes down, it seems like its due to commit
>>> fec9095bdef4e ("usb: dwc3: gadget: remove wait_end_transfer").
>>>
>>> It doesn't happen all the time, so I'll need to run some more testing,
>>> but so far I've not been able to trigger it backing out the patches to
>>> that point.
>> Yeah, it sounds like the same issue. You can review the discussion here:
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.spinics.net_lists_linux-2Dusb_msg176110.html&d=DwIBaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=u9FYoxKtyhjrGFcyixFYqTjw1ZX0VsG2d8FCmzkTY-w&m=NafAczsBrlszpsknN68eiuJOr8zYyB34O0R9NCF3-pc&s=ExaEPcrEUReFA79ZWRHG9MELb9eead_QxKiTu1ea8Eg&e=
> Unfortunately, merging in
> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_balbi_usb.git&d=DwIBaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=u9FYoxKtyhjrGFcyixFYqTjw1ZX0VsG2d8FCmzkTY-w&m=NafAczsBrlszpsknN68eiuJOr8zYyB34O0R9NCF3-pc&s=VLaXOrfcXp-6oljE2fC9rBEsPfYTZ6CxKqiQOhW_QKE&e=
> testing/next seems to trigger a different issue:
>
> [   38.585141] OOM killer enabled.
> [   38.585143] Restarting tasks ...
> [   38.585874] ------------[ cut here ]------------
> [   38.585882] ep1out: request 0000000000000000 already in flight
> [   38.585944] WARNING: CPU: 7 PID: 2545 at
> drivers/usb/dwc3/gadget.c:1430 dwc3_gadget_ep_queue+0x1d4/0x200
> [   38.585946] Modules linked in:
> [   38.585960] CPU: 7 PID: 2545 Comm: adbd Tainted: G S
> 5.0.0-rc4-00110-gad0e691 #509
> [   38.585963] Hardware name: HiKey960 (DT)
> [   38.585968] pstate: 80400085 (Nzcv daIf +PAN -UAO)
> [   38.585972] pc : dwc3_gadget_ep_queue+0x1d4/0x200
> [   38.585976] lr : dwc3_gadget_ep_queue+0x1d4/0x200
> [   38.585978] sp : ffffff80149fbb40

It looks like the gadget driver tried to queue request twice.

> Let me know if you have other ideas to try!
>
>

Try to revert this patches series (3 patches) from Felipe testing/next
to see if that helps:
https://patchwork.kernel.org/patch/10764297/

Thinh


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Frequent dwc3 crashes on suspend or reboot since 5.0-rc1
  2019-02-02  0:18 Frequent dwc3 crashes on suspend or reboot since 5.0-rc1 John Stultz
  2019-02-02  0:29 ` Thinh Nguyen
  2019-02-02  0:41 ` John Stultz
@ 2019-02-02 17:00 ` Alan Stern
  2019-02-04 23:57   ` John Stultz
  2 siblings, 1 reply; 10+ messages in thread
From: Alan Stern @ 2019-02-02 17:00 UTC (permalink / raw)
  To: John Stultz
  Cc: Felipe Balbi, Zeng Tao, Jack Pham, Thinh Nguyen, Chen Yu, lkml,
	Linux USB List, Greg Kroah-Hartman

On Fri, 1 Feb 2019, John Stultz wrote:

> Hey all,
>   Since the 5.0 merge window opened, I've been tripping on frequent
> dwc3 crashes on reboot and suspend, which I've added an example to the
> bottom of this mail.
> 
> I've dug in a little bit and sort of have a sense of whats going on.
> 
> In ffs_epfile_io():
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/usb/gadget/function/f_fs.c#n1065
> 
> The completion done is setup on the stack:
>   DECLARE_COMPLETION_ONSTACK(done);
> 
> Then later we setup a request and queue it:
>   req->context  = &done;
>   ...
>   ret = usb_ep_queue(ep->ep, req, GFP_ATOMIC);
> 
> Then wait for it:
>   if (unlikely(wait_for_completion_interruptible(&done))) {
>     /*
>     * To avoid race condition with ffs_epfile_io_complete,
>     * dequeue the request first then check
>     * status. usb_ep_dequeue API should guarantee no race
>     * condition with req->complete callback.
>     */
>     usb_ep_dequeue(ep->ep, req);

This code contains a bug: It assumes that usb_ep_dequeue() waits until 
the request has been completed.  You should insert

	wait_for_completion(&done);

right here.

>     interrupted = ep->status < 0;
>   }


> 
> The problem is, that we end up being interrupted, supposedly dequeue
> the request, and exit.
> 
> But then (or in parallel) the irq triggers and we try calling
> complete() on the context pointer which points to now random stack
> space, which results in the panic.

This is the natural result of not waiting for the request to complete.

> It seems like something is wrong with usb_ep_dequeue not really
> stopping the irq from happening?

Certainly.  usb_ep_dequeue() just speeds up the process of completing
the request; it doesn't wait for that process to finish.

Alan Stern

> If I revert all the changes to dwc3 back to 4.20, I don't see the issue.
> 
> I'll do some bisection to try to narrow things down, but I wanted to
> see if this was a known issue or if anyone had immediate ideas as to
> what might be wrong.
> 
> thanks
> -john


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Frequent dwc3 crashes on suspend or reboot since 5.0-rc1
  2019-02-02 17:00 ` Alan Stern
@ 2019-02-04 23:57   ` John Stultz
  2019-02-05 15:58     ` Alan Stern
  0 siblings, 1 reply; 10+ messages in thread
From: John Stultz @ 2019-02-04 23:57 UTC (permalink / raw)
  To: Alan Stern
  Cc: Felipe Balbi, Zeng Tao, Jack Pham, Thinh Nguyen, Chen Yu, lkml,
	Linux USB List, Greg Kroah-Hartman

On Sat, Feb 2, 2019 at 9:00 AM Alan Stern <stern@rowland.harvard.edu> wrote:
>
> On Fri, 1 Feb 2019, John Stultz wrote:
>
> > Hey all,
> >   Since the 5.0 merge window opened, I've been tripping on frequent
> > dwc3 crashes on reboot and suspend, which I've added an example to the
> > bottom of this mail.
> >
> > I've dug in a little bit and sort of have a sense of whats going on.
> >
> > In ffs_epfile_io():
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/usb/gadget/function/f_fs.c#n1065
> >
> > The completion done is setup on the stack:
> >   DECLARE_COMPLETION_ONSTACK(done);
> >
> > Then later we setup a request and queue it:
> >   req->context  = &done;
> >   ...
> >   ret = usb_ep_queue(ep->ep, req, GFP_ATOMIC);
> >
> > Then wait for it:
> >   if (unlikely(wait_for_completion_interruptible(&done))) {
> >     /*
> >     * To avoid race condition with ffs_epfile_io_complete,
> >     * dequeue the request first then check
> >     * status. usb_ep_dequeue API should guarantee no race
> >     * condition with req->complete callback.
> >     */
> >     usb_ep_dequeue(ep->ep, req);
>
> This code contains a bug: It assumes that usb_ep_dequeue() waits until
> the request has been completed.  You should insert
>
>         wait_for_completion(&done);
>
> right here.
>
> >     interrupted = ep->status < 0;
> >   }
>

Thanks! This does seem to resolve the issue I'm seeing.

Are you planning to send in such a patch, or would it help if I sent it out?

thanks
-john

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Frequent dwc3 crashes on suspend or reboot since 5.0-rc1
  2019-02-04 23:57   ` John Stultz
@ 2019-02-05 15:58     ` Alan Stern
  0 siblings, 0 replies; 10+ messages in thread
From: Alan Stern @ 2019-02-05 15:58 UTC (permalink / raw)
  To: John Stultz
  Cc: Felipe Balbi, Zeng Tao, Jack Pham, Thinh Nguyen, Chen Yu, lkml,
	Linux USB List, Greg Kroah-Hartman

On Mon, 4 Feb 2019, John Stultz wrote:

> On Sat, Feb 2, 2019 at 9:00 AM Alan Stern <stern@rowland.harvard.edu> wrote:
> >
> > On Fri, 1 Feb 2019, John Stultz wrote:
> >
> > > Hey all,
> > >   Since the 5.0 merge window opened, I've been tripping on frequent
> > > dwc3 crashes on reboot and suspend, which I've added an example to the
> > > bottom of this mail.
> > >
> > > I've dug in a little bit and sort of have a sense of whats going on.
> > >
> > > In ffs_epfile_io():
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/usb/gadget/function/f_fs.c#n1065
> > >
> > > The completion done is setup on the stack:
> > >   DECLARE_COMPLETION_ONSTACK(done);
> > >
> > > Then later we setup a request and queue it:
> > >   req->context  = &done;
> > >   ...
> > >   ret = usb_ep_queue(ep->ep, req, GFP_ATOMIC);
> > >
> > > Then wait for it:
> > >   if (unlikely(wait_for_completion_interruptible(&done))) {
> > >     /*
> > >     * To avoid race condition with ffs_epfile_io_complete,
> > >     * dequeue the request first then check
> > >     * status. usb_ep_dequeue API should guarantee no race
> > >     * condition with req->complete callback.
> > >     */
> > >     usb_ep_dequeue(ep->ep, req);
> >
> > This code contains a bug: It assumes that usb_ep_dequeue() waits until
> > the request has been completed.  You should insert
> >
> >         wait_for_completion(&done);
> >
> > right here.
> >
> > >     interrupted = ep->status < 0;
> > >   }
> >
> 
> Thanks! This does seem to resolve the issue I'm seeing.
> 
> Are you planning to send in such a patch, or would it help if I sent it out?

Either way, whatever you prefer.

If you decide to write the patch yourself, consider editing the
preceding comment (which is confusing if not outright wrong).  Also,
there is an earlier call of usb_ep_dequeue() in __ffs_ep0_queue_wait()
which might need to be fixed as well -- I'm not sure.

Alan Stern


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2019-02-05 15:58 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-02  0:18 Frequent dwc3 crashes on suspend or reboot since 5.0-rc1 John Stultz
2019-02-02  0:29 ` Thinh Nguyen
2019-02-02  0:36   ` John Stultz
2019-02-02  0:41 ` John Stultz
2019-02-02  0:46   ` Thinh Nguyen
2019-02-02  0:55     ` John Stultz
2019-02-02  1:31       ` Thinh Nguyen
2019-02-02 17:00 ` Alan Stern
2019-02-04 23:57   ` John Stultz
2019-02-05 15:58     ` Alan Stern

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).