All of lore.kernel.org
 help / color / mirror / Atom feed
From: Li Yang <leoyang.li@nxp.com>
To: Eugene Bordenkircher <Eugene_Bordenkircher@selinc.com>
Cc: "linux-usb@vger.kernel.org" <linux-usb@vger.kernel.org>,
	"linuxppc-dev@lists.ozlabs.org" <linuxppc-dev@lists.ozlabs.org>,
	"balbi@kernel.org" <balbi@kernel.org>,
	"gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>
Subject: Re: bug: usb: gadget: FSL_UDC_CORE Corrupted request list leads to unrecoverable loop.
Date: Fri, 29 Oct 2021 18:14:01 -0500	[thread overview]
Message-ID: <CADRPPNSrhiwr8jmBb2h4cFYqHtuDKK8rL0i6Bkg7+xEyXJPATA@mail.gmail.com> (raw)
In-Reply-To: <MWHPR2201MB1520D45396628364E91A1FA691879@MWHPR2201MB1520.namprd22.prod.outlook.com>

On Fri, Oct 29, 2021 at 4:27 PM Eugene Bordenkircher
<Eugene_Bordenkircher@selinc.com> wrote:
>
> Typing Greg's email correct this time.  My apologies.
>
> Eugene
>
> -----Original Message-----
> From: Eugene Bordenkircher
> Sent: Friday, October 29, 2021 10:14 AM
> To: linux-usb@vger.kernel.org; linuxppc-dev@lists.ozlabs.org
> Cc: leoyang.li@nxp.com; balbi@kernel.org; gregkh@linuxfoundataion.org
> Subject: bug: usb: gadget: FSL_UDC_CORE Corrupted request list leads to unrecoverable loop.
>
> Hello all,
>
> We've discovered a situation where the FSL udc driver (drivers/usb/gadget/udc/fsl_udc_core.c) will enter a loop iterating over the request queue, but the queue has been corrupted at some point so it loops infinitely.  I believe we have narrowed into the offending code, but we are in need of assistance trying to find an appropriate fix for the problem.  The identified code appears to be in all versions of the Linux kernel the driver exists in.
>
> The problem appears to be when handling a USB_REQ_GET_STATUS request.  The driver gets this request and then calls the ch9getstatus() function.  In this function, it starts a request by "borrowing" the per device status_req, filling it in, and then queuing it with a call to list_add_tail() to add the request to the endpoint queue.  Right before it exits the function however, it's calling ep0_prime_status(), which is filling out that same status_req structure and then queuing it with another call to list_add_tail() to add the request to the endpoint queue.  This adds two instances of the exact same LIST_HEAD to the endpoint queue, which breaks the list since the prev and next pointers end up pointing to the wrong things.  This ends up causing a hard loop the next time nuke() gets called, which happens on the next setup IRQ.
>

I agree with you that this looks problematic.  This is probably
introduced by f79a60b8785 "usb: fsl_udc_core: prime status stage once
data stage has primed" that it didn't consider that the status_req has
been re-used for the DATA phase.

I think the proper fix should be having a separate request allocated
for the data phase after the above change.

> I'm not sure what the appropriate fix to this problem is, mostly due to my lack of expertise in USB and this driver stack.  The code has been this way in the kernel for a very long time, which suggests that it has been working, unless USB_REQ_GET_STATUS requests are never made.  This further suggests that there is something else going on that I don't understand.  Deleting the call to ep0_prime_status() and the following ep0stall() call appears, on the surface, to get the device working again, but may have side effects that I'm not seeing.
>
> I'm hopeful someone in the community can help provide some information on what I may be missing or help come up with a solution to the problem.  A big thank you to anyone who would like to help out.
>
> Eugene

WARNING: multiple messages have this Message-ID
From: Li Yang <leoyang.li@nxp.com>
To: Eugene Bordenkircher <Eugene_Bordenkircher@selinc.com>
Cc: "balbi@kernel.org" <balbi@kernel.org>,
	"linux-usb@vger.kernel.org" <linux-usb@vger.kernel.org>,
	"linuxppc-dev@lists.ozlabs.org" <linuxppc-dev@lists.ozlabs.org>,
	"gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>
Subject: Re: bug: usb: gadget: FSL_UDC_CORE Corrupted request list leads to unrecoverable loop.
Date: Fri, 29 Oct 2021 18:14:01 -0500	[thread overview]
Message-ID: <CADRPPNSrhiwr8jmBb2h4cFYqHtuDKK8rL0i6Bkg7+xEyXJPATA@mail.gmail.com> (raw)
In-Reply-To: <MWHPR2201MB1520D45396628364E91A1FA691879@MWHPR2201MB1520.namprd22.prod.outlook.com>

On Fri, Oct 29, 2021 at 4:27 PM Eugene Bordenkircher
<Eugene_Bordenkircher@selinc.com> wrote:
>
> Typing Greg's email correct this time.  My apologies.
>
> Eugene
>
> -----Original Message-----
> From: Eugene Bordenkircher
> Sent: Friday, October 29, 2021 10:14 AM
> To: linux-usb@vger.kernel.org; linuxppc-dev@lists.ozlabs.org
> Cc: leoyang.li@nxp.com; balbi@kernel.org; gregkh@linuxfoundataion.org
> Subject: bug: usb: gadget: FSL_UDC_CORE Corrupted request list leads to unrecoverable loop.
>
> Hello all,
>
> We've discovered a situation where the FSL udc driver (drivers/usb/gadget/udc/fsl_udc_core.c) will enter a loop iterating over the request queue, but the queue has been corrupted at some point so it loops infinitely.  I believe we have narrowed into the offending code, but we are in need of assistance trying to find an appropriate fix for the problem.  The identified code appears to be in all versions of the Linux kernel the driver exists in.
>
> The problem appears to be when handling a USB_REQ_GET_STATUS request.  The driver gets this request and then calls the ch9getstatus() function.  In this function, it starts a request by "borrowing" the per device status_req, filling it in, and then queuing it with a call to list_add_tail() to add the request to the endpoint queue.  Right before it exits the function however, it's calling ep0_prime_status(), which is filling out that same status_req structure and then queuing it with another call to list_add_tail() to add the request to the endpoint queue.  This adds two instances of the exact same LIST_HEAD to the endpoint queue, which breaks the list since the prev and next pointers end up pointing to the wrong things.  This ends up causing a hard loop the next time nuke() gets called, which happens on the next setup IRQ.
>

I agree with you that this looks problematic.  This is probably
introduced by f79a60b8785 "usb: fsl_udc_core: prime status stage once
data stage has primed" that it didn't consider that the status_req has
been re-used for the DATA phase.

I think the proper fix should be having a separate request allocated
for the data phase after the above change.

> I'm not sure what the appropriate fix to this problem is, mostly due to my lack of expertise in USB and this driver stack.  The code has been this way in the kernel for a very long time, which suggests that it has been working, unless USB_REQ_GET_STATUS requests are never made.  This further suggests that there is something else going on that I don't understand.  Deleting the call to ep0_prime_status() and the following ep0stall() call appears, on the surface, to get the device working again, but may have side effects that I'm not seeing.
>
> I'm hopeful someone in the community can help provide some information on what I may be missing or help come up with a solution to the problem.  A big thank you to anyone who would like to help out.
>
> Eugene

  reply	other threads:[~2021-10-29 23:14 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-29 17:14 Eugene Bordenkircher
2021-10-29 17:14 ` Eugene Bordenkircher
2021-10-29 17:24 ` Eugene Bordenkircher
2021-10-29 17:24   ` Eugene Bordenkircher
2021-10-29 23:14   ` Li Yang [this message]
2021-10-29 23:14     ` Li Yang
2021-10-30 14:20 ` Joakim Tjernlund
2021-10-30 14:20   ` Joakim Tjernlund
2021-11-02 21:15   ` Joakim Tjernlund
2021-11-02 21:15     ` Joakim Tjernlund
2021-11-15  8:36     ` Thorsten Leemhuis
2021-11-15  8:36       ` Thorsten Leemhuis
2021-11-16 19:11       ` Eugene Bordenkircher
2021-11-16 19:11         ` Eugene Bordenkircher
2021-11-25 13:59         ` Thorsten Leemhuis
2021-11-25 13:59           ` Thorsten Leemhuis
2021-11-29 17:24           ` Eugene Bordenkircher
2021-11-29 17:24             ` Eugene Bordenkircher
2021-11-29 23:37             ` Leo Li
2021-11-29 23:48               ` Eugene Bordenkircher
2021-11-30 11:56                 ` Joakim Tjernlund
2021-12-01 14:19                   ` Joakim Tjernlund
2021-12-02 20:35                     ` Leo Li
2021-12-02 22:45                       ` Joakim Tjernlund
2021-12-04  0:40                         ` Leo Li
2022-01-20 12:54                           ` Thorsten Leemhuis
2022-02-18  7:11                             ` Thorsten Leemhuis
2022-02-18 10:21                               ` Joakim Tjernlund
2022-02-18 10:39                                 ` gregkh
2022-02-18 10:39                                   ` gregkh
2022-02-18 11:17                                   ` Joakim Tjernlund
2022-02-18 11:17                                     ` Joakim Tjernlund
2022-02-18 11:48                                     ` gregkh
2022-02-18 11:48                                       ` gregkh
2021-11-01 16:42 ` Thorsten Leemhuis

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CADRPPNSrhiwr8jmBb2h4cFYqHtuDKK8rL0i6Bkg7+xEyXJPATA@mail.gmail.com \
    --to=leoyang.li@nxp.com \
    --cc=Eugene_Bordenkircher@selinc.com \
    --cc=balbi@kernel.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-usb@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --subject='Re: bug: usb: gadget: FSL_UDC_CORE Corrupted request list leads to unrecoverable loop.' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.