linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Leo Li <leoyang.li@nxp.com>
To: Eugene Bordenkircher <Eugene_Bordenkircher@selinc.com>,
	Thorsten Leemhuis <regressions@leemhuis.info>,
	"jocke@infinera.com" <joakim.tjernlund@infinera.com>,
	"linuxppc-dev@lists.ozlabs.org" <linuxppc-dev@lists.ozlabs.org>,
	"linux-usb@vger.kernel.org" <linux-usb@vger.kernel.org>
Cc: "gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>,
	"balbi@kernel.org" <balbi@kernel.org>
Subject: RE: bug: usb: gadget: FSL_UDC_CORE Corrupted request list leads to unrecoverable loop.
Date: Mon, 29 Nov 2021 23:37:17 +0000	[thread overview]
Message-ID: <AS8PR04MB89461BF7A3272E5A18ECD0948F669@AS8PR04MB8946.eurprd04.prod.outlook.com> (raw)
In-Reply-To: <MWHPR2201MB1520A85FE05B281DAA30F44A91669@MWHPR2201MB1520.namprd22.prod.outlook.com>



> -----Original Message-----
> From: Eugene Bordenkircher <Eugene_Bordenkircher@selinc.com>
> Sent: Monday, November 29, 2021 11:25 AM
> To: Thorsten Leemhuis <regressions@leemhuis.info>; jocke@infinera.com
> <joakim.tjernlund@infinera.com>; linuxppc-dev@lists.ozlabs.org; linux-
> usb@vger.kernel.org
> Cc: Leo Li <leoyang.li@nxp.com>; gregkh@linuxfoundation.org;
> balbi@kernel.org
> Subject: RE: bug: usb: gadget: FSL_UDC_CORE Corrupted request list leads to
> unrecoverable loop.
> 
> The final result of our testing is that the patch set posted seems to address all
> known defects in the Linux kernel.  The mentioned additional problems are
> entirely caused by the antivirus solution on the windows box.  The antivirus
> solution blocks the disconnect messages from reaching the RNDIS driver so it
> has no idea the USB device went away.  There is nothing we can do to
> address this in the Linux kernel.

Thanks for the confirmation.

> 
> I propose we move forward with the patchset.

I think that we should proceed to merge the patchset but it seems to need some cleanup for coding style issues and better description before submitted formally.

> 
> Eugene T. Bordenkircher
> 
> -----Original Message-----
> From: Thorsten Leemhuis <regressions@leemhuis.info>
> Sent: Thursday, November 25, 2021 5:59 AM
> To: Eugene Bordenkircher <Eugene_Bordenkircher@selinc.com>; Thorsten
> Leemhuis <regressions@leemhuis.info>; Joakim Tjernlund
> <Joakim.Tjernlund@infinera.com>; linuxppc-dev@lists.ozlabs.org; linux-
> usb@vger.kernel.org
> Cc: leoyang.li@nxp.com; gregkh@linuxfoundation.org; balbi@kernel.org
> Subject: Re: bug: usb: gadget: FSL_UDC_CORE Corrupted request list leads to
> unrecoverable loop.
> 
> Hi, this is your Linux kernel regression tracker speaking.
> 
> Top-posting for once, to make this easy to process for everyone:
> 
> Li Yang and Felipe Balbi: how to move on with this? It's quite an old
> regression, but nevertheless it is one and thus should be fixed. Part of my
> position is to make that happen and thus remind developers and maintainers
> about this until the regression is resolved.
> 
> Ciao, Thorsten
> 
> On 16.11.21 20:11, Eugene Bordenkircher wrote:
> > On 02.11.21 22:15, Joakim Tjernlund wrote:
> >> On Sat, 2021-10-30 at 14:20 +0000, Joakim Tjernlund wrote:
> >>> On Fri, 2021-10-29 at 17:14 +0000, Eugene Bordenkircher wrote:
> >>
> >>>> We've discovered a situation where the FSL udc driver
> (drivers/usb/gadget/udc/fsl_udc_core.c) will enter a loop iterating over the
> request queue, but the queue has been corrupted at some point so it loops
> infinitely.  I believe we have narrowed into the offending code, but we are in
> need of assistance trying to find an appropriate fix for the problem.  The
> identified code appears to be in all versions of the Linux kernel the driver
> exists in.
> >>>>
> >>>> The problem appears to be when handling a USB_REQ_GET_STATUS
> request.  The driver gets this request and then calls the ch9getstatus()
> function.  In this function, it starts a request by "borrowing" the per device
> status_req, filling it in, and then queuing it with a call to list_add_tail() to add
> the request to the endpoint queue.  Right before it exits the function
> however, it's calling ep0_prime_status(), which is filling out that same
> status_req structure and then queuing it with another call to list_add_tail() to
> add the request to the endpoint queue.  This adds two instances of the exact
> same LIST_HEAD to the endpoint queue, which breaks the list since the prev
> and next pointers end up pointing to the wrong things.  This ends up causing
> a hard loop the next time nuke() gets called, which happens on the next
> setup IRQ.
> >>>>
> >>>> I'm not sure what the appropriate fix to this problem is, mostly due to
> my lack of expertise in USB and this driver stack.  The code has been this way
> in the kernel for a very long time, which suggests that it has been working,
> unless USB_REQ_GET_STATUS requests are never made.  This further
> suggests that there is something else going on that I don't understand.
> Deleting the call to ep0_prime_status() and the following ep0stall() call
> appears, on the surface, to get the device working again, but may have side
> effects that I'm not seeing.
> >>>>
> >>>> I'm hopeful someone in the community can help provide some
> information on what I may be missing or help come up with a solution to the
> problem.  A big thank you to anyone who would like to help out.
> >>>
> >>> Run into this to a while ago. Found the bug and a few more fixes.
> >>> This is against 4.19 so you may have to tweak them a bit.
> >>> Feel free to upstream them.
> >>
> >> Curious, did my patches help? Good to known once we upgrade as well.
> >
> > There's good news and bad news.
> >
> > The good news is that this appears to stop the driver from entering an
> > infinite loop, which prevents the Linux system from locking up and
> > never recovering.  So I'm willing to say we've made the behavior
> > better.
> >
> > The bad news is that once we get past this point, there is new bad
> > behavior.  What is on top of this driver in our system is the RNDIS
> > gadget driver communicating to a Laptop running Win10 -1809.
> > Everything appears to work fine with the Linux system until there is a
> > USB disconnect.  After the disconnect, the Linux side appears to
> > continue on just fine, but the Windows side doesn't seem to recognize
> > the disconnect, which causes the USB driver on that side to hang
> > forever and eventually blue screen the box.  This doesn't happen on
> > all machines, just a select few.   I think we can isolate the
> > behavior to a specific antivirus/security software driver that is
> > inserting itself into the USB stack and filtering the disconnect
> > message, but we're still proving that.
> >
> > I'm about 90% certain this is a different problem and we can call this
> > patchset good, at least for our test setup.  My only hesitation is if
> > the Linux side is sending a set of responses that are confusing the
> > Windows side (specifically this antivirus) or not.  I'd be content
> > calling that a separate defect though and letting this one close up
> > with that patchset.
> 
> P.S.: As a Linux kernel regression tracker I'm getting a lot of reports on my
> table. I can only look briefly into most of them. Unfortunately therefore I
> sometimes will get things wrong or miss something important.
> I hope that's not the case here; if you think it is, don't hesitate to tell me
> about it in a public reply. That's in everyone's interest, as what I wrote above
> might be misleading to everyone reading this; any suggestion I gave they
> thus might sent someone reading this down the wrong rabbit hole, which
> none of us wants.
> 
> BTW, I have no personal interest in this issue, which is tracked using regzbot,
> my Linux kernel regression tracking bot
> (https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furld
> efense.com%2Fv3%2F__https%3A%2F%2Flinux-
> regtracking.leemhuis.info%2Fregzbot%2F__%3B!!O7uE89YCNVw!aHa5_mLM
> nBeDjINlAtV19tBHm-
> He9jbusXucMA5h7oonHvNFwYpOHAaaqqewPOuGK9HAzJUz%24&amp;data
> =04%7C01%7Cleoyang.li%40nxp.com%7C859ce1560a7344729cea08d9b35d2e
> 67%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C6377380350721308
> 84%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luM
> zIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=ONQZyAKXNgok
> 6LgYvnaAL7LVY%2B5Wl7pXglZDqWUJZMc%3D&amp;reserved=0 ). I'm only
> posting this mail to get things rolling again and hence don't need to be CC on
> all further activities wrt to this regression.
> 
> #regzbot title: usb: fsl_udc_core: corrupted request list leads to
> unrecoverable loop

  reply	other threads:[~2021-11-29 23:38 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-29 17:14 bug: usb: gadget: FSL_UDC_CORE Corrupted request list leads to unrecoverable loop Eugene Bordenkircher
2021-10-29 17:24 ` Eugene Bordenkircher
2021-10-29 23:14   ` Li Yang
2021-10-30 14:20 ` Joakim Tjernlund
2021-11-02 21:15   ` Joakim Tjernlund
2021-11-15  8:36     ` Thorsten Leemhuis
2021-11-16 19:11       ` Eugene Bordenkircher
2021-11-25 13:59         ` Thorsten Leemhuis
2021-11-29 17:24           ` Eugene Bordenkircher
2021-11-29 23:37             ` Leo Li [this message]
2021-11-29 23:48               ` Eugene Bordenkircher
2021-11-30 11:56                 ` Joakim Tjernlund
2021-12-01 14:19                   ` Joakim Tjernlund
2021-12-02 20:35                     ` Leo Li
2021-12-02 22:45                       ` Joakim Tjernlund
2021-12-04  0:40                         ` Leo Li
2022-01-20 12:54                           ` Thorsten Leemhuis
2022-02-18  7:11                             ` Thorsten Leemhuis
2022-02-18 10:21                               ` Joakim Tjernlund
2022-02-18 10:39                                 ` gregkh
2022-02-18 11:17                                   ` Joakim Tjernlund
2022-02-18 11:48                                     ` gregkh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=AS8PR04MB89461BF7A3272E5A18ECD0948F669@AS8PR04MB8946.eurprd04.prod.outlook.com \
    --to=leoyang.li@nxp.com \
    --cc=Eugene_Bordenkircher@selinc.com \
    --cc=balbi@kernel.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=joakim.tjernlund@infinera.com \
    --cc=linux-usb@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=regressions@leemhuis.info \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).