linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Eugene Bordenkircher <Eugene_Bordenkircher@selinc.com>
To: Leo Li <leoyang.li@nxp.com>,
	Thorsten Leemhuis <regressions@leemhuis.info>,
	"jocke@infinera.com" <joakim.tjernlund@infinera.com>,
	"linuxppc-dev@lists.ozlabs.org" <linuxppc-dev@lists.ozlabs.org>,
	"linux-usb@vger.kernel.org" <linux-usb@vger.kernel.org>
Cc: "gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>,
	"balbi@kernel.org" <balbi@kernel.org>
Subject: RE: bug: usb: gadget: FSL_UDC_CORE Corrupted request list leads to unrecoverable loop.
Date: Mon, 29 Nov 2021 23:48:23 +0000	[thread overview]
Message-ID: <MWHPR2201MB15205A333F1F610D332038AC91669@MWHPR2201MB1520.namprd22.prod.outlook.com> (raw)
In-Reply-To: <AS8PR04MB89461BF7A3272E5A18ECD0948F669@AS8PR04MB8946.eurprd04.prod.outlook.com>

Agreed,

We are happy pick up the torch on this, but I'd like to try and hear from Joakim first before we do.  The patch set is his, so I'd like to give him the opportunity.  I think he's the only one that can add a truly proper description as well because he mentioned that this includes a "few more fixes" than just the one we ran into.  I'd rather hear from him than try to reverse engineer what was being addressed.  

Joakim, if you are still watching the thread, would you like to take a stab at it?  If I don't hear from you in a couple days, we'll pick up the torch and do what we can.

Eugene T. Bordenkircher

-----Original Message-----
From: Leo Li <leoyang.li@nxp.com> 
Sent: Monday, November 29, 2021 3:37 PM
To: Eugene Bordenkircher <Eugene_Bordenkircher@selinc.com>; Thorsten Leemhuis <regressions@leemhuis.info>; jocke@infinera.com <joakim.tjernlund@infinera.com>; linuxppc-dev@lists.ozlabs.org; linux-usb@vger.kernel.org
Cc: gregkh@linuxfoundation.org; balbi@kernel.org
Subject: RE: bug: usb: gadget: FSL_UDC_CORE Corrupted request list leads to unrecoverable loop.

[Caution - External]

> -----Original Message-----
> From: Eugene Bordenkircher <Eugene_Bordenkircher@selinc.com>
> Sent: Monday, November 29, 2021 11:25 AM
> To: Thorsten Leemhuis <regressions@leemhuis.info>; jocke@infinera.com 
> <joakim.tjernlund@infinera.com>; linuxppc-dev@lists.ozlabs.org; linux- 
> usb@vger.kernel.org
> Cc: Leo Li <leoyang.li@nxp.com>; gregkh@linuxfoundation.org; 
> balbi@kernel.org
> Subject: RE: bug: usb: gadget: FSL_UDC_CORE Corrupted request list 
> leads to unrecoverable loop.
>
> The final result of our testing is that the patch set posted seems to 
> address all known defects in the Linux kernel.  The mentioned 
> additional problems are entirely caused by the antivirus solution on 
> the windows box.  The antivirus solution blocks the disconnect 
> messages from reaching the RNDIS driver so it has no idea the USB 
> device went away.  There is nothing we can do to address this in the Linux kernel.

Thanks for the confirmation.

>
> I propose we move forward with the patchset.

I think that we should proceed to merge the patchset but it seems to need some cleanup for coding style issues and better description before submitted formally.

>
> Eugene T. Bordenkircher
>
> -----Original Message-----
> From: Thorsten Leemhuis <regressions@leemhuis.info>
> Sent: Thursday, November 25, 2021 5:59 AM
> To: Eugene Bordenkircher <Eugene_Bordenkircher@selinc.com>; Thorsten 
> Leemhuis <regressions@leemhuis.info>; Joakim Tjernlund 
> <Joakim.Tjernlund@infinera.com>; linuxppc-dev@lists.ozlabs.org; linux- 
> usb@vger.kernel.org
> Cc: leoyang.li@nxp.com; gregkh@linuxfoundation.org; balbi@kernel.org
> Subject: Re: bug: usb: gadget: FSL_UDC_CORE Corrupted request list 
> leads to unrecoverable loop.
>
> Hi, this is your Linux kernel regression tracker speaking.
>
> Top-posting for once, to make this easy to process for everyone:
>
> Li Yang and Felipe Balbi: how to move on with this? It's quite an old 
> regression, but nevertheless it is one and thus should be fixed. Part 
> of my position is to make that happen and thus remind developers and 
> maintainers about this until the regression is resolved.
>
> Ciao, Thorsten
>
> On 16.11.21 20:11, Eugene Bordenkircher wrote:
> > On 02.11.21 22:15, Joakim Tjernlund wrote:
> >> On Sat, 2021-10-30 at 14:20 +0000, Joakim Tjernlund wrote:
> >>> On Fri, 2021-10-29 at 17:14 +0000, Eugene Bordenkircher wrote:
> >>
> >>>> We've discovered a situation where the FSL udc driver
> (drivers/usb/gadget/udc/fsl_udc_core.c) will enter a loop iterating 
> over the request queue, but the queue has been corrupted at some point 
> so it loops infinitely.  I believe we have narrowed into the offending 
> code, but we are in need of assistance trying to find an appropriate 
> fix for the problem.  The identified code appears to be in all 
> versions of the Linux kernel the driver exists in.
> >>>>
> >>>> The problem appears to be when handling a USB_REQ_GET_STATUS
> request.  The driver gets this request and then calls the 
> ch9getstatus() function.  In this function, it starts a request by 
> "borrowing" the per device status_req, filling it in, and then queuing 
> it with a call to list_add_tail() to add the request to the endpoint 
> queue.  Right before it exits the function however, it's calling 
> ep0_prime_status(), which is filling out that same status_req 
> structure and then queuing it with another call to list_add_tail() to 
> add the request to the endpoint queue.  This adds two instances of the 
> exact same LIST_HEAD to the endpoint queue, which breaks the list 
> since the prev and next pointers end up pointing to the wrong things.  
> This ends up causing a hard loop the next time nuke() gets called, which happens on the next setup IRQ.
> >>>>
> >>>> I'm not sure what the appropriate fix to this problem is, mostly 
> >>>> due to
> my lack of expertise in USB and this driver stack.  The code has been 
> this way in the kernel for a very long time, which suggests that it 
> has been working, unless USB_REQ_GET_STATUS requests are never made.  
> This further suggests that there is something else going on that I don't understand.
> Deleting the call to ep0_prime_status() and the following ep0stall() 
> call appears, on the surface, to get the device working again, but may 
> have side effects that I'm not seeing.
> >>>>
> >>>> I'm hopeful someone in the community can help provide some
> information on what I may be missing or help come up with a solution 
> to the problem.  A big thank you to anyone who would like to help out.
> >>>
> >>> Run into this to a while ago. Found the bug and a few more fixes.
> >>> This is against 4.19 so you may have to tweak them a bit.
> >>> Feel free to upstream them.
> >>
> >> Curious, did my patches help? Good to known once we upgrade as well.
> >
> > There's good news and bad news.
> >
> > The good news is that this appears to stop the driver from entering 
> > an infinite loop, which prevents the Linux system from locking up 
> > and never recovering.  So I'm willing to say we've made the behavior 
> > better.
> >
> > The bad news is that once we get past this point, there is new bad 
> > behavior.  What is on top of this driver in our system is the RNDIS 
> > gadget driver communicating to a Laptop running Win10 -1809.
> > Everything appears to work fine with the Linux system until there is 
> > a USB disconnect.  After the disconnect, the Linux side appears to 
> > continue on just fine, but the Windows side doesn't seem to 
> > recognize the disconnect, which causes the USB driver on that side 
> > to hang forever and eventually blue screen the box.  This doesn't happen on
> > all machines, just a select few.   I think we can isolate the
> > behavior to a specific antivirus/security software driver that is 
> > inserting itself into the USB stack and filtering the disconnect 
> > message, but we're still proving that.
> >
> > I'm about 90% certain this is a different problem and we can call 
> > this patchset good, at least for our test setup.  My only hesitation 
> > is if the Linux side is sending a set of responses that are 
> > confusing the Windows side (specifically this antivirus) or not.  
> > I'd be content calling that a separate defect though and letting 
> > this one close up with that patchset.
>
> P.S.: As a Linux kernel regression tracker I'm getting a lot of 
> reports on my table. I can only look briefly into most of them. 
> Unfortunately therefore I sometimes will get things wrong or miss something important.
> I hope that's not the case here; if you think it is, don't hesitate to 
> tell me about it in a public reply. That's in everyone's interest, as 
> what I wrote above might be misleading to everyone reading this; any 
> suggestion I gave they thus might sent someone reading this down the 
> wrong rabbit hole, which none of us wants.
>
> BTW, I have no personal interest in this issue, which is tracked using 
> regzbot, my Linux kernel regression tracking bot 
> (https://urldefense.com/v3/__https://eur01.safelinks.protection.outloo
> k.com/?url=https*3A*2F*2Furld__;JSUl!!O7uE89YCNVw!a6nsIMfn544OIzmshw3H
> bMBVcbwor4cV2Q5OsST7-86jy_YZKvDsN-558Ris4wh8Zawz4puN$
> efense.com%2Fv3%2F__https%3A%2F%2Flinux-
> regtracking.leemhuis.info%2Fregzbot%2F__%3B!!O7uE89YCNVw!aHa5_mLM
> nBeDjINlAtV19tBHm-
> He9jbusXucMA5h7oonHvNFwYpOHAaaqqewPOuGK9HAzJUz%24&amp;data
> =04%7C01%7Cleoyang.li%40nxp.com%7C859ce1560a7344729cea08d9b35d2e
> 67%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C6377380350721308
> 84%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luM
> zIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=ONQZyAKXNgok
> 6LgYvnaAL7LVY%2B5Wl7pXglZDqWUJZMc%3D&amp;reserved=0 ). I'm only 
> posting this mail to get things rolling again and hence don't need to 
> be CC on all further activities wrt to this regression.
>
> #regzbot title: usb: fsl_udc_core: corrupted request list leads to 
> unrecoverable loop

  reply	other threads:[~2021-11-29 23:49 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-29 17:14 Eugene Bordenkircher
2021-10-29 17:24 ` Eugene Bordenkircher
2021-10-29 23:14   ` Li Yang
2021-10-30 14:20 ` Joakim Tjernlund
2021-11-02 21:15   ` Joakim Tjernlund
2021-11-15  8:36     ` Thorsten Leemhuis
2021-11-16 19:11       ` Eugene Bordenkircher
2021-11-25 13:59         ` Thorsten Leemhuis
2021-11-29 17:24           ` Eugene Bordenkircher
2021-11-29 23:37             ` Leo Li
2021-11-29 23:48               ` Eugene Bordenkircher [this message]
2021-11-30 11:56                 ` Joakim Tjernlund
2021-12-01 14:19                   ` Joakim Tjernlund
2021-12-02 20:35                     ` Leo Li
2021-12-02 22:45                       ` Joakim Tjernlund
2021-12-04  0:40                         ` Leo Li
2022-01-20 12:54                           ` Thorsten Leemhuis
2022-02-18  7:11                             ` Thorsten Leemhuis
2022-02-18 10:21                               ` Joakim Tjernlund
2022-02-18 10:39                                 ` gregkh
2022-02-18 11:17                                   ` Joakim Tjernlund
2022-02-18 11:48                                     ` gregkh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=MWHPR2201MB15205A333F1F610D332038AC91669@MWHPR2201MB1520.namprd22.prod.outlook.com \
    --to=eugene_bordenkircher@selinc.com \
    --cc=balbi@kernel.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=joakim.tjernlund@infinera.com \
    --cc=leoyang.li@nxp.com \
    --cc=linux-usb@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=regressions@leemhuis.info \
    --subject='RE: bug: usb: gadget: FSL_UDC_CORE Corrupted request list leads to unrecoverable loop.' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).