linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* bug: usb: gadget: FSL_UDC_CORE Corrupted request list leads to unrecoverable loop.
@ 2021-10-29 17:14 Eugene Bordenkircher
  2021-10-29 17:24 ` Eugene Bordenkircher
  2021-10-30 14:20 ` Joakim Tjernlund
  0 siblings, 2 replies; 22+ messages in thread
From: Eugene Bordenkircher @ 2021-10-29 17:14 UTC (permalink / raw)
  To: linux-usb, linuxppc-dev; +Cc: balbi, gregkh, leoyang.li

Hello all,

We've discovered a situation where the FSL udc driver (drivers/usb/gadget/udc/fsl_udc_core.c) will enter a loop iterating over the request queue, but the queue has been corrupted at some point so it loops infinitely.  I believe we have narrowed into the offending code, but we are in need of assistance trying to find an appropriate fix for the problem.  The identified code appears to be in all versions of the Linux kernel the driver exists in.

The problem appears to be when handling a USB_REQ_GET_STATUS request.  The driver gets this request and then calls the ch9getstatus() function.  In this function, it starts a request by "borrowing" the per device status_req, filling it in, and then queuing it with a call to list_add_tail() to add the request to the endpoint queue.  Right before it exits the function however, it's calling ep0_prime_status(), which is filling out that same status_req structure and then queuing it with another call to list_add_tail() to add the request to the endpoint queue.  This adds two instances of the exact same LIST_HEAD to the endpoint queue, which breaks the list since the prev and next pointers end up pointing to the wrong things.  This ends up causing a hard loop the next time nuke() gets called, which happens on the next setup IRQ.

I'm not sure what the appropriate fix to this problem is, mostly due to my lack of expertise in USB and this driver stack.  The code has been this way in the kernel for a very long time, which suggests that it has been working, unless USB_REQ_GET_STATUS requests are never made.  This further suggests that there is something else going on that I don't understand.  Deleting the call to ep0_prime_status() and the following ep0stall() call appears, on the surface, to get the device working again, but may have side effects that I'm not seeing.

I'm hopeful someone in the community can help provide some information on what I may be missing or help come up with a solution to the problem.  A big thank you to anyone who would like to help out.

Eugene

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: bug: usb: gadget: FSL_UDC_CORE Corrupted request list leads to unrecoverable loop.
  2021-10-29 17:14 bug: usb: gadget: FSL_UDC_CORE Corrupted request list leads to unrecoverable loop Eugene Bordenkircher
@ 2021-10-29 17:24 ` Eugene Bordenkircher
  2021-10-29 23:14   ` Li Yang
  2021-10-30 14:20 ` Joakim Tjernlund
  1 sibling, 1 reply; 22+ messages in thread
From: Eugene Bordenkircher @ 2021-10-29 17:24 UTC (permalink / raw)
  To: linux-usb, linuxppc-dev; +Cc: balbi, gregkh, leoyang.li

Typing Greg's email correct this time.  My apologies.

Eugene 

-----Original Message-----
From: Eugene Bordenkircher 
Sent: Friday, October 29, 2021 10:14 AM
To: linux-usb@vger.kernel.org; linuxppc-dev@lists.ozlabs.org
Cc: leoyang.li@nxp.com; balbi@kernel.org; gregkh@linuxfoundataion.org
Subject: bug: usb: gadget: FSL_UDC_CORE Corrupted request list leads to unrecoverable loop.

Hello all,

We've discovered a situation where the FSL udc driver (drivers/usb/gadget/udc/fsl_udc_core.c) will enter a loop iterating over the request queue, but the queue has been corrupted at some point so it loops infinitely.  I believe we have narrowed into the offending code, but we are in need of assistance trying to find an appropriate fix for the problem.  The identified code appears to be in all versions of the Linux kernel the driver exists in.

The problem appears to be when handling a USB_REQ_GET_STATUS request.  The driver gets this request and then calls the ch9getstatus() function.  In this function, it starts a request by "borrowing" the per device status_req, filling it in, and then queuing it with a call to list_add_tail() to add the request to the endpoint queue.  Right before it exits the function however, it's calling ep0_prime_status(), which is filling out that same status_req structure and then queuing it with another call to list_add_tail() to add the request to the endpoint queue.  This adds two instances of the exact same LIST_HEAD to the endpoint queue, which breaks the list since the prev and next pointers end up pointing to the wrong things.  This ends up causing a hard loop the next time nuke() gets called, which happens on the next setup IRQ.

I'm not sure what the appropriate fix to this problem is, mostly due to my lack of expertise in USB and this driver stack.  The code has been this way in the kernel for a very long time, which suggests that it has been working, unless USB_REQ_GET_STATUS requests are never made.  This further suggests that there is something else going on that I don't understand.  Deleting the call to ep0_prime_status() and the following ep0stall() call appears, on the surface, to get the device working again, but may have side effects that I'm not seeing.

I'm hopeful someone in the community can help provide some information on what I may be missing or help come up with a solution to the problem.  A big thank you to anyone who would like to help out.

Eugene

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: bug: usb: gadget: FSL_UDC_CORE Corrupted request list leads to unrecoverable loop.
  2021-10-29 17:24 ` Eugene Bordenkircher
@ 2021-10-29 23:14   ` Li Yang
  0 siblings, 0 replies; 22+ messages in thread
From: Li Yang @ 2021-10-29 23:14 UTC (permalink / raw)
  To: Eugene Bordenkircher; +Cc: balbi, linux-usb, linuxppc-dev, gregkh

On Fri, Oct 29, 2021 at 4:27 PM Eugene Bordenkircher
<Eugene_Bordenkircher@selinc.com> wrote:
>
> Typing Greg's email correct this time.  My apologies.
>
> Eugene
>
> -----Original Message-----
> From: Eugene Bordenkircher
> Sent: Friday, October 29, 2021 10:14 AM
> To: linux-usb@vger.kernel.org; linuxppc-dev@lists.ozlabs.org
> Cc: leoyang.li@nxp.com; balbi@kernel.org; gregkh@linuxfoundataion.org
> Subject: bug: usb: gadget: FSL_UDC_CORE Corrupted request list leads to unrecoverable loop.
>
> Hello all,
>
> We've discovered a situation where the FSL udc driver (drivers/usb/gadget/udc/fsl_udc_core.c) will enter a loop iterating over the request queue, but the queue has been corrupted at some point so it loops infinitely.  I believe we have narrowed into the offending code, but we are in need of assistance trying to find an appropriate fix for the problem.  The identified code appears to be in all versions of the Linux kernel the driver exists in.
>
> The problem appears to be when handling a USB_REQ_GET_STATUS request.  The driver gets this request and then calls the ch9getstatus() function.  In this function, it starts a request by "borrowing" the per device status_req, filling it in, and then queuing it with a call to list_add_tail() to add the request to the endpoint queue.  Right before it exits the function however, it's calling ep0_prime_status(), which is filling out that same status_req structure and then queuing it with another call to list_add_tail() to add the request to the endpoint queue.  This adds two instances of the exact same LIST_HEAD to the endpoint queue, which breaks the list since the prev and next pointers end up pointing to the wrong things.  This ends up causing a hard loop the next time nuke() gets called, which happens on the next setup IRQ.
>

I agree with you that this looks problematic.  This is probably
introduced by f79a60b8785 "usb: fsl_udc_core: prime status stage once
data stage has primed" that it didn't consider that the status_req has
been re-used for the DATA phase.

I think the proper fix should be having a separate request allocated
for the data phase after the above change.

> I'm not sure what the appropriate fix to this problem is, mostly due to my lack of expertise in USB and this driver stack.  The code has been this way in the kernel for a very long time, which suggests that it has been working, unless USB_REQ_GET_STATUS requests are never made.  This further suggests that there is something else going on that I don't understand.  Deleting the call to ep0_prime_status() and the following ep0stall() call appears, on the surface, to get the device working again, but may have side effects that I'm not seeing.
>
> I'm hopeful someone in the community can help provide some information on what I may be missing or help come up with a solution to the problem.  A big thank you to anyone who would like to help out.
>
> Eugene

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: bug: usb: gadget: FSL_UDC_CORE Corrupted request list leads to unrecoverable loop.
  2021-10-29 17:14 bug: usb: gadget: FSL_UDC_CORE Corrupted request list leads to unrecoverable loop Eugene Bordenkircher
  2021-10-29 17:24 ` Eugene Bordenkircher
@ 2021-10-30 14:20 ` Joakim Tjernlund
  2021-11-02 21:15   ` Joakim Tjernlund
  1 sibling, 1 reply; 22+ messages in thread
From: Joakim Tjernlund @ 2021-10-30 14:20 UTC (permalink / raw)
  To: linuxppc-dev, Eugene_Bordenkircher, linux-usb; +Cc: gregkh, balbi, leoyang.li

[-- Attachment #1: Type: text/plain, Size: 2314 bytes --]

On Fri, 2021-10-29 at 17:14 +0000, Eugene Bordenkircher wrote:
> Hello all,
> 
> We've discovered a situation where the FSL udc driver (drivers/usb/gadget/udc/fsl_udc_core.c) will enter a loop iterating over the request queue, but the queue has been corrupted at some point so it loops infinitely.  I believe we have narrowed into the offending code, but we are in need of assistance trying to find an appropriate fix for the problem.  The identified code appears to be in all versions of the Linux kernel the driver exists in.
> 
> The problem appears to be when handling a USB_REQ_GET_STATUS request.  The driver gets this request and then calls the ch9getstatus() function.  In this function, it starts a request by "borrowing" the per device status_req, filling it in, and then queuing it with a call to list_add_tail() to add the request to the endpoint queue.  Right before it exits the function however, it's calling ep0_prime_status(), which is filling out that same status_req structure and then queuing it with another call to list_add_tail() to add the request to the endpoint queue.  This adds two instances of the exact same LIST_HEAD to the endpoint queue, which breaks the list since the prev and next pointers end up pointing to the wrong things.  This ends up causing a hard loop the next time nuke() gets called, which happens on the next setup IRQ.
> 
> I'm not sure what the appropriate fix to this problem is, mostly due to my lack of expertise in USB and this driver stack.  The code has been this way in the kernel for a very long time, which suggests that it has been working, unless USB_REQ_GET_STATUS requests are never made.  This further suggests that there is something else going on that I don't understand.  Deleting the call to ep0_prime_status() and the following ep0stall() call appears, on the surface, to get the device working again, but may have side effects that I'm not seeing.
> 
> I'm hopeful someone in the community can help provide some information on what I may be missing or help come up with a solution to the problem.  A big thank you to anyone who would like to help out.
> 
> Eugene

Run into this to a while ago. Found the bug and a few more fixes.
This is against 4.19 so you may have to tweak them a bit.
Feel free to upstream them.

 Jocke 

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0005-fsl_udc_core-Init-max_pipes-for-reset_queues.patch --]
[-- Type: text/x-patch; name="0005-fsl_udc_core-Init-max_pipes-for-reset_queues.patch", Size: 989 bytes --]

From a7ed9cffbfc90371b570ebef698d96c39adbaf77 Mon Sep 17 00:00:00 2001
From: Joakim Tjernlund <joakim.tjernlund@infinera.com>
Date: Mon, 11 May 2020 11:18:14 +0200
Subject: [PATCH 5/5] fsl_udc_core: Init max_pipes for reset_queues()

Signed-off-by: Joakim Tjernlund <joakim.tjernlund@infinera.com>
---
 drivers/usb/gadget/udc/fsl_udc_core.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/usb/gadget/udc/fsl_udc_core.c b/drivers/usb/gadget/udc/fsl_udc_core.c
index bd3825d9f1d2..92136dff8373 100644
--- a/drivers/usb/gadget/udc/fsl_udc_core.c
+++ b/drivers/usb/gadget/udc/fsl_udc_core.c
@@ -2441,6 +2441,7 @@ static int fsl_udc_probe(struct platform_device *pdev)
 	/* Get max device endpoints */
 	/* DEN is bidirectional ep number, max_ep doubles the number */
 	udc_controller->max_ep = (dccparams & DCCPARAMS_DEN_MASK) * 2;
+	udc_controller->max_pipes = udc_controller->max_ep;
 
 	udc_controller->irq = platform_get_irq(pdev, 0);
 	if (!udc_controller->irq) {
-- 
2.32.0


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #3: 0004-fsl_udc_stop-Use-list_for_each_entry_safe-when-delet.patch --]
[-- Type: text/x-patch; name="0004-fsl_udc_stop-Use-list_for_each_entry_safe-when-delet.patch", Size: 1422 bytes --]

From b98fa0dd384f17fee0c1283b91f855b97d1976f4 Mon Sep 17 00:00:00 2001
From: Joakim Tjernlund <joakim.tjernlund@infinera.com>
Date: Mon, 11 May 2020 10:38:07 +0200
Subject: [PATCH 4/5] fsl_udc_stop: Use list_for_each_entry_safe() when
 deleting

Signed-off-by: Joakim Tjernlund <joakim.tjernlund@infinera.com>
---
 drivers/usb/gadget/udc/fsl_udc_core.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/usb/gadget/udc/fsl_udc_core.c b/drivers/usb/gadget/udc/fsl_udc_core.c
index 4f835332af45..bd3825d9f1d2 100644
--- a/drivers/usb/gadget/udc/fsl_udc_core.c
+++ b/drivers/usb/gadget/udc/fsl_udc_core.c
@@ -1984,7 +1984,7 @@ static int fsl_udc_start(struct usb_gadget *g,
 /* Disconnect from gadget driver */
 static int fsl_udc_stop(struct usb_gadget *g)
 {
-	struct fsl_ep *loop_ep;
+	struct fsl_ep *loop_ep, *tmp_loop;
 	unsigned long flags;
 
 	if (!IS_ERR_OR_NULL(udc_controller->transceiver))
@@ -2002,8 +2002,8 @@ static int fsl_udc_stop(struct usb_gadget *g)
 	spin_lock_irqsave(&udc_controller->lock, flags);
 	udc_controller->gadget.speed = USB_SPEED_UNKNOWN;
 	nuke(&udc_controller->eps[0], -ESHUTDOWN);
-	list_for_each_entry(loop_ep, &udc_controller->gadget.ep_list,
-			ep.ep_list)
+	list_for_each_entry_safe(loop_ep, tmp_loop, &udc_controller->gadget.ep_list,
+				 ep.ep_list)
 		nuke(loop_ep, -ESHUTDOWN);
 	spin_unlock_irqrestore(&udc_controller->lock, flags);
 
-- 
2.32.0


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #4: 0003-fsl_ep_dequeue.patch --]
[-- Type: text/x-patch; name="0003-fsl_ep_dequeue.patch", Size: 1007 bytes --]

From a90a89d06bd008f606404ec613b4f2343b9dda1a Mon Sep 17 00:00:00 2001
From: Joakim Tjernlund <joakim.tjernlund@infinera.com>
Date: Thu, 7 May 2020 22:35:14 +0200
Subject: [PATCH 3/5] fsl_ep_dequeue

Signed-off-by: Joakim Tjernlund <joakim.tjernlund@infinera.com>
---
 drivers/usb/gadget/udc/fsl_udc_core.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/usb/gadget/udc/fsl_udc_core.c b/drivers/usb/gadget/udc/fsl_udc_core.c
index 4b1591fa2e1c..4f835332af45 100644
--- a/drivers/usb/gadget/udc/fsl_udc_core.c
+++ b/drivers/usb/gadget/udc/fsl_udc_core.c
@@ -977,7 +977,13 @@ static int fsl_ep_dequeue(struct usb_ep *_ep, struct usb_request *_req)
 
 			/* prime with dTD of next request */
 			fsl_prime_ep(ep, next_req->head);
-		}
+		} else {
+			struct ep_queue_head *qh;
+
+			qh = ep->qh;
+			qh->next_dtd_ptr = 1;
+			qh->size_ioc_int_sts = 0;
+ 		}
 	/* The request hasn't been processed, patch up the TD chain */
 	} else {
 		struct fsl_req *prev_req;
-- 
2.32.0


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #5: 0002-fsl_udc-import-build_dtd-fixes.patch --]
[-- Type: text/x-patch; name="0002-fsl_udc-import-build_dtd-fixes.patch", Size: 2239 bytes --]

From b3f09747be2007be3a372fe80635b51df6ba71bd Mon Sep 17 00:00:00 2001
From: Joakim Tjernlund <joakim.tjernlund@infinera.com>
Date: Thu, 7 May 2020 22:32:26 +0200
Subject: [PATCH 2/5] fsl_udc: import build_dtd fixes

Signed-off-by: Joakim Tjernlund <joakim.tjernlund@infinera.com>
---
 drivers/usb/gadget/udc/fsl_udc_core.c | 16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/drivers/usb/gadget/udc/fsl_udc_core.c b/drivers/usb/gadget/udc/fsl_udc_core.c
index 2546bc28f42a..4b1591fa2e1c 100644
--- a/drivers/usb/gadget/udc/fsl_udc_core.c
+++ b/drivers/usb/gadget/udc/fsl_udc_core.c
@@ -774,12 +774,20 @@ static void fsl_queue_td(struct fsl_ep *ep, struct fsl_req *req)
 static struct ep_td_struct *fsl_build_dtd(struct fsl_req *req, unsigned *length,
 		dma_addr_t *dma, int *is_last, gfp_t gfp_flags)
 {
-	u32 swap_temp;
+	u32 swap_temp, mult = 0;
 	struct ep_td_struct *dtd;
+	struct ep_queue_head *dqh;
 
 	/* how big will this transfer be? */
-	*length = min(req->req.length - req->req.actual,
-			(unsigned)EP_MAX_LENGTH_TRANSFER);
+	if (usb_endpoint_xfer_isoc(req->ep->ep.desc)) {
+		dqh = req->ep->qh;
+		mult = (dqh->max_pkt_length >> EP_QUEUE_HEAD_MULT_POS)
+			& 0x3;
+		*length = min(req->req.length - req->req.actual,
+			      (unsigned)(mult * req->ep->ep.maxpacket));
+	} else
+		*length = min(req->req.length - req->req.actual,
+			      (unsigned)EP_MAX_LENGTH_TRANSFER);
 
 	dtd = dma_pool_alloc(udc_controller->td_pool, gfp_flags, dma);
 	if (dtd == NULL)
@@ -794,6 +802,7 @@ static struct ep_td_struct *fsl_build_dtd(struct fsl_req *req, unsigned *length,
 	/* Init all of buffer page pointers */
 	swap_temp = (u32) (req->req.dma + req->req.actual);
 	dtd->buff_ptr0 = cpu_to_hc32(swap_temp);
+	swap_temp &= ~0xFFF;
 	dtd->buff_ptr1 = cpu_to_hc32(swap_temp + 0x1000);
 	dtd->buff_ptr2 = cpu_to_hc32(swap_temp + 0x2000);
 	dtd->buff_ptr3 = cpu_to_hc32(swap_temp + 0x3000);
@@ -820,6 +829,7 @@ static struct ep_td_struct *fsl_build_dtd(struct fsl_req *req, unsigned *length,
 	/* Enable interrupt for the last dtd of a request */
 	if (*is_last && !req->req.no_interrupt)
 		swap_temp |= DTD_IOC;
+	swap_temp |= mult << 10;
 
 	dtd->size_ioc_sts = cpu_to_hc32(swap_temp);
 
-- 
2.32.0


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #6: 0001-ch9getstatus-ep0_prime_status-fixes-RND-28770.patch --]
[-- Type: text/x-patch; name="0001-ch9getstatus-ep0_prime_status-fixes-RND-28770.patch", Size: 4367 bytes --]

From 17c684fdcd6152b7e504656b1711e24508c32f6e Mon Sep 17 00:00:00 2001
From: Joakim Tjernlund <joakim.tjernlund@infinera.com>
Date: Fri, 8 May 2020 17:12:53 +0200
Subject: [PATCH 1/5] ch9getstatus/ep0_prime_status, fixes RND-28770

USB driver added the same req twice to the same list.
This cause a endless loop while in IRQ context.
Fix by importing code from mv_udc_core.c, its sister driver.

Signed-off-by: Joakim Tjernlund <joakim.tjernlund@infinera.com>
---
 drivers/usb/gadget/udc/fsl_udc_core.c | 56 ++++++++++-----------------
 1 file changed, 21 insertions(+), 35 deletions(-)

diff --git a/drivers/usb/gadget/udc/fsl_udc_core.c b/drivers/usb/gadget/udc/fsl_udc_core.c
index 367697144cda..2546bc28f42a 100644
--- a/drivers/usb/gadget/udc/fsl_udc_core.c
+++ b/drivers/usb/gadget/udc/fsl_udc_core.c
@@ -1266,7 +1266,7 @@ static void ep0stall(struct fsl_udc *udc)
 }
 
 /* Prime a status phase for ep0 */
-static int ep0_prime_status(struct fsl_udc *udc, int direction)
+static int ep0_prime_status(struct fsl_udc *udc, int direction, u16 status, bool empty)
 {
 	struct fsl_req *req = udc->status_req;
 	struct fsl_ep *ep;
@@ -1281,8 +1281,14 @@ static int ep0_prime_status(struct fsl_udc *udc, int direction)
 	if (udc->ep0_state != DATA_STATE_XMIT)
 		udc->ep0_state = WAIT_FOR_OUT_STATUS;
 
+	/* fill in the reqest structure */
+	if (empty == false) {
+		*((u16 *) req->req.buf) = cpu_to_le16(status);
+		req->req.length = 2;
+	} else
+		req->req.length = 0;
+
 	req->ep = ep;
-	req->req.length = 0;
 	req->req.status = -EINPROGRESS;
 	req->req.actual = 0;
 	req->req.complete = fsl_noop_complete;
@@ -1292,14 +1298,19 @@ static int ep0_prime_status(struct fsl_udc *udc, int direction)
 	if (ret)
 		return ret;
 
+	ret = -ENOMEM;
 	if (fsl_req_to_dtd(req, GFP_ATOMIC) == 0)
 		fsl_queue_td(ep, req);
 	else
-		return -ENOMEM;
+		goto out;
 
 	list_add_tail(&req->queue, &ep->queue);
 
 	return 0;
+out:
+	usb_gadget_unmap_request(&udc->gadget, &req->req, ep_is_in(ep));
+
+	return ret;
 }
 
 static void udc_reset_ep_queue(struct fsl_udc *udc, u8 pipe)
@@ -1320,7 +1331,7 @@ static void ch9setaddress(struct fsl_udc *udc, u16 value, u16 index, u16 length)
 	/* Update usb state */
 	udc->usb_state = USB_STATE_ADDRESS;
 	/* Status phase */
-	if (ep0_prime_status(udc, EP_DIR_IN))
+	if (ep0_prime_status(udc, EP_DIR_IN, 0, true))
 		ep0stall(udc);
 }
 
@@ -1331,9 +1342,7 @@ static void ch9getstatus(struct fsl_udc *udc, u8 request_type, u16 value,
 		u16 index, u16 length)
 {
 	u16 tmp = 0;		/* Status, cpu endian */
-	struct fsl_req *req;
 	struct fsl_ep *ep;
-	int ret;
 
 	ep = &udc->eps[0];
 
@@ -1358,33 +1367,10 @@ static void ch9getstatus(struct fsl_udc *udc, u8 request_type, u16 value,
 				<< USB_ENDPOINT_HALT;
 	}
 
-	udc->ep0_dir = USB_DIR_IN;
-	/* Borrow the per device status_req */
-	req = udc->status_req;
-	/* Fill in the reqest structure */
-	*((u16 *) req->req.buf) = cpu_to_le16(tmp);
-
-	req->ep = ep;
-	req->req.length = 2;
-	req->req.status = -EINPROGRESS;
-	req->req.actual = 0;
-	req->req.complete = fsl_noop_complete;
-	req->dtd_count = 0;
-
-	ret = usb_gadget_map_request(&ep->udc->gadget, &req->req, ep_is_in(ep));
-	if (ret)
-		goto stall;
-
-	/* prime the data phase */
-	if ((fsl_req_to_dtd(req, GFP_ATOMIC) == 0))
-		fsl_queue_td(ep, req);
-	else			/* no mem */
-		goto stall;
-
-	list_add_tail(&req->queue, &ep->queue);
-	udc->ep0_state = DATA_STATE_XMIT;
-	if (ep0_prime_status(udc, EP_DIR_OUT))
+	if (ep0_prime_status(udc, EP_DIR_OUT, tmp, false))
 		ep0stall(udc);
+	else
+		udc->ep0_state = DATA_STATE_XMIT;
 
 	return;
 stall:
@@ -1465,7 +1451,7 @@ __acquires(udc->lock)
 			break;
 
 		if (rc == 0) {
-			if (ep0_prime_status(udc, EP_DIR_IN))
+			if (ep0_prime_status(udc, EP_DIR_IN, 0, true))
 				ep0stall(udc);
 		}
 		if (ptc) {
@@ -1501,7 +1487,7 @@ __acquires(udc->lock)
 		 * See 2.0 Spec chapter 8.5.3.3 for detail.
 		 */
 		if (udc->ep0_state == DATA_STATE_XMIT)
-			if (ep0_prime_status(udc, EP_DIR_OUT))
+			if (ep0_prime_status(udc, EP_DIR_OUT, 0, true))
 				ep0stall(udc);
 
 	} else {
@@ -1537,7 +1523,7 @@ static void ep0_req_complete(struct fsl_udc *udc, struct fsl_ep *ep0,
 		break;
 	case DATA_STATE_RECV:
 		/* send status phase */
-		if (ep0_prime_status(udc, EP_DIR_IN))
+		if (ep0_prime_status(udc, EP_DIR_IN, 0, true))
 			ep0stall(udc);
 		break;
 	case WAIT_FOR_OUT_STATUS:
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: bug: usb: gadget: FSL_UDC_CORE Corrupted request list leads to unrecoverable loop.
  2021-10-30 14:20 ` Joakim Tjernlund
@ 2021-11-02 21:15   ` Joakim Tjernlund
  2021-11-15  8:36     ` Thorsten Leemhuis
  0 siblings, 1 reply; 22+ messages in thread
From: Joakim Tjernlund @ 2021-11-02 21:15 UTC (permalink / raw)
  To: linuxppc-dev, Eugene_Bordenkircher, linux-usb; +Cc: gregkh, balbi, leoyang.li

On Sat, 2021-10-30 at 14:20 +0000, Joakim Tjernlund wrote:
> On Fri, 2021-10-29 at 17:14 +0000, Eugene Bordenkircher wrote:
> > Hello all,
> > 
> > We've discovered a situation where the FSL udc driver (drivers/usb/gadget/udc/fsl_udc_core.c) will enter a loop iterating over the request queue, but the queue has been corrupted at some point so it loops infinitely.  I believe we have narrowed into the offending code, but we are in need of assistance trying to find an appropriate fix for the problem.  The identified code appears to be in all versions of the Linux kernel the driver exists in.
> > 
> > The problem appears to be when handling a USB_REQ_GET_STATUS request.  The driver gets this request and then calls the ch9getstatus() function.  In this function, it starts a request by "borrowing" the per device status_req, filling it in, and then queuing it with a call to list_add_tail() to add the request to the endpoint queue.  Right before it exits the function however, it's calling ep0_prime_status(), which is filling out that same status_req structure and then queuing it with another call to list_add_tail() to add the request to the endpoint queue.  This adds two instances of the exact same LIST_HEAD to the endpoint queue, which breaks the list since the prev and next pointers end up pointing to the wrong things.  This ends up causing a hard loop the next time nuke() gets called, which happens on the next setup IRQ.
> > 
> > I'm not sure what the appropriate fix to this problem is, mostly due to my lack of expertise in USB and this driver stack.  The code has been this way in the kernel for a very long time, which suggests that it has been working, unless USB_REQ_GET_STATUS requests are never made.  This further suggests that there is something else going on that I don't understand.  Deleting the call to ep0_prime_status() and the following ep0stall() call appears, on the surface, to get the device working again, but may have side effects that I'm not seeing.
> > 
> > I'm hopeful someone in the community can help provide some information on what I may be missing or help come up with a solution to the problem.  A big thank you to anyone who would like to help out.
> > 
> > Eugene
> 
> Run into this to a while ago. Found the bug and a few more fixes.
> This is against 4.19 so you may have to tweak them a bit.
> Feel free to upstream them.
> 
>  Jocke 

Curious, did my patches help? Good to known once we upgrade as well.

 Jocke

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: bug: usb: gadget: FSL_UDC_CORE Corrupted request list leads to unrecoverable loop.
  2021-11-02 21:15   ` Joakim Tjernlund
@ 2021-11-15  8:36     ` Thorsten Leemhuis
  2021-11-16 19:11       ` Eugene Bordenkircher
  0 siblings, 1 reply; 22+ messages in thread
From: Thorsten Leemhuis @ 2021-11-15  8:36 UTC (permalink / raw)
  To: Joakim Tjernlund, linuxppc-dev, Eugene_Bordenkircher, linux-usb
  Cc: gregkh, balbi, leoyang.li

Hi, this is your Linux kernel regression tracker speaking.

This looks stalled, as afaics nothing to get this regression fixed
happened since below mail. How can we things rolling again?

Eugene, were you able to look into the patch from Joakim?

Or did I miss anything and some progress to fix this was made elsewhere?
Please let me know if that's the case.

Ciao, Thorsten (carrying his Linux kernel regression tracker hat)

P.S.: As a Linux kernel regression tracker I'm getting a lot of reports
on my table. I can only look briefly into most of them. Unfortunately
therefore I sometimes will get things wrong or miss something important.
I hope that's not the case here; if you think it is, don't hesitate to
tell me about it in a public reply. That's in everyone's interest, as
what I wrote above might be misleading to everyone reading this; any
suggestion I gave they thus might sent someone reading this down the
wrong rabbit hole, which none of us wants.

P.P.S.: Feel free to ignore the following lines, they are only meant for
regzbot, my Linux kernel regression tracking bot
(https://linux-regtracking.leemhuis.info/regzbot/):

#regzbot poke

On 02.11.21 22:15, Joakim Tjernlund wrote:
> On Sat, 2021-10-30 at 14:20 +0000, Joakim Tjernlund wrote:
>> On Fri, 2021-10-29 at 17:14 +0000, Eugene Bordenkircher wrote:
>
>>> We've discovered a situation where the FSL udc driver (drivers/usb/gadget/udc/fsl_udc_core.c) will enter a loop iterating over the request queue, but the queue has been corrupted at some point so it loops infinitely.  I believe we have narrowed into the offending code, but we are in need of assistance trying to find an appropriate fix for the problem.  The identified code appears to be in all versions of the Linux kernel the driver exists in.
>>>
>>> The problem appears to be when handling a USB_REQ_GET_STATUS request.  The driver gets this request and then calls the ch9getstatus() function.  In this function, it starts a request by "borrowing" the per device status_req, filling it in, and then queuing it with a call to list_add_tail() to add the request to the endpoint queue.  Right before it exits the function however, it's calling ep0_prime_status(), which is filling out that same status_req structure and then queuing it with another call to list_add_tail() to add the request to the endpoint queue.  This adds two instances of the exact same LIST_HEAD to the endpoint queue, which breaks the list since the prev and next pointers end up pointing to the wrong things.  This ends up causing a hard loop the next time nuke() gets called, which happens on the next setup IRQ.
>>>
>>> I'm not sure what the appropriate fix to this problem is, mostly due to my lack of expertise in USB and this driver stack.  The code has been this way in the kernel for a very long time, which suggests that it has been working, unless USB_REQ_GET_STATUS requests are never made.  This further suggests that there is something else going on that I don't understand.  Deleting the call to ep0_prime_status() and the following ep0stall() call appears, on the surface, to get the device working again, but may have side effects that I'm not seeing.
>>>
>>> I'm hopeful someone in the community can help provide some information on what I may be missing or help come up with a solution to the problem.  A big thank you to anyone who would like to help out.
>>>
>>> Eugene
>>
>> Run into this to a while ago. Found the bug and a few more fixes.
>> This is against 4.19 so you may have to tweak them a bit.
>> Feel free to upstream them.
>>
>>  Jocke 
> 
> Curious, did my patches help? Good to known once we upgrade as well.
> 
>  Jocke

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: bug: usb: gadget: FSL_UDC_CORE Corrupted request list leads to unrecoverable loop.
  2021-11-15  8:36     ` Thorsten Leemhuis
@ 2021-11-16 19:11       ` Eugene Bordenkircher
  2021-11-25 13:59         ` Thorsten Leemhuis
  0 siblings, 1 reply; 22+ messages in thread
From: Eugene Bordenkircher @ 2021-11-16 19:11 UTC (permalink / raw)
  To: Thorsten Leemhuis, Joakim Tjernlund, linuxppc-dev, linux-usb
  Cc: gregkh, balbi, leoyang.li

On 02.11.21 22:15, Joakim Tjernlund wrote:
> On Sat, 2021-10-30 at 14:20 +0000, Joakim Tjernlund wrote:
>> On Fri, 2021-10-29 at 17:14 +0000, Eugene Bordenkircher wrote:
>
>>> We've discovered a situation where the FSL udc driver (drivers/usb/gadget/udc/fsl_udc_core.c) will enter a loop iterating over the request queue, but the queue has been corrupted at some point so it loops infinitely.  I believe we have narrowed into the offending code, but we are in need of assistance trying to find an appropriate fix for the problem.  The identified code appears to be in all versions of the Linux kernel the driver exists in.
>>>
>>> The problem appears to be when handling a USB_REQ_GET_STATUS request.  The driver gets this request and then calls the ch9getstatus() function.  In this function, it starts a request by "borrowing" the per device status_req, filling it in, and then queuing it with a call to list_add_tail() to add the request to the endpoint queue.  Right before it exits the function however, it's calling ep0_prime_status(), which is filling out that same status_req structure and then queuing it with another call to list_add_tail() to add the request to the endpoint queue.  This adds two instances of the exact same LIST_HEAD to the endpoint queue, which breaks the list since the prev and next pointers end up pointing to the wrong things.  This ends up causing a hard loop the next time nuke() gets called, which happens on the next setup IRQ.
>>>
>>> I'm not sure what the appropriate fix to this problem is, mostly due to my lack of expertise in USB and this driver stack.  The code has been this way in the kernel for a very long time, which suggests that it has been working, unless USB_REQ_GET_STATUS requests are never made.  This further suggests that there is something else going on that I don't understand.  Deleting the call to ep0_prime_status() and the following ep0stall() call appears, on the surface, to get the device working again, but may have side effects that I'm not seeing.
>>>
>>> I'm hopeful someone in the community can help provide some information on what I may be missing or help come up with a solution to the problem.  A big thank you to anyone who would like to help out.
>>>
>>> Eugene
>>
>> Run into this to a while ago. Found the bug and a few more fixes.
>> This is against 4.19 so you may have to tweak them a bit.
>> Feel free to upstream them.
>>
>>  Jocke
>
> Curious, did my patches help? Good to known once we upgrade as well.
>
>  Jocke

There's good news and bad news.

The good news is that this appears to stop the driver from entering an infinite loop, which prevents the Linux system from locking up and never recovering.  So I'm willing to say we've made the behavior better.

The bad news is that once we get past this point, there is new bad behavior.  What is on top of this driver in our system is the RNDIS gadget driver communicating to a Laptop running Win10 -1809.  Everything appears to work fine with the Linux system until there is a USB disconnect.  After the disconnect, the Linux side appears to continue on just fine, but the Windows side doesn't seem to recognize the disconnect, which causes the USB driver on that side to hang forever and eventually blue screen the box.  This doesn't happen on all machines, just a select few.   I think we can isolate the behavior to a specific antivirus/security software driver that is inserting itself into the USB stack and filtering the disconnect message, but we're still proving that.

I'm about 90% certain this is a different problem and we can call this patchset good, at least for our test setup.  My only hesitation is if the Linux side is sending a set of responses that are confusing the Windows side (specifically this antivirus) or not.  I'd be content calling that a separate defect though and letting this one close up with that patchset.

Eugene

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: bug: usb: gadget: FSL_UDC_CORE Corrupted request list leads to unrecoverable loop.
  2021-11-16 19:11       ` Eugene Bordenkircher
@ 2021-11-25 13:59         ` Thorsten Leemhuis
  2021-11-29 17:24           ` Eugene Bordenkircher
  0 siblings, 1 reply; 22+ messages in thread
From: Thorsten Leemhuis @ 2021-11-25 13:59 UTC (permalink / raw)
  To: Eugene Bordenkircher, Thorsten Leemhuis, Joakim Tjernlund,
	linuxppc-dev, linux-usb
  Cc: gregkh, balbi, leoyang.li

Hi, this is your Linux kernel regression tracker speaking.

Top-posting for once, to make this easy to process for everyone:

Li Yang and Felipe Balbi: how to move on with this? It's quite an old
regression, but nevertheless it is one and thus should be fixed. Part of
my position is to make that happen and thus remind developers and
maintainers about this until the regression is resolved.

Ciao, Thorsten

On 16.11.21 20:11, Eugene Bordenkircher wrote:
> On 02.11.21 22:15, Joakim Tjernlund wrote:
>> On Sat, 2021-10-30 at 14:20 +0000, Joakim Tjernlund wrote:
>>> On Fri, 2021-10-29 at 17:14 +0000, Eugene Bordenkircher wrote:
>>
>>>> We've discovered a situation where the FSL udc driver (drivers/usb/gadget/udc/fsl_udc_core.c) will enter a loop iterating over the request queue, but the queue has been corrupted at some point so it loops infinitely.  I believe we have narrowed into the offending code, but we are in need of assistance trying to find an appropriate fix for the problem.  The identified code appears to be in all versions of the Linux kernel the driver exists in.
>>>>
>>>> The problem appears to be when handling a USB_REQ_GET_STATUS request.  The driver gets this request and then calls the ch9getstatus() function.  In this function, it starts a request by "borrowing" the per device status_req, filling it in, and then queuing it with a call to list_add_tail() to add the request to the endpoint queue.  Right before it exits the function however, it's calling ep0_prime_status(), which is filling out that same status_req structure and then queuing it with another call to list_add_tail() to add the request to the endpoint queue.  This adds two instances of the exact same LIST_HEAD to the endpoint queue, which breaks the list since the prev and next pointers end up pointing to the wrong things.  This ends up causing a hard loop the next time nuke() gets called, which happens on the next setup IRQ.
>>>>
>>>> I'm not sure what the appropriate fix to this problem is, mostly due to my lack of expertise in USB and this driver stack.  The code has been this way in the kernel for a very long time, which suggests that it has been working, unless USB_REQ_GET_STATUS requests are never made.  This further suggests that there is something else going on that I don't understand.  Deleting the call to ep0_prime_status() and the following ep0stall() call appears, on the surface, to get the device working again, but may have side effects that I'm not seeing.
>>>>
>>>> I'm hopeful someone in the community can help provide some information on what I may be missing or help come up with a solution to the problem.  A big thank you to anyone who would like to help out.
>>>
>>> Run into this to a while ago. Found the bug and a few more fixes.
>>> This is against 4.19 so you may have to tweak them a bit.
>>> Feel free to upstream them.
>>
>> Curious, did my patches help? Good to known once we upgrade as well.
> 
> There's good news and bad news.
> 
> The good news is that this appears to stop the driver from entering
> an infinite loop, which prevents the Linux system from locking up and
> never recovering.  So I'm willing to say we've made the behavior
> better.
> 
> The bad news is that once we get past this point, there is new bad
> behavior.  What is on top of this driver in our system is the RNDIS
> gadget driver communicating to a Laptop running Win10 -1809.
> Everything appears to work fine with the Linux system until there is
> a USB disconnect.  After the disconnect, the Linux side appears to
> continue on just fine, but the Windows side doesn't seem to recognize
> the disconnect, which causes the USB driver on that side to hang
> forever and eventually blue screen the box.  This doesn't happen on
> all machines, just a select few.   I think we can isolate the
> behavior to a specific antivirus/security software driver that is
> inserting itself into the USB stack and filtering the disconnect
> message, but we're still proving that.
> 
> I'm about 90% certain this is a different problem and we can call
> this patchset good, at least for our test setup.  My only hesitation
> is if the Linux side is sending a set of responses that are confusing
> the Windows side (specifically this antivirus) or not.  I'd be
> content calling that a separate defect though and letting this one
> close up with that patchset.

P.S.: As a Linux kernel regression tracker I'm getting a lot of reports
on my table. I can only look briefly into most of them. Unfortunately
therefore I sometimes will get things wrong or miss something important.
I hope that's not the case here; if you think it is, don't hesitate to
tell me about it in a public reply. That's in everyone's interest, as
what I wrote above might be misleading to everyone reading this; any
suggestion I gave they thus might sent someone reading this down the
wrong rabbit hole, which none of us wants.

BTW, I have no personal interest in this issue, which is tracked using
regzbot, my Linux kernel regression tracking bot
(https://linux-regtracking.leemhuis.info/regzbot/). I'm only posting
this mail to get things rolling again and hence don't need to be CC on
all further activities wrt to this regression.

#regzbot title: usb: fsl_udc_core: corrupted request list leads to
unrecoverable loop

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: bug: usb: gadget: FSL_UDC_CORE Corrupted request list leads to unrecoverable loop.
  2021-11-25 13:59         ` Thorsten Leemhuis
@ 2021-11-29 17:24           ` Eugene Bordenkircher
  2021-11-29 23:37             ` Leo Li
  0 siblings, 1 reply; 22+ messages in thread
From: Eugene Bordenkircher @ 2021-11-29 17:24 UTC (permalink / raw)
  To: Thorsten Leemhuis, Joakim Tjernlund, linuxppc-dev, linux-usb
  Cc: gregkh, balbi, leoyang.li

The final result of our testing is that the patch set posted seems to address all known defects in the Linux kernel.  The mentioned additional problems are entirely caused by the antivirus solution on the windows box.  The antivirus solution blocks the disconnect messages from reaching the RNDIS driver so it has no idea the USB device went away.  There is nothing we can do to address this in the Linux kernel.

I propose we move forward with the patchset.

Eugene T. Bordenkircher

-----Original Message-----
From: Thorsten Leemhuis <regressions@leemhuis.info> 
Sent: Thursday, November 25, 2021 5:59 AM
To: Eugene Bordenkircher <Eugene_Bordenkircher@selinc.com>; Thorsten Leemhuis <regressions@leemhuis.info>; Joakim Tjernlund <Joakim.Tjernlund@infinera.com>; linuxppc-dev@lists.ozlabs.org; linux-usb@vger.kernel.org
Cc: leoyang.li@nxp.com; gregkh@linuxfoundation.org; balbi@kernel.org
Subject: Re: bug: usb: gadget: FSL_UDC_CORE Corrupted request list leads to unrecoverable loop.

Hi, this is your Linux kernel regression tracker speaking.

Top-posting for once, to make this easy to process for everyone:

Li Yang and Felipe Balbi: how to move on with this? It's quite an old regression, but nevertheless it is one and thus should be fixed. Part of my position is to make that happen and thus remind developers and maintainers about this until the regression is resolved.

Ciao, Thorsten

On 16.11.21 20:11, Eugene Bordenkircher wrote:
> On 02.11.21 22:15, Joakim Tjernlund wrote:
>> On Sat, 2021-10-30 at 14:20 +0000, Joakim Tjernlund wrote:
>>> On Fri, 2021-10-29 at 17:14 +0000, Eugene Bordenkircher wrote:
>>
>>>> We've discovered a situation where the FSL udc driver (drivers/usb/gadget/udc/fsl_udc_core.c) will enter a loop iterating over the request queue, but the queue has been corrupted at some point so it loops infinitely.  I believe we have narrowed into the offending code, but we are in need of assistance trying to find an appropriate fix for the problem.  The identified code appears to be in all versions of the Linux kernel the driver exists in.
>>>>
>>>> The problem appears to be when handling a USB_REQ_GET_STATUS request.  The driver gets this request and then calls the ch9getstatus() function.  In this function, it starts a request by "borrowing" the per device status_req, filling it in, and then queuing it with a call to list_add_tail() to add the request to the endpoint queue.  Right before it exits the function however, it's calling ep0_prime_status(), which is filling out that same status_req structure and then queuing it with another call to list_add_tail() to add the request to the endpoint queue.  This adds two instances of the exact same LIST_HEAD to the endpoint queue, which breaks the list since the prev and next pointers end up pointing to the wrong things.  This ends up causing a hard loop the next time nuke() gets called, which happens on the next setup IRQ.
>>>>
>>>> I'm not sure what the appropriate fix to this problem is, mostly due to my lack of expertise in USB and this driver stack.  The code has been this way in the kernel for a very long time, which suggests that it has been working, unless USB_REQ_GET_STATUS requests are never made.  This further suggests that there is something else going on that I don't understand.  Deleting the call to ep0_prime_status() and the following ep0stall() call appears, on the surface, to get the device working again, but may have side effects that I'm not seeing.
>>>>
>>>> I'm hopeful someone in the community can help provide some information on what I may be missing or help come up with a solution to the problem.  A big thank you to anyone who would like to help out.
>>>
>>> Run into this to a while ago. Found the bug and a few more fixes.
>>> This is against 4.19 so you may have to tweak them a bit.
>>> Feel free to upstream them.
>>
>> Curious, did my patches help? Good to known once we upgrade as well.
>
> There's good news and bad news.
>
> The good news is that this appears to stop the driver from entering an 
> infinite loop, which prevents the Linux system from locking up and 
> never recovering.  So I'm willing to say we've made the behavior 
> better.
>
> The bad news is that once we get past this point, there is new bad 
> behavior.  What is on top of this driver in our system is the RNDIS 
> gadget driver communicating to a Laptop running Win10 -1809.
> Everything appears to work fine with the Linux system until there is a 
> USB disconnect.  After the disconnect, the Linux side appears to 
> continue on just fine, but the Windows side doesn't seem to recognize 
> the disconnect, which causes the USB driver on that side to hang 
> forever and eventually blue screen the box.  This doesn't happen on
> all machines, just a select few.   I think we can isolate the
> behavior to a specific antivirus/security software driver that is 
> inserting itself into the USB stack and filtering the disconnect 
> message, but we're still proving that.
>
> I'm about 90% certain this is a different problem and we can call this 
> patchset good, at least for our test setup.  My only hesitation is if 
> the Linux side is sending a set of responses that are confusing the 
> Windows side (specifically this antivirus) or not.  I'd be content 
> calling that a separate defect though and letting this one close up 
> with that patchset.

P.S.: As a Linux kernel regression tracker I'm getting a lot of reports on my table. I can only look briefly into most of them. Unfortunately therefore I sometimes will get things wrong or miss something important.
I hope that's not the case here; if you think it is, don't hesitate to tell me about it in a public reply. That's in everyone's interest, as what I wrote above might be misleading to everyone reading this; any suggestion I gave they thus might sent someone reading this down the wrong rabbit hole, which none of us wants.

BTW, I have no personal interest in this issue, which is tracked using regzbot, my Linux kernel regression tracking bot (https://urldefense.com/v3/__https://linux-regtracking.leemhuis.info/regzbot/__;!!O7uE89YCNVw!aHa5_mLMnBeDjINlAtV19tBHm-He9jbusXucMA5h7oonHvNFwYpOHAaaqqewPOuGK9HAzJUz$ ). I'm only posting this mail to get things rolling again and hence don't need to be CC on all further activities wrt to this regression.

#regzbot title: usb: fsl_udc_core: corrupted request list leads to unrecoverable loop

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: bug: usb: gadget: FSL_UDC_CORE Corrupted request list leads to unrecoverable loop.
  2021-11-29 17:24           ` Eugene Bordenkircher
@ 2021-11-29 23:37             ` Leo Li
  2021-11-29 23:48               ` Eugene Bordenkircher
  0 siblings, 1 reply; 22+ messages in thread
From: Leo Li @ 2021-11-29 23:37 UTC (permalink / raw)
  To: Eugene Bordenkircher, Thorsten Leemhuis, jocke@infinera.com,
	linuxppc-dev, linux-usb
  Cc: gregkh, balbi



> -----Original Message-----
> From: Eugene Bordenkircher <Eugene_Bordenkircher@selinc.com>
> Sent: Monday, November 29, 2021 11:25 AM
> To: Thorsten Leemhuis <regressions@leemhuis.info>; jocke@infinera.com
> <joakim.tjernlund@infinera.com>; linuxppc-dev@lists.ozlabs.org; linux-
> usb@vger.kernel.org
> Cc: Leo Li <leoyang.li@nxp.com>; gregkh@linuxfoundation.org;
> balbi@kernel.org
> Subject: RE: bug: usb: gadget: FSL_UDC_CORE Corrupted request list leads to
> unrecoverable loop.
> 
> The final result of our testing is that the patch set posted seems to address all
> known defects in the Linux kernel.  The mentioned additional problems are
> entirely caused by the antivirus solution on the windows box.  The antivirus
> solution blocks the disconnect messages from reaching the RNDIS driver so it
> has no idea the USB device went away.  There is nothing we can do to
> address this in the Linux kernel.

Thanks for the confirmation.

> 
> I propose we move forward with the patchset.

I think that we should proceed to merge the patchset but it seems to need some cleanup for coding style issues and better description before submitted formally.

> 
> Eugene T. Bordenkircher
> 
> -----Original Message-----
> From: Thorsten Leemhuis <regressions@leemhuis.info>
> Sent: Thursday, November 25, 2021 5:59 AM
> To: Eugene Bordenkircher <Eugene_Bordenkircher@selinc.com>; Thorsten
> Leemhuis <regressions@leemhuis.info>; Joakim Tjernlund
> <Joakim.Tjernlund@infinera.com>; linuxppc-dev@lists.ozlabs.org; linux-
> usb@vger.kernel.org
> Cc: leoyang.li@nxp.com; gregkh@linuxfoundation.org; balbi@kernel.org
> Subject: Re: bug: usb: gadget: FSL_UDC_CORE Corrupted request list leads to
> unrecoverable loop.
> 
> Hi, this is your Linux kernel regression tracker speaking.
> 
> Top-posting for once, to make this easy to process for everyone:
> 
> Li Yang and Felipe Balbi: how to move on with this? It's quite an old
> regression, but nevertheless it is one and thus should be fixed. Part of my
> position is to make that happen and thus remind developers and maintainers
> about this until the regression is resolved.
> 
> Ciao, Thorsten
> 
> On 16.11.21 20:11, Eugene Bordenkircher wrote:
> > On 02.11.21 22:15, Joakim Tjernlund wrote:
> >> On Sat, 2021-10-30 at 14:20 +0000, Joakim Tjernlund wrote:
> >>> On Fri, 2021-10-29 at 17:14 +0000, Eugene Bordenkircher wrote:
> >>
> >>>> We've discovered a situation where the FSL udc driver
> (drivers/usb/gadget/udc/fsl_udc_core.c) will enter a loop iterating over the
> request queue, but the queue has been corrupted at some point so it loops
> infinitely.  I believe we have narrowed into the offending code, but we are in
> need of assistance trying to find an appropriate fix for the problem.  The
> identified code appears to be in all versions of the Linux kernel the driver
> exists in.
> >>>>
> >>>> The problem appears to be when handling a USB_REQ_GET_STATUS
> request.  The driver gets this request and then calls the ch9getstatus()
> function.  In this function, it starts a request by "borrowing" the per device
> status_req, filling it in, and then queuing it with a call to list_add_tail() to add
> the request to the endpoint queue.  Right before it exits the function
> however, it's calling ep0_prime_status(), which is filling out that same
> status_req structure and then queuing it with another call to list_add_tail() to
> add the request to the endpoint queue.  This adds two instances of the exact
> same LIST_HEAD to the endpoint queue, which breaks the list since the prev
> and next pointers end up pointing to the wrong things.  This ends up causing
> a hard loop the next time nuke() gets called, which happens on the next
> setup IRQ.
> >>>>
> >>>> I'm not sure what the appropriate fix to this problem is, mostly due to
> my lack of expertise in USB and this driver stack.  The code has been this way
> in the kernel for a very long time, which suggests that it has been working,
> unless USB_REQ_GET_STATUS requests are never made.  This further
> suggests that there is something else going on that I don't understand.
> Deleting the call to ep0_prime_status() and the following ep0stall() call
> appears, on the surface, to get the device working again, but may have side
> effects that I'm not seeing.
> >>>>
> >>>> I'm hopeful someone in the community can help provide some
> information on what I may be missing or help come up with a solution to the
> problem.  A big thank you to anyone who would like to help out.
> >>>
> >>> Run into this to a while ago. Found the bug and a few more fixes.
> >>> This is against 4.19 so you may have to tweak them a bit.
> >>> Feel free to upstream them.
> >>
> >> Curious, did my patches help? Good to known once we upgrade as well.
> >
> > There's good news and bad news.
> >
> > The good news is that this appears to stop the driver from entering an
> > infinite loop, which prevents the Linux system from locking up and
> > never recovering.  So I'm willing to say we've made the behavior
> > better.
> >
> > The bad news is that once we get past this point, there is new bad
> > behavior.  What is on top of this driver in our system is the RNDIS
> > gadget driver communicating to a Laptop running Win10 -1809.
> > Everything appears to work fine with the Linux system until there is a
> > USB disconnect.  After the disconnect, the Linux side appears to
> > continue on just fine, but the Windows side doesn't seem to recognize
> > the disconnect, which causes the USB driver on that side to hang
> > forever and eventually blue screen the box.  This doesn't happen on
> > all machines, just a select few.   I think we can isolate the
> > behavior to a specific antivirus/security software driver that is
> > inserting itself into the USB stack and filtering the disconnect
> > message, but we're still proving that.
> >
> > I'm about 90% certain this is a different problem and we can call this
> > patchset good, at least for our test setup.  My only hesitation is if
> > the Linux side is sending a set of responses that are confusing the
> > Windows side (specifically this antivirus) or not.  I'd be content
> > calling that a separate defect though and letting this one close up
> > with that patchset.
> 
> P.S.: As a Linux kernel regression tracker I'm getting a lot of reports on my
> table. I can only look briefly into most of them. Unfortunately therefore I
> sometimes will get things wrong or miss something important.
> I hope that's not the case here; if you think it is, don't hesitate to tell me
> about it in a public reply. That's in everyone's interest, as what I wrote above
> might be misleading to everyone reading this; any suggestion I gave they
> thus might sent someone reading this down the wrong rabbit hole, which
> none of us wants.
> 
> BTW, I have no personal interest in this issue, which is tracked using regzbot,
> my Linux kernel regression tracking bot
> (https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Furld
> efense.com%2Fv3%2F__https%3A%2F%2Flinux-
> regtracking.leemhuis.info%2Fregzbot%2F__%3B!!O7uE89YCNVw!aHa5_mLM
> nBeDjINlAtV19tBHm-
> He9jbusXucMA5h7oonHvNFwYpOHAaaqqewPOuGK9HAzJUz%24&amp;data
> =04%7C01%7Cleoyang.li%40nxp.com%7C859ce1560a7344729cea08d9b35d2e
> 67%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C6377380350721308
> 84%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luM
> zIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=ONQZyAKXNgok
> 6LgYvnaAL7LVY%2B5Wl7pXglZDqWUJZMc%3D&amp;reserved=0 ). I'm only
> posting this mail to get things rolling again and hence don't need to be CC on
> all further activities wrt to this regression.
> 
> #regzbot title: usb: fsl_udc_core: corrupted request list leads to
> unrecoverable loop

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: bug: usb: gadget: FSL_UDC_CORE Corrupted request list leads to unrecoverable loop.
  2021-11-29 23:37             ` Leo Li
@ 2021-11-29 23:48               ` Eugene Bordenkircher
  2021-11-30 11:56                 ` Joakim Tjernlund
  0 siblings, 1 reply; 22+ messages in thread
From: Eugene Bordenkircher @ 2021-11-29 23:48 UTC (permalink / raw)
  To: Leo Li, Thorsten Leemhuis, jocke@infinera.com, linuxppc-dev, linux-usb
  Cc: gregkh, balbi

Agreed,

We are happy pick up the torch on this, but I'd like to try and hear from Joakim first before we do.  The patch set is his, so I'd like to give him the opportunity.  I think he's the only one that can add a truly proper description as well because he mentioned that this includes a "few more fixes" than just the one we ran into.  I'd rather hear from him than try to reverse engineer what was being addressed.  

Joakim, if you are still watching the thread, would you like to take a stab at it?  If I don't hear from you in a couple days, we'll pick up the torch and do what we can.

Eugene T. Bordenkircher

-----Original Message-----
From: Leo Li <leoyang.li@nxp.com> 
Sent: Monday, November 29, 2021 3:37 PM
To: Eugene Bordenkircher <Eugene_Bordenkircher@selinc.com>; Thorsten Leemhuis <regressions@leemhuis.info>; jocke@infinera.com <joakim.tjernlund@infinera.com>; linuxppc-dev@lists.ozlabs.org; linux-usb@vger.kernel.org
Cc: gregkh@linuxfoundation.org; balbi@kernel.org
Subject: RE: bug: usb: gadget: FSL_UDC_CORE Corrupted request list leads to unrecoverable loop.

[Caution - External]

> -----Original Message-----
> From: Eugene Bordenkircher <Eugene_Bordenkircher@selinc.com>
> Sent: Monday, November 29, 2021 11:25 AM
> To: Thorsten Leemhuis <regressions@leemhuis.info>; jocke@infinera.com 
> <joakim.tjernlund@infinera.com>; linuxppc-dev@lists.ozlabs.org; linux- 
> usb@vger.kernel.org
> Cc: Leo Li <leoyang.li@nxp.com>; gregkh@linuxfoundation.org; 
> balbi@kernel.org
> Subject: RE: bug: usb: gadget: FSL_UDC_CORE Corrupted request list 
> leads to unrecoverable loop.
>
> The final result of our testing is that the patch set posted seems to 
> address all known defects in the Linux kernel.  The mentioned 
> additional problems are entirely caused by the antivirus solution on 
> the windows box.  The antivirus solution blocks the disconnect 
> messages from reaching the RNDIS driver so it has no idea the USB 
> device went away.  There is nothing we can do to address this in the Linux kernel.

Thanks for the confirmation.

>
> I propose we move forward with the patchset.

I think that we should proceed to merge the patchset but it seems to need some cleanup for coding style issues and better description before submitted formally.

>
> Eugene T. Bordenkircher
>
> -----Original Message-----
> From: Thorsten Leemhuis <regressions@leemhuis.info>
> Sent: Thursday, November 25, 2021 5:59 AM
> To: Eugene Bordenkircher <Eugene_Bordenkircher@selinc.com>; Thorsten 
> Leemhuis <regressions@leemhuis.info>; Joakim Tjernlund 
> <Joakim.Tjernlund@infinera.com>; linuxppc-dev@lists.ozlabs.org; linux- 
> usb@vger.kernel.org
> Cc: leoyang.li@nxp.com; gregkh@linuxfoundation.org; balbi@kernel.org
> Subject: Re: bug: usb: gadget: FSL_UDC_CORE Corrupted request list 
> leads to unrecoverable loop.
>
> Hi, this is your Linux kernel regression tracker speaking.
>
> Top-posting for once, to make this easy to process for everyone:
>
> Li Yang and Felipe Balbi: how to move on with this? It's quite an old 
> regression, but nevertheless it is one and thus should be fixed. Part 
> of my position is to make that happen and thus remind developers and 
> maintainers about this until the regression is resolved.
>
> Ciao, Thorsten
>
> On 16.11.21 20:11, Eugene Bordenkircher wrote:
> > On 02.11.21 22:15, Joakim Tjernlund wrote:
> >> On Sat, 2021-10-30 at 14:20 +0000, Joakim Tjernlund wrote:
> >>> On Fri, 2021-10-29 at 17:14 +0000, Eugene Bordenkircher wrote:
> >>
> >>>> We've discovered a situation where the FSL udc driver
> (drivers/usb/gadget/udc/fsl_udc_core.c) will enter a loop iterating 
> over the request queue, but the queue has been corrupted at some point 
> so it loops infinitely.  I believe we have narrowed into the offending 
> code, but we are in need of assistance trying to find an appropriate 
> fix for the problem.  The identified code appears to be in all 
> versions of the Linux kernel the driver exists in.
> >>>>
> >>>> The problem appears to be when handling a USB_REQ_GET_STATUS
> request.  The driver gets this request and then calls the 
> ch9getstatus() function.  In this function, it starts a request by 
> "borrowing" the per device status_req, filling it in, and then queuing 
> it with a call to list_add_tail() to add the request to the endpoint 
> queue.  Right before it exits the function however, it's calling 
> ep0_prime_status(), which is filling out that same status_req 
> structure and then queuing it with another call to list_add_tail() to 
> add the request to the endpoint queue.  This adds two instances of the 
> exact same LIST_HEAD to the endpoint queue, which breaks the list 
> since the prev and next pointers end up pointing to the wrong things.  
> This ends up causing a hard loop the next time nuke() gets called, which happens on the next setup IRQ.
> >>>>
> >>>> I'm not sure what the appropriate fix to this problem is, mostly 
> >>>> due to
> my lack of expertise in USB and this driver stack.  The code has been 
> this way in the kernel for a very long time, which suggests that it 
> has been working, unless USB_REQ_GET_STATUS requests are never made.  
> This further suggests that there is something else going on that I don't understand.
> Deleting the call to ep0_prime_status() and the following ep0stall() 
> call appears, on the surface, to get the device working again, but may 
> have side effects that I'm not seeing.
> >>>>
> >>>> I'm hopeful someone in the community can help provide some
> information on what I may be missing or help come up with a solution 
> to the problem.  A big thank you to anyone who would like to help out.
> >>>
> >>> Run into this to a while ago. Found the bug and a few more fixes.
> >>> This is against 4.19 so you may have to tweak them a bit.
> >>> Feel free to upstream them.
> >>
> >> Curious, did my patches help? Good to known once we upgrade as well.
> >
> > There's good news and bad news.
> >
> > The good news is that this appears to stop the driver from entering 
> > an infinite loop, which prevents the Linux system from locking up 
> > and never recovering.  So I'm willing to say we've made the behavior 
> > better.
> >
> > The bad news is that once we get past this point, there is new bad 
> > behavior.  What is on top of this driver in our system is the RNDIS 
> > gadget driver communicating to a Laptop running Win10 -1809.
> > Everything appears to work fine with the Linux system until there is 
> > a USB disconnect.  After the disconnect, the Linux side appears to 
> > continue on just fine, but the Windows side doesn't seem to 
> > recognize the disconnect, which causes the USB driver on that side 
> > to hang forever and eventually blue screen the box.  This doesn't happen on
> > all machines, just a select few.   I think we can isolate the
> > behavior to a specific antivirus/security software driver that is 
> > inserting itself into the USB stack and filtering the disconnect 
> > message, but we're still proving that.
> >
> > I'm about 90% certain this is a different problem and we can call 
> > this patchset good, at least for our test setup.  My only hesitation 
> > is if the Linux side is sending a set of responses that are 
> > confusing the Windows side (specifically this antivirus) or not.  
> > I'd be content calling that a separate defect though and letting 
> > this one close up with that patchset.
>
> P.S.: As a Linux kernel regression tracker I'm getting a lot of 
> reports on my table. I can only look briefly into most of them. 
> Unfortunately therefore I sometimes will get things wrong or miss something important.
> I hope that's not the case here; if you think it is, don't hesitate to 
> tell me about it in a public reply. That's in everyone's interest, as 
> what I wrote above might be misleading to everyone reading this; any 
> suggestion I gave they thus might sent someone reading this down the 
> wrong rabbit hole, which none of us wants.
>
> BTW, I have no personal interest in this issue, which is tracked using 
> regzbot, my Linux kernel regression tracking bot 
> (https://urldefense.com/v3/__https://eur01.safelinks.protection.outloo
> k.com/?url=https*3A*2F*2Furld__;JSUl!!O7uE89YCNVw!a6nsIMfn544OIzmshw3H
> bMBVcbwor4cV2Q5OsST7-86jy_YZKvDsN-558Ris4wh8Zawz4puN$
> efense.com%2Fv3%2F__https%3A%2F%2Flinux-
> regtracking.leemhuis.info%2Fregzbot%2F__%3B!!O7uE89YCNVw!aHa5_mLM
> nBeDjINlAtV19tBHm-
> He9jbusXucMA5h7oonHvNFwYpOHAaaqqewPOuGK9HAzJUz%24&amp;data
> =04%7C01%7Cleoyang.li%40nxp.com%7C859ce1560a7344729cea08d9b35d2e
> 67%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C6377380350721308
> 84%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luM
> zIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=ONQZyAKXNgok
> 6LgYvnaAL7LVY%2B5Wl7pXglZDqWUJZMc%3D&amp;reserved=0 ). I'm only 
> posting this mail to get things rolling again and hence don't need to 
> be CC on all further activities wrt to this regression.
>
> #regzbot title: usb: fsl_udc_core: corrupted request list leads to 
> unrecoverable loop

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: bug: usb: gadget: FSL_UDC_CORE Corrupted request list leads to unrecoverable loop.
  2021-11-29 23:48               ` Eugene Bordenkircher
@ 2021-11-30 11:56                 ` Joakim Tjernlund
  2021-12-01 14:19                   ` Joakim Tjernlund
  0 siblings, 1 reply; 22+ messages in thread
From: Joakim Tjernlund @ 2021-11-30 11:56 UTC (permalink / raw)
  To: regressions, leoyang.li, Eugene_Bordenkircher, linux-usb, linuxppc-dev
  Cc: gregkh, balbi

On Mon, 2021-11-29 at 23:48 +0000, Eugene Bordenkircher wrote:
> Agreed,
> 
> We are happy pick up the torch on this, but I'd like to try and hear from Joakim first before we do.  The patch set is his, so I'd like to give him the opportunity.  I think he's the only one that can add a truly proper description as well because he mentioned that this includes a "few more fixes" than just the one we ran into.  I'd rather hear from him than try to reverse engineer what was being addressed.  
> 
> Joakim, if you are still watching the thread, would you like to take a stab at it?  If I don't hear from you in a couple days, we'll pick up the torch and do what we can.
> 

I am far away from this now and still on 4.19. I don't mind if you tweak tweak the patches for better "upstreamability" 

  Regards
           Joakim

> Eugene T. Bordenkircher
> 
> -----Original Message-----
> From: Leo Li <leoyang.li@nxp.com> 
> Sent: Monday, November 29, 2021 3:37 PM
> To: Eugene Bordenkircher <Eugene_Bordenkircher@selinc.com>; Thorsten Leemhuis <regressions@leemhuis.info>; jocke@infinera.com <joakim.tjernlund@infinera.com>; linuxppc-dev@lists.ozlabs.org; linux-usb@vger.kernel.org
> Cc: gregkh@linuxfoundation.org; balbi@kernel.org
> Subject: RE: bug: usb: gadget: FSL_UDC_CORE Corrupted request list leads to unrecoverable loop.
> 
> [Caution - External]
> 
> > -----Original Message-----
> > From: Eugene Bordenkircher <Eugene_Bordenkircher@selinc.com>
> > Sent: Monday, November 29, 2021 11:25 AM
> > To: Thorsten Leemhuis <regressions@leemhuis.info>; jocke@infinera.com 
> > <joakim.tjernlund@infinera.com>; linuxppc-dev@lists.ozlabs.org; linux- 
> > usb@vger.kernel.org
> > Cc: Leo Li <leoyang.li@nxp.com>; gregkh@linuxfoundation.org; 
> > balbi@kernel.org
> > Subject: RE: bug: usb: gadget: FSL_UDC_CORE Corrupted request list 
> > leads to unrecoverable loop.
> > 
> > The final result of our testing is that the patch set posted seems to 
> > address all known defects in the Linux kernel.  The mentioned 
> > additional problems are entirely caused by the antivirus solution on 
> > the windows box.  The antivirus solution blocks the disconnect 
> > messages from reaching the RNDIS driver so it has no idea the USB 
> > device went away.  There is nothing we can do to address this in the Linux kernel.
> 
> Thanks for the confirmation.
> 
> > 
> > I propose we move forward with the patchset.
> 
> I think that we should proceed to merge the patchset but it seems to need some cleanup for coding style issues and better description before submitted formally.
> 
> > 
> > Eugene T. Bordenkircher
> > 
> > -----Original Message-----
> > From: Thorsten Leemhuis <regressions@leemhuis.info>
> > Sent: Thursday, November 25, 2021 5:59 AM
> > To: Eugene Bordenkircher <Eugene_Bordenkircher@selinc.com>; Thorsten 
> > Leemhuis <regressions@leemhuis.info>; Joakim Tjernlund 
> > <Joakim.Tjernlund@infinera.com>; linuxppc-dev@lists.ozlabs.org; linux- 
> > usb@vger.kernel.org
> > Cc: leoyang.li@nxp.com; gregkh@linuxfoundation.org; balbi@kernel.org
> > Subject: Re: bug: usb: gadget: FSL_UDC_CORE Corrupted request list 
> > leads to unrecoverable loop.
> > 
> > Hi, this is your Linux kernel regression tracker speaking.
> > 
> > Top-posting for once, to make this easy to process for everyone:
> > 
> > Li Yang and Felipe Balbi: how to move on with this? It's quite an old 
> > regression, but nevertheless it is one and thus should be fixed. Part 
> > of my position is to make that happen and thus remind developers and 
> > maintainers about this until the regression is resolved.
> > 
> > Ciao, Thorsten
> > 
> > On 16.11.21 20:11, Eugene Bordenkircher wrote:
> > > On 02.11.21 22:15, Joakim Tjernlund wrote:
> > > > On Sat, 2021-10-30 at 14:20 +0000, Joakim Tjernlund wrote:
> > > > > On Fri, 2021-10-29 at 17:14 +0000, Eugene Bordenkircher wrote:
> > > > 
> > > > > > We've discovered a situation where the FSL udc driver
> > (drivers/usb/gadget/udc/fsl_udc_core.c) will enter a loop iterating 
> > over the request queue, but the queue has been corrupted at some point 
> > so it loops infinitely.  I believe we have narrowed into the offending 
> > code, but we are in need of assistance trying to find an appropriate 
> > fix for the problem.  The identified code appears to be in all 
> > versions of the Linux kernel the driver exists in.
> > > > > > 
> > > > > > The problem appears to be when handling a USB_REQ_GET_STATUS
> > request.  The driver gets this request and then calls the 
> > ch9getstatus() function.  In this function, it starts a request by 
> > "borrowing" the per device status_req, filling it in, and then queuing 
> > it with a call to list_add_tail() to add the request to the endpoint 
> > queue.  Right before it exits the function however, it's calling 
> > ep0_prime_status(), which is filling out that same status_req 
> > structure and then queuing it with another call to list_add_tail() to 
> > add the request to the endpoint queue.  This adds two instances of the 
> > exact same LIST_HEAD to the endpoint queue, which breaks the list 
> > since the prev and next pointers end up pointing to the wrong things.  
> > This ends up causing a hard loop the next time nuke() gets called, which happens on the next setup IRQ.
> > > > > > 
> > > > > > I'm not sure what the appropriate fix to this problem is, mostly 
> > > > > > due to
> > my lack of expertise in USB and this driver stack.  The code has been 
> > this way in the kernel for a very long time, which suggests that it 
> > has been working, unless USB_REQ_GET_STATUS requests are never made.  
> > This further suggests that there is something else going on that I don't understand.
> > Deleting the call to ep0_prime_status() and the following ep0stall() 
> > call appears, on the surface, to get the device working again, but may 
> > have side effects that I'm not seeing.
> > > > > > 
> > > > > > I'm hopeful someone in the community can help provide some
> > information on what I may be missing or help come up with a solution 
> > to the problem.  A big thank you to anyone who would like to help out.
> > > > > 
> > > > > Run into this to a while ago. Found the bug and a few more fixes.
> > > > > This is against 4.19 so you may have to tweak them a bit.
> > > > > Feel free to upstream them.
> > > > 
> > > > Curious, did my patches help? Good to known once we upgrade as well.
> > > 
> > > There's good news and bad news.
> > > 
> > > The good news is that this appears to stop the driver from entering 
> > > an infinite loop, which prevents the Linux system from locking up 
> > > and never recovering.  So I'm willing to say we've made the behavior 
> > > better.
> > > 
> > > The bad news is that once we get past this point, there is new bad 
> > > behavior.  What is on top of this driver in our system is the RNDIS 
> > > gadget driver communicating to a Laptop running Win10 -1809.
> > > Everything appears to work fine with the Linux system until there is 
> > > a USB disconnect.  After the disconnect, the Linux side appears to 
> > > continue on just fine, but the Windows side doesn't seem to 
> > > recognize the disconnect, which causes the USB driver on that side 
> > > to hang forever and eventually blue screen the box.  This doesn't happen on
> > > all machines, just a select few.   I think we can isolate the
> > > behavior to a specific antivirus/security software driver that is 
> > > inserting itself into the USB stack and filtering the disconnect 
> > > message, but we're still proving that.
> > > 
> > > I'm about 90% certain this is a different problem and we can call 
> > > this patchset good, at least for our test setup.  My only hesitation 
> > > is if the Linux side is sending a set of responses that are 
> > > confusing the Windows side (specifically this antivirus) or not.  
> > > I'd be content calling that a separate defect though and letting 
> > > this one close up with that patchset.
> > 
> > P.S.: As a Linux kernel regression tracker I'm getting a lot of 
> > reports on my table. I can only look briefly into most of them. 
> > Unfortunately therefore I sometimes will get things wrong or miss something important.
> > I hope that's not the case here; if you think it is, don't hesitate to 
> > tell me about it in a public reply. That's in everyone's interest, as 
> > what I wrote above might be misleading to everyone reading this; any 
> > suggestion I gave they thus might sent someone reading this down the 
> > wrong rabbit hole, which none of us wants.
> > 
> > BTW, I have no personal interest in this issue, which is tracked using 
> > regzbot, my Linux kernel regression tracking bot 
> > (https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.com%2Fv3%2F__https%3A%2F%2Feur01.safelinks.protection.outloo&amp;data=04%7C01%7Cjoakim.tjernlund%40infinera.com%7Cb302ff817a8f4b3184c408d9b392bd1c%7C285643de5f5b4b03a1530ae2dc8aaf77%7C1%7C0%7C637738265108962168%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=D2TOGHLaeLnnmbJQG5VEY3CQ66GKtkpBOkFZ16WeW%2F4%3D&amp;reserved=0
> > k.com/?url=https*3A*2F*2Furld__;JSUl!!O7uE89YCNVw!a6nsIMfn544OIzmshw3H
> > bMBVcbwor4cV2Q5OsST7-86jy_YZKvDsN-558Ris4wh8Zawz4puN$
> > efense.com%2Fv3%2F__https%3A%2F%2Flinux-
> > regtracking.leemhuis.info%2Fregzbot%2F__%3B!!O7uE89YCNVw!aHa5_mLM
> > nBeDjINlAtV19tBHm-
> > He9jbusXucMA5h7oonHvNFwYpOHAaaqqewPOuGK9HAzJUz%24&amp;data
> > =04%7C01%7Cleoyang.li%40nxp.com%7C859ce1560a7344729cea08d9b35d2e
> > 67%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C6377380350721308
> > 84%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luM
> > zIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=ONQZyAKXNgok
> > 6LgYvnaAL7LVY%2B5Wl7pXglZDqWUJZMc%3D&amp;reserved=0 ). I'm only 
> > posting this mail to get things rolling again and hence don't need to 
> > be CC on all further activities wrt to this regression.
> > 
> > #regzbot title: usb: fsl_udc_core: corrupted request list leads to 
> > unrecoverable loop


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: bug: usb: gadget: FSL_UDC_CORE Corrupted request list leads to unrecoverable loop.
  2021-11-30 11:56                 ` Joakim Tjernlund
@ 2021-12-01 14:19                   ` Joakim Tjernlund
  2021-12-02 20:35                     ` Leo Li
  0 siblings, 1 reply; 22+ messages in thread
From: Joakim Tjernlund @ 2021-12-01 14:19 UTC (permalink / raw)
  To: regressions, leoyang.li, Eugene_Bordenkircher, linux-usb, linuxppc-dev
  Cc: gregkh, balbi

On Tue, 2021-11-30 at 12:56 +0100, Joakim Tjernlund wrote:
> On Mon, 2021-11-29 at 23:48 +0000, Eugene Bordenkircher wrote:
> > Agreed,
> > 
> > We are happy pick up the torch on this, but I'd like to try and hear from Joakim first before we do.  The patch set is his, so I'd like to give him the opportunity.  I think he's the only one that can add a truly proper description as well because he mentioned that this includes a "few more fixes" than just the one we ran into.  I'd rather hear from him than try to reverse engineer what was being addressed.  
> > 
> > Joakim, if you are still watching the thread, would you like to take a stab at it?  If I don't hear from you in a couple days, we'll pick up the torch and do what we can.
> > 
> 
> I am far away from this now and still on 4.19. I don't mind if you tweak tweak the patches for better "upstreamability" 

Even better would be to migrate to the chipidea driver, I am told just a few tweaks are needed but this is probably
something NXP should do as they have access to other SOC's using chipidea.

Leo ?

 Joakim


> 
>   Regards
>            Joakim
> 
> > Eugene T. Bordenkircher


^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: bug: usb: gadget: FSL_UDC_CORE Corrupted request list leads to unrecoverable loop.
  2021-12-01 14:19                   ` Joakim Tjernlund
@ 2021-12-02 20:35                     ` Leo Li
  2021-12-02 22:45                       ` Joakim Tjernlund
  0 siblings, 1 reply; 22+ messages in thread
From: Leo Li @ 2021-12-02 20:35 UTC (permalink / raw)
  To: jocke@infinera.com, regressions, Eugene_Bordenkircher, linux-usb,
	linuxppc-dev
  Cc: gregkh, balbi



> -----Original Message-----
> From: Joakim Tjernlund <Joakim.Tjernlund@infinera.com>
> Sent: Wednesday, December 1, 2021 8:19 AM
> To: regressions@leemhuis.info; Leo Li <leoyang.li@nxp.com>;
> Eugene_Bordenkircher@selinc.com; linux-usb@vger.kernel.org; linuxppc-
> dev@lists.ozlabs.org
> Cc: gregkh@linuxfoundation.org; balbi@kernel.org
> Subject: Re: bug: usb: gadget: FSL_UDC_CORE Corrupted request list leads to
> unrecoverable loop.
> 
> On Tue, 2021-11-30 at 12:56 +0100, Joakim Tjernlund wrote:
> > On Mon, 2021-11-29 at 23:48 +0000, Eugene Bordenkircher wrote:
> > > Agreed,
> > >
> > > We are happy pick up the torch on this, but I'd like to try and hear from
> Joakim first before we do.  The patch set is his, so I'd like to give him the
> opportunity.  I think he's the only one that can add a truly proper description
> as well because he mentioned that this includes a "few more fixes" than just
> the one we ran into.  I'd rather hear from him than try to reverse engineer
> what was being addressed.
> > >
> > > Joakim, if you are still watching the thread, would you like to take a stab
> at it?  If I don't hear from you in a couple days, we'll pick up the torch and do
> what we can.
> > >
> >
> > I am far away from this now and still on 4.19. I don't mind if you tweak
> tweak the patches for better "upstreamability"
> 
> Even better would be to migrate to the chipidea driver, I am told just a few
> tweaks are needed but this is probably something NXP should do as they
> have access to other SOC's using chipidea.

I agree with this direction but the problem was with bandwidth.  As this controller was only used on legacy platforms, it is harder to justify new effort on it now.

Regards,
Leo


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: bug: usb: gadget: FSL_UDC_CORE Corrupted request list leads to unrecoverable loop.
  2021-12-02 20:35                     ` Leo Li
@ 2021-12-02 22:45                       ` Joakim Tjernlund
  2021-12-04  0:40                         ` Leo Li
  0 siblings, 1 reply; 22+ messages in thread
From: Joakim Tjernlund @ 2021-12-02 22:45 UTC (permalink / raw)
  To: regressions, leoyang.li, Eugene_Bordenkircher, linux-usb, linuxppc-dev
  Cc: gregkh, balbi

On Thu, 2021-12-02 at 20:35 +0000, Leo Li wrote:
> 
> > -----Original Message-----
> > From: Joakim Tjernlund <Joakim.Tjernlund@infinera.com>
> > Sent: Wednesday, December 1, 2021 8:19 AM
> > To: regressions@leemhuis.info; Leo Li <leoyang.li@nxp.com>;
> > Eugene_Bordenkircher@selinc.com; linux-usb@vger.kernel.org; linuxppc-
> > dev@lists.ozlabs.org
> > Cc: gregkh@linuxfoundation.org; balbi@kernel.org
> > Subject: Re: bug: usb: gadget: FSL_UDC_CORE Corrupted request list leads to
> > unrecoverable loop.
> > 
> > On Tue, 2021-11-30 at 12:56 +0100, Joakim Tjernlund wrote:
> > > On Mon, 2021-11-29 at 23:48 +0000, Eugene Bordenkircher wrote:
> > > > Agreed,
> > > > 
> > > > We are happy pick up the torch on this, but I'd like to try and hear from
> > Joakim first before we do.  The patch set is his, so I'd like to give him the
> > opportunity.  I think he's the only one that can add a truly proper description
> > as well because he mentioned that this includes a "few more fixes" than just
> > the one we ran into.  I'd rather hear from him than try to reverse engineer
> > what was being addressed.
> > > > 
> > > > Joakim, if you are still watching the thread, would you like to take a stab
> > at it?  If I don't hear from you in a couple days, we'll pick up the torch and do
> > what we can.
> > > > 
> > > 
> > > I am far away from this now and still on 4.19. I don't mind if you tweak
> > tweak the patches for better "upstreamability"
> > 
> > Even better would be to migrate to the chipidea driver, I am told just a few
> > tweaks are needed but this is probably something NXP should do as they
> > have access to other SOC's using chipidea.
> 
> I agree with this direction but the problem was with bandwidth.  As this controller was only used on legacy platforms, it is harder to justify new effort on it now.
> 

Legacy? All PPC is legacy and not supported now? 

  Jocke

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: bug: usb: gadget: FSL_UDC_CORE Corrupted request list leads to unrecoverable loop.
  2021-12-02 22:45                       ` Joakim Tjernlund
@ 2021-12-04  0:40                         ` Leo Li
  2022-01-20 12:54                           ` Thorsten Leemhuis
  0 siblings, 1 reply; 22+ messages in thread
From: Leo Li @ 2021-12-04  0:40 UTC (permalink / raw)
  To: jocke@infinera.com, regressions, Eugene_Bordenkircher, linux-usb,
	linuxppc-dev
  Cc: gregkh, balbi



> -----Original Message-----
> From: Joakim Tjernlund <Joakim.Tjernlund@infinera.com>
> Sent: Thursday, December 2, 2021 4:45 PM
> To: regressions@leemhuis.info; Leo Li <leoyang.li@nxp.com>;
> Eugene_Bordenkircher@selinc.com; linux-usb@vger.kernel.org; linuxppc-
> dev@lists.ozlabs.org
> Cc: gregkh@linuxfoundation.org; balbi@kernel.org
> Subject: Re: bug: usb: gadget: FSL_UDC_CORE Corrupted request list leads to
> unrecoverable loop.
> 
> On Thu, 2021-12-02 at 20:35 +0000, Leo Li wrote:
> >
> > > -----Original Message-----
> > > From: Joakim Tjernlund <Joakim.Tjernlund@infinera.com>
> > > Sent: Wednesday, December 1, 2021 8:19 AM
> > > To: regressions@leemhuis.info; Leo Li <leoyang.li@nxp.com>;
> > > Eugene_Bordenkircher@selinc.com; linux-usb@vger.kernel.org;
> > > linuxppc- dev@lists.ozlabs.org
> > > Cc: gregkh@linuxfoundation.org; balbi@kernel.org
> > > Subject: Re: bug: usb: gadget: FSL_UDC_CORE Corrupted request list
> > > leads to unrecoverable loop.
> > >
> > > On Tue, 2021-11-30 at 12:56 +0100, Joakim Tjernlund wrote:
> > > > On Mon, 2021-11-29 at 23:48 +0000, Eugene Bordenkircher wrote:
> > > > > Agreed,
> > > > >
> > > > > We are happy pick up the torch on this, but I'd like to try and
> > > > > hear from
> > > Joakim first before we do.  The patch set is his, so I'd like to
> > > give him the opportunity.  I think he's the only one that can add a
> > > truly proper description as well because he mentioned that this
> > > includes a "few more fixes" than just the one we ran into.  I'd
> > > rather hear from him than try to reverse engineer what was being
> addressed.
> > > > >
> > > > > Joakim, if you are still watching the thread, would you like to
> > > > > take a stab
> > > at it?  If I don't hear from you in a couple days, we'll pick up the
> > > torch and do what we can.
> > > > >
> > > >
> > > > I am far away from this now and still on 4.19. I don't mind if you
> > > > tweak
> > > tweak the patches for better "upstreamability"
> > >
> > > Even better would be to migrate to the chipidea driver, I am told
> > > just a few tweaks are needed but this is probably something NXP
> > > should do as they have access to other SOC's using chipidea.
> >
> > I agree with this direction but the problem was with bandwidth.  As this
> controller was only used on legacy platforms, it is harder to justify new effort
> on it now.
> >
> 
> Legacy? All PPC is legacy and not supported now?

I'm not saying that they are not supported, but they are in maintenance only mode.

Regards,
Leo

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: bug: usb: gadget: FSL_UDC_CORE Corrupted request list leads to unrecoverable loop.
  2021-12-04  0:40                         ` Leo Li
@ 2022-01-20 12:54                           ` Thorsten Leemhuis
  2022-02-18  7:11                             ` Thorsten Leemhuis
  0 siblings, 1 reply; 22+ messages in thread
From: Thorsten Leemhuis @ 2022-01-20 12:54 UTC (permalink / raw)
  To: Leo Li, jocke@infinera.com, regressions, Eugene_Bordenkircher,
	linux-usb, linuxppc-dev
  Cc: gregkh, balbi

Hi, this is your Linux kernel regression tracker speaking.

On 04.12.21 01:40, Leo Li wrote:
>> -----Original Message-----
>> From: Joakim Tjernlund <Joakim.Tjernlund@infinera.com>
>> Sent: Thursday, December 2, 2021 4:45 PM
>> To: regressions@leemhuis.info; Leo Li <leoyang.li@nxp.com>;
>> Eugene_Bordenkircher@selinc.com; linux-usb@vger.kernel.org; linuxppc-
>> dev@lists.ozlabs.org
>> Cc: gregkh@linuxfoundation.org; balbi@kernel.org
>> Subject: Re: bug: usb: gadget: FSL_UDC_CORE Corrupted request list leads to
>> unrecoverable loop.
>>
>> On Thu, 2021-12-02 at 20:35 +0000, Leo Li wrote:
>>>
>>>> -----Original Message-----
>>>> From: Joakim Tjernlund <Joakim.Tjernlund@infinera.com>
>>>> Sent: Wednesday, December 1, 2021 8:19 AM
>>>> To: regressions@leemhuis.info; Leo Li <leoyang.li@nxp.com>;
>>>> Eugene_Bordenkircher@selinc.com; linux-usb@vger.kernel.org;
>>>> linuxppc- dev@lists.ozlabs.org
>>>> Cc: gregkh@linuxfoundation.org; balbi@kernel.org
>>>> Subject: Re: bug: usb: gadget: FSL_UDC_CORE Corrupted request list
>>>> leads to unrecoverable loop.
>>>>
>>>> On Tue, 2021-11-30 at 12:56 +0100, Joakim Tjernlund wrote:
>>>>> On Mon, 2021-11-29 at 23:48 +0000, Eugene Bordenkircher wrote:
>>>>>> Agreed,
>>>>>>
>>>>>> We are happy pick up the torch on this, but I'd like to try and
>>>>>> hear from
>>>> Joakim first before we do.  The patch set is his, so I'd like to
>>>> give him the opportunity.  I think he's the only one that can add a
>>>> truly proper description as well because he mentioned that this
>>>> includes a "few more fixes" than just the one we ran into.  I'd
>>>> rather hear from him than try to reverse engineer what was being
>> addressed.
>>>>>>
>>>>>> Joakim, if you are still watching the thread, would you like to
>>>>>> take a stab
>>>> at it?  If I don't hear from you in a couple days, we'll pick up the
>>>> torch and do what we can.

Did anything happen? Sure, it's a old regression from the v3.4-rc4 days,
but there iirc was already a tested proto-patch in that thread that
fixes the issue. Or was progress made and I just missed it?

Ciao, Thorsten

P.S.: As a Linux kernel regression tracker I'm getting a lot of reports
on my table. I can only look briefly into most of them. Unfortunately
therefore I sometimes will get things wrong or miss something important.
I hope that's not the case here; if you think it is, don't hesitate to
tell me about it in a public reply, that's in everyone's interest.

BTW, I have no personal interest in this issue, which is tracked using
regzbot, my Linux kernel regression tracking bot
(https://linux-regtracking.leemhuis.info/regzbot/). I'm only posting
this mail to get things rolling again and hence don't need to be CC on
all further activities wrt to this regression.

#regzbot ignore-activity

>>>>> I am far away from this now and still on 4.19. I don't mind if you
>>>>> tweak
>>>> tweak the patches for better "upstreamability"
>>>>
>>>> Even better would be to migrate to the chipidea driver, I am told
>>>> just a few tweaks are needed but this is probably something NXP
>>>> should do as they have access to other SOC's using chipidea.
>>>
>>> I agree with this direction but the problem was with bandwidth.  As this
>> controller was only used on legacy platforms, it is harder to justify new effort
>> on it now.
>>
>> Legacy? All PPC is legacy and not supported now?
> 
> I'm not saying that they are not supported, but they are in maintenance only mode.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: bug: usb: gadget: FSL_UDC_CORE Corrupted request list leads to unrecoverable loop.
  2022-01-20 12:54                           ` Thorsten Leemhuis
@ 2022-02-18  7:11                             ` Thorsten Leemhuis
  2022-02-18 10:21                               ` Joakim Tjernlund
  0 siblings, 1 reply; 22+ messages in thread
From: Thorsten Leemhuis @ 2022-02-18  7:11 UTC (permalink / raw)
  To: Leo Li, jocke@infinera.com, Eugene_Bordenkircher, linux-usb,
	linuxppc-dev
  Cc: gregkh, balbi

Hi, this is your Linux kernel regression tracker speaking. Top-posting
for once, to make this easy accessible to everyone.

Sadly it looks to me like nobody is going to address this (quite old)
regression (that afaic only very few people will hit), despite the rough
patch to fix it that was already posted and tested in this thread.

Well, guess that's how it is sometimes. Marking it as "on back burner"
in regzbot to reduce the noise there:

#regzbot backburner: Tested patch available, but things nevertheless got
stuck

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

P.S.: As the Linux kernel's regression tracker I'm getting a lot of
reports on my table. I can only look briefly into most of them and lack
knowledge about most of the areas they concern. I thus unfortunately
will sometimes get things wrong or miss something important. I hope
that's not the case here; if you think it is, don't hesitate to tell me
in a public reply, it's in everyone's interest to set the public record
straight.

#regzbot poke



On 20.01.22 13:54, Thorsten Leemhuis wrote:
> Hi, this is your Linux kernel regression tracker speaking.
> 
> On 04.12.21 01:40, Leo Li wrote:
>>> -----Original Message-----
>>> From: Joakim Tjernlund <Joakim.Tjernlund@infinera.com>
>>> Sent: Thursday, December 2, 2021 4:45 PM
>>> To: regressions@leemhuis.info; Leo Li <leoyang.li@nxp.com>;
>>> Eugene_Bordenkircher@selinc.com; linux-usb@vger.kernel.org; linuxppc-
>>> dev@lists.ozlabs.org
>>> Cc: gregkh@linuxfoundation.org; balbi@kernel.org
>>> Subject: Re: bug: usb: gadget: FSL_UDC_CORE Corrupted request list leads to
>>> unrecoverable loop.
>>>
>>> On Thu, 2021-12-02 at 20:35 +0000, Leo Li wrote:
>>>>
>>>>> -----Original Message-----
>>>>> From: Joakim Tjernlund <Joakim.Tjernlund@infinera.com>
>>>>> Sent: Wednesday, December 1, 2021 8:19 AM
>>>>> To: regressions@leemhuis.info; Leo Li <leoyang.li@nxp.com>;
>>>>> Eugene_Bordenkircher@selinc.com; linux-usb@vger.kernel.org;
>>>>> linuxppc- dev@lists.ozlabs.org
>>>>> Cc: gregkh@linuxfoundation.org; balbi@kernel.org
>>>>> Subject: Re: bug: usb: gadget: FSL_UDC_CORE Corrupted request list
>>>>> leads to unrecoverable loop.
>>>>>
>>>>> On Tue, 2021-11-30 at 12:56 +0100, Joakim Tjernlund wrote:
>>>>>> On Mon, 2021-11-29 at 23:48 +0000, Eugene Bordenkircher wrote:
>>>>>>> Agreed,
>>>>>>>
>>>>>>> We are happy pick up the torch on this, but I'd like to try and
>>>>>>> hear from
>>>>> Joakim first before we do.  The patch set is his, so I'd like to
>>>>> give him the opportunity.  I think he's the only one that can add a
>>>>> truly proper description as well because he mentioned that this
>>>>> includes a "few more fixes" than just the one we ran into.  I'd
>>>>> rather hear from him than try to reverse engineer what was being
>>> addressed.
>>>>>>>
>>>>>>> Joakim, if you are still watching the thread, would you like to
>>>>>>> take a stab
>>>>> at it?  If I don't hear from you in a couple days, we'll pick up the
>>>>> torch and do what we can.
> 
> Did anything happen? Sure, it's a old regression from the v3.4-rc4 days,
> but there iirc was already a tested proto-patch in that thread that
> fixes the issue. Or was progress made and I just missed it?
> 
> Ciao, Thorsten
> 
> P.S.: As a Linux kernel regression tracker I'm getting a lot of reports
> on my table. I can only look briefly into most of them. Unfortunately
> therefore I sometimes will get things wrong or miss something important.
> I hope that's not the case here; if you think it is, don't hesitate to
> tell me about it in a public reply, that's in everyone's interest.
> 
> BTW, I have no personal interest in this issue, which is tracked using
> regzbot, my Linux kernel regression tracking bot
> (https://linux-regtracking.leemhuis.info/regzbot/). I'm only posting
> this mail to get things rolling again and hence don't need to be CC on
> all further activities wrt to this regression.
> 
> #regzbot ignore-activity
> 
>>>>>> I am far away from this now and still on 4.19. I don't mind if you
>>>>>> tweak
>>>>> tweak the patches for better "upstreamability"
>>>>>
>>>>> Even better would be to migrate to the chipidea driver, I am told
>>>>> just a few tweaks are needed but this is probably something NXP
>>>>> should do as they have access to other SOC's using chipidea.
>>>>
>>>> I agree with this direction but the problem was with bandwidth.  As this
>>> controller was only used on legacy platforms, it is harder to justify new effort
>>> on it now.
>>>
>>> Legacy? All PPC is legacy and not supported now?
>>
>> I'm not saying that they are not supported, but they are in maintenance only mode.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: bug: usb: gadget: FSL_UDC_CORE Corrupted request list leads to unrecoverable loop.
  2022-02-18  7:11                             ` Thorsten Leemhuis
@ 2022-02-18 10:21                               ` Joakim Tjernlund
  2022-02-18 10:39                                 ` gregkh
  0 siblings, 1 reply; 22+ messages in thread
From: Joakim Tjernlund @ 2022-02-18 10:21 UTC (permalink / raw)
  To: Thorsten Leemhuis, Leo Li, Eugene_Bordenkircher, linux-usb, linuxppc-dev
  Cc: gregkh, balbi

I think you could apply them as is, only criticism was the commit msgs.

 Jocke

________________________________________
From: Thorsten Leemhuis <regressions@leemhuis.info>
Sent: 18 February 2022 08:11
To: Leo Li; Joakim Tjernlund; Eugene_Bordenkircher@selinc.com; linux-usb@vger.kernel.org; linuxppc-dev@lists.ozlabs.org
Cc: gregkh@linuxfoundation.org; balbi@kernel.org
Subject: Re: bug: usb: gadget: FSL_UDC_CORE Corrupted request list leads to unrecoverable loop.

Hi, this is your Linux kernel regression tracker speaking. Top-posting
for once, to make this easy accessible to everyone.

Sadly it looks to me like nobody is going to address this (quite old)
regression (that afaic only very few people will hit), despite the rough
patch to fix it that was already posted and tested in this thread.

Well, guess that's how it is sometimes. Marking it as "on back burner"
in regzbot to reduce the noise there:

#regzbot backburner: Tested patch available, but things nevertheless got
stuck

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

P.S.: As the Linux kernel's regression tracker I'm getting a lot of
reports on my table. I can only look briefly into most of them and lack
knowledge about most of the areas they concern. I thus unfortunately
will sometimes get things wrong or miss something important. I hope
that's not the case here; if you think it is, don't hesitate to tell me
in a public reply, it's in everyone's interest to set the public record
straight.

#regzbot poke



On 20.01.22 13:54, Thorsten Leemhuis wrote:
> Hi, this is your Linux kernel regression tracker speaking.
>
> On 04.12.21 01:40, Leo Li wrote:
>>> -----Original Message-----
>>> From: Joakim Tjernlund <Joakim.Tjernlund@infinera.com>
>>> Sent: Thursday, December 2, 2021 4:45 PM
>>> To: regressions@leemhuis.info; Leo Li <leoyang.li@nxp.com>;
>>> Eugene_Bordenkircher@selinc.com; linux-usb@vger.kernel.org; linuxppc-
>>> dev@lists.ozlabs.org
>>> Cc: gregkh@linuxfoundation.org; balbi@kernel.org
>>> Subject: Re: bug: usb: gadget: FSL_UDC_CORE Corrupted request list leads to
>>> unrecoverable loop.
>>>
>>> On Thu, 2021-12-02 at 20:35 +0000, Leo Li wrote:
>>>>
>>>>> -----Original Message-----
>>>>> From: Joakim Tjernlund <Joakim.Tjernlund@infinera.com>
>>>>> Sent: Wednesday, December 1, 2021 8:19 AM
>>>>> To: regressions@leemhuis.info; Leo Li <leoyang.li@nxp.com>;
>>>>> Eugene_Bordenkircher@selinc.com; linux-usb@vger.kernel.org;
>>>>> linuxppc- dev@lists.ozlabs.org
>>>>> Cc: gregkh@linuxfoundation.org; balbi@kernel.org
>>>>> Subject: Re: bug: usb: gadget: FSL_UDC_CORE Corrupted request list
>>>>> leads to unrecoverable loop.
>>>>>
>>>>> On Tue, 2021-11-30 at 12:56 +0100, Joakim Tjernlund wrote:
>>>>>> On Mon, 2021-11-29 at 23:48 +0000, Eugene Bordenkircher wrote:
>>>>>>> Agreed,
>>>>>>>
>>>>>>> We are happy pick up the torch on this, but I'd like to try and
>>>>>>> hear from
>>>>> Joakim first before we do.  The patch set is his, so I'd like to
>>>>> give him the opportunity.  I think he's the only one that can add a
>>>>> truly proper description as well because he mentioned that this
>>>>> includes a "few more fixes" than just the one we ran into.  I'd
>>>>> rather hear from him than try to reverse engineer what was being
>>> addressed.
>>>>>>>
>>>>>>> Joakim, if you are still watching the thread, would you like to
>>>>>>> take a stab
>>>>> at it?  If I don't hear from you in a couple days, we'll pick up the
>>>>> torch and do what we can.
>
> Did anything happen? Sure, it's a old regression from the v3.4-rc4 days,
> but there iirc was already a tested proto-patch in that thread that
> fixes the issue. Or was progress made and I just missed it?
>
> Ciao, Thorsten
>
> P.S.: As a Linux kernel regression tracker I'm getting a lot of reports
> on my table. I can only look briefly into most of them. Unfortunately
> therefore I sometimes will get things wrong or miss something important.
> I hope that's not the case here; if you think it is, don't hesitate to
> tell me about it in a public reply, that's in everyone's interest.
>
> BTW, I have no personal interest in this issue, which is tracked using
> regzbot, my Linux kernel regression tracking bot
> (https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flinux-regtracking.leemhuis.info%2Fregzbot%2F&amp;data=04%7C01%7Cjoakim.tjernlund%40infinera.com%7C8784242cb55d4627e61608d9f2adec23%7C285643de5f5b4b03a1530ae2dc8aaf77%7C1%7C0%7C637807651100768999%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=dQS2xwqJjHY4DqSawLZKoe0XZaBAqPLul5YgPdQWFio%3D&amp;reserved=0). I'm only posting
> this mail to get things rolling again and hence don't need to be CC on
> all further activities wrt to this regression.
>
> #regzbot ignore-activity
>
>>>>>> I am far away from this now and still on 4.19. I don't mind if you
>>>>>> tweak
>>>>> tweak the patches for better "upstreamability"
>>>>>
>>>>> Even better would be to migrate to the chipidea driver, I am told
>>>>> just a few tweaks are needed but this is probably something NXP
>>>>> should do as they have access to other SOC's using chipidea.
>>>>
>>>> I agree with this direction but the problem was with bandwidth.  As this
>>> controller was only used on legacy platforms, it is harder to justify new effort
>>> on it now.
>>>
>>> Legacy? All PPC is legacy and not supported now?
>>
>> I'm not saying that they are not supported, but they are in maintenance only mode.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: bug: usb: gadget: FSL_UDC_CORE Corrupted request list leads to unrecoverable loop.
  2022-02-18 10:21                               ` Joakim Tjernlund
@ 2022-02-18 10:39                                 ` gregkh
  2022-02-18 11:17                                   ` Joakim Tjernlund
  0 siblings, 1 reply; 22+ messages in thread
From: gregkh @ 2022-02-18 10:39 UTC (permalink / raw)
  To: Joakim Tjernlund
  Cc: balbi, Eugene_Bordenkircher, linux-usb, Leo Li,
	Thorsten Leemhuis, linuxppc-dev

On Fri, Feb 18, 2022 at 10:21:12AM +0000, Joakim Tjernlund wrote:
> I think you could apply them as is, only criticism was the commit msgs.

That is always a good reason to reject a change.  Please resubmit them
with the commit message cleaned up and I will be glad to review it
again.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: bug: usb: gadget: FSL_UDC_CORE Corrupted request list leads to unrecoverable loop.
  2022-02-18 10:39                                 ` gregkh
@ 2022-02-18 11:17                                   ` Joakim Tjernlund
  2022-02-18 11:48                                     ` gregkh
  0 siblings, 1 reply; 22+ messages in thread
From: Joakim Tjernlund @ 2022-02-18 11:17 UTC (permalink / raw)
  To: gregkh
  Cc: balbi, Eugene_Bordenkircher, linux-usb, Leo Li,
	Thorsten Leemhuis, linuxppc-dev

I was happy with commit msgs and I don't know what the criticism was.

________________________________________
From: gregkh@linuxfoundation.org <gregkh@linuxfoundation.org>
Sent: 18 February 2022 11:39
To: Joakim Tjernlund
Cc: Thorsten Leemhuis; Leo Li; Eugene_Bordenkircher@selinc.com; linux-usb@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; balbi@kernel.org
Subject: Re: bug: usb: gadget: FSL_UDC_CORE Corrupted request list leads to unrecoverable loop.

On Fri, Feb 18, 2022 at 10:21:12AM +0000, Joakim Tjernlund wrote:
> I think you could apply them as is, only criticism was the commit msgs.

That is always a good reason to reject a change.  Please resubmit them
with the commit message cleaned up and I will be glad to review it
again.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: bug: usb: gadget: FSL_UDC_CORE Corrupted request list leads to unrecoverable loop.
  2022-02-18 11:17                                   ` Joakim Tjernlund
@ 2022-02-18 11:48                                     ` gregkh
  0 siblings, 0 replies; 22+ messages in thread
From: gregkh @ 2022-02-18 11:48 UTC (permalink / raw)
  To: Joakim Tjernlund
  Cc: balbi, Eugene_Bordenkircher, linux-usb, Leo Li,
	Thorsten Leemhuis, linuxppc-dev

On Fri, Feb 18, 2022 at 11:17:59AM +0000, Joakim Tjernlund wrote:
> I was happy with commit msgs and I don't know what the criticism was.

I have no context anymore, sorry.

Can someone resubmit the change again and we can take it from there?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2022-02-18 11:49 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-29 17:14 bug: usb: gadget: FSL_UDC_CORE Corrupted request list leads to unrecoverable loop Eugene Bordenkircher
2021-10-29 17:24 ` Eugene Bordenkircher
2021-10-29 23:14   ` Li Yang
2021-10-30 14:20 ` Joakim Tjernlund
2021-11-02 21:15   ` Joakim Tjernlund
2021-11-15  8:36     ` Thorsten Leemhuis
2021-11-16 19:11       ` Eugene Bordenkircher
2021-11-25 13:59         ` Thorsten Leemhuis
2021-11-29 17:24           ` Eugene Bordenkircher
2021-11-29 23:37             ` Leo Li
2021-11-29 23:48               ` Eugene Bordenkircher
2021-11-30 11:56                 ` Joakim Tjernlund
2021-12-01 14:19                   ` Joakim Tjernlund
2021-12-02 20:35                     ` Leo Li
2021-12-02 22:45                       ` Joakim Tjernlund
2021-12-04  0:40                         ` Leo Li
2022-01-20 12:54                           ` Thorsten Leemhuis
2022-02-18  7:11                             ` Thorsten Leemhuis
2022-02-18 10:21                               ` Joakim Tjernlund
2022-02-18 10:39                                 ` gregkh
2022-02-18 11:17                                   ` Joakim Tjernlund
2022-02-18 11:48                                     ` gregkh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).