All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] r8152: stop submitting rx for -EPROTO
@ 2021-09-29  5:18 ` Jason-ch Chen
  0 siblings, 0 replies; 44+ messages in thread
From: Jason-ch Chen @ 2021-09-29  5:18 UTC (permalink / raw)
  To: matthias.bgg, hayeswang
  Cc: linux-usb, netdev, linux-kernel, linux-arm-kernel,
	linux-mediatek, Project_Global_Chrome_Upstream_Group, hsinyi,
	Jason-ch Chen

When unplugging RTL8152 Fast Ethernet Adapter which is plugged
into an USB HUB, the driver would get -EPROTO for bulk transfer.
There is a high probability to get the soft/hard lockup
information if the driver continues to submit Rx before the HUB
completes the detection of all hub ports and issue the
disconnect event.

[  644.786219] net_ratelimit: 113887 callbacks suppressed
[  644.786239] r8152 1-1.2.4:1.0 eth0: Rx status -71
[  644.786335] r8152 1-1.2.4:1.0 eth0: Rx status -71
[  644.786369] r8152 1-1.2.4:1.0 eth0: Rx status -71
[  644.786431] r8152 1-1.2.4:1.0 eth0: Rx status -71
[  644.786493] r8152 1-1.2.4:1.0 eth0: Rx status -71
[  644.786555] r8152 1-1.2.4:1.0 eth0: Rx status -71
[  644.786617] r8152 1-1.2.4:1.0 eth0: Rx status -71
[  644.786678] r8152 1-1.2.4:1.0 eth0: Rx status -71
[  644.786740] r8152 1-1.2.4:1.0 eth0: Rx status -71
[  644.786802] r8152 1-1.2.4:1.0 eth0: Rx status -71
[  645.041159] mtk-scp 10500000.scp: scp_ipi_send: IPI timeout!
[  645.041211] cros-ec-rpmsg 10500000.scp.cros-ec-rpmsg.13.-1: rpmsg send failed
[  649.183350] watchdog: BUG: soft lockup - CPU#0 stuck for 12s! [migration/0:14]

Signed-off-by: Jason-ch Chen <jason-ch.chen@mediatek.com>
---
 drivers/net/usb/r8152.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index 60ba9b734055..250718f0dcb7 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -1771,6 +1771,7 @@ static void read_bulk_callback(struct urb *urb)
 		netif_device_detach(tp->netdev);
 		return;
 	case -ENOENT:
+	case -EPROTO:
 		return;	/* the urb is in unlink state */
 	case -ETIME:
 		if (net_ratelimit())
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH] r8152: stop submitting rx for -EPROTO
@ 2021-09-29  5:18 ` Jason-ch Chen
  0 siblings, 0 replies; 44+ messages in thread
From: Jason-ch Chen @ 2021-09-29  5:18 UTC (permalink / raw)
  To: matthias.bgg, hayeswang
  Cc: linux-usb, netdev, linux-kernel, linux-arm-kernel,
	linux-mediatek, Project_Global_Chrome_Upstream_Group, hsinyi,
	Jason-ch Chen

When unplugging RTL8152 Fast Ethernet Adapter which is plugged
into an USB HUB, the driver would get -EPROTO for bulk transfer.
There is a high probability to get the soft/hard lockup
information if the driver continues to submit Rx before the HUB
completes the detection of all hub ports and issue the
disconnect event.

[  644.786219] net_ratelimit: 113887 callbacks suppressed
[  644.786239] r8152 1-1.2.4:1.0 eth0: Rx status -71
[  644.786335] r8152 1-1.2.4:1.0 eth0: Rx status -71
[  644.786369] r8152 1-1.2.4:1.0 eth0: Rx status -71
[  644.786431] r8152 1-1.2.4:1.0 eth0: Rx status -71
[  644.786493] r8152 1-1.2.4:1.0 eth0: Rx status -71
[  644.786555] r8152 1-1.2.4:1.0 eth0: Rx status -71
[  644.786617] r8152 1-1.2.4:1.0 eth0: Rx status -71
[  644.786678] r8152 1-1.2.4:1.0 eth0: Rx status -71
[  644.786740] r8152 1-1.2.4:1.0 eth0: Rx status -71
[  644.786802] r8152 1-1.2.4:1.0 eth0: Rx status -71
[  645.041159] mtk-scp 10500000.scp: scp_ipi_send: IPI timeout!
[  645.041211] cros-ec-rpmsg 10500000.scp.cros-ec-rpmsg.13.-1: rpmsg send failed
[  649.183350] watchdog: BUG: soft lockup - CPU#0 stuck for 12s! [migration/0:14]

Signed-off-by: Jason-ch Chen <jason-ch.chen@mediatek.com>
---
 drivers/net/usb/r8152.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index 60ba9b734055..250718f0dcb7 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -1771,6 +1771,7 @@ static void read_bulk_callback(struct urb *urb)
 		netif_device_detach(tp->netdev);
 		return;
 	case -ENOENT:
+	case -EPROTO:
 		return;	/* the urb is in unlink state */
 	case -ETIME:
 		if (net_ratelimit())
-- 
2.18.0


_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH] r8152: stop submitting rx for -EPROTO
@ 2021-09-29  5:18 ` Jason-ch Chen
  0 siblings, 0 replies; 44+ messages in thread
From: Jason-ch Chen @ 2021-09-29  5:18 UTC (permalink / raw)
  To: matthias.bgg, hayeswang
  Cc: linux-usb, netdev, linux-kernel, linux-arm-kernel,
	linux-mediatek, Project_Global_Chrome_Upstream_Group, hsinyi,
	Jason-ch Chen

When unplugging RTL8152 Fast Ethernet Adapter which is plugged
into an USB HUB, the driver would get -EPROTO for bulk transfer.
There is a high probability to get the soft/hard lockup
information if the driver continues to submit Rx before the HUB
completes the detection of all hub ports and issue the
disconnect event.

[  644.786219] net_ratelimit: 113887 callbacks suppressed
[  644.786239] r8152 1-1.2.4:1.0 eth0: Rx status -71
[  644.786335] r8152 1-1.2.4:1.0 eth0: Rx status -71
[  644.786369] r8152 1-1.2.4:1.0 eth0: Rx status -71
[  644.786431] r8152 1-1.2.4:1.0 eth0: Rx status -71
[  644.786493] r8152 1-1.2.4:1.0 eth0: Rx status -71
[  644.786555] r8152 1-1.2.4:1.0 eth0: Rx status -71
[  644.786617] r8152 1-1.2.4:1.0 eth0: Rx status -71
[  644.786678] r8152 1-1.2.4:1.0 eth0: Rx status -71
[  644.786740] r8152 1-1.2.4:1.0 eth0: Rx status -71
[  644.786802] r8152 1-1.2.4:1.0 eth0: Rx status -71
[  645.041159] mtk-scp 10500000.scp: scp_ipi_send: IPI timeout!
[  645.041211] cros-ec-rpmsg 10500000.scp.cros-ec-rpmsg.13.-1: rpmsg send failed
[  649.183350] watchdog: BUG: soft lockup - CPU#0 stuck for 12s! [migration/0:14]

Signed-off-by: Jason-ch Chen <jason-ch.chen@mediatek.com>
---
 drivers/net/usb/r8152.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index 60ba9b734055..250718f0dcb7 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -1771,6 +1771,7 @@ static void read_bulk_callback(struct urb *urb)
 		netif_device_detach(tp->netdev);
 		return;
 	case -ENOENT:
+	case -EPROTO:
 		return;	/* the urb is in unlink state */
 	case -ETIME:
 		if (net_ratelimit())
-- 
2.18.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* RE: [PATCH] r8152: stop submitting rx for -EPROTO
  2021-09-29  5:18 ` Jason-ch Chen
  (?)
@ 2021-09-29  8:14   ` Hayes Wang
  -1 siblings, 0 replies; 44+ messages in thread
From: Hayes Wang @ 2021-09-29  8:14 UTC (permalink / raw)
  To: Jason-ch Chen, matthias.bgg
  Cc: linux-usb, netdev, linux-kernel, linux-arm-kernel,
	linux-mediatek, Project_Global_Chrome_Upstream_Group, hsinyi,
	nic_swsd

Jason-ch Chen <jason-ch.chen@mediatek.com>
> Sent: Wednesday, September 29, 2021 1:18 PM
[...]
> When unplugging RTL8152 Fast Ethernet Adapter which is plugged
> into an USB HUB, the driver would get -EPROTO for bulk transfer.
> There is a high probability to get the soft/hard lockup
> information if the driver continues to submit Rx before the HUB
> completes the detection of all hub ports and issue the
> disconnect event.

I don't think it is a good idea.
For the other situations which return the same error code, you would stop the rx, too.
However, the rx may re-work after being resubmitted for the other cases.

Best Regards,
Hayes


^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [PATCH] r8152: stop submitting rx for -EPROTO
@ 2021-09-29  8:14   ` Hayes Wang
  0 siblings, 0 replies; 44+ messages in thread
From: Hayes Wang @ 2021-09-29  8:14 UTC (permalink / raw)
  To: Jason-ch Chen, matthias.bgg
  Cc: linux-usb, netdev, linux-kernel, linux-arm-kernel,
	linux-mediatek, Project_Global_Chrome_Upstream_Group, hsinyi,
	nic_swsd

Jason-ch Chen <jason-ch.chen@mediatek.com>
> Sent: Wednesday, September 29, 2021 1:18 PM
[...]
> When unplugging RTL8152 Fast Ethernet Adapter which is plugged
> into an USB HUB, the driver would get -EPROTO for bulk transfer.
> There is a high probability to get the soft/hard lockup
> information if the driver continues to submit Rx before the HUB
> completes the detection of all hub ports and issue the
> disconnect event.

I don't think it is a good idea.
For the other situations which return the same error code, you would stop the rx, too.
However, the rx may re-work after being resubmitted for the other cases.

Best Regards,
Hayes


_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [PATCH] r8152: stop submitting rx for -EPROTO
@ 2021-09-29  8:14   ` Hayes Wang
  0 siblings, 0 replies; 44+ messages in thread
From: Hayes Wang @ 2021-09-29  8:14 UTC (permalink / raw)
  To: Jason-ch Chen, matthias.bgg
  Cc: linux-usb, netdev, linux-kernel, linux-arm-kernel,
	linux-mediatek, Project_Global_Chrome_Upstream_Group, hsinyi,
	nic_swsd

Jason-ch Chen <jason-ch.chen@mediatek.com>
> Sent: Wednesday, September 29, 2021 1:18 PM
[...]
> When unplugging RTL8152 Fast Ethernet Adapter which is plugged
> into an USB HUB, the driver would get -EPROTO for bulk transfer.
> There is a high probability to get the soft/hard lockup
> information if the driver continues to submit Rx before the HUB
> completes the detection of all hub ports and issue the
> disconnect event.

I don't think it is a good idea.
For the other situations which return the same error code, you would stop the rx, too.
However, the rx may re-work after being resubmitted for the other cases.

Best Regards,
Hayes


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] r8152: stop submitting rx for -EPROTO
  2021-09-29  8:14   ` Hayes Wang
  (?)
@ 2021-09-29  9:52     ` Jason-ch Chen
  -1 siblings, 0 replies; 44+ messages in thread
From: Jason-ch Chen @ 2021-09-29  9:52 UTC (permalink / raw)
  To: Hayes Wang, matthias.bgg
  Cc: linux-usb, netdev, linux-kernel, linux-arm-kernel,
	linux-mediatek, Project_Global_Chrome_Upstream_Group, hsinyi,
	nic_swsd

On Wed, 2021-09-29 at 08:14 +0000, Hayes Wang wrote:
> Jason-ch Chen <jason-ch.chen@mediatek.com>
> > Sent: Wednesday, September 29, 2021 1:18 PM
> 
> [...]
> > When unplugging RTL8152 Fast Ethernet Adapter which is plugged
> > into an USB HUB, the driver would get -EPROTO for bulk transfer.
> > There is a high probability to get the soft/hard lockup
> > information if the driver continues to submit Rx before the HUB
> > completes the detection of all hub ports and issue the
> > disconnect event.
> 
> I don't think it is a good idea.
> For the other situations which return the same error code, you would
> stop the rx, too.
> However, the rx may re-work after being resubmitted for the other
> cases.
> 
> Best Regards,
> Hayes
> 
Hi Hayes,

Sometimes Rx submits rapidly and the USB kernel driver of opensource
cannot receive any disconnect event due to CPU heavy loading, which
finally causes a system crash.
Do you have any suggestions to modify the r8152 driver to prevent this
situation happened?

Regards,
Jason


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] r8152: stop submitting rx for -EPROTO
@ 2021-09-29  9:52     ` Jason-ch Chen
  0 siblings, 0 replies; 44+ messages in thread
From: Jason-ch Chen @ 2021-09-29  9:52 UTC (permalink / raw)
  To: Hayes Wang, matthias.bgg
  Cc: linux-usb, netdev, linux-kernel, linux-arm-kernel,
	linux-mediatek, Project_Global_Chrome_Upstream_Group, hsinyi,
	nic_swsd

On Wed, 2021-09-29 at 08:14 +0000, Hayes Wang wrote:
> Jason-ch Chen <jason-ch.chen@mediatek.com>
> > Sent: Wednesday, September 29, 2021 1:18 PM
> 
> [...]
> > When unplugging RTL8152 Fast Ethernet Adapter which is plugged
> > into an USB HUB, the driver would get -EPROTO for bulk transfer.
> > There is a high probability to get the soft/hard lockup
> > information if the driver continues to submit Rx before the HUB
> > completes the detection of all hub ports and issue the
> > disconnect event.
> 
> I don't think it is a good idea.
> For the other situations which return the same error code, you would
> stop the rx, too.
> However, the rx may re-work after being resubmitted for the other
> cases.
> 
> Best Regards,
> Hayes
> 
Hi Hayes,

Sometimes Rx submits rapidly and the USB kernel driver of opensource
cannot receive any disconnect event due to CPU heavy loading, which
finally causes a system crash.
Do you have any suggestions to modify the r8152 driver to prevent this
situation happened?

Regards,
Jason


_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] r8152: stop submitting rx for -EPROTO
@ 2021-09-29  9:52     ` Jason-ch Chen
  0 siblings, 0 replies; 44+ messages in thread
From: Jason-ch Chen @ 2021-09-29  9:52 UTC (permalink / raw)
  To: Hayes Wang, matthias.bgg
  Cc: linux-usb, netdev, linux-kernel, linux-arm-kernel,
	linux-mediatek, Project_Global_Chrome_Upstream_Group, hsinyi,
	nic_swsd

On Wed, 2021-09-29 at 08:14 +0000, Hayes Wang wrote:
> Jason-ch Chen <jason-ch.chen@mediatek.com>
> > Sent: Wednesday, September 29, 2021 1:18 PM
> 
> [...]
> > When unplugging RTL8152 Fast Ethernet Adapter which is plugged
> > into an USB HUB, the driver would get -EPROTO for bulk transfer.
> > There is a high probability to get the soft/hard lockup
> > information if the driver continues to submit Rx before the HUB
> > completes the detection of all hub ports and issue the
> > disconnect event.
> 
> I don't think it is a good idea.
> For the other situations which return the same error code, you would
> stop the rx, too.
> However, the rx may re-work after being resubmitted for the other
> cases.
> 
> Best Regards,
> Hayes
> 
Hi Hayes,

Sometimes Rx submits rapidly and the USB kernel driver of opensource
cannot receive any disconnect event due to CPU heavy loading, which
finally causes a system crash.
Do you have any suggestions to modify the r8152 driver to prevent this
situation happened?

Regards,
Jason


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [PATCH] r8152: stop submitting rx for -EPROTO
  2021-09-29  9:52     ` Jason-ch Chen
  (?)
@ 2021-09-30  2:41       ` Hayes Wang
  -1 siblings, 0 replies; 44+ messages in thread
From: Hayes Wang @ 2021-09-30  2:41 UTC (permalink / raw)
  To: Jason-ch Chen, matthias.bgg
  Cc: linux-usb, netdev, linux-kernel, linux-arm-kernel,
	linux-mediatek, Project_Global_Chrome_Upstream_Group, hsinyi,
	nic_swsd

Jason-ch Chen <jason-ch.chen@mediatek.com>
> Sent: Wednesday, September 29, 2021 5:53 PM
[...]
> Hi Hayes,
> 
> Sometimes Rx submits rapidly and the USB kernel driver of opensource
> cannot receive any disconnect event due to CPU heavy loading, which
> finally causes a system crash.
> Do you have any suggestions to modify the r8152 driver to prevent this
> situation happened?

Do you mind to try the following patch?
It avoids to re-submit RX immediately.

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index 60ba9b734055..bfe00af8283f 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -767,6 +767,7 @@ enum rtl8152_flags {
 	PHY_RESET,
 	SCHEDULE_TASKLET,
 	GREEN_ETHERNET,
+	SCHEDULE_NAPI,
 };
 
 #define DEVICE_ID_THINKPAD_THUNDERBOLT3_DOCK_GEN2	0x3082
@@ -1770,6 +1771,14 @@ static void read_bulk_callback(struct urb *urb)
 		rtl_set_unplug(tp);
 		netif_device_detach(tp->netdev);
 		return;
+	case -EPROTO:
+		urb->actual_length = 0;
+		spin_lock_irqsave(&tp->rx_lock, flags);
+		list_add_tail(&agg->list, &tp->rx_done);
+		spin_unlock_irqrestore(&tp->rx_lock, flags);
+		set_bit(SCHEDULE_NAPI, &tp->flags);
+		schedule_delayed_work(&tp->schedule, 1);
+		return;
 	case -ENOENT:
 		return;	/* the urb is in unlink state */
 	case -ETIME:
@@ -2425,6 +2434,7 @@ static int rx_bottom(struct r8152 *tp, int budget)
 	if (list_empty(&tp->rx_done))
 		goto out1;
 
+	clear_bit(SCHEDULE_NAPI, &tp->flags);
 	INIT_LIST_HEAD(&rx_queue);
 	spin_lock_irqsave(&tp->rx_lock, flags);
 	list_splice_init(&tp->rx_done, &rx_queue);
@@ -2441,7 +2451,7 @@ static int rx_bottom(struct r8152 *tp, int budget)
 
 		agg = list_entry(cursor, struct rx_agg, list);
 		urb = agg->urb;
-		if (urb->actual_length < ETH_ZLEN)
+		if (urb->status != 0 || urb->actual_length < ETH_ZLEN)
 			goto submit;
 
 		agg_free = rtl_get_free_rx(tp, GFP_ATOMIC);
@@ -6643,6 +6653,10 @@ static void rtl_work_func_t(struct work_struct *work)
 	    netif_carrier_ok(tp->netdev))
 		tasklet_schedule(&tp->tx_tl);
 
+	if (test_and_clear_bit(SCHEDULE_NAPI, &tp->flags) &&
+	    !list_empty(&tp->rx_done))
+		napi_schedule(&tp->napi);
+
 	mutex_unlock(&tp->control);
 
 out1:


Best Regards,
Hayes


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* RE: [PATCH] r8152: stop submitting rx for -EPROTO
@ 2021-09-30  2:41       ` Hayes Wang
  0 siblings, 0 replies; 44+ messages in thread
From: Hayes Wang @ 2021-09-30  2:41 UTC (permalink / raw)
  To: Jason-ch Chen, matthias.bgg
  Cc: linux-usb, netdev, linux-kernel, linux-arm-kernel,
	linux-mediatek, Project_Global_Chrome_Upstream_Group, hsinyi,
	nic_swsd

Jason-ch Chen <jason-ch.chen@mediatek.com>
> Sent: Wednesday, September 29, 2021 5:53 PM
[...]
> Hi Hayes,
> 
> Sometimes Rx submits rapidly and the USB kernel driver of opensource
> cannot receive any disconnect event due to CPU heavy loading, which
> finally causes a system crash.
> Do you have any suggestions to modify the r8152 driver to prevent this
> situation happened?

Do you mind to try the following patch?
It avoids to re-submit RX immediately.

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index 60ba9b734055..bfe00af8283f 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -767,6 +767,7 @@ enum rtl8152_flags {
 	PHY_RESET,
 	SCHEDULE_TASKLET,
 	GREEN_ETHERNET,
+	SCHEDULE_NAPI,
 };
 
 #define DEVICE_ID_THINKPAD_THUNDERBOLT3_DOCK_GEN2	0x3082
@@ -1770,6 +1771,14 @@ static void read_bulk_callback(struct urb *urb)
 		rtl_set_unplug(tp);
 		netif_device_detach(tp->netdev);
 		return;
+	case -EPROTO:
+		urb->actual_length = 0;
+		spin_lock_irqsave(&tp->rx_lock, flags);
+		list_add_tail(&agg->list, &tp->rx_done);
+		spin_unlock_irqrestore(&tp->rx_lock, flags);
+		set_bit(SCHEDULE_NAPI, &tp->flags);
+		schedule_delayed_work(&tp->schedule, 1);
+		return;
 	case -ENOENT:
 		return;	/* the urb is in unlink state */
 	case -ETIME:
@@ -2425,6 +2434,7 @@ static int rx_bottom(struct r8152 *tp, int budget)
 	if (list_empty(&tp->rx_done))
 		goto out1;
 
+	clear_bit(SCHEDULE_NAPI, &tp->flags);
 	INIT_LIST_HEAD(&rx_queue);
 	spin_lock_irqsave(&tp->rx_lock, flags);
 	list_splice_init(&tp->rx_done, &rx_queue);
@@ -2441,7 +2451,7 @@ static int rx_bottom(struct r8152 *tp, int budget)
 
 		agg = list_entry(cursor, struct rx_agg, list);
 		urb = agg->urb;
-		if (urb->actual_length < ETH_ZLEN)
+		if (urb->status != 0 || urb->actual_length < ETH_ZLEN)
 			goto submit;
 
 		agg_free = rtl_get_free_rx(tp, GFP_ATOMIC);
@@ -6643,6 +6653,10 @@ static void rtl_work_func_t(struct work_struct *work)
 	    netif_carrier_ok(tp->netdev))
 		tasklet_schedule(&tp->tx_tl);
 
+	if (test_and_clear_bit(SCHEDULE_NAPI, &tp->flags) &&
+	    !list_empty(&tp->rx_done))
+		napi_schedule(&tp->napi);
+
 	mutex_unlock(&tp->control);
 
 out1:


Best Regards,
Hayes

_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* RE: [PATCH] r8152: stop submitting rx for -EPROTO
@ 2021-09-30  2:41       ` Hayes Wang
  0 siblings, 0 replies; 44+ messages in thread
From: Hayes Wang @ 2021-09-30  2:41 UTC (permalink / raw)
  To: Jason-ch Chen, matthias.bgg
  Cc: linux-usb, netdev, linux-kernel, linux-arm-kernel,
	linux-mediatek, Project_Global_Chrome_Upstream_Group, hsinyi,
	nic_swsd

Jason-ch Chen <jason-ch.chen@mediatek.com>
> Sent: Wednesday, September 29, 2021 5:53 PM
[...]
> Hi Hayes,
> 
> Sometimes Rx submits rapidly and the USB kernel driver of opensource
> cannot receive any disconnect event due to CPU heavy loading, which
> finally causes a system crash.
> Do you have any suggestions to modify the r8152 driver to prevent this
> situation happened?

Do you mind to try the following patch?
It avoids to re-submit RX immediately.

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index 60ba9b734055..bfe00af8283f 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -767,6 +767,7 @@ enum rtl8152_flags {
 	PHY_RESET,
 	SCHEDULE_TASKLET,
 	GREEN_ETHERNET,
+	SCHEDULE_NAPI,
 };
 
 #define DEVICE_ID_THINKPAD_THUNDERBOLT3_DOCK_GEN2	0x3082
@@ -1770,6 +1771,14 @@ static void read_bulk_callback(struct urb *urb)
 		rtl_set_unplug(tp);
 		netif_device_detach(tp->netdev);
 		return;
+	case -EPROTO:
+		urb->actual_length = 0;
+		spin_lock_irqsave(&tp->rx_lock, flags);
+		list_add_tail(&agg->list, &tp->rx_done);
+		spin_unlock_irqrestore(&tp->rx_lock, flags);
+		set_bit(SCHEDULE_NAPI, &tp->flags);
+		schedule_delayed_work(&tp->schedule, 1);
+		return;
 	case -ENOENT:
 		return;	/* the urb is in unlink state */
 	case -ETIME:
@@ -2425,6 +2434,7 @@ static int rx_bottom(struct r8152 *tp, int budget)
 	if (list_empty(&tp->rx_done))
 		goto out1;
 
+	clear_bit(SCHEDULE_NAPI, &tp->flags);
 	INIT_LIST_HEAD(&rx_queue);
 	spin_lock_irqsave(&tp->rx_lock, flags);
 	list_splice_init(&tp->rx_done, &rx_queue);
@@ -2441,7 +2451,7 @@ static int rx_bottom(struct r8152 *tp, int budget)
 
 		agg = list_entry(cursor, struct rx_agg, list);
 		urb = agg->urb;
-		if (urb->actual_length < ETH_ZLEN)
+		if (urb->status != 0 || urb->actual_length < ETH_ZLEN)
 			goto submit;
 
 		agg_free = rtl_get_free_rx(tp, GFP_ATOMIC);
@@ -6643,6 +6653,10 @@ static void rtl_work_func_t(struct work_struct *work)
 	    netif_carrier_ok(tp->netdev))
 		tasklet_schedule(&tp->tx_tl);
 
+	if (test_and_clear_bit(SCHEDULE_NAPI, &tp->flags) &&
+	    !list_empty(&tp->rx_done))
+		napi_schedule(&tp->napi);
+
 	mutex_unlock(&tp->control);
 
 out1:


Best Regards,
Hayes

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH] r8152: stop submitting rx for -EPROTO
  2021-09-29  9:52     ` Jason-ch Chen
  (?)
@ 2021-09-30  9:30       ` Oliver Neukum
  -1 siblings, 0 replies; 44+ messages in thread
From: Oliver Neukum @ 2021-09-30  9:30 UTC (permalink / raw)
  To: Jason-ch Chen, Hayes Wang, matthias.bgg
  Cc: linux-usb, netdev, linux-kernel, linux-arm-kernel,
	linux-mediatek, Project_Global_Chrome_Upstream_Group, hsinyi,
	nic_swsd


On 29.09.21 11:52, Jason-ch Chen wrote:
> On Wed, 2021-09-29 at 08:14 +0000, Hayes Wang wrote:
>>
> Hi Hayes,
>
> Sometimes Rx submits rapidly and the USB kernel driver of opensource
> cannot receive any disconnect event due to CPU heavy loading, which
> finally causes a system crash.
> Do you have any suggestions to modify the r8152 driver to prevent this
> situation happened?
>
> Regards,
> Jason
>
Hi,

Hayes proposed a solution. Basically you solve this the way HID or WDM do it
delaying resubmission. This makes me wonder whether this problem is specific
to any driver. If it is not, as I would argue, do we have a deficiency
in our API?

Should we have something like: usb_submit_delayed_urb() ?

    Regards
        Oliver



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] r8152: stop submitting rx for -EPROTO
@ 2021-09-30  9:30       ` Oliver Neukum
  0 siblings, 0 replies; 44+ messages in thread
From: Oliver Neukum @ 2021-09-30  9:30 UTC (permalink / raw)
  To: Jason-ch Chen, Hayes Wang, matthias.bgg
  Cc: linux-usb, netdev, linux-kernel, linux-arm-kernel,
	linux-mediatek, Project_Global_Chrome_Upstream_Group, hsinyi,
	nic_swsd


On 29.09.21 11:52, Jason-ch Chen wrote:
> On Wed, 2021-09-29 at 08:14 +0000, Hayes Wang wrote:
>>
> Hi Hayes,
>
> Sometimes Rx submits rapidly and the USB kernel driver of opensource
> cannot receive any disconnect event due to CPU heavy loading, which
> finally causes a system crash.
> Do you have any suggestions to modify the r8152 driver to prevent this
> situation happened?
>
> Regards,
> Jason
>
Hi,

Hayes proposed a solution. Basically you solve this the way HID or WDM do it
delaying resubmission. This makes me wonder whether this problem is specific
to any driver. If it is not, as I would argue, do we have a deficiency
in our API?

Should we have something like: usb_submit_delayed_urb() ?

    Regards
        Oliver



_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] r8152: stop submitting rx for -EPROTO
@ 2021-09-30  9:30       ` Oliver Neukum
  0 siblings, 0 replies; 44+ messages in thread
From: Oliver Neukum @ 2021-09-30  9:30 UTC (permalink / raw)
  To: Jason-ch Chen, Hayes Wang, matthias.bgg
  Cc: linux-usb, netdev, linux-kernel, linux-arm-kernel,
	linux-mediatek, Project_Global_Chrome_Upstream_Group, hsinyi,
	nic_swsd


On 29.09.21 11:52, Jason-ch Chen wrote:
> On Wed, 2021-09-29 at 08:14 +0000, Hayes Wang wrote:
>>
> Hi Hayes,
>
> Sometimes Rx submits rapidly and the USB kernel driver of opensource
> cannot receive any disconnect event due to CPU heavy loading, which
> finally causes a system crash.
> Do you have any suggestions to modify the r8152 driver to prevent this
> situation happened?
>
> Regards,
> Jason
>
Hi,

Hayes proposed a solution. Basically you solve this the way HID or WDM do it
delaying resubmission. This makes me wonder whether this problem is specific
to any driver. If it is not, as I would argue, do we have a deficiency
in our API?

Should we have something like: usb_submit_delayed_urb() ?

    Regards
        Oliver



_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] r8152: stop submitting rx for -EPROTO
  2021-09-30  9:30       ` Oliver Neukum
  (?)
@ 2021-09-30 15:18         ` Alan Stern
  -1 siblings, 0 replies; 44+ messages in thread
From: Alan Stern @ 2021-09-30 15:18 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: Jason-ch Chen, Hayes Wang, matthias.bgg, linux-usb, netdev,
	linux-kernel, linux-arm-kernel, linux-mediatek,
	Project_Global_Chrome_Upstream_Group, hsinyi, nic_swsd

On Thu, Sep 30, 2021 at 11:30:17AM +0200, Oliver Neukum wrote:
> 
> On 29.09.21 11:52, Jason-ch Chen wrote:
> > On Wed, 2021-09-29 at 08:14 +0000, Hayes Wang wrote:
> >>
> > Hi Hayes,
> >
> > Sometimes Rx submits rapidly and the USB kernel driver of opensource
> > cannot receive any disconnect event due to CPU heavy loading, which
> > finally causes a system crash.
> > Do you have any suggestions to modify the r8152 driver to prevent this
> > situation happened?
> >
> > Regards,
> > Jason
> >
> Hi,
> 
> Hayes proposed a solution. Basically you solve this the way HID or WDM do it
> delaying resubmission. This makes me wonder whether this problem is specific
> to any driver. If it is not, as I would argue, do we have a deficiency
> in our API?
> 
> Should we have something like: usb_submit_delayed_urb() ?

There has been some discussion about this in the past.

In general, -EPROTO is almost always a non-recoverable error.  In 
usually occurs when a USB cable has been unplugged, before the 
upstream hub has notified the kernel about the unplug event.  It also 
can occur when the device's firmware has crashed.

I do tend to think there is a deficiency in our API, and that it 
should be fixed by making the core logically disable an endpoint 
(clear the ep->enabled flag) whenever an URB for that endpoint 
completes with -EPROTO, -EILSEQ, or -ETIME status.  (In retrospect, 
using three distinct status codes for these errors was a mistake.)  
Then we wouldn't have to go through this piecemeal approach, 
modifying individual drivers to make them give up whenever they get 
one of these errors.

But then we'd have also have to make sure drivers have a way to 
logically re-enable endpoints, for the unlikely case that the error 
can be recovered from.  Certainly set-config, set-interface, and 
clear-halt should do this.  Anything else?

Alan Stern

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] r8152: stop submitting rx for -EPROTO
@ 2021-09-30 15:18         ` Alan Stern
  0 siblings, 0 replies; 44+ messages in thread
From: Alan Stern @ 2021-09-30 15:18 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: Jason-ch Chen, Hayes Wang, matthias.bgg, linux-usb, netdev,
	linux-kernel, linux-arm-kernel, linux-mediatek,
	Project_Global_Chrome_Upstream_Group, hsinyi, nic_swsd

On Thu, Sep 30, 2021 at 11:30:17AM +0200, Oliver Neukum wrote:
> 
> On 29.09.21 11:52, Jason-ch Chen wrote:
> > On Wed, 2021-09-29 at 08:14 +0000, Hayes Wang wrote:
> >>
> > Hi Hayes,
> >
> > Sometimes Rx submits rapidly and the USB kernel driver of opensource
> > cannot receive any disconnect event due to CPU heavy loading, which
> > finally causes a system crash.
> > Do you have any suggestions to modify the r8152 driver to prevent this
> > situation happened?
> >
> > Regards,
> > Jason
> >
> Hi,
> 
> Hayes proposed a solution. Basically you solve this the way HID or WDM do it
> delaying resubmission. This makes me wonder whether this problem is specific
> to any driver. If it is not, as I would argue, do we have a deficiency
> in our API?
> 
> Should we have something like: usb_submit_delayed_urb() ?

There has been some discussion about this in the past.

In general, -EPROTO is almost always a non-recoverable error.  In 
usually occurs when a USB cable has been unplugged, before the 
upstream hub has notified the kernel about the unplug event.  It also 
can occur when the device's firmware has crashed.

I do tend to think there is a deficiency in our API, and that it 
should be fixed by making the core logically disable an endpoint 
(clear the ep->enabled flag) whenever an URB for that endpoint 
completes with -EPROTO, -EILSEQ, or -ETIME status.  (In retrospect, 
using three distinct status codes for these errors was a mistake.)  
Then we wouldn't have to go through this piecemeal approach, 
modifying individual drivers to make them give up whenever they get 
one of these errors.

But then we'd have also have to make sure drivers have a way to 
logically re-enable endpoints, for the unlikely case that the error 
can be recovered from.  Certainly set-config, set-interface, and 
clear-halt should do this.  Anything else?

Alan Stern

_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] r8152: stop submitting rx for -EPROTO
@ 2021-09-30 15:18         ` Alan Stern
  0 siblings, 0 replies; 44+ messages in thread
From: Alan Stern @ 2021-09-30 15:18 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: Jason-ch Chen, Hayes Wang, matthias.bgg, linux-usb, netdev,
	linux-kernel, linux-arm-kernel, linux-mediatek,
	Project_Global_Chrome_Upstream_Group, hsinyi, nic_swsd

On Thu, Sep 30, 2021 at 11:30:17AM +0200, Oliver Neukum wrote:
> 
> On 29.09.21 11:52, Jason-ch Chen wrote:
> > On Wed, 2021-09-29 at 08:14 +0000, Hayes Wang wrote:
> >>
> > Hi Hayes,
> >
> > Sometimes Rx submits rapidly and the USB kernel driver of opensource
> > cannot receive any disconnect event due to CPU heavy loading, which
> > finally causes a system crash.
> > Do you have any suggestions to modify the r8152 driver to prevent this
> > situation happened?
> >
> > Regards,
> > Jason
> >
> Hi,
> 
> Hayes proposed a solution. Basically you solve this the way HID or WDM do it
> delaying resubmission. This makes me wonder whether this problem is specific
> to any driver. If it is not, as I would argue, do we have a deficiency
> in our API?
> 
> Should we have something like: usb_submit_delayed_urb() ?

There has been some discussion about this in the past.

In general, -EPROTO is almost always a non-recoverable error.  In 
usually occurs when a USB cable has been unplugged, before the 
upstream hub has notified the kernel about the unplug event.  It also 
can occur when the device's firmware has crashed.

I do tend to think there is a deficiency in our API, and that it 
should be fixed by making the core logically disable an endpoint 
(clear the ep->enabled flag) whenever an URB for that endpoint 
completes with -EPROTO, -EILSEQ, or -ETIME status.  (In retrospect, 
using three distinct status codes for these errors was a mistake.)  
Then we wouldn't have to go through this piecemeal approach, 
modifying individual drivers to make them give up whenever they get 
one of these errors.

But then we'd have also have to make sure drivers have a way to 
logically re-enable endpoints, for the unlikely case that the error 
can be recovered from.  Certainly set-config, set-interface, and 
clear-halt should do this.  Anything else?

Alan Stern

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [PATCH] r8152: stop submitting rx for -EPROTO
  2021-09-30  9:30       ` Oliver Neukum
  (?)
@ 2021-09-30 16:13         ` Hayes Wang
  -1 siblings, 0 replies; 44+ messages in thread
From: Hayes Wang @ 2021-09-30 16:13 UTC (permalink / raw)
  To: Oliver Neukum, Jason-ch Chen, matthias.bgg
  Cc: linux-usb, netdev, linux-kernel, linux-arm-kernel,
	linux-mediatek, Project_Global_Chrome_Upstream_Group, hsinyi,
	nic_swsd

Oliver Neukum <oneukum@suse.com>
> Sent: Thursday, September 30, 2021 5:30 PM
[...]
> Hi,
> 
> Hayes proposed a solution. Basically you solve this the way HID or WDM do it
> delaying resubmission. This makes me wonder whether this problem is specific
> to any driver. If it is not, as I would argue, do we have a deficiency
> in our API?

I think the major question is that the driver doesn't know whether
it is necessary to stop submitting bulk transfer or not. There are
two situations with the same error code. One needs to resubmit
the bulk transfer. The other needs to stop the transfer. The original
idea is that the disconnect event would stop submitting transfer for
the second situation. However, for this case, the disconnect event
comes very late, so the submission couldn't be stopped in time.
The best solution is the driver could get another error code which
indicates the device is disappear for the second situation.  Then,
I don't need to do delayed resubmission.

Best Regards,
Hayes

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [PATCH] r8152: stop submitting rx for -EPROTO
@ 2021-09-30 16:13         ` Hayes Wang
  0 siblings, 0 replies; 44+ messages in thread
From: Hayes Wang @ 2021-09-30 16:13 UTC (permalink / raw)
  To: Oliver Neukum, Jason-ch Chen, matthias.bgg
  Cc: linux-usb, netdev, linux-kernel, linux-arm-kernel,
	linux-mediatek, Project_Global_Chrome_Upstream_Group, hsinyi,
	nic_swsd

Oliver Neukum <oneukum@suse.com>
> Sent: Thursday, September 30, 2021 5:30 PM
[...]
> Hi,
> 
> Hayes proposed a solution. Basically you solve this the way HID or WDM do it
> delaying resubmission. This makes me wonder whether this problem is specific
> to any driver. If it is not, as I would argue, do we have a deficiency
> in our API?

I think the major question is that the driver doesn't know whether
it is necessary to stop submitting bulk transfer or not. There are
two situations with the same error code. One needs to resubmit
the bulk transfer. The other needs to stop the transfer. The original
idea is that the disconnect event would stop submitting transfer for
the second situation. However, for this case, the disconnect event
comes very late, so the submission couldn't be stopped in time.
The best solution is the driver could get another error code which
indicates the device is disappear for the second situation.  Then,
I don't need to do delayed resubmission.

Best Regards,
Hayes
_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [PATCH] r8152: stop submitting rx for -EPROTO
@ 2021-09-30 16:13         ` Hayes Wang
  0 siblings, 0 replies; 44+ messages in thread
From: Hayes Wang @ 2021-09-30 16:13 UTC (permalink / raw)
  To: Oliver Neukum, Jason-ch Chen, matthias.bgg
  Cc: linux-usb, netdev, linux-kernel, linux-arm-kernel,
	linux-mediatek, Project_Global_Chrome_Upstream_Group, hsinyi,
	nic_swsd

Oliver Neukum <oneukum@suse.com>
> Sent: Thursday, September 30, 2021 5:30 PM
[...]
> Hi,
> 
> Hayes proposed a solution. Basically you solve this the way HID or WDM do it
> delaying resubmission. This makes me wonder whether this problem is specific
> to any driver. If it is not, as I would argue, do we have a deficiency
> in our API?

I think the major question is that the driver doesn't know whether
it is necessary to stop submitting bulk transfer or not. There are
two situations with the same error code. One needs to resubmit
the bulk transfer. The other needs to stop the transfer. The original
idea is that the disconnect event would stop submitting transfer for
the second situation. However, for this case, the disconnect event
comes very late, so the submission couldn't be stopped in time.
The best solution is the driver could get another error code which
indicates the device is disappear for the second situation.  Then,
I don't need to do delayed resubmission.

Best Regards,
Hayes
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] r8152: stop submitting rx for -EPROTO
  2021-09-30  2:41       ` Hayes Wang
  (?)
@ 2021-10-01  1:36         ` Jason-ch Chen
  -1 siblings, 0 replies; 44+ messages in thread
From: Jason-ch Chen @ 2021-10-01  1:36 UTC (permalink / raw)
  To: Hayes Wang, matthias.bgg
  Cc: linux-usb, netdev, linux-kernel, linux-arm-kernel,
	linux-mediatek, Project_Global_Chrome_Upstream_Group, hsinyi,
	nic_swsd

On Thu, 2021-09-30 at 02:41 +0000, Hayes Wang wrote:
> Jason-ch Chen <jason-ch.chen@mediatek.com>
> > Sent: Wednesday, September 29, 2021 5:53 PM
> 
> [...]
> > Hi Hayes,
> > 
> > Sometimes Rx submits rapidly and the USB kernel driver of
> > opensource
> > cannot receive any disconnect event due to CPU heavy loading, which
> > finally causes a system crash.
> > Do you have any suggestions to modify the r8152 driver to prevent
> > this
> > situation happened?
> 
> Do you mind to try the following patch?
> It avoids to re-submit RX immediately.
> 
> diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
> index 60ba9b734055..bfe00af8283f 100644
> --- a/drivers/net/usb/r8152.c
> +++ b/drivers/net/usb/r8152.c
> @@ -767,6 +767,7 @@ enum rtl8152_flags {
>  	PHY_RESET,
>  	SCHEDULE_TASKLET,
>  	GREEN_ETHERNET,
> +	SCHEDULE_NAPI,
>  };
>  
>  #define DEVICE_ID_THINKPAD_THUNDERBOLT3_DOCK_GEN2	0x3082
> @@ -1770,6 +1771,14 @@ static void read_bulk_callback(struct urb
> *urb)
>  		rtl_set_unplug(tp);
>  		netif_device_detach(tp->netdev);
>  		return;
> +	case -EPROTO:
> +		urb->actual_length = 0;
> +		spin_lock_irqsave(&tp->rx_lock, flags);
> +		list_add_tail(&agg->list, &tp->rx_done);
> +		spin_unlock_irqrestore(&tp->rx_lock, flags);
> +		set_bit(SCHEDULE_NAPI, &tp->flags);
> +		schedule_delayed_work(&tp->schedule, 1);
> +		return;
>  	case -ENOENT:
>  		return;	/* the urb is in unlink state */
>  	case -ETIME:
> @@ -2425,6 +2434,7 @@ static int rx_bottom(struct r8152 *tp, int
> budget)
>  	if (list_empty(&tp->rx_done))
>  		goto out1;
>  
> +	clear_bit(SCHEDULE_NAPI, &tp->flags);
>  	INIT_LIST_HEAD(&rx_queue);
>  	spin_lock_irqsave(&tp->rx_lock, flags);
>  	list_splice_init(&tp->rx_done, &rx_queue);
> @@ -2441,7 +2451,7 @@ static int rx_bottom(struct r8152 *tp, int
> budget)
>  
>  		agg = list_entry(cursor, struct rx_agg, list);
>  		urb = agg->urb;
> -		if (urb->actual_length < ETH_ZLEN)
> +		if (urb->status != 0 || urb->actual_length < ETH_ZLEN)
>  			goto submit;
>  
>  		agg_free = rtl_get_free_rx(tp, GFP_ATOMIC);
> @@ -6643,6 +6653,10 @@ static void rtl_work_func_t(struct work_struct
> *work)
>  	    netif_carrier_ok(tp->netdev))
>  		tasklet_schedule(&tp->tx_tl);
>  
> +	if (test_and_clear_bit(SCHEDULE_NAPI, &tp->flags) &&
> +	    !list_empty(&tp->rx_done))
> +		napi_schedule(&tp->napi);
> +
>  	mutex_unlock(&tp->control);
>  
>  out1:
> 
> 
> Best Regards,
> Hayes

Hi,

This patch has been verified.
It did avoid Rx re-submit immediately.

Thanks,
Jason


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] r8152: stop submitting rx for -EPROTO
@ 2021-10-01  1:36         ` Jason-ch Chen
  0 siblings, 0 replies; 44+ messages in thread
From: Jason-ch Chen @ 2021-10-01  1:36 UTC (permalink / raw)
  To: Hayes Wang, matthias.bgg
  Cc: linux-usb, netdev, linux-kernel, linux-arm-kernel,
	linux-mediatek, Project_Global_Chrome_Upstream_Group, hsinyi,
	nic_swsd

On Thu, 2021-09-30 at 02:41 +0000, Hayes Wang wrote:
> Jason-ch Chen <jason-ch.chen@mediatek.com>
> > Sent: Wednesday, September 29, 2021 5:53 PM
> 
> [...]
> > Hi Hayes,
> > 
> > Sometimes Rx submits rapidly and the USB kernel driver of
> > opensource
> > cannot receive any disconnect event due to CPU heavy loading, which
> > finally causes a system crash.
> > Do you have any suggestions to modify the r8152 driver to prevent
> > this
> > situation happened?
> 
> Do you mind to try the following patch?
> It avoids to re-submit RX immediately.
> 
> diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
> index 60ba9b734055..bfe00af8283f 100644
> --- a/drivers/net/usb/r8152.c
> +++ b/drivers/net/usb/r8152.c
> @@ -767,6 +767,7 @@ enum rtl8152_flags {
>  	PHY_RESET,
>  	SCHEDULE_TASKLET,
>  	GREEN_ETHERNET,
> +	SCHEDULE_NAPI,
>  };
>  
>  #define DEVICE_ID_THINKPAD_THUNDERBOLT3_DOCK_GEN2	0x3082
> @@ -1770,6 +1771,14 @@ static void read_bulk_callback(struct urb
> *urb)
>  		rtl_set_unplug(tp);
>  		netif_device_detach(tp->netdev);
>  		return;
> +	case -EPROTO:
> +		urb->actual_length = 0;
> +		spin_lock_irqsave(&tp->rx_lock, flags);
> +		list_add_tail(&agg->list, &tp->rx_done);
> +		spin_unlock_irqrestore(&tp->rx_lock, flags);
> +		set_bit(SCHEDULE_NAPI, &tp->flags);
> +		schedule_delayed_work(&tp->schedule, 1);
> +		return;
>  	case -ENOENT:
>  		return;	/* the urb is in unlink state */
>  	case -ETIME:
> @@ -2425,6 +2434,7 @@ static int rx_bottom(struct r8152 *tp, int
> budget)
>  	if (list_empty(&tp->rx_done))
>  		goto out1;
>  
> +	clear_bit(SCHEDULE_NAPI, &tp->flags);
>  	INIT_LIST_HEAD(&rx_queue);
>  	spin_lock_irqsave(&tp->rx_lock, flags);
>  	list_splice_init(&tp->rx_done, &rx_queue);
> @@ -2441,7 +2451,7 @@ static int rx_bottom(struct r8152 *tp, int
> budget)
>  
>  		agg = list_entry(cursor, struct rx_agg, list);
>  		urb = agg->urb;
> -		if (urb->actual_length < ETH_ZLEN)
> +		if (urb->status != 0 || urb->actual_length < ETH_ZLEN)
>  			goto submit;
>  
>  		agg_free = rtl_get_free_rx(tp, GFP_ATOMIC);
> @@ -6643,6 +6653,10 @@ static void rtl_work_func_t(struct work_struct
> *work)
>  	    netif_carrier_ok(tp->netdev))
>  		tasklet_schedule(&tp->tx_tl);
>  
> +	if (test_and_clear_bit(SCHEDULE_NAPI, &tp->flags) &&
> +	    !list_empty(&tp->rx_done))
> +		napi_schedule(&tp->napi);
> +
>  	mutex_unlock(&tp->control);
>  
>  out1:
> 
> 
> Best Regards,
> Hayes

Hi,

This patch has been verified.
It did avoid Rx re-submit immediately.

Thanks,
Jason


_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] r8152: stop submitting rx for -EPROTO
@ 2021-10-01  1:36         ` Jason-ch Chen
  0 siblings, 0 replies; 44+ messages in thread
From: Jason-ch Chen @ 2021-10-01  1:36 UTC (permalink / raw)
  To: Hayes Wang, matthias.bgg
  Cc: linux-usb, netdev, linux-kernel, linux-arm-kernel,
	linux-mediatek, Project_Global_Chrome_Upstream_Group, hsinyi,
	nic_swsd

On Thu, 2021-09-30 at 02:41 +0000, Hayes Wang wrote:
> Jason-ch Chen <jason-ch.chen@mediatek.com>
> > Sent: Wednesday, September 29, 2021 5:53 PM
> 
> [...]
> > Hi Hayes,
> > 
> > Sometimes Rx submits rapidly and the USB kernel driver of
> > opensource
> > cannot receive any disconnect event due to CPU heavy loading, which
> > finally causes a system crash.
> > Do you have any suggestions to modify the r8152 driver to prevent
> > this
> > situation happened?
> 
> Do you mind to try the following patch?
> It avoids to re-submit RX immediately.
> 
> diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
> index 60ba9b734055..bfe00af8283f 100644
> --- a/drivers/net/usb/r8152.c
> +++ b/drivers/net/usb/r8152.c
> @@ -767,6 +767,7 @@ enum rtl8152_flags {
>  	PHY_RESET,
>  	SCHEDULE_TASKLET,
>  	GREEN_ETHERNET,
> +	SCHEDULE_NAPI,
>  };
>  
>  #define DEVICE_ID_THINKPAD_THUNDERBOLT3_DOCK_GEN2	0x3082
> @@ -1770,6 +1771,14 @@ static void read_bulk_callback(struct urb
> *urb)
>  		rtl_set_unplug(tp);
>  		netif_device_detach(tp->netdev);
>  		return;
> +	case -EPROTO:
> +		urb->actual_length = 0;
> +		spin_lock_irqsave(&tp->rx_lock, flags);
> +		list_add_tail(&agg->list, &tp->rx_done);
> +		spin_unlock_irqrestore(&tp->rx_lock, flags);
> +		set_bit(SCHEDULE_NAPI, &tp->flags);
> +		schedule_delayed_work(&tp->schedule, 1);
> +		return;
>  	case -ENOENT:
>  		return;	/* the urb is in unlink state */
>  	case -ETIME:
> @@ -2425,6 +2434,7 @@ static int rx_bottom(struct r8152 *tp, int
> budget)
>  	if (list_empty(&tp->rx_done))
>  		goto out1;
>  
> +	clear_bit(SCHEDULE_NAPI, &tp->flags);
>  	INIT_LIST_HEAD(&rx_queue);
>  	spin_lock_irqsave(&tp->rx_lock, flags);
>  	list_splice_init(&tp->rx_done, &rx_queue);
> @@ -2441,7 +2451,7 @@ static int rx_bottom(struct r8152 *tp, int
> budget)
>  
>  		agg = list_entry(cursor, struct rx_agg, list);
>  		urb = agg->urb;
> -		if (urb->actual_length < ETH_ZLEN)
> +		if (urb->status != 0 || urb->actual_length < ETH_ZLEN)
>  			goto submit;
>  
>  		agg_free = rtl_get_free_rx(tp, GFP_ATOMIC);
> @@ -6643,6 +6653,10 @@ static void rtl_work_func_t(struct work_struct
> *work)
>  	    netif_carrier_ok(tp->netdev))
>  		tasklet_schedule(&tp->tx_tl);
>  
> +	if (test_and_clear_bit(SCHEDULE_NAPI, &tp->flags) &&
> +	    !list_empty(&tp->rx_done))
> +		napi_schedule(&tp->napi);
> +
>  	mutex_unlock(&tp->control);
>  
>  out1:
> 
> 
> Best Regards,
> Hayes

Hi,

This patch has been verified.
It did avoid Rx re-submit immediately.

Thanks,
Jason


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [PATCH] r8152: stop submitting rx for -EPROTO
  2021-09-30 15:18         ` Alan Stern
  (?)
@ 2021-10-01  2:40           ` Hayes Wang
  -1 siblings, 0 replies; 44+ messages in thread
From: Hayes Wang @ 2021-10-01  2:40 UTC (permalink / raw)
  To: Alan Stern, Oliver Neukum
  Cc: Jason-ch Chen, matthias.bgg, linux-usb, netdev, linux-kernel,
	linux-arm-kernel, linux-mediatek,
	Project_Global_Chrome_Upstream_Group, hsinyi, nic_swsd

Alan Stern <stern@rowland.harvard.edu>
[...]
> There has been some discussion about this in the past.
> 
> In general, -EPROTO is almost always a non-recoverable error.

Excuse me. I am confused about the above description.
I got -EPROTO before, when I debugged another issue.
However, the bulk transfer still worked after I resubmitted
the transfer. I didn't do anything to recover it. That is why
I do resubmission for -EPROTO.

Best Regards,
Hayes


^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [PATCH] r8152: stop submitting rx for -EPROTO
@ 2021-10-01  2:40           ` Hayes Wang
  0 siblings, 0 replies; 44+ messages in thread
From: Hayes Wang @ 2021-10-01  2:40 UTC (permalink / raw)
  To: Alan Stern, Oliver Neukum
  Cc: Jason-ch Chen, matthias.bgg, linux-usb, netdev, linux-kernel,
	linux-arm-kernel, linux-mediatek,
	Project_Global_Chrome_Upstream_Group, hsinyi, nic_swsd

Alan Stern <stern@rowland.harvard.edu>
[...]
> There has been some discussion about this in the past.
> 
> In general, -EPROTO is almost always a non-recoverable error.

Excuse me. I am confused about the above description.
I got -EPROTO before, when I debugged another issue.
However, the bulk transfer still worked after I resubmitted
the transfer. I didn't do anything to recover it. That is why
I do resubmission for -EPROTO.

Best Regards,
Hayes


_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [PATCH] r8152: stop submitting rx for -EPROTO
@ 2021-10-01  2:40           ` Hayes Wang
  0 siblings, 0 replies; 44+ messages in thread
From: Hayes Wang @ 2021-10-01  2:40 UTC (permalink / raw)
  To: Alan Stern, Oliver Neukum
  Cc: Jason-ch Chen, matthias.bgg, linux-usb, netdev, linux-kernel,
	linux-arm-kernel, linux-mediatek,
	Project_Global_Chrome_Upstream_Group, hsinyi, nic_swsd

Alan Stern <stern@rowland.harvard.edu>
[...]
> There has been some discussion about this in the past.
> 
> In general, -EPROTO is almost always a non-recoverable error.

Excuse me. I am confused about the above description.
I got -EPROTO before, when I debugged another issue.
However, the bulk transfer still worked after I resubmitted
the transfer. I didn't do anything to recover it. That is why
I do resubmission for -EPROTO.

Best Regards,
Hayes


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [PATCH] r8152: stop submitting rx for -EPROTO
  2021-10-01  2:40           ` Hayes Wang
  (?)
@ 2021-10-01  3:26             ` Hayes Wang
  -1 siblings, 0 replies; 44+ messages in thread
From: Hayes Wang @ 2021-10-01  3:26 UTC (permalink / raw)
  To: Alan Stern, Oliver Neukum
  Cc: Jason-ch Chen, matthias.bgg, linux-usb, netdev, linux-kernel,
	linux-arm-kernel, linux-mediatek,
	Project_Global_Chrome_Upstream_Group, hsinyi, nic_swsd

> Alan Stern <stern@rowland.harvard.edu>
> [...]
> > There has been some discussion about this in the past.
> >
> > In general, -EPROTO is almost always a non-recoverable error.
> 
> Excuse me. I am confused about the above description.
> I got -EPROTO before, when I debugged another issue.
> However, the bulk transfer still worked after I resubmitted
> the transfer. I didn't do anything to recover it. That is why
> I do resubmission for -EPROTO.

I check the Linux driver and the xHCI spec.
The driver gets -EPROTO for bulk transfer, when the host
returns COMP_USB_TRANSACTION_ERROR.
According to the spec of xHCI, USB TRANSACTION ERROR
means the host did not receive a valid response from the
device (Timeout, CRC, Bad PID, unexpected NYET, etc.).
It seems to be reasonable why resubmission sometimes works.

Best Regards,
Hayes



^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [PATCH] r8152: stop submitting rx for -EPROTO
@ 2021-10-01  3:26             ` Hayes Wang
  0 siblings, 0 replies; 44+ messages in thread
From: Hayes Wang @ 2021-10-01  3:26 UTC (permalink / raw)
  To: Alan Stern, Oliver Neukum
  Cc: Jason-ch Chen, matthias.bgg, linux-usb, netdev, linux-kernel,
	linux-arm-kernel, linux-mediatek,
	Project_Global_Chrome_Upstream_Group, hsinyi, nic_swsd

> Alan Stern <stern@rowland.harvard.edu>
> [...]
> > There has been some discussion about this in the past.
> >
> > In general, -EPROTO is almost always a non-recoverable error.
> 
> Excuse me. I am confused about the above description.
> I got -EPROTO before, when I debugged another issue.
> However, the bulk transfer still worked after I resubmitted
> the transfer. I didn't do anything to recover it. That is why
> I do resubmission for -EPROTO.

I check the Linux driver and the xHCI spec.
The driver gets -EPROTO for bulk transfer, when the host
returns COMP_USB_TRANSACTION_ERROR.
According to the spec of xHCI, USB TRANSACTION ERROR
means the host did not receive a valid response from the
device (Timeout, CRC, Bad PID, unexpected NYET, etc.).
It seems to be reasonable why resubmission sometimes works.

Best Regards,
Hayes



_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [PATCH] r8152: stop submitting rx for -EPROTO
@ 2021-10-01  3:26             ` Hayes Wang
  0 siblings, 0 replies; 44+ messages in thread
From: Hayes Wang @ 2021-10-01  3:26 UTC (permalink / raw)
  To: Alan Stern, Oliver Neukum
  Cc: Jason-ch Chen, matthias.bgg, linux-usb, netdev, linux-kernel,
	linux-arm-kernel, linux-mediatek,
	Project_Global_Chrome_Upstream_Group, hsinyi, nic_swsd

> Alan Stern <stern@rowland.harvard.edu>
> [...]
> > There has been some discussion about this in the past.
> >
> > In general, -EPROTO is almost always a non-recoverable error.
> 
> Excuse me. I am confused about the above description.
> I got -EPROTO before, when I debugged another issue.
> However, the bulk transfer still worked after I resubmitted
> the transfer. I didn't do anything to recover it. That is why
> I do resubmission for -EPROTO.

I check the Linux driver and the xHCI spec.
The driver gets -EPROTO for bulk transfer, when the host
returns COMP_USB_TRANSACTION_ERROR.
According to the spec of xHCI, USB TRANSACTION ERROR
means the host did not receive a valid response from the
device (Timeout, CRC, Bad PID, unexpected NYET, etc.).
It seems to be reasonable why resubmission sometimes works.

Best Regards,
Hayes



_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] r8152: stop submitting rx for -EPROTO
  2021-10-01  3:26             ` Hayes Wang
  (?)
@ 2021-10-01 15:22               ` Alan Stern
  -1 siblings, 0 replies; 44+ messages in thread
From: Alan Stern @ 2021-10-01 15:22 UTC (permalink / raw)
  To: Hayes Wang
  Cc: Oliver Neukum, Jason-ch Chen, matthias.bgg, linux-usb, netdev,
	linux-kernel, linux-arm-kernel, linux-mediatek,
	Project_Global_Chrome_Upstream_Group, hsinyi, nic_swsd

On Fri, Oct 01, 2021 at 03:26:48AM +0000, Hayes Wang wrote:
> > Alan Stern <stern@rowland.harvard.edu>
> > [...]
> > > There has been some discussion about this in the past.
> > >
> > > In general, -EPROTO is almost always a non-recoverable error.
> > 
> > Excuse me. I am confused about the above description.
> > I got -EPROTO before, when I debugged another issue.
> > However, the bulk transfer still worked after I resubmitted
> > the transfer. I didn't do anything to recover it. That is why
> > I do resubmission for -EPROTO.
> 
> I check the Linux driver and the xHCI spec.
> The driver gets -EPROTO for bulk transfer, when the host
> returns COMP_USB_TRANSACTION_ERROR.
> According to the spec of xHCI, USB TRANSACTION ERROR
> means the host did not receive a valid response from the
> device (Timeout, CRC, Bad PID, unexpected NYET, etc.).

That's right.  If the device and cable are working properly, this 
should never happen.  Or only extremely rarely (for example, caused 
by external electromagnetic interference).

> It seems to be reasonable why resubmission sometimes works.

Did you ever track down the reason why you got the -EPROTO error 
while debugging that other issue?  Can you reproduce it?

Alan Stern

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] r8152: stop submitting rx for -EPROTO
@ 2021-10-01 15:22               ` Alan Stern
  0 siblings, 0 replies; 44+ messages in thread
From: Alan Stern @ 2021-10-01 15:22 UTC (permalink / raw)
  To: Hayes Wang
  Cc: Oliver Neukum, Jason-ch Chen, matthias.bgg, linux-usb, netdev,
	linux-kernel, linux-arm-kernel, linux-mediatek,
	Project_Global_Chrome_Upstream_Group, hsinyi, nic_swsd

On Fri, Oct 01, 2021 at 03:26:48AM +0000, Hayes Wang wrote:
> > Alan Stern <stern@rowland.harvard.edu>
> > [...]
> > > There has been some discussion about this in the past.
> > >
> > > In general, -EPROTO is almost always a non-recoverable error.
> > 
> > Excuse me. I am confused about the above description.
> > I got -EPROTO before, when I debugged another issue.
> > However, the bulk transfer still worked after I resubmitted
> > the transfer. I didn't do anything to recover it. That is why
> > I do resubmission for -EPROTO.
> 
> I check the Linux driver and the xHCI spec.
> The driver gets -EPROTO for bulk transfer, when the host
> returns COMP_USB_TRANSACTION_ERROR.
> According to the spec of xHCI, USB TRANSACTION ERROR
> means the host did not receive a valid response from the
> device (Timeout, CRC, Bad PID, unexpected NYET, etc.).

That's right.  If the device and cable are working properly, this 
should never happen.  Or only extremely rarely (for example, caused 
by external electromagnetic interference).

> It seems to be reasonable why resubmission sometimes works.

Did you ever track down the reason why you got the -EPROTO error 
while debugging that other issue?  Can you reproduce it?

Alan Stern

_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] r8152: stop submitting rx for -EPROTO
@ 2021-10-01 15:22               ` Alan Stern
  0 siblings, 0 replies; 44+ messages in thread
From: Alan Stern @ 2021-10-01 15:22 UTC (permalink / raw)
  To: Hayes Wang
  Cc: Oliver Neukum, Jason-ch Chen, matthias.bgg, linux-usb, netdev,
	linux-kernel, linux-arm-kernel, linux-mediatek,
	Project_Global_Chrome_Upstream_Group, hsinyi, nic_swsd

On Fri, Oct 01, 2021 at 03:26:48AM +0000, Hayes Wang wrote:
> > Alan Stern <stern@rowland.harvard.edu>
> > [...]
> > > There has been some discussion about this in the past.
> > >
> > > In general, -EPROTO is almost always a non-recoverable error.
> > 
> > Excuse me. I am confused about the above description.
> > I got -EPROTO before, when I debugged another issue.
> > However, the bulk transfer still worked after I resubmitted
> > the transfer. I didn't do anything to recover it. That is why
> > I do resubmission for -EPROTO.
> 
> I check the Linux driver and the xHCI spec.
> The driver gets -EPROTO for bulk transfer, when the host
> returns COMP_USB_TRANSACTION_ERROR.
> According to the spec of xHCI, USB TRANSACTION ERROR
> means the host did not receive a valid response from the
> device (Timeout, CRC, Bad PID, unexpected NYET, etc.).

That's right.  If the device and cable are working properly, this 
should never happen.  Or only extremely rarely (for example, caused 
by external electromagnetic interference).

> It seems to be reasonable why resubmission sometimes works.

Did you ever track down the reason why you got the -EPROTO error 
while debugging that other issue?  Can you reproduce it?

Alan Stern

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [PATCH] r8152: stop submitting rx for -EPROTO
  2021-10-01 15:22               ` Alan Stern
  (?)
@ 2021-10-04  2:15                 ` Hayes Wang
  -1 siblings, 0 replies; 44+ messages in thread
From: Hayes Wang @ 2021-10-04  2:15 UTC (permalink / raw)
  To: Alan Stern
  Cc: Oliver Neukum, Jason-ch Chen, matthias.bgg, linux-usb, netdev,
	linux-kernel, linux-arm-kernel, linux-mediatek,
	Project_Global_Chrome_Upstream_Group, hsinyi, nic_swsd

Alan Stern <stern@rowland.harvard.edu>
> Sent: Friday, October 1, 2021 11:22 PM
[...]
> That's right.  If the device and cable are working properly, this
> should never happen.  Or only extremely rarely (for example, caused
> by external electromagnetic interference).
> 
> > It seems to be reasonable why resubmission sometimes works.
> 
> Did you ever track down the reason why you got the -EPROTO error
> while debugging that other issue?  Can you reproduce it?

I didn't follow it, because it was not relative to the driver. Besides, we
didn't focus on -EPROTO at that time, because it was not the major issue.
And the -EPROTO occurred rarely indeed during a lot of transmission.
The hw engineer confirmed that the device completed the transfer
normally, but the driver still got an error from the host. I don't sure if
there was a USB HUB between the device and the USB host controller.
That are all what I know.

Best Regards,
Hayes


^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [PATCH] r8152: stop submitting rx for -EPROTO
@ 2021-10-04  2:15                 ` Hayes Wang
  0 siblings, 0 replies; 44+ messages in thread
From: Hayes Wang @ 2021-10-04  2:15 UTC (permalink / raw)
  To: Alan Stern
  Cc: Oliver Neukum, Jason-ch Chen, matthias.bgg, linux-usb, netdev,
	linux-kernel, linux-arm-kernel, linux-mediatek,
	Project_Global_Chrome_Upstream_Group, hsinyi, nic_swsd

Alan Stern <stern@rowland.harvard.edu>
> Sent: Friday, October 1, 2021 11:22 PM
[...]
> That's right.  If the device and cable are working properly, this
> should never happen.  Or only extremely rarely (for example, caused
> by external electromagnetic interference).
> 
> > It seems to be reasonable why resubmission sometimes works.
> 
> Did you ever track down the reason why you got the -EPROTO error
> while debugging that other issue?  Can you reproduce it?

I didn't follow it, because it was not relative to the driver. Besides, we
didn't focus on -EPROTO at that time, because it was not the major issue.
And the -EPROTO occurred rarely indeed during a lot of transmission.
The hw engineer confirmed that the device completed the transfer
normally, but the driver still got an error from the host. I don't sure if
there was a USB HUB between the device and the USB host controller.
That are all what I know.

Best Regards,
Hayes


_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [PATCH] r8152: stop submitting rx for -EPROTO
@ 2021-10-04  2:15                 ` Hayes Wang
  0 siblings, 0 replies; 44+ messages in thread
From: Hayes Wang @ 2021-10-04  2:15 UTC (permalink / raw)
  To: Alan Stern
  Cc: Oliver Neukum, Jason-ch Chen, matthias.bgg, linux-usb, netdev,
	linux-kernel, linux-arm-kernel, linux-mediatek,
	Project_Global_Chrome_Upstream_Group, hsinyi, nic_swsd

Alan Stern <stern@rowland.harvard.edu>
> Sent: Friday, October 1, 2021 11:22 PM
[...]
> That's right.  If the device and cable are working properly, this
> should never happen.  Or only extremely rarely (for example, caused
> by external electromagnetic interference).
> 
> > It seems to be reasonable why resubmission sometimes works.
> 
> Did you ever track down the reason why you got the -EPROTO error
> while debugging that other issue?  Can you reproduce it?

I didn't follow it, because it was not relative to the driver. Besides, we
didn't focus on -EPROTO at that time, because it was not the major issue.
And the -EPROTO occurred rarely indeed during a lot of transmission.
The hw engineer confirmed that the device completed the transfer
normally, but the driver still got an error from the host. I don't sure if
there was a USB HUB between the device and the USB host controller.
That are all what I know.

Best Regards,
Hayes


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH net] r8152: avoid to resubmit rx immediately
  2021-09-29  5:18 ` Jason-ch Chen
                   ` (2 preceding siblings ...)
  (?)
@ 2021-10-04  6:28 ` Hayes Wang
  2021-10-05 11:50   ` patchwork-bot+netdevbpf
  -1 siblings, 1 reply; 44+ messages in thread
From: Hayes Wang @ 2021-10-04  6:28 UTC (permalink / raw)
  To: jason-ch.chen, kuba, davem
  Cc: netdev, nic_swsd, linux-kernel, linux-usb, Hayes Wang

For the situation that the disconnect event comes very late when the
device is unplugged, the driver would resubmit the RX bulk transfer
after getting the callback with -EPROTO immediately and continually.
Finally, soft lockup occurs.

This patch avoids to resubmit RX immediately. It uses a workqueue to
schedule the RX NAPI. And the NAPI would resubmit the RX. It let the
disconnect event have opportunity to stop the submission before soft
lockup.

Reported-by: Jason-ch Chen <jason-ch.chen@mediatek.com>
Tested-by: Jason-ch Chen <jason-ch.chen@mediatek.com>
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
---
 drivers/net/usb/r8152.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index 60ba9b734055..f329e39100a7 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -767,6 +767,7 @@ enum rtl8152_flags {
 	PHY_RESET,
 	SCHEDULE_TASKLET,
 	GREEN_ETHERNET,
+	RX_EPROTO,
 };
 
 #define DEVICE_ID_THINKPAD_THUNDERBOLT3_DOCK_GEN2	0x3082
@@ -1770,6 +1771,14 @@ static void read_bulk_callback(struct urb *urb)
 		rtl_set_unplug(tp);
 		netif_device_detach(tp->netdev);
 		return;
+	case -EPROTO:
+		urb->actual_length = 0;
+		spin_lock_irqsave(&tp->rx_lock, flags);
+		list_add_tail(&agg->list, &tp->rx_done);
+		spin_unlock_irqrestore(&tp->rx_lock, flags);
+		set_bit(RX_EPROTO, &tp->flags);
+		schedule_delayed_work(&tp->schedule, 1);
+		return;
 	case -ENOENT:
 		return;	/* the urb is in unlink state */
 	case -ETIME:
@@ -2425,6 +2434,7 @@ static int rx_bottom(struct r8152 *tp, int budget)
 	if (list_empty(&tp->rx_done))
 		goto out1;
 
+	clear_bit(RX_EPROTO, &tp->flags);
 	INIT_LIST_HEAD(&rx_queue);
 	spin_lock_irqsave(&tp->rx_lock, flags);
 	list_splice_init(&tp->rx_done, &rx_queue);
@@ -2441,7 +2451,7 @@ static int rx_bottom(struct r8152 *tp, int budget)
 
 		agg = list_entry(cursor, struct rx_agg, list);
 		urb = agg->urb;
-		if (urb->actual_length < ETH_ZLEN)
+		if (urb->status != 0 || urb->actual_length < ETH_ZLEN)
 			goto submit;
 
 		agg_free = rtl_get_free_rx(tp, GFP_ATOMIC);
@@ -6643,6 +6653,10 @@ static void rtl_work_func_t(struct work_struct *work)
 	    netif_carrier_ok(tp->netdev))
 		tasklet_schedule(&tp->tx_tl);
 
+	if (test_and_clear_bit(RX_EPROTO, &tp->flags) &&
+	    !list_empty(&tp->rx_done))
+		napi_schedule(&tp->napi);
+
 	mutex_unlock(&tp->control);
 
 out1:
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH] r8152: stop submitting rx for -EPROTO
  2021-10-01 15:22               ` Alan Stern
  (?)
@ 2021-10-04 11:44                 ` Oliver Neukum
  -1 siblings, 0 replies; 44+ messages in thread
From: Oliver Neukum @ 2021-10-04 11:44 UTC (permalink / raw)
  To: Alan Stern, Hayes Wang
  Cc: Oliver Neukum, Jason-ch Chen, matthias.bgg, linux-usb, netdev,
	linux-kernel, linux-arm-kernel, linux-mediatek,
	Project_Global_Chrome_Upstream_Group, hsinyi, nic_swsd


On 01.10.21 17:22, Alan Stern wrote:
> On Fri, Oct 01, 2021 at 03:26:48AM +0000, Hayes Wang wrote:
>>> Alan Stern <stern@rowland.harvard.edu>
>>> [...]
>>>> There has been some discussion about this in the past.
>>>>
>>>> In general, -EPROTO is almost always a non-recoverable error.
>>> Excuse me. I am confused about the above description.
>>> I got -EPROTO before, when I debugged another issue.
>>> However, the bulk transfer still worked after I resubmitted
>>> the transfer. I didn't do anything to recover it. That is why
>>> I do resubmission for -EPROTO.
>> I check the Linux driver and the xHCI spec.
>> The driver gets -EPROTO for bulk transfer, when the host
>> returns COMP_USB_TRANSACTION_ERROR.
>> According to the spec of xHCI, USB TRANSACTION ERROR
>> means the host did not receive a valid response from the
>> device (Timeout, CRC, Bad PID, unexpected NYET, etc.).
> That's right.  If the device and cable are working properly, this 
> should never happen.  Or only extremely rarely (for example, caused 
> by external electromagnetic interference).
And the device. I am afraid the condition in your conditional statement
is not as likely to be true as would be desirable for quite a lot setups.
>
>> It seems to be reasonable why resubmission sometimes works.
> Did you ever track down the reason why you got the -EPROTO error 
> while debugging that other issue?  Can you reproduce it?

Is that really the issue though? We are seeing this issue with EPROTO.
But wouldn't we see it with any recoverable error?

AFAICT we are running into a situation without progress because drivers
retry

* forever
* immediately

If we broke any of these conditions the system would proceed and the
hotplug event be eventually be processed. We may ask whether drivers should
retry forever, but I don't see that you can blame it on error codes.

    Regards
        Oliver


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] r8152: stop submitting rx for -EPROTO
@ 2021-10-04 11:44                 ` Oliver Neukum
  0 siblings, 0 replies; 44+ messages in thread
From: Oliver Neukum @ 2021-10-04 11:44 UTC (permalink / raw)
  To: Alan Stern, Hayes Wang
  Cc: Oliver Neukum, Jason-ch Chen, matthias.bgg, linux-usb, netdev,
	linux-kernel, linux-arm-kernel, linux-mediatek,
	Project_Global_Chrome_Upstream_Group, hsinyi, nic_swsd


On 01.10.21 17:22, Alan Stern wrote:
> On Fri, Oct 01, 2021 at 03:26:48AM +0000, Hayes Wang wrote:
>>> Alan Stern <stern@rowland.harvard.edu>
>>> [...]
>>>> There has been some discussion about this in the past.
>>>>
>>>> In general, -EPROTO is almost always a non-recoverable error.
>>> Excuse me. I am confused about the above description.
>>> I got -EPROTO before, when I debugged another issue.
>>> However, the bulk transfer still worked after I resubmitted
>>> the transfer. I didn't do anything to recover it. That is why
>>> I do resubmission for -EPROTO.
>> I check the Linux driver and the xHCI spec.
>> The driver gets -EPROTO for bulk transfer, when the host
>> returns COMP_USB_TRANSACTION_ERROR.
>> According to the spec of xHCI, USB TRANSACTION ERROR
>> means the host did not receive a valid response from the
>> device (Timeout, CRC, Bad PID, unexpected NYET, etc.).
> That's right.  If the device and cable are working properly, this 
> should never happen.  Or only extremely rarely (for example, caused 
> by external electromagnetic interference).
And the device. I am afraid the condition in your conditional statement
is not as likely to be true as would be desirable for quite a lot setups.
>
>> It seems to be reasonable why resubmission sometimes works.
> Did you ever track down the reason why you got the -EPROTO error 
> while debugging that other issue?  Can you reproduce it?

Is that really the issue though? We are seeing this issue with EPROTO.
But wouldn't we see it with any recoverable error?

AFAICT we are running into a situation without progress because drivers
retry

* forever
* immediately

If we broke any of these conditions the system would proceed and the
hotplug event be eventually be processed. We may ask whether drivers should
retry forever, but I don't see that you can blame it on error codes.

    Regards
        Oliver


_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] r8152: stop submitting rx for -EPROTO
@ 2021-10-04 11:44                 ` Oliver Neukum
  0 siblings, 0 replies; 44+ messages in thread
From: Oliver Neukum @ 2021-10-04 11:44 UTC (permalink / raw)
  To: Alan Stern, Hayes Wang
  Cc: Oliver Neukum, Jason-ch Chen, matthias.bgg, linux-usb, netdev,
	linux-kernel, linux-arm-kernel, linux-mediatek,
	Project_Global_Chrome_Upstream_Group, hsinyi, nic_swsd


On 01.10.21 17:22, Alan Stern wrote:
> On Fri, Oct 01, 2021 at 03:26:48AM +0000, Hayes Wang wrote:
>>> Alan Stern <stern@rowland.harvard.edu>
>>> [...]
>>>> There has been some discussion about this in the past.
>>>>
>>>> In general, -EPROTO is almost always a non-recoverable error.
>>> Excuse me. I am confused about the above description.
>>> I got -EPROTO before, when I debugged another issue.
>>> However, the bulk transfer still worked after I resubmitted
>>> the transfer. I didn't do anything to recover it. That is why
>>> I do resubmission for -EPROTO.
>> I check the Linux driver and the xHCI spec.
>> The driver gets -EPROTO for bulk transfer, when the host
>> returns COMP_USB_TRANSACTION_ERROR.
>> According to the spec of xHCI, USB TRANSACTION ERROR
>> means the host did not receive a valid response from the
>> device (Timeout, CRC, Bad PID, unexpected NYET, etc.).
> That's right.  If the device and cable are working properly, this 
> should never happen.  Or only extremely rarely (for example, caused 
> by external electromagnetic interference).
And the device. I am afraid the condition in your conditional statement
is not as likely to be true as would be desirable for quite a lot setups.
>
>> It seems to be reasonable why resubmission sometimes works.
> Did you ever track down the reason why you got the -EPROTO error 
> while debugging that other issue?  Can you reproduce it?

Is that really the issue though? We are seeing this issue with EPROTO.
But wouldn't we see it with any recoverable error?

AFAICT we are running into a situation without progress because drivers
retry

* forever
* immediately

If we broke any of these conditions the system would proceed and the
hotplug event be eventually be processed. We may ask whether drivers should
retry forever, but I don't see that you can blame it on error codes.

    Regards
        Oliver


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] r8152: stop submitting rx for -EPROTO
  2021-10-04 11:44                 ` Oliver Neukum
  (?)
@ 2021-10-04 14:33                   ` Alan Stern
  -1 siblings, 0 replies; 44+ messages in thread
From: Alan Stern @ 2021-10-04 14:33 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: Hayes Wang, Jason-ch Chen, matthias.bgg, linux-usb, netdev,
	linux-kernel, linux-arm-kernel, linux-mediatek,
	Project_Global_Chrome_Upstream_Group, hsinyi, nic_swsd

On Mon, Oct 04, 2021 at 01:44:54PM +0200, Oliver Neukum wrote:
> 
> On 01.10.21 17:22, Alan Stern wrote:
> > On Fri, Oct 01, 2021 at 03:26:48AM +0000, Hayes Wang wrote:
> >>> Alan Stern <stern@rowland.harvard.edu>
> >>> [...]
> >>>> There has been some discussion about this in the past.
> >>>>
> >>>> In general, -EPROTO is almost always a non-recoverable error.
> >>> Excuse me. I am confused about the above description.
> >>> I got -EPROTO before, when I debugged another issue.
> >>> However, the bulk transfer still worked after I resubmitted
> >>> the transfer. I didn't do anything to recover it. That is why
> >>> I do resubmission for -EPROTO.
> >> I check the Linux driver and the xHCI spec.
> >> The driver gets -EPROTO for bulk transfer, when the host
> >> returns COMP_USB_TRANSACTION_ERROR.
> >> According to the spec of xHCI, USB TRANSACTION ERROR
> >> means the host did not receive a valid response from the
> >> device (Timeout, CRC, Bad PID, unexpected NYET, etc.).
> > That's right.  If the device and cable are working properly, this 
> > should never happen.  Or only extremely rarely (for example, caused 
> > by external electromagnetic interference).
> And the device. I am afraid the condition in your conditional statement
> is not as likely to be true as would be desirable for quite a lot setups.

But if the device isn't working, a simple retry is most unlikely to fix 
the problem.  Some form of active error recovery, such as a bus reset, 
will be necessary.  For a non-working cable, even a reset won't help -- 
the user would have to physically adjust or replace the cable.

> >> It seems to be reasonable why resubmission sometimes works.
> > Did you ever track down the reason why you got the -EPROTO error 
> > while debugging that other issue?  Can you reproduce it?
> 
> Is that really the issue though? We are seeing this issue with EPROTO.
> But wouldn't we see it with any recoverable error?

If you mean an error that can be fixed but only by doing something more 
than a simple retry, then yes.  However, the vast majority of USB 
drivers do not attempt anything more than a simple retry.  Relatively 
few of them (including usbhid and mass-storage) are more sophisticated 
in their error handling.

> AFAICT we are running into a situation without progress because drivers
> retry
> 
> * forever
> * immediately
> 
> If we broke any of these conditions the system would proceed and the
> hotplug event be eventually be processed. We may ask whether drivers should
> retry forever, but I don't see that you can blame it on error codes.

It's important to distinguish between:

    1.	errors that are transient and will disappear very quickly,
	meaning that a retry has a good chance of working, and

    2.	errors that are effectively permanent (or at least, long-lived)
	and therefore are highly unlikely to be fixed by retrying.

My point is that there is no reason to retry in case 2, and -EPROTO 
falls into this case (as do -EILSEQ and -ETIME).

Converting drivers to keep track of their retries, to avoid retrying 
forever, would be a fairly large change.  Even implementing delayed 
retries requires some significant work (as you can see in Hayes's recent 
patch -- and that was an easy case because the NAPI infrastructure was 
already present).  It's much simpler to avoid retrying entirely in 
situations where retries won't help.

And it's even simpler if the USB core would automatically prevent 
retries (by failing URB submissions after low-level protocol errors) in 
these situations.

Alan Stern

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] r8152: stop submitting rx for -EPROTO
@ 2021-10-04 14:33                   ` Alan Stern
  0 siblings, 0 replies; 44+ messages in thread
From: Alan Stern @ 2021-10-04 14:33 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: Hayes Wang, Jason-ch Chen, matthias.bgg, linux-usb, netdev,
	linux-kernel, linux-arm-kernel, linux-mediatek,
	Project_Global_Chrome_Upstream_Group, hsinyi, nic_swsd

On Mon, Oct 04, 2021 at 01:44:54PM +0200, Oliver Neukum wrote:
> 
> On 01.10.21 17:22, Alan Stern wrote:
> > On Fri, Oct 01, 2021 at 03:26:48AM +0000, Hayes Wang wrote:
> >>> Alan Stern <stern@rowland.harvard.edu>
> >>> [...]
> >>>> There has been some discussion about this in the past.
> >>>>
> >>>> In general, -EPROTO is almost always a non-recoverable error.
> >>> Excuse me. I am confused about the above description.
> >>> I got -EPROTO before, when I debugged another issue.
> >>> However, the bulk transfer still worked after I resubmitted
> >>> the transfer. I didn't do anything to recover it. That is why
> >>> I do resubmission for -EPROTO.
> >> I check the Linux driver and the xHCI spec.
> >> The driver gets -EPROTO for bulk transfer, when the host
> >> returns COMP_USB_TRANSACTION_ERROR.
> >> According to the spec of xHCI, USB TRANSACTION ERROR
> >> means the host did not receive a valid response from the
> >> device (Timeout, CRC, Bad PID, unexpected NYET, etc.).
> > That's right.  If the device and cable are working properly, this 
> > should never happen.  Or only extremely rarely (for example, caused 
> > by external electromagnetic interference).
> And the device. I am afraid the condition in your conditional statement
> is not as likely to be true as would be desirable for quite a lot setups.

But if the device isn't working, a simple retry is most unlikely to fix 
the problem.  Some form of active error recovery, such as a bus reset, 
will be necessary.  For a non-working cable, even a reset won't help -- 
the user would have to physically adjust or replace the cable.

> >> It seems to be reasonable why resubmission sometimes works.
> > Did you ever track down the reason why you got the -EPROTO error 
> > while debugging that other issue?  Can you reproduce it?
> 
> Is that really the issue though? We are seeing this issue with EPROTO.
> But wouldn't we see it with any recoverable error?

If you mean an error that can be fixed but only by doing something more 
than a simple retry, then yes.  However, the vast majority of USB 
drivers do not attempt anything more than a simple retry.  Relatively 
few of them (including usbhid and mass-storage) are more sophisticated 
in their error handling.

> AFAICT we are running into a situation without progress because drivers
> retry
> 
> * forever
> * immediately
> 
> If we broke any of these conditions the system would proceed and the
> hotplug event be eventually be processed. We may ask whether drivers should
> retry forever, but I don't see that you can blame it on error codes.

It's important to distinguish between:

    1.	errors that are transient and will disappear very quickly,
	meaning that a retry has a good chance of working, and

    2.	errors that are effectively permanent (or at least, long-lived)
	and therefore are highly unlikely to be fixed by retrying.

My point is that there is no reason to retry in case 2, and -EPROTO 
falls into this case (as do -EILSEQ and -ETIME).

Converting drivers to keep track of their retries, to avoid retrying 
forever, would be a fairly large change.  Even implementing delayed 
retries requires some significant work (as you can see in Hayes's recent 
patch -- and that was an easy case because the NAPI infrastructure was 
already present).  It's much simpler to avoid retrying entirely in 
situations where retries won't help.

And it's even simpler if the USB core would automatically prevent 
retries (by failing URB submissions after low-level protocol errors) in 
these situations.

Alan Stern

_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] r8152: stop submitting rx for -EPROTO
@ 2021-10-04 14:33                   ` Alan Stern
  0 siblings, 0 replies; 44+ messages in thread
From: Alan Stern @ 2021-10-04 14:33 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: Hayes Wang, Jason-ch Chen, matthias.bgg, linux-usb, netdev,
	linux-kernel, linux-arm-kernel, linux-mediatek,
	Project_Global_Chrome_Upstream_Group, hsinyi, nic_swsd

On Mon, Oct 04, 2021 at 01:44:54PM +0200, Oliver Neukum wrote:
> 
> On 01.10.21 17:22, Alan Stern wrote:
> > On Fri, Oct 01, 2021 at 03:26:48AM +0000, Hayes Wang wrote:
> >>> Alan Stern <stern@rowland.harvard.edu>
> >>> [...]
> >>>> There has been some discussion about this in the past.
> >>>>
> >>>> In general, -EPROTO is almost always a non-recoverable error.
> >>> Excuse me. I am confused about the above description.
> >>> I got -EPROTO before, when I debugged another issue.
> >>> However, the bulk transfer still worked after I resubmitted
> >>> the transfer. I didn't do anything to recover it. That is why
> >>> I do resubmission for -EPROTO.
> >> I check the Linux driver and the xHCI spec.
> >> The driver gets -EPROTO for bulk transfer, when the host
> >> returns COMP_USB_TRANSACTION_ERROR.
> >> According to the spec of xHCI, USB TRANSACTION ERROR
> >> means the host did not receive a valid response from the
> >> device (Timeout, CRC, Bad PID, unexpected NYET, etc.).
> > That's right.  If the device and cable are working properly, this 
> > should never happen.  Or only extremely rarely (for example, caused 
> > by external electromagnetic interference).
> And the device. I am afraid the condition in your conditional statement
> is not as likely to be true as would be desirable for quite a lot setups.

But if the device isn't working, a simple retry is most unlikely to fix 
the problem.  Some form of active error recovery, such as a bus reset, 
will be necessary.  For a non-working cable, even a reset won't help -- 
the user would have to physically adjust or replace the cable.

> >> It seems to be reasonable why resubmission sometimes works.
> > Did you ever track down the reason why you got the -EPROTO error 
> > while debugging that other issue?  Can you reproduce it?
> 
> Is that really the issue though? We are seeing this issue with EPROTO.
> But wouldn't we see it with any recoverable error?

If you mean an error that can be fixed but only by doing something more 
than a simple retry, then yes.  However, the vast majority of USB 
drivers do not attempt anything more than a simple retry.  Relatively 
few of them (including usbhid and mass-storage) are more sophisticated 
in their error handling.

> AFAICT we are running into a situation without progress because drivers
> retry
> 
> * forever
> * immediately
> 
> If we broke any of these conditions the system would proceed and the
> hotplug event be eventually be processed. We may ask whether drivers should
> retry forever, but I don't see that you can blame it on error codes.

It's important to distinguish between:

    1.	errors that are transient and will disappear very quickly,
	meaning that a retry has a good chance of working, and

    2.	errors that are effectively permanent (or at least, long-lived)
	and therefore are highly unlikely to be fixed by retrying.

My point is that there is no reason to retry in case 2, and -EPROTO 
falls into this case (as do -EILSEQ and -ETIME).

Converting drivers to keep track of their retries, to avoid retrying 
forever, would be a fairly large change.  Even implementing delayed 
retries requires some significant work (as you can see in Hayes's recent 
patch -- and that was an easy case because the NAPI infrastructure was 
already present).  It's much simpler to avoid retrying entirely in 
situations where retries won't help.

And it's even simpler if the USB core would automatically prevent 
retries (by failing URB submissions after low-level protocol errors) in 
these situations.

Alan Stern

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net] r8152: avoid to resubmit rx immediately
  2021-10-04  6:28 ` [PATCH net] r8152: avoid to resubmit rx immediately Hayes Wang
@ 2021-10-05 11:50   ` patchwork-bot+netdevbpf
  0 siblings, 0 replies; 44+ messages in thread
From: patchwork-bot+netdevbpf @ 2021-10-05 11:50 UTC (permalink / raw)
  To: Hayes Wang
  Cc: jason-ch.chen, kuba, davem, netdev, nic_swsd, linux-kernel, linux-usb

Hello:

This patch was applied to netdev/net.git (refs/heads/master):

On Mon, 4 Oct 2021 14:28:58 +0800 you wrote:
> For the situation that the disconnect event comes very late when the
> device is unplugged, the driver would resubmit the RX bulk transfer
> after getting the callback with -EPROTO immediately and continually.
> Finally, soft lockup occurs.
> 
> This patch avoids to resubmit RX immediately. It uses a workqueue to
> schedule the RX NAPI. And the NAPI would resubmit the RX. It let the
> disconnect event have opportunity to stop the submission before soft
> lockup.
> 
> [...]

Here is the summary with links:
  - [net] r8152: avoid to resubmit rx immediately
    https://git.kernel.org/netdev/net/c/baf33d7a7564

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2021-10-05 11:50 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-29  5:18 [PATCH] r8152: stop submitting rx for -EPROTO Jason-ch Chen
2021-09-29  5:18 ` Jason-ch Chen
2021-09-29  5:18 ` Jason-ch Chen
2021-09-29  8:14 ` Hayes Wang
2021-09-29  8:14   ` Hayes Wang
2021-09-29  8:14   ` Hayes Wang
2021-09-29  9:52   ` Jason-ch Chen
2021-09-29  9:52     ` Jason-ch Chen
2021-09-29  9:52     ` Jason-ch Chen
2021-09-30  2:41     ` Hayes Wang
2021-09-30  2:41       ` Hayes Wang
2021-09-30  2:41       ` Hayes Wang
2021-10-01  1:36       ` Jason-ch Chen
2021-10-01  1:36         ` Jason-ch Chen
2021-10-01  1:36         ` Jason-ch Chen
2021-09-30  9:30     ` Oliver Neukum
2021-09-30  9:30       ` Oliver Neukum
2021-09-30  9:30       ` Oliver Neukum
2021-09-30 15:18       ` Alan Stern
2021-09-30 15:18         ` Alan Stern
2021-09-30 15:18         ` Alan Stern
2021-10-01  2:40         ` Hayes Wang
2021-10-01  2:40           ` Hayes Wang
2021-10-01  2:40           ` Hayes Wang
2021-10-01  3:26           ` Hayes Wang
2021-10-01  3:26             ` Hayes Wang
2021-10-01  3:26             ` Hayes Wang
2021-10-01 15:22             ` Alan Stern
2021-10-01 15:22               ` Alan Stern
2021-10-01 15:22               ` Alan Stern
2021-10-04  2:15               ` Hayes Wang
2021-10-04  2:15                 ` Hayes Wang
2021-10-04  2:15                 ` Hayes Wang
2021-10-04 11:44               ` Oliver Neukum
2021-10-04 11:44                 ` Oliver Neukum
2021-10-04 11:44                 ` Oliver Neukum
2021-10-04 14:33                 ` Alan Stern
2021-10-04 14:33                   ` Alan Stern
2021-10-04 14:33                   ` Alan Stern
2021-09-30 16:13       ` Hayes Wang
2021-09-30 16:13         ` Hayes Wang
2021-09-30 16:13         ` Hayes Wang
2021-10-04  6:28 ` [PATCH net] r8152: avoid to resubmit rx immediately Hayes Wang
2021-10-05 11:50   ` patchwork-bot+netdevbpf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.