All of lore.kernel.org
 help / color / mirror / Atom feed
* Hangs in r8152 connected to power management in kernels at least up v4.17-rc4
@ 2018-05-15 12:43 Oliver Neukum
  2018-05-16  3:37 ` Hayes Wang
  0 siblings, 1 reply; 9+ messages in thread
From: Oliver Neukum @ 2018-05-15 12:43 UTC (permalink / raw)
  To: hayeswang; +Cc: netdev

Hi,

I got reports about hangs with this trace:

May 13 01:36:55 neroon kernel: INFO: task kworker/0:0:4 blocked for more than 60 seconds.
May 13 01:36:55 neroon kernel:       Tainted: G     U            4.17.0-rc4-1.g8257a00-vanilla #1
May 13 01:36:55 neroon kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 13 01:36:55 neroon kernel: kworker/0:0     D    0     4      2 0x80000000
May 13 01:36:55 neroon kernel: Workqueue: events rtl_work_func_t [r8152]
May 13 01:36:55 neroon kernel: Call Trace:
May 13 01:36:55 neroon kernel:  ? __schedule+0x289/0x880
May 13 01:36:55 neroon kernel:  schedule+0x2f/0x90
May 13 01:36:55 neroon kernel:  rpm_resume+0xf9/0x7a0
May 13 01:36:55 neroon kernel:  ? wait_woken+0x80/0x80
May 13 01:36:55 neroon kernel:  rpm_resume+0x547/0x7a0
May 13 01:36:55 neroon kernel:  ? __switch_to_asm+0x40/0x70
May 13 01:36:55 neroon kernel:  ? __switch_to_asm+0x34/0x70
May 13 01:36:55 neroon kernel:  ? __switch_to_asm+0x40/0x70
May 13 01:36:55 neroon kernel:  ? __switch_to_asm+0x34/0x70
May 13 01:36:55 neroon kernel:  ? __switch_to_asm+0x40/0x70
May 13 01:36:55 neroon kernel:  __pm_runtime_resume+0x3a/0x50
May 13 01:36:55 neroon kernel:  usb_autopm_get_interface+0x1d/0x50 [usbcore]
May 13 01:36:55 neroon kernel:  rtl_work_func_t+0x3c/0x27c [r8152]
May 13 01:36:55 neroon kernel:  process_one_work+0x1d4/0x3f0
May 13 01:36:55 neroon kernel:  worker_thread+0x2b/0x3d0
May 13 01:36:55 neroon kernel:  ? process_one_work+0x3f0/0x3f0
May 13 01:36:55 neroon kernel:  kthread+0x113/0x130
May 13 01:36:55 neroon kernel:  ? kthread_create_worker_on_cpu+0x50/0x50
May 13 01:36:55 neroon kernel:  ret_from_fork+0x3a/0x50

Any idea?

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: Hangs in r8152 connected to power management in kernels at least up v4.17-rc4
  2018-05-15 12:43 Hangs in r8152 connected to power management in kernels at least up v4.17-rc4 Oliver Neukum
@ 2018-05-16  3:37 ` Hayes Wang
  2018-05-16  8:26   ` Oliver Neukum
  0 siblings, 1 reply; 9+ messages in thread
From: Hayes Wang @ 2018-05-16  3:37 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: netdev

Oliver Neukum [mailto:oneukum@suse.com]
> 
> Hi,
> 
> I got reports about hangs with this trace:
> 
> May 13 01:36:55 neroon kernel: INFO: task kworker/0:0:4 blocked for more
> than 60 seconds.
> May 13 01:36:55 neroon kernel:       Tainted: G     U
> 4.17.0-rc4-1.g8257a00-vanilla #1
> May 13 01:36:55 neroon kernel: "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> May 13 01:36:55 neroon kernel: kworker/0:0     D    0     4      2
> 0x80000000
> May 13 01:36:55 neroon kernel: Workqueue: events rtl_work_func_t [r8152]
> May 13 01:36:55 neroon kernel: Call Trace:
> May 13 01:36:55 neroon kernel:  ? __schedule+0x289/0x880
> May 13 01:36:55 neroon kernel:  schedule+0x2f/0x90
> May 13 01:36:55 neroon kernel:  rpm_resume+0xf9/0x7a0
> May 13 01:36:55 neroon kernel:  ? wait_woken+0x80/0x80
> May 13 01:36:55 neroon kernel:  rpm_resume+0x547/0x7a0
> May 13 01:36:55 neroon kernel:  ? __switch_to_asm+0x40/0x70
> May 13 01:36:55 neroon kernel:  ? __switch_to_asm+0x34/0x70
> May 13 01:36:55 neroon kernel:  ? __switch_to_asm+0x40/0x70
> May 13 01:36:55 neroon kernel:  ? __switch_to_asm+0x34/0x70
> May 13 01:36:55 neroon kernel:  ? __switch_to_asm+0x40/0x70
> May 13 01:36:55 neroon kernel:  __pm_runtime_resume+0x3a/0x50
> May 13 01:36:55 neroon kernel:  usb_autopm_get_interface+0x1d/0x50 [usbcore]

Would usb_autopm_get_interface() take a long time?
The driver would wake the device if it has suspended.
I have no idea about how usb_autopm_get_interface() works, so I don't know how to help.

> May 13 01:36:55 neroon kernel:  rtl_work_func_t+0x3c/0x27c [r8152]
> May 13 01:36:55 neroon kernel:  process_one_work+0x1d4/0x3f0
> May 13 01:36:55 neroon kernel:  worker_thread+0x2b/0x3d0
> May 13 01:36:55 neroon kernel:  ? process_one_work+0x3f0/0x3f0
> May 13 01:36:55 neroon kernel:  kthread+0x113/0x130
> May 13 01:36:55 neroon kernel:  ?
> kthread_create_worker_on_cpu+0x50/0x50
> May 13 01:36:55 neroon kernel:  ret_from_fork+0x3a/0x50

Best Regards,
Hayes



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Hangs in r8152 connected to power management in kernels at least up v4.17-rc4
  2018-05-16  3:37 ` Hayes Wang
@ 2018-05-16  8:26   ` Oliver Neukum
  2018-05-16 10:00     ` Hayes Wang
  0 siblings, 1 reply; 9+ messages in thread
From: Oliver Neukum @ 2018-05-16  8:26 UTC (permalink / raw)
  To: Hayes Wang; +Cc: netdev

Am Mittwoch, den 16.05.2018, 03:37 +0000 schrieb Hayes Wang:
> Oliver Neukum [mailto:oneukum@suse.com]
> > 
> > Hi,
> > 
> > I got reports about hangs with this trace:
> > 
> > May 13 01:36:55 neroon kernel: INFO: task kworker/0:0:4 blocked for more
> > than 60 seconds.
> > May 13 01:36:55 neroon kernel:       Tainted: G     U
> > 4.17.0-rc4-1.g8257a00-vanilla #1
> > May 13 01:36:55 neroon kernel: "echo 0 >
> > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > May 13 01:36:55 neroon kernel: kworker/0:0     D    0     4      2
> > 0x80000000
> > May 13 01:36:55 neroon kernel: Workqueue: events rtl_work_func_t [r8152]
> > May 13 01:36:55 neroon kernel: Call Trace:
> > May 13 01:36:55 neroon kernel:  ? __schedule+0x289/0x880
> > May 13 01:36:55 neroon kernel:  schedule+0x2f/0x90
> > May 13 01:36:55 neroon kernel:  rpm_resume+0xf9/0x7a0
> > May 13 01:36:55 neroon kernel:  ? wait_woken+0x80/0x80
> > May 13 01:36:55 neroon kernel:  rpm_resume+0x547/0x7a0
> > May 13 01:36:55 neroon kernel:  ? __switch_to_asm+0x40/0x70
> > May 13 01:36:55 neroon kernel:  ? __switch_to_asm+0x34/0x70
> > May 13 01:36:55 neroon kernel:  ? __switch_to_asm+0x40/0x70
> > May 13 01:36:55 neroon kernel:  ? __switch_to_asm+0x34/0x70
> > May 13 01:36:55 neroon kernel:  ? __switch_to_asm+0x40/0x70
> > May 13 01:36:55 neroon kernel:  __pm_runtime_resume+0x3a/0x50
> > May 13 01:36:55 neroon kernel:  usb_autopm_get_interface+0x1d/0x50 [usbcore]
> 
> Would usb_autopm_get_interface() take a long time?
> The driver would wake the device if it has suspended.
> I have no idea about how usb_autopm_get_interface() works, so I don't know how to help.

Hi,

it basically calls r8152_resume() and makes a control request to the
hub. I think we are spinning in rtl8152_runtime_resume(), but where?
It has a lot of NAPI stuff. Any suggestions on how to instrument or
trace this?

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: Hangs in r8152 connected to power management in kernels at least up v4.17-rc4
  2018-05-16  8:26   ` Oliver Neukum
@ 2018-05-16 10:00     ` Hayes Wang
  2018-05-16 10:09       ` Oliver Neukum
  0 siblings, 1 reply; 9+ messages in thread
From: Hayes Wang @ 2018-05-16 10:00 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: netdev

Oliver Neukum [mailto:oneukum@suse.com]
> Sent: Wednesday, May 16, 2018 4:27 PM
[...]
> >
> > Would usb_autopm_get_interface() take a long time?
> > The driver would wake the device if it has suspended.
> > I have no idea about how usb_autopm_get_interface() works, so I don't know
> how to help.
> 
> Hi,
> 
> it basically calls r8152_resume() and makes a control request to the
> hub. I think we are spinning in rtl8152_runtime_resume(), but where?
> It has a lot of NAPI stuff. Any suggestions on how to instrument or
> trace this?

Is rtl8152_runtime_resume() called? I don't see the name in the trace.

I guess the relative API in rtl8152_runtime_resume() are
		ops->disable		= rtl8153_disable;
		ops->autosuspend_en	= rtl8153_runtime_enable;

And I don't find any possible dead lock in rtl8152_runtime_resume().

Besides, I find a similar issue as following.
https://www.spinics.net/lists/netdev/msg493512.html


Best Regards,
Hayes




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Hangs in r8152 connected to power management in kernels at least up v4.17-rc4
  2018-05-16 10:00     ` Hayes Wang
@ 2018-05-16 10:09       ` Oliver Neukum
  2018-05-16 12:07         ` Hayes Wang
  0 siblings, 1 reply; 9+ messages in thread
From: Oliver Neukum @ 2018-05-16 10:09 UTC (permalink / raw)
  To: Hayes Wang; +Cc: netdev, jslaby

Am Mittwoch, den 16.05.2018, 10:00 +0000 schrieb Hayes Wang:
> Oliver Neukum [mailto:oneukum@suse.com]
> > Sent: Wednesday, May 16, 2018 4:27 PM
> 
> [...]
> > > 
> > > Would usb_autopm_get_interface() take a long time?
> > > The driver would wake the device if it has suspended.
> > > I have no idea about how usb_autopm_get_interface() works, so I don't know
> > 
> > how to help.
> > 
> > Hi,
> > 
> > it basically calls r8152_resume() and makes a control request to the
> > hub. I think we are spinning in rtl8152_runtime_resume(), but where?
> > It has a lot of NAPI stuff. Any suggestions on how to instrument or
> > trace this?
> 
> Is rtl8152_runtime_resume() called? I don't see the name in the trace.

Good question. I see nothing else that could produce a live lock.
> 
> I guess the relative API in rtl8152_runtime_resume() are
> 		ops->disable		= rtl8153_disable;
> 		ops->autosuspend_en	= rtl8153_runtime_enable;
> 
> And I don't find any possible dead lock in rtl8152_runtime_resume().
> 
> Besides, I find a similar issue as following.
> https://www.spinics.net/lists/netdev/msg493512.html

Well, if we have an imbalance in NAPI it should strike whereever
it is used, not just in suspend(). Is there debugging for NAPI
we could activate?

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: Hangs in r8152 connected to power management in kernels at least up v4.17-rc4
  2018-05-16 10:09       ` Oliver Neukum
@ 2018-05-16 12:07         ` Hayes Wang
  2018-05-16 13:29           ` Jiri Slaby
  0 siblings, 1 reply; 9+ messages in thread
From: Hayes Wang @ 2018-05-16 12:07 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: netdev, jslaby

Oliver Neukum [mailto:oneukum@suse.com]
> Sent: Wednesday, May 16, 2018 6:10 PM
[...]
> > Besides, I find a similar issue as following.
> > https://www.spinics.net/lists/netdev/msg493512.html
> 
> Well, if we have an imbalance in NAPI it should strike whereever
> it is used, not just in suspend(). Is there debugging for NAPI
> we could activate?

No. The driver doesn't embed such debugging about it.
And I don't find an imbalance between napi_disable() and napi_enable().


Best Regards,
Hayes




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Hangs in r8152 connected to power management in kernels at least up v4.17-rc4
  2018-05-16 12:07         ` Hayes Wang
@ 2018-05-16 13:29           ` Jiri Slaby
  2018-05-16 13:36             ` Jiri Slaby
  0 siblings, 1 reply; 9+ messages in thread
From: Jiri Slaby @ 2018-05-16 13:29 UTC (permalink / raw)
  To: Hayes Wang, Oliver Neukum; +Cc: netdev

[-- Attachment #1: Type: text/plain, Size: 1143 bytes --]

On 05/16/2018, 02:07 PM, Hayes Wang wrote:
> Oliver Neukum [mailto:oneukum@suse.com]
>> Sent: Wednesday, May 16, 2018 6:10 PM
> [...]
>>> Besides, I find a similar issue as following.
>>> https://www.spinics.net/lists/netdev/msg493512.html
>>
>> Well, if we have an imbalance in NAPI it should strike whereever
>> it is used, not just in suspend(). Is there debugging for NAPI
>> we could activate?
> 
> No. The driver doesn't embed such debugging about it.

Despite of that, Oliver, I have a kernel with a debug patch (attached)
at (suse-only link):
https://build.suse.de/project/show/home:jirislaby:stable-drm

> And I don't find an imbalance between napi_disable() and napi_enable().

There is none, apparently (the warns never triggered). BUt still the
driver sucks wrt both power mgmt and dock plug/unplug. Since I am using
the patch (it upper-bounds the napi_disable loop count) and the udev
rule below, I can really use the nic.

$ cat /etc/udev/rules.d/10-disable-r8152-pm.rules
ACTION=="add", SUBSYSTEM=="usb", ATTR{idProduct}=="8153",
ATTR{idVendor}=="0bda", TEST=="power/control", ATTR{power/control}="on"

thanks,
-- 
js
suse labs

[-- Attachment #2: r8152.patch --]
[-- Type: text/x-patch, Size: 5467 bytes --]

---
 drivers/net/usb/r8152.c |   62 +++++++++++++++++++++++++++++++-----------------
 net/core/dev.c          |   14 +++++++++-
 2 files changed, 53 insertions(+), 23 deletions(-)

--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -704,6 +704,8 @@ struct r8152 {
 	unsigned long flags;
 	struct usb_device *udev;
 	struct napi_struct napi;
+	int napi_stat;
+	void *napi_last_en, *napi_last_dis;
 	struct usb_interface *intf;
 	struct net_device *netdev;
 	struct urb *intr_urb;
@@ -775,6 +777,31 @@ static unsigned int agg_buf_sz = 16384;
 #define RTL_LIMITED_TSO_SIZE	(agg_buf_sz - sizeof(struct tx_desc) - \
 				 VLAN_ETH_HLEN - ETH_FCS_LEN)
 
+static void my_napi_enable(struct r8152 *tp)
+{
+	if (tp->napi_stat == 0) {
+		napi_enable(&tp->napi);
+		tp->napi_stat++;
+		tp->napi_last_en = (void *)_RET_IP_;
+		return;
+	}
+
+	WARN(1, "napi_stat=%d\n", tp->napi_stat);
+}
+
+static void my_napi_disable(struct r8152 *tp)
+{
+	if (tp->napi_stat == 1) {
+		napi_disable(&tp->napi);
+		tp->napi_stat--;
+		tp->napi_last_dis = (void *)_RET_IP_;
+		return;
+	}
+
+	WARN(1, "napi_stat=%d last_dis=%pF last_en=%pF\n",
+			tp->napi_stat, tp->napi_last_en, tp->napi_last_dis);
+}
+
 static
 int get_registers(struct r8152 *tp, u16 value, u16 index, u16 size, void *data)
 {
@@ -3787,7 +3814,6 @@ static bool rtl8153_in_nway(struct r8152
 static void set_carrier(struct r8152 *tp)
 {
 	struct net_device *netdev = tp->netdev;
-	struct napi_struct *napi = &tp->napi;
 	u8 speed;
 
 	speed = rtl8152_get_speed(tp);
@@ -3796,12 +3822,12 @@ static void set_carrier(struct r8152 *tp
 		if (!netif_carrier_ok(netdev)) {
 			tp->rtl_ops.enable(tp);
 			netif_stop_queue(netdev);
-			napi_disable(napi);
+			my_napi_disable(tp);
 			netif_carrier_on(netdev);
 			rtl_start_rx(tp);
 			clear_bit(RTL8152_SET_RX_MODE, &tp->flags);
 			_rtl8152_set_rx_mode(netdev);
-			napi_enable(&tp->napi);
+			my_napi_enable(tp);
 			netif_wake_queue(netdev);
 			netif_info(tp, link, netdev, "carrier on\n");
 		} else if (netif_queue_stopped(netdev) &&
@@ -3811,9 +3837,9 @@ static void set_carrier(struct r8152 *tp
 	} else {
 		if (netif_carrier_ok(netdev)) {
 			netif_carrier_off(netdev);
-			napi_disable(napi);
+			my_napi_disable(tp);
 			tp->rtl_ops.disable(tp);
-			napi_enable(napi);
+			my_napi_enable(tp);
 			netif_info(tp, link, netdev, "carrier off\n");
 		}
 	}
@@ -3934,7 +3960,7 @@ static int rtl8152_open(struct net_devic
 			   res);
 		goto out_unlock;
 	}
-	napi_enable(&tp->napi);
+	my_napi_enable(tp);
 
 	mutex_unlock(&tp->control);
 
@@ -3962,7 +3988,7 @@ static int rtl8152_close(struct net_devi
 #ifdef CONFIG_PM_SLEEP
 	unregister_pm_notifier(&tp->pm_notifier);
 #endif
-	napi_disable(&tp->napi);
+	my_napi_disable(tp);
 	clear_bit(WORK_ENABLE, &tp->flags);
 	usb_kill_urb(tp->intr_urb);
 	cancel_delayed_work_sync(&tp->schedule);
@@ -4230,7 +4256,7 @@ static int rtl8152_pre_reset(struct usb_
 		return 0;
 
 	netif_stop_queue(netdev);
-	napi_disable(&tp->napi);
+	my_napi_disable(tp);
 	clear_bit(WORK_ENABLE, &tp->flags);
 	usb_kill_urb(tp->intr_urb);
 	cancel_delayed_work_sync(&tp->schedule);
@@ -4264,7 +4290,7 @@ static int rtl8152_post_reset(struct usb
 		mutex_unlock(&tp->control);
 	}
 
-	napi_enable(&tp->napi);
+	my_napi_enable(tp);
 	netif_wake_queue(netdev);
 	usb_submit_urb(tp->intr_urb, GFP_KERNEL);
 
@@ -4302,10 +4328,8 @@ static int rtl8152_runtime_resume(struct
 	struct net_device *netdev = tp->netdev;
 
 	if (netif_running(netdev) && netdev->flags & IFF_UP) {
-		struct napi_struct *napi = &tp->napi;
-
 		tp->rtl_ops.autosuspend_en(tp, false);
-		napi_disable(napi);
+		my_napi_disable(tp);
 		set_bit(WORK_ENABLE, &tp->flags);
 
 		if (netif_carrier_ok(netdev)) {
@@ -4318,7 +4342,7 @@ static int rtl8152_runtime_resume(struct
 			}
 		}
 
-		napi_enable(napi);
+		my_napi_enable(tp);
 		clear_bit(SELECTIVE_SUSPEND, &tp->flags);
 		smp_mb__after_atomic();
 
@@ -4388,13 +4412,11 @@ static int rtl8152_runtime_suspend(struc
 		tp->rtl_ops.autosuspend_en(tp, true);
 
 		if (netif_carrier_ok(netdev)) {
-			struct napi_struct *napi = &tp->napi;
-
-			napi_disable(napi);
+			my_napi_disable(tp);
 			rtl_stop_rx(tp);
 			rxdy_gated_en(tp, false);
 			ocp_write_dword(tp, MCU_TYPE_PLA, PLA_RCR, rcr);
-			napi_enable(napi);
+			my_napi_enable(tp);
 		}
 
 		if (delay_autosuspend(tp)) {
@@ -4415,14 +4437,12 @@ static int rtl8152_system_suspend(struct
 	netif_device_detach(netdev);
 
 	if (netif_running(netdev) && test_bit(WORK_ENABLE, &tp->flags)) {
-		struct napi_struct *napi = &tp->napi;
-
 		clear_bit(WORK_ENABLE, &tp->flags);
 		usb_kill_urb(tp->intr_urb);
-		napi_disable(napi);
+		my_napi_disable(tp);
 		cancel_delayed_work_sync(&tp->schedule);
 		tp->rtl_ops.down(tp);
-		napi_enable(napi);
+		my_napi_enable(tp);
 	}
 
 	return ret;
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5654,13 +5654,23 @@ EXPORT_SYMBOL(netif_napi_add);
 
 void napi_disable(struct napi_struct *n)
 {
+	int a;
+
 	might_sleep();
 	set_bit(NAPI_STATE_DISABLE, &n->state);
 
-	while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
+	a = 0;
+	while (test_and_set_bit(NAPI_STATE_SCHED, &n->state)) {
 		msleep(1);
-	while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state))
+		if (WARN(a++ > 20 * 1000, "NAPI_STATE_SCHED never cleared"))
+			break;
+	}
+	a = 0;
+	while (test_and_set_bit(NAPI_STATE_NPSVC, &n->state)) {
 		msleep(1);
+		if (WARN(a++ > 20 * 1000, "NAPI_STATE_NPSVC never cleared"))
+			break;
+	}
 
 	hrtimer_cancel(&n->timer);
 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Hangs in r8152 connected to power management in kernels at least up v4.17-rc4
  2018-05-16 13:29           ` Jiri Slaby
@ 2018-05-16 13:36             ` Jiri Slaby
  2018-05-25  8:41               ` Jiri Slaby
  0 siblings, 1 reply; 9+ messages in thread
From: Jiri Slaby @ 2018-05-16 13:36 UTC (permalink / raw)
  To: Hayes Wang, Oliver Neukum; +Cc: netdev

On 05/16/2018, 03:29 PM, Jiri Slaby wrote:
> On 05/16/2018, 02:07 PM, Hayes Wang wrote:
>> Oliver Neukum [mailto:oneukum@suse.com]
>>> Sent: Wednesday, May 16, 2018 6:10 PM
>> [...]
>>>> Besides, I find a similar issue as following.
>>>> https://www.spinics.net/lists/netdev/msg493512.html
>>>
>>> Well, if we have an imbalance in NAPI it should strike whereever
>>> it is used, not just in suspend(). Is there debugging for NAPI
>>> we could activate?
>>
>> No. The driver doesn't embed such debugging about it.
> 
> Despite of that, Oliver, I have a kernel with a debug patch (attached)
> at (suse-only link):
> https://build.suse.de/project/show/home:jirislaby:stable-drm
> 
>> And I don't find an imbalance between napi_disable() and napi_enable().
> 
> There is none, apparently (the warns never triggered). BUt still the
> driver sucks wrt both power mgmt and dock plug/unplug. Since I am using
> the patch (it upper-bounds the napi_disable loop count) and the udev
> rule below, I can really use the nic.

BTW the added warning to napi_disable indeed triggers:

> xzgrep -a -B 2 kernel:.*WARNING.*napi messages-20180*
> messages-20180503.xz:2018-04-27T09:57:00.048922+02:00 anemoi2 kernel: [158616.363052] ------------[ cut here ]------------
> messages-20180503.xz:2018-04-27T09:57:00.048979+02:00 anemoi2 kernel: [158616.363070] NAPI_STATE_SCHED never cleared
> messages-20180503.xz:2018-04-27T09:57:00.048988+02:00 anemoi2 kernel: [158616.363120] WARNING: CPU: 1 PID: 14365 at ../net/core/dev.c:5665 napi_disable+0x3d/0x80

And since I do 'ip l set dev ethX down' before unplugging the dock with
the NIC, I have not seen a single occurrence.

So I assume it must be a problem of making usb->disconnect without prior
ndo->close (or alike).

thanks,
-- 
js
suse labs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Hangs in r8152 connected to power management in kernels at least up v4.17-rc4
  2018-05-16 13:36             ` Jiri Slaby
@ 2018-05-25  8:41               ` Jiri Slaby
  0 siblings, 0 replies; 9+ messages in thread
From: Jiri Slaby @ 2018-05-25  8:41 UTC (permalink / raw)
  To: Hayes Wang, Oliver Neukum; +Cc: netdev

On 05/16/2018, 03:36 PM, Jiri Slaby wrote:
> So I assume it must be a problem of making usb->disconnect without prior
> ndo->close (or alike).

So according to my debug messages, I think this should workaround the
problem:
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -3962,7 +3962,8 @@ static int rtl8152_close(struct net_device *netdev)
 #ifdef CONFIG_PM_SLEEP
        unregister_pm_notifier(&tp->pm_notifier);
 #endif
-       napi_disable(&tp->napi);
+       if (!test_bit(RTL8152_UNPLUG, &tp->flags))
+               napi_disable(&tp->napi);
        clear_bit(WORK_ENABLE, &tp->flags);
        usb_kill_urb(tp->intr_urb);
        cancel_delayed_work_sync(&tp->schedule);


napi is deleted in usb->disconnect, then unregister_netdev is called
which invokes netdev->ndo_stop, i.e. rtl8152_close above. And
rtl8152_close tries to napi_disable, but that is already deleted.

The patch does not solve the race between disconnect and ndo_stop
AFAICS. It needs locking, IMO. I am not familiar enough with the code,
but it looks like ->disconnect can happen any time while ->stop is in
progress.

thanks,
-- 
js
suse labs

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2018-05-25  8:41 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-15 12:43 Hangs in r8152 connected to power management in kernels at least up v4.17-rc4 Oliver Neukum
2018-05-16  3:37 ` Hayes Wang
2018-05-16  8:26   ` Oliver Neukum
2018-05-16 10:00     ` Hayes Wang
2018-05-16 10:09       ` Oliver Neukum
2018-05-16 12:07         ` Hayes Wang
2018-05-16 13:29           ` Jiri Slaby
2018-05-16 13:36             ` Jiri Slaby
2018-05-25  8:41               ` Jiri Slaby

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.