All of lore.kernel.org
 help / color / mirror / Atom feed
* Kernel-Warning when stopping the CAN interface
@ 2020-03-11 15:47 Wolfgang Grandegger
       [not found] ` <4c5cd8bd-3d4d-4826-9333-0d0f23b30f5a@schneidersoft.net>
  0 siblings, 1 reply; 3+ messages in thread
From: Wolfgang Grandegger @ 2020-03-11 15:47 UTC (permalink / raw)
  To: linux-can

Hello,

I can easily trigger the kernel warning below with a "gs_usb" device when
I send messages with "cangen -g1 can0" and then stop the interface with
"ifconfig can0 down". The warning comes from here:

  https://elixir.bootlin.com/linux/latest/source/net/core/skbuff.c#L651

What happens is that the "ifconfig down" command first sets

  clear_bit(__LINK_STATE_START, &dev->state);

before it calls the device specific stop/close routine. In the meantime,
TX done events may still occur, calling can_get_echo_skb(), which calls
netif_rx() then netif_rx_internal() and finally enqueue_to_backlog().
Because the interface is not running any longer, the skb is dropped in
there:

  https://elixir.bootlin.com/linux/latest/source/net/core/dev.c#L4394

causing the warning below, because the kfree_skb(skb) is called from
the interrupt context.

How can I avoid that warning? Actually, I do not see a simple method.
Maybe I have missed something.

Wolfgang

[ 1694.131475] ------------[ cut here ]------------
[ 1694.139911] WARNING: CPU: 0 PID: 0 at skb_release_head_state+0x9e/0xb0
[ 1694.147206] Modules linked in: intel_telemetry_pltdrv intel_pmc_ipc intel_punit_ipc intel_telemetry_core x86_pkg_temp_thermal igb tpm_tis tpm_tis_core btusb btrtl dca tpm lpc_ich btbcm spi_pxa2xx_platform mei_me btintel mei thermal rfcomm bluetooth ecdh_generic ntngdm_i2c ntngdm_core ntngdm_bl ntngdm_tsi ntngdm_irq ntngdm_hwmon i915 video
[ 1694.180782] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G     U  W       4.14.51apollolake #1
[ 1694.189828] Hardware name: Default string Default string/AGCO_NT-NG, BIOS X3.00c#AGCO BETA 09/19/2019
[ 1694.200136] task: ffffffff840104c0 task.stack: ffffffff84000000
[ 1694.206753] RIP: 0010:skb_release_head_state+0x9e/0xb0
[ 1694.212494] RSP: 0018:ffff8b2b3fc03cb0 EFLAGS: 00010006
[ 1694.218333] RAX: ffffffff835a4fb0 RBX: ffff8b2b3a20b100 RCX: 0000000000000018
[ 1694.226310] RDX: 0000000000010000 RSI: 0000000000000000 RDI: 0000000000000000
[ 1694.234284] RBP: 0000000000020ac0 R08: 0000000000000014 R09: 0000000000000014
[ 1694.242259] R10: 0000000000000000 R11: ffff8b2b3a7a2180 R12: ffff8b2b3fc03d40
[ 1694.250235] R13: ffff8b2b3a20b100 R14: ffff8b2b3fc20bcc R15: 0000000000000086
[ 1694.258211] FS:  0000000000000000(0000) GS:ffff8b2b3fc00000(0000) knlGS:0000000000000000
[ 1694.267256] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1694.273678] CR2: 00007f83a2db5130 CR3: 0000000179c6a000 CR4: 00000000003406f0
[ 1694.281653] Call Trace:
[ 1694.284383]  <IRQ>
[ 1694.286629]  skb_release_all+0x9/0x20
[ 1694.290720]  kfree_skb+0x19/0x40
[ 1694.294327]  enqueue_to_backlog+0x7e/0x230
[ 1694.298906]  ? recalibrate_cpu_khz+0x10/0x10
[ 1694.303677]  ? ktime_get_with_offset+0x4f/0xb0
[ 1694.308644]  netif_rx_internal+0x56/0x150
[ 1694.313126]  can_get_echo_skb+0x37/0x60
[ 1694.317415]  gs_usb_receive_bulk_callback+0x111/0x320
[ 1694.323063]  __usb_hcd_giveback_urb+0x7b/0xf0
[ 1694.327924]  xhci_giveback_urb_in_irq.isra.55+0x6d/0x90
[ 1694.333765]  xhci_td_cleanup+0xcb/0x150
[ 1694.338051]  xhci_irq+0x66a/0x20b0
[ 1694.341854]  ? rcu_accelerate_cbs+0x3a/0x190
[ 1694.346628]  __handle_irq_event_percpu+0x32/0xd0
[ 1694.351791]  handle_irq_event_percpu+0x2b/0x70
[ 1694.356757]  handle_irq_event+0x2f/0x50
[ 1694.361044]  handle_edge_irq+0x72/0x190
[ 1694.365331]  handle_irq+0x17/0x20
[ 1694.369038]  do_IRQ+0x3c/0xc0
[ 1694.372355]  common_interrupt+0x7a/0x7a
[ 1694.376639]  </IRQ>
[ 1694.378981] RIP: 0010:cpuidle_enter_state+0x122/0x200
[ 1694.384622] RSP: 0018:ffffffff84003ea8 EFLAGS: 00000286 ORIG_RAX: ffffffffffffffbd
[ 1694.393087] RAX: ffff8b2b3fc1fa00 RBX: 0000000000000002 RCX: 0000018a721a5ae8
[ 1694.401063] RDX: 0000018a721a5ae8 RSI: 0000000050523fbe RDI: 0000000000000000
[ 1694.409040] RBP: ffff8b2b3aa13600 R08: 0000000000000193 R09: 000000000000012a
[ 1694.417015] R10: ffffffff84003e90 R11: 00000000000000ef R12: 0000018a721a5ae8
[ 1694.424990] R13: 0000018a72199308 R14: 0000000000000002 R15: 0000000072efca78
[ 1694.432970]  ? cpuidle_enter_state+0x11e/0x200
[ 1694.437937]  do_idle+0x161/0x1a0
[ 1694.441544]  cpu_startup_entry+0x6a/0x70
[ 1694.445927]  start_kernel+0x414/0x434
[ 1694.450023]  secondary_startup_64+0xa5/0xb0
[ 1694.454697] Code: 88 96 4f 17 00 74 02 5b c3 5b e9 4e 15 bc ff e8 d9 33 04 00 eb dd e8 c2 5a 0d 00 eb 9f 48 83 e7 fe e8 07 d7 01 00 e9 75 ff ff ff <0f> 0b eb ad 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 53 48 89 
[ 1694.475869] ---[ end trace 063196a1141471b3 ]---

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Kernel-Warning when stopping the CAN interface
       [not found] ` <4c5cd8bd-3d4d-4826-9333-0d0f23b30f5a@schneidersoft.net>
@ 2020-03-12  7:05   ` Wolfgang Grandegger
  2020-03-12  7:36     ` Wolfgang Grandegger
  0 siblings, 1 reply; 3+ messages in thread
From: Wolfgang Grandegger @ 2020-03-12  7:05 UTC (permalink / raw)
  To: Maximilian Schneider; +Cc: linux-can

Hello Maximilian,

I think, the problem is not gs_usb specific. The following patch seems
to help:

--- a/drivers/net/can/usb/gs_usb.c
+++ b/drivers/net/can/usb/gs_usb.c
@@ -321,7 +321,7 @@ static void gs_usb_receive_bulk_callback(struct urb *urb)
 	netdev = dev->netdev;
 	stats = &netdev->stats;
 
-	if (!netif_device_present(netdev))
+	if (!netif_device_present(netdev) || !netif_running(netdev))
 		return;
 
 	if (hf->echo_id == -1) { /* normal rx */

Wolfgang

Am 12.03.20 um 07:06 schrieb Maximilian Schneider:
> I guess the naive solution would be to test dev->state in the RX
> callback. I doubt that's the best solution thought. Is it known if this
> warning affects only the gs_usb driver?





> On 11 Mar 2020, at 16:53, Wolfgang Grandegger <wg@grandegger.com
> <mailto:wg@grandegger.com>> wrote:
> 
>     Hello,
> 
>     I can easily trigger the kernel warning below with a "gs_usb" device when
>     I send messages with "cangen -g1 can0" and then stop the interface with
>     "ifconfig can0 down". The warning comes from here:
> 
>       https://elixir.bootlin.com/linux/latest/source/net/core/skbuff.c#L651
> 
>     What happens is that the "ifconfig down" command first sets
> 
>       clear_bit(__LINK_STATE_START, &dev->state);
> 
>     before it calls the device specific stop/close routine. In the meantime,
>     TX done events may still occur, calling can_get_echo_skb(), which calls
>     netif_rx() then netif_rx_internal() and finally enqueue_to_backlog().
>     Because the interface is not running any longer, the skb is dropped in
>     there:
> 
>       https://elixir.bootlin.com/linux/latest/source/net/core/dev.c#L4394
> 
>     causing the warning below, because the kfree_skb(skb) is called from
>     the interrupt context.
> 
>     How can I avoid that warning? Actually, I do not see a simple method.
>     Maybe I have missed something.
> 
>     Wolfgang
> 
>     [ 1694.131475] ------------[ cut here ]------------
>     [ 1694.139911] WARNING: CPU: 0 PID: 0 at skb_release_head_state+0x9e/0xb0
>     [ 1694.147206] Modules linked in: intel_telemetry_pltdrv intel_pmc_ipc intel_punit_ipc intel_telemetry_core x86_pkg_temp_thermal igb tpm_tis tpm_tis_core btusb btrtl dca tpm lpc_ich btbcm spi_pxa2xx_platform mei_me btintel mei thermal rfcomm bluetooth ecdh_generic ntngdm_i2c ntngdm_core ntngdm_bl ntngdm_tsi ntngdm_irq ntngdm_hwmon i915 video
>     [ 1694.180782] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G     U  W       4.14.51apollolake #1
>     [ 1694.189828] Hardware name: Default string Default string/AGCO_NT-NG, BIOS X3.00c#AGCO BETA 09/19/2019
>     [ 1694.200136] task: ffffffff840104c0 task.stack: ffffffff84000000
>     [ 1694.206753] RIP: 0010:skb_release_head_state+0x9e/0xb0
>     [ 1694.212494] RSP: 0018:ffff8b2b3fc03cb0 EFLAGS: 00010006
>     [ 1694.218333] RAX: ffffffff835a4fb0 RBX: ffff8b2b3a20b100 RCX: 0000000000000018
>     [ 1694.226310] RDX: 0000000000010000 RSI: 0000000000000000 RDI: 0000000000000000
>     [ 1694.234284] RBP: 0000000000020ac0 R08: 0000000000000014 R09: 0000000000000014
>     [ 1694.242259] R10: 0000000000000000 R11: ffff8b2b3a7a2180 R12: ffff8b2b3fc03d40
>     [ 1694.250235] R13: ffff8b2b3a20b100 R14: ffff8b2b3fc20bcc R15: 0000000000000086
>     [ 1694.258211] FS:  0000000000000000(0000) GS:ffff8b2b3fc00000(0000) knlGS:0000000000000000
>     [ 1694.267256] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>     [ 1694.273678] CR2: 00007f83a2db5130 CR3: 0000000179c6a000 CR4: 00000000003406f0
>     [ 1694.281653] Call Trace:
>     [ 1694.284383]  <IRQ>
>     [ 1694.286629]  skb_release_all+0x9/0x20
>     [ 1694.290720]  kfree_skb+0x19/0x40
>     [ 1694.294327]  enqueue_to_backlog+0x7e/0x230
>     [ 1694.298906]  ? recalibrate_cpu_khz+0x10/0x10
>     [ 1694.303677]  ? ktime_get_with_offset+0x4f/0xb0
>     [ 1694.308644]  netif_rx_internal+0x56/0x150
>     [ 1694.313126]  can_get_echo_skb+0x37/0x60
>     [ 1694.317415]  gs_usb_receive_bulk_callback+0x111/0x320
>     [ 1694.323063]  __usb_hcd_giveback_urb+0x7b/0xf0
>     [ 1694.327924]  xhci_giveback_urb_in_irq.isra.55+0x6d/0x90
>     [ 1694.333765]  xhci_td_cleanup+0xcb/0x150
>     [ 1694.338051]  xhci_irq+0x66a/0x20b0
>     [ 1694.341854]  ? rcu_accelerate_cbs+0x3a/0x190
>     [ 1694.346628]  __handle_irq_event_percpu+0x32/0xd0
>     [ 1694.351791]  handle_irq_event_percpu+0x2b/0x70
>     [ 1694.356757]  handle_irq_event+0x2f/0x50
>     [ 1694.361044]  handle_edge_irq+0x72/0x190
>     [ 1694.365331]  handle_irq+0x17/0x20
>     [ 1694.369038]  do_IRQ+0x3c/0xc0
>     [ 1694.372355]  common_interrupt+0x7a/0x7a
>     [ 1694.376639]  </IRQ>
>     [ 1694.378981] RIP: 0010:cpuidle_enter_state+0x122/0x200
>     [ 1694.384622] RSP: 0018:ffffffff84003ea8 EFLAGS: 00000286 ORIG_RAX: ffffffffffffffbd
>     [ 1694.393087] RAX: ffff8b2b3fc1fa00 RBX: 0000000000000002 RCX: 0000018a721a5ae8
>     [ 1694.401063] RDX: 0000018a721a5ae8 RSI: 0000000050523fbe RDI: 0000000000000000
>     [ 1694.409040] RBP: ffff8b2b3aa13600 R08: 0000000000000193 R09: 000000000000012a
>     [ 1694.417015] R10: ffffffff84003e90 R11: 00000000000000ef R12: 0000018a721a5ae8
>     [ 1694.424990] R13: 0000018a72199308 R14: 0000000000000002 R15: 0000000072efca78
>     [ 1694.432970]  ? cpuidle_enter_state+0x11e/0x200
>     [ 1694.437937]  do_idle+0x161/0x1a0
>     [ 1694.441544]  cpu_startup_entry+0x6a/0x70
>     [ 1694.445927]  start_kernel+0x414/0x434
>     [ 1694.450023]  secondary_startup_64+0xa5/0xb0
>     [ 1694.454697] Code: 88 96 4f 17 00 74 02 5b c3 5b e9 4e 15 bc ff e8 d9 33 04 00 eb dd e8 c2 5a 0d 00 eb 9f 48 83 e7 fe e8 07 d7 01 00 e9 75 ff ff ff <0f> 0b eb ad 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 53 48 89 
>     [ 1694.475869] ---[ end trace 063196a1141471b3 ]---
> 

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Kernel-Warning when stopping the CAN interface
  2020-03-12  7:05   ` Wolfgang Grandegger
@ 2020-03-12  7:36     ` Wolfgang Grandegger
  0 siblings, 0 replies; 3+ messages in thread
From: Wolfgang Grandegger @ 2020-03-12  7:36 UTC (permalink / raw)
  To: Maximilian Schneider; +Cc: linux-can


Am 12.03.20 um 08:05 schrieb Wolfgang Grandegger:
> Hello Maximilian,
> 
> I think, the problem is not gs_usb specific. The following patch seems
> to help:
> 
> --- a/drivers/net/can/usb/gs_usb.c
> +++ b/drivers/net/can/usb/gs_usb.c
> @@ -321,7 +321,7 @@ static void gs_usb_receive_bulk_callback(struct urb *urb)
>  	netdev = dev->netdev;
>  	stats = &netdev->stats;
>  
> -	if (!netif_device_present(netdev))
> +	if (!netif_device_present(netdev) || !netif_running(netdev))
>  		return;
>  
>  	if (hf->echo_id == -1) { /* normal rx */
> 

Well, the warning still occurs, but at a much much lower rate... as
expected.

Wolfgang

> Am 12.03.20 um 07:06 schrieb Maximilian Schneider:
>> I guess the naive solution would be to test dev->state in the RX
>> callback. I doubt that's the best solution thought. Is it known if this
>> warning affects only the gs_usb driver?
> 
> 
> 
> 
> 
>> On 11 Mar 2020, at 16:53, Wolfgang Grandegger <wg@grandegger.com
>> <mailto:wg@grandegger.com>> wrote:
>>
>>     Hello,
>>
>>     I can easily trigger the kernel warning below with a "gs_usb" device when
>>     I send messages with "cangen -g1 can0" and then stop the interface with
>>     "ifconfig can0 down". The warning comes from here:
>>
>>       https://elixir.bootlin.com/linux/latest/source/net/core/skbuff.c#L651
>>
>>     What happens is that the "ifconfig down" command first sets
>>
>>       clear_bit(__LINK_STATE_START, &dev->state);
>>
>>     before it calls the device specific stop/close routine. In the meantime,
>>     TX done events may still occur, calling can_get_echo_skb(), which calls
>>     netif_rx() then netif_rx_internal() and finally enqueue_to_backlog().
>>     Because the interface is not running any longer, the skb is dropped in
>>     there:
>>
>>       https://elixir.bootlin.com/linux/latest/source/net/core/dev.c#L4394
>>
>>     causing the warning below, because the kfree_skb(skb) is called from
>>     the interrupt context.
>>
>>     How can I avoid that warning? Actually, I do not see a simple method.
>>     Maybe I have missed something.
>>
>>     Wolfgang
>>
>>     [ 1694.131475] ------------[ cut here ]------------
>>     [ 1694.139911] WARNING: CPU: 0 PID: 0 at skb_release_head_state+0x9e/0xb0
>>     [ 1694.147206] Modules linked in: intel_telemetry_pltdrv intel_pmc_ipc intel_punit_ipc intel_telemetry_core x86_pkg_temp_thermal igb tpm_tis tpm_tis_core btusb btrtl dca tpm lpc_ich btbcm spi_pxa2xx_platform mei_me btintel mei thermal rfcomm bluetooth ecdh_generic ntngdm_i2c ntngdm_core ntngdm_bl ntngdm_tsi ntngdm_irq ntngdm_hwmon i915 video
>>     [ 1694.180782] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G     U  W       4.14.51apollolake #1
>>     [ 1694.189828] Hardware name: Default string Default string/AGCO_NT-NG, BIOS X3.00c#AGCO BETA 09/19/2019
>>     [ 1694.200136] task: ffffffff840104c0 task.stack: ffffffff84000000
>>     [ 1694.206753] RIP: 0010:skb_release_head_state+0x9e/0xb0
>>     [ 1694.212494] RSP: 0018:ffff8b2b3fc03cb0 EFLAGS: 00010006
>>     [ 1694.218333] RAX: ffffffff835a4fb0 RBX: ffff8b2b3a20b100 RCX: 0000000000000018
>>     [ 1694.226310] RDX: 0000000000010000 RSI: 0000000000000000 RDI: 0000000000000000
>>     [ 1694.234284] RBP: 0000000000020ac0 R08: 0000000000000014 R09: 0000000000000014
>>     [ 1694.242259] R10: 0000000000000000 R11: ffff8b2b3a7a2180 R12: ffff8b2b3fc03d40
>>     [ 1694.250235] R13: ffff8b2b3a20b100 R14: ffff8b2b3fc20bcc R15: 0000000000000086
>>     [ 1694.258211] FS:  0000000000000000(0000) GS:ffff8b2b3fc00000(0000) knlGS:0000000000000000
>>     [ 1694.267256] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>     [ 1694.273678] CR2: 00007f83a2db5130 CR3: 0000000179c6a000 CR4: 00000000003406f0
>>     [ 1694.281653] Call Trace:
>>     [ 1694.284383]  <IRQ>
>>     [ 1694.286629]  skb_release_all+0x9/0x20
>>     [ 1694.290720]  kfree_skb+0x19/0x40
>>     [ 1694.294327]  enqueue_to_backlog+0x7e/0x230
>>     [ 1694.298906]  ? recalibrate_cpu_khz+0x10/0x10
>>     [ 1694.303677]  ? ktime_get_with_offset+0x4f/0xb0
>>     [ 1694.308644]  netif_rx_internal+0x56/0x150
>>     [ 1694.313126]  can_get_echo_skb+0x37/0x60
>>     [ 1694.317415]  gs_usb_receive_bulk_callback+0x111/0x320
>>     [ 1694.323063]  __usb_hcd_giveback_urb+0x7b/0xf0
>>     [ 1694.327924]  xhci_giveback_urb_in_irq.isra.55+0x6d/0x90
>>     [ 1694.333765]  xhci_td_cleanup+0xcb/0x150
>>     [ 1694.338051]  xhci_irq+0x66a/0x20b0
>>     [ 1694.341854]  ? rcu_accelerate_cbs+0x3a/0x190
>>     [ 1694.346628]  __handle_irq_event_percpu+0x32/0xd0
>>     [ 1694.351791]  handle_irq_event_percpu+0x2b/0x70
>>     [ 1694.356757]  handle_irq_event+0x2f/0x50
>>     [ 1694.361044]  handle_edge_irq+0x72/0x190
>>     [ 1694.365331]  handle_irq+0x17/0x20
>>     [ 1694.369038]  do_IRQ+0x3c/0xc0
>>     [ 1694.372355]  common_interrupt+0x7a/0x7a
>>     [ 1694.376639]  </IRQ>
>>     [ 1694.378981] RIP: 0010:cpuidle_enter_state+0x122/0x200
>>     [ 1694.384622] RSP: 0018:ffffffff84003ea8 EFLAGS: 00000286 ORIG_RAX: ffffffffffffffbd
>>     [ 1694.393087] RAX: ffff8b2b3fc1fa00 RBX: 0000000000000002 RCX: 0000018a721a5ae8
>>     [ 1694.401063] RDX: 0000018a721a5ae8 RSI: 0000000050523fbe RDI: 0000000000000000
>>     [ 1694.409040] RBP: ffff8b2b3aa13600 R08: 0000000000000193 R09: 000000000000012a
>>     [ 1694.417015] R10: ffffffff84003e90 R11: 00000000000000ef R12: 0000018a721a5ae8
>>     [ 1694.424990] R13: 0000018a72199308 R14: 0000000000000002 R15: 0000000072efca78
>>     [ 1694.432970]  ? cpuidle_enter_state+0x11e/0x200
>>     [ 1694.437937]  do_idle+0x161/0x1a0
>>     [ 1694.441544]  cpu_startup_entry+0x6a/0x70
>>     [ 1694.445927]  start_kernel+0x414/0x434
>>     [ 1694.450023]  secondary_startup_64+0xa5/0xb0
>>     [ 1694.454697] Code: 88 96 4f 17 00 74 02 5b c3 5b e9 4e 15 bc ff e8 d9 33 04 00 eb dd e8 c2 5a 0d 00 eb 9f 48 83 e7 fe e8 07 d7 01 00 e9 75 ff ff ff <0f> 0b eb ad 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 53 48 89 
>>     [ 1694.475869] ---[ end trace 063196a1141471b3 ]---
>>

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-03-12  7:36 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-11 15:47 Kernel-Warning when stopping the CAN interface Wolfgang Grandegger
     [not found] ` <4c5cd8bd-3d4d-4826-9333-0d0f23b30f5a@schneidersoft.net>
2020-03-12  7:05   ` Wolfgang Grandegger
2020-03-12  7:36     ` Wolfgang Grandegger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.