netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net] be2net: Fix NULL pointer dereference in be_tx_timeout()
@ 2018-11-22 14:24 Petr Oros
  2018-11-27 22:51 ` David Miller
  0 siblings, 1 reply; 6+ messages in thread
From: Petr Oros @ 2018-11-22 14:24 UTC (permalink / raw)
  To: netdev; +Cc: ivecera, davem

The driver enumerates Tx queues in ndo_tx_timeout() handler, here is
possible race with be_update_queues. For this case we set carrier_off.
It prevents netdev watchdog to be fired after be_clear_queues().
The watchdog timeout doesn't make any sense here as we re-creating queues.

Reproducer:
We can reproduce bug with ethtool when changing queue count
  ethtool -L $netif combined 1
  ethtool -L $netif combined 32
If oops is not triggered imediately, just run it again or in loop.

Oops:
[  865.768648] NETDEV WATCHDOG: enp4s0f0 (be2net): transmit queue 0 timed out
[  865.775539] WARNING: CPU: 3 PID: 0 at net/sched/sch_generic.c:461 dev_watchdog+0x20d/0x220
[  865.783796] Modules linked in: be2net intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul iTCO_wdt iTCO_vendor_support ghash_clmulni_intel mei_me intel_cstate intel_uncore ipmi_ssif mei ipmi_si pcspkr sg i2c_i801 joydev lpc_ich intel_rapl_perf ipmi_devintf ioatdma ipmi_msghandler xfs libcrc32c sd_mod mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ahci libahci crc32c_intel drm serio_raw libata igb dca i2c_algo_bit wmi dm_mirror dm_region_hash dm_log dm_mod [last unloaded: be2net]
[  865.834289] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.20.0-rc3+ #2
[  865.840640] Hardware name: Supermicro X9DBU/X9DBU, BIOS 3.2 01/15/2015
[  865.847168] RIP: 0010:dev_watchdog+0x20d/0x220
[  865.851612] Code: 00 49 63 4e e0 eb 92 4c 89 e7 c6 05 a5 de c9 00 01 e8 f7 b2 fc ff 89 d9 4c 89 e6 48 c7 c7 a0 d1 b2 99 48 89 c2 e8 7d b0 98 ff <0f> 0b eb c0 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 66 66 66
[  865.870358] RSP: 0018:ffff9bee73ac3e88 EFLAGS: 00010282
[  865.875583] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 000000000000083f
[  865.882707] RDX: 0000000000000000 RSI: 00000000000000f6 RDI: 000000000000003f
[  865.889832] RBP: ffff9bee5fa0045c R08: 0000000000000824 R09: 0000000000000007
[  865.896956] R10: 0000000000000000 R11: ffffffff9a3f162d R12: ffff9bee5fa00000
[  865.904088] R13: 0000000000000003 R14: ffff9bee5fa00480 R15: 0000000000000020
[  865.911214] FS:  0000000000000000(0000) GS:ffff9bee73ac0000(0000) knlGS:0000000000000000
[  865.919298] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  865.925037] CR2: 00005580497ce040 CR3: 00000002cf60a004 CR4: 00000000000606e0
[  865.932170] Call Trace:
[  865.934626]  <IRQ>
[  865.936645]  ? pfifo_fast_dequeue+0x160/0x160
[  865.941005]  call_timer_fn+0x2b/0x130
[  865.944670]  run_timer_softirq+0x3b9/0x3f0
[  865.948768]  ? tick_sched_timer+0x37/0x70
[  865.952779]  ? __hrtimer_run_queues+0x110/0x280
[  865.957314]  __do_softirq+0xdd/0x2fe
[  865.960896]  irq_exit+0xfa/0x100
[  865.964125]  smp_apic_timer_interrupt+0x74/0x140
[  865.968745]  apic_timer_interrupt+0xf/0x20
[  865.972844]  </IRQ>
[  865.974953] RIP: 0010:cpuidle_enter_state+0xb0/0x320
[  865.979915] Code: 89 c3 66 66 66 66 90 31 ff e8 0c 07 a6 ff 80 7c 24 0b 00 74 12 9c 58 f6 c4 02 0f 85 46 02 00 00 31 ff e8 33 e0 ab ff fb 85 ed <0f> 88 1a 02 00 00 48 b8 ff ff ff ff f3 01 00 00 48 2b 1c 24 48 39
[  865.998661] RSP: 0018:ffffbc9ac19e7ea0 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff13
[  866.006225] RAX: ffff9bee73ae1dc0 RBX: 000000c9938e11ae RCX: 000000000000001f
[  866.013350] RDX: 000000c9938e11ae RSI: 00000000435e532a RDI: 0000000000000000
[  866.020474] RBP: 0000000000000005 R08: 0000000000000002 R09: 0000000000021640
[  866.027598] R10: 00009c434b946fde R11: ffff9bee73ae0e44 R12: ffffffff99d27538
[  866.034723] R13: ffff9bee73aec628 R14: 0000000000000005 R15: 0000000000000000
[  866.041860]  do_idle+0x1f1/0x230
[  866.045091]  cpu_startup_entry+0x19/0x20
[  866.049016]  start_secondary+0x195/0x1e0
[  866.052943]  secondary_startup_64+0xb6/0xc0
[  866.057129] ---[ end trace dead88c26bcd8261 ]---
[  866.061750] be2net 0000:04:00.0: TXQ Dump: 0 H: 0 T: 0 used: 0, qid: 0x2
[  866.068452] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
[  866.076273] PGD 0 P4D 0
[  866.078810] Oops: 0000 [#1] SMP PTI
[  866.082305] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G        W         4.20.0-rc3+ #2
[  866.090041] Hardware name: Supermicro X9DBU/X9DBU, BIOS 3.2 01/15/2015
[  866.096566] RIP: 0010:be_tx_timeout+0x7c/0x300 [be2net]
[  866.101786] Code: 8b 45 1c 41 8b 4d 14 48 89 df 31 ed 45 8b 4d 18 48 c7 c6 80 51 2c c0 50 45 8b 45 10 8b 54 24 14 e8 09 a7 cb d8 4d 8b 7d 20 59 <41> 8b 0c af 45 8b 44 af 04 41 8b 74 af 0c 45 8b 4c af 08 89 ca 44
[  866.120532] RSP: 0018:ffff9bee73ac3e38 EFLAGS: 00010246
[  866.125758] RAX: 0000000000000000 RBX: ffff9bee72d6b0b0 RCX: 0000000000000002
[  866.132882] RDX: 0000000000000000 RSI: 00000000000000f6 RDI: 000000000000003f
[  866.140014] RBP: 0000000000000000 R08: 000000000000084d R09: 0000000000000007
[  866.147138] R10: 0000000000000000 R11: ffffffff9a3f162d R12: ffffffffc02c60ab
[  866.154263] R13: ffff9bee5fa04b40 R14: ffffffffc02c613a R15: 0000000000000000
[  866.161388] FS:  0000000000000000(0000) GS:ffff9bee73ac0000(0000) knlGS:0000000000000000
[  866.169472] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  866.175210] CR2: 0000000000000000 CR3: 00000002cf60a004 CR4: 00000000000606e0
[  866.182334] Call Trace:
[  866.184781]  <IRQ>
[  866.186799]  dev_watchdog+0x1e4/0x220
[  866.190466]  ? pfifo_fast_dequeue+0x160/0x160
[  866.194825]  call_timer_fn+0x2b/0x130
[  866.198491]  run_timer_softirq+0x3b9/0x3f0
[  866.202590]  ? tick_sched_timer+0x37/0x70
[  866.206604]  ? __hrtimer_run_queues+0x110/0x280
[  866.211134]  __do_softirq+0xdd/0x2fe
[  866.214715]  irq_exit+0xfa/0x100
[  866.217948]  smp_apic_timer_interrupt+0x74/0x140
[  866.222566]  apic_timer_interrupt+0xf/0x20
[  866.226664]  </IRQ>
[  866.228762] RIP: 0010:cpuidle_enter_state+0xb0/0x320
[  866.233727] Code: 89 c3 66 66 66 66 90 31 ff e8 0c 07 a6 ff 80 7c 24 0b 00 74 12 9c 58 f6 c4 02 0f 85 46 02 00 00 31 ff e8 33 e0 ab ff fb 85 ed <0f> 88 1a 02 00 00 48 b8 ff ff ff ff f3 01 00 00 48 2b 1c 24 48 39
[  866.252465] RSP: 0018:ffffbc9ac19e7ea0 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff13
[  866.260031] RAX: ffff9bee73ae1dc0 RBX: 000000c9938e11ae RCX: 000000000000001f
[  866.267164] RDX: 000000c9938e11ae RSI: 00000000435e532a RDI: 0000000000000000
[  866.274288] RBP: 0000000000000005 R08: 0000000000000002 R09: 0000000000021640
[  866.281421] R10: 00009c434b946fde R11: ffff9bee73ae0e44 R12: ffffffff99d27538
[  866.288545] R13: ffff9bee73aec628 R14: 0000000000000005 R15: 0000000000000000
[  866.295680]  do_idle+0x1f1/0x230
[  866.298913]  cpu_startup_entry+0x19/0x20
[  866.302839]  start_secondary+0x195/0x1e0
[  866.306764]  secondary_startup_64+0xb6/0xc0
[  866.310948] Modules linked in: be2net intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul iTCO_wdt iTCO_vendor_support ghash_clmulni_intel mei_me intel_cstate intel_uncore ipmi_ssif mei ipmi_si pcspkr sg i2c_i801 joydev lpc_ich intel_rapl_perf ipmi_devintf ioatdma ipmi_msghandler xfs libcrc32c sd_mod mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ahci libahci crc32c_intel drm serio_raw libata igb dca i2c_algo_bit wmi dm_mirror dm_region_hash dm_log dm_mod [last unloaded: be2net]
[  866.361432] CR2: 0000000000000000
[  866.364748] ---[ end trace dead88c26bcd8262 ]---
[  866.507013] RIP: 0010:be_tx_timeout+0x7c/0x300 [be2net]
[  866.512234] Code: 8b 45 1c 41 8b 4d 14 48 89 df 31 ed 45 8b 4d 18 48 c7 c6 80 51 2c c0 50 45 8b 45 10 8b 54 24 14 e8 09 a7 cb d8 4d 8b 7d 20 59 <41> 8b 0c af 45 8b 44 af 04 41 8b 74 af 0c 45 8b 4c af 08 89 ca 44
[  866.530980] RSP: 0018:ffff9bee73ac3e38 EFLAGS: 00010246
[  866.536206] RAX: 0000000000000000 RBX: ffff9bee72d6b0b0 RCX: 0000000000000002
[  866.543330] RDX: 0000000000000000 RSI: 00000000000000f6 RDI: 000000000000003f
[  866.550454] RBP: 0000000000000000 R08: 000000000000084d R09: 0000000000000007
[  866.557578] R10: 0000000000000000 R11: ffffffff9a3f162d R12: ffffffffc02c60ab
[  866.564710] R13: ffff9bee5fa04b40 R14: ffffffffc02c613a R15: 0000000000000000
[  866.571835] FS:  0000000000000000(0000) GS:ffff9bee73ac0000(0000) knlGS:0000000000000000
[  866.579920] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  866.585658] CR2: 0000000000000000 CR3: 00000002cf60a004 CR4: 00000000000606e0
[  866.592784] Kernel panic - not syncing: Fatal exception in interrupt
[  866.599179] Kernel Offset: 0x17a00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)

Fixes: c1b3bdb2ffa9 ("be2net: gather debug info and reset adapter (only for Lancer) on a tx-timeout")
Signed-off-by: Petr Oros <poros@redhat.com>
---
 drivers/net/ethernet/emulex/benet/be_main.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/emulex/benet/be_main.c b/drivers/net/ethernet/emulex/benet/be_main.c
index c5ad7a4f4d83..02202c5e6794 100644
--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -4700,8 +4700,11 @@ int be_update_queues(struct be_adapter *adapter)
 	struct net_device *netdev = adapter->netdev;
 	int status;
 
-	if (netif_running(netdev))
+	if (netif_running(netdev)) {
+		/* prevent netdev watchdog during tx queue destroy */
+		netif_carrier_off(netdev);
 		be_close(netdev);
+	}
 
 	be_cancel_worker(adapter);
 
-- 
2.18.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH net] be2net: Fix NULL pointer dereference in be_tx_timeout()
  2018-11-22 14:24 [PATCH net] be2net: Fix NULL pointer dereference in be_tx_timeout() Petr Oros
@ 2018-11-27 22:51 ` David Miller
  2018-11-28 10:12   ` Ivan Vecera
  0 siblings, 1 reply; 6+ messages in thread
From: David Miller @ 2018-11-27 22:51 UTC (permalink / raw)
  To: poros; +Cc: netdev, ivecera

From: Petr Oros <poros@redhat.com>
Date: Thu, 22 Nov 2018 15:24:07 +0100

> @@ -4700,8 +4700,11 @@ int be_update_queues(struct be_adapter *adapter)
>  	struct net_device *netdev = adapter->netdev;
>  	int status;
>  
> -	if (netif_running(netdev))
> +	if (netif_running(netdev)) {
> +		/* prevent netdev watchdog during tx queue destroy */
> +		netif_carrier_off(netdev);
>  		be_close(netdev);
> +	}

This will make userspace networking daemons will think that the link
went down.

This absolutely should not be a side effect of making a simple
TX queue configuration change via ethtool.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net] be2net: Fix NULL pointer dereference in be_tx_timeout()
  2018-11-27 22:51 ` David Miller
@ 2018-11-28 10:12   ` Ivan Vecera
  2018-11-28 19:00     ` David Miller
  0 siblings, 1 reply; 6+ messages in thread
From: Ivan Vecera @ 2018-11-28 10:12 UTC (permalink / raw)
  To: David Miller, poros; +Cc: netdev

On 27. 11. 18 23:51, David Miller wrote:
> From: Petr Oros <poros@redhat.com>
> Date: Thu, 22 Nov 2018 15:24:07 +0100
> 
>> @@ -4700,8 +4700,11 @@ int be_update_queues(struct be_adapter *adapter)
>>   	struct net_device *netdev = adapter->netdev;
>>   	int status;
>>   
>> -	if (netif_running(netdev))
>> +	if (netif_running(netdev)) {
>> +		/* prevent netdev watchdog during tx queue destroy */
>> +		netif_carrier_off(netdev);
>>   		be_close(netdev);
>> +	}
> 
> This will make userspace networking daemons will think that the link
> went down.
> 
> This absolutely should not be a side effect of making a simple
> TX queue configuration change via ethtool.
> 

Yes, you're right Dave. But the same thing (netif_carrier_off()) is done by 
number of other drivers (igb, tg3, ixgbe...) during .set_channels() callback. 
The patch that Petr sent does the practically the same thing like this one:

commit c0dfb90e5b2d41c907de9b624657a6688541837e
Author: John Fastabend <john.r.fastabend@intel.com>
Date:   Tue Apr 27 02:13:39 2010 +0000

     ixgbe: ixgbe_down needs to stop dev_watchdog

     There is a small race between when the tx queues are stopped
     and when netif_carrier_off() is called in ixgbe_down.  If the
     dev_watchdog() timer fires during this time it is possible for
     a false tx timeout to occur.

     This patch moves the netif_carrier_off() so that it is called before
     the tx queues are stopped preventing the dev_watchdog timer from
     detecting false tx timeouts.  The race is seen occosionally when
     FCoE or DCB settings are being configured or changed.

     Testing note, running ifconfig up/down will not reproduce this
     issue because dev_open/dev_close call dev_deactivate() and then
     dev_activate().

where netif_carrier_off() is called prior netif_tx_disable() from ixgbe_down() 
to avoid false Tx timeout. And ixgbe_down() is called from ixgbe_set_channels() 
this way:
ixgbe_set_channels()->ixgbe_setup_tc()->ixgbe_close()->ixgbe_close_suspend()->ixgbe_down()

As I said the similar approach is used by other drivers as well.

The mlx4 driver resolves this situation differently. It calls 
mlx4_en_stop_port() from mlx4_en_set_channels() with parameter 'detach'==1 that 
causes that netif_device_detach() is called prior stopping of Tx queues. This 
also effectively prevents dev_watchdog() to call .ndo_tx_timeout() callback. An 
advantage is netif_device_detach() does not fire linkwatch events.

So... what ways is the _right_ one ?

Thanks,
Ivan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net] be2net: Fix NULL pointer dereference in be_tx_timeout()
  2018-11-28 10:12   ` Ivan Vecera
@ 2018-11-28 19:00     ` David Miller
  2018-11-28 19:29       ` Ivan Vecera
  0 siblings, 1 reply; 6+ messages in thread
From: David Miller @ 2018-11-28 19:00 UTC (permalink / raw)
  To: ivecera; +Cc: poros, netdev

From: Ivan Vecera <ivecera@redhat.com>
Date: Wed, 28 Nov 2018 11:12:22 +0100

> On 27. 11. 18 23:51, David Miller wrote:
>> From: Petr Oros <poros@redhat.com>
>> Date: Thu, 22 Nov 2018 15:24:07 +0100
>> 
>>> @@ -4700,8 +4700,11 @@ int be_update_queues(struct be_adapter
>>> *adapter)
>>>   	struct net_device *netdev = adapter->netdev;
>>>   	int status;
>>>   -	if (netif_running(netdev))
>>> +	if (netif_running(netdev)) {
>>> +		/* prevent netdev watchdog during tx queue destroy */
>>> +		netif_carrier_off(netdev);
>>>   		be_close(netdev);
>>> +	}
>> This will make userspace networking daemons will think that the link
>> went down.
>> This absolutely should not be a side effect of making a simple
>> TX queue configuration change via ethtool.
>> 
> 
> Yes, you're right Dave. But the same thing (netif_carrier_off()) is
> done by number of other drivers (igb, tg3, ixgbe...) during
> .set_channels() callback. The patch that Petr sent does the
> practically the same thing like this one:

Bug exist in other drivers, thanks for reporting that.

It doesn't mean you should add the same bug here as well.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net] be2net: Fix NULL pointer dereference in be_tx_timeout()
  2018-11-28 19:00     ` David Miller
@ 2018-11-28 19:29       ` Ivan Vecera
  2018-11-28 19:32         ` David Miller
  0 siblings, 1 reply; 6+ messages in thread
From: Ivan Vecera @ 2018-11-28 19:29 UTC (permalink / raw)
  To: David Miller; +Cc: poros, netdev

On 28. 11. 18 20:00, David Miller wrote:
> From: Ivan Vecera <ivecera@redhat.com>
> Date: Wed, 28 Nov 2018 11:12:22 +0100
> 
>> On 27. 11. 18 23:51, David Miller wrote:
>>> From: Petr Oros <poros@redhat.com>
>>> Date: Thu, 22 Nov 2018 15:24:07 +0100
>>>
>>>> @@ -4700,8 +4700,11 @@ int be_update_queues(struct be_adapter
>>>> *adapter)
>>>>    	struct net_device *netdev = adapter->netdev;
>>>>    	int status;
>>>>    -	if (netif_running(netdev))
>>>> +	if (netif_running(netdev)) {
>>>> +		/* prevent netdev watchdog during tx queue destroy */
>>>> +		netif_carrier_off(netdev);
>>>>    		be_close(netdev);
>>>> +	}
>>> This will make userspace networking daemons will think that the link
>>> went down.
>>> This absolutely should not be a side effect of making a simple
>>> TX queue configuration change via ethtool.
>>>
>>
>> Yes, you're right Dave. But the same thing (netif_carrier_off()) is
>> done by number of other drivers (igb, tg3, ixgbe...) during
>> .set_channels() callback. The patch that Petr sent does the
>> practically the same thing like this one:
> 
> Bug exist in other drivers, thanks for reporting that.
> 
> It doesn't mean you should add the same bug here as well.
> 
OK.
And the right way? netif_device_detach() that does not fire linkwatch events?

Thx,
Ivan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net] be2net: Fix NULL pointer dereference in be_tx_timeout()
  2018-11-28 19:29       ` Ivan Vecera
@ 2018-11-28 19:32         ` David Miller
  0 siblings, 0 replies; 6+ messages in thread
From: David Miller @ 2018-11-28 19:32 UTC (permalink / raw)
  To: ivecera; +Cc: poros, netdev

From: Ivan Vecera <ivecera@redhat.com>
Date: Wed, 28 Nov 2018 20:29:44 +0100

> And the right way? netif_device_detach() that does not fire
> linkwatch events?

Allocate all the TX queue resources first.

Do not modify the TX queue configuration in any way whatsoever
meanwhile.

Only after successfully allocating the resources, should you
commit them to the hardware configuration.

That way there is no problem and you don't have to worry about
the OOPS in question.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-11-29  6:34 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-22 14:24 [PATCH net] be2net: Fix NULL pointer dereference in be_tx_timeout() Petr Oros
2018-11-27 22:51 ` David Miller
2018-11-28 10:12   ` Ivan Vecera
2018-11-28 19:00     ` David Miller
2018-11-28 19:29       ` Ivan Vecera
2018-11-28 19:32         ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).