[PATCH net] be2net: Fix NULL pointer dereference in be_tx_timeout()

* [PATCH net] be2net: Fix NULL pointer dereference in be_tx_timeout()
@ 2018-11-22 14:24 Petr Oros
  2018-11-27 22:51 ` David Miller
  0 siblings, 1 reply; 6+ messages in thread
From: Petr Oros @ 2018-11-22 14:24 UTC (permalink / raw)
  To: netdev; +Cc: ivecera, davem

The driver enumerates Tx queues in ndo_tx_timeout() handler, here is
possible race with be_update_queues. For this case we set carrier_off.
It prevents netdev watchdog to be fired after be_clear_queues().
The watchdog timeout doesn't make any sense here as we re-creating queues.

Reproducer:
We can reproduce bug with ethtool when changing queue count
  ethtool -L $netif combined 1
  ethtool -L $netif combined 32
If oops is not triggered imediately, just run it again or in loop.

Oops:
[  865.768648] NETDEV WATCHDOG: enp4s0f0 (be2net): transmit queue 0 timed out
[  865.775539] WARNING: CPU: 3 PID: 0 at net/sched/sch_generic.c:461 dev_watchdog+0x20d/0x220
[  865.783796] Modules linked in: be2net intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul iTCO_wdt iTCO_vendor_support ghash_clmulni_intel mei_me intel_cstate intel_uncore ipmi_ssif mei ipmi_si pcspkr sg i2c_i801 joydev lpc_ich intel_rapl_perf ipmi_devintf ioatdma ipmi_msghandler xfs libcrc32c sd_mod mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ahci libahci crc32c_intel drm serio_raw libata igb dca i2c_algo_bit wmi dm_mirror dm_region_hash dm_log dm_mod [last unloaded: be2net]
[  865.834289] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.20.0-rc3+ #2
[  865.840640] Hardware name: Supermicro X9DBU/X9DBU, BIOS 3.2 01/15/2015
[  865.847168] RIP: 0010:dev_watchdog+0x20d/0x220
[  865.851612] Code: 00 49 63 4e e0 eb 92 4c 89 e7 c6 05 a5 de c9 00 01 e8 f7 b2 fc ff 89 d9 4c 89 e6 48 c7 c7 a0 d1 b2 99 48 89 c2 e8 7d b0 98 ff <0f> 0b eb c0 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 66 66 66
[  865.870358] RSP: 0018:ffff9bee73ac3e88 EFLAGS: 00010282
[  865.875583] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 000000000000083f
[  865.882707] RDX: 0000000000000000 RSI: 00000000000000f6 RDI: 000000000000003f
[  865.889832] RBP: ffff9bee5fa0045c R08: 0000000000000824 R09: 0000000000000007
[  865.896956] R10: 0000000000000000 R11: ffffffff9a3f162d R12: ffff9bee5fa00000
[  865.904088] R13: 0000000000000003 R14: ffff9bee5fa00480 R15: 0000000000000020
[  865.911214] FS:  0000000000000000(0000) GS:ffff9bee73ac0000(0000) knlGS:0000000000000000
[  865.919298] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  865.925037] CR2: 00005580497ce040 CR3: 00000002cf60a004 CR4: 00000000000606e0
[  865.932170] Call Trace:
[  865.934626]  <IRQ>
[  865.936645]  ? pfifo_fast_dequeue+0x160/0x160
[  865.941005]  call_timer_fn+0x2b/0x130
[  865.944670]  run_timer_softirq+0x3b9/0x3f0
[  865.948768]  ? tick_sched_timer+0x37/0x70
[  865.952779]  ? __hrtimer_run_queues+0x110/0x280
[  865.957314]  __do_softirq+0xdd/0x2fe
[  865.960896]  irq_exit+0xfa/0x100
[  865.964125]  smp_apic_timer_interrupt+0x74/0x140
[  865.968745]  apic_timer_interrupt+0xf/0x20
[  865.972844]  </IRQ>
[  865.974953] RIP: 0010:cpuidle_enter_state+0xb0/0x320
[  865.979915] Code: 89 c3 66 66 66 66 90 31 ff e8 0c 07 a6 ff 80 7c 24 0b 00 74 12 9c 58 f6 c4 02 0f 85 46 02 00 00 31 ff e8 33 e0 ab ff fb 85 ed <0f> 88 1a 02 00 00 48 b8 ff ff ff ff f3 01 00 00 48 2b 1c 24 48 39
[  865.998661] RSP: 0018:ffffbc9ac19e7ea0 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff13
[  866.006225] RAX: ffff9bee73ae1dc0 RBX: 000000c9938e11ae RCX: 000000000000001f
[  866.013350] RDX: 000000c9938e11ae RSI: 00000000435e532a RDI: 0000000000000000
[  866.020474] RBP: 0000000000000005 R08: 0000000000000002 R09: 0000000000021640
[  866.027598] R10: 00009c434b946fde R11: ffff9bee73ae0e44 R12: ffffffff99d27538
[  866.034723] R13: ffff9bee73aec628 R14: 0000000000000005 R15: 0000000000000000
[  866.041860]  do_idle+0x1f1/0x230
[  866.045091]  cpu_startup_entry+0x19/0x20
[  866.049016]  start_secondary+0x195/0x1e0
[  866.052943]  secondary_startup_64+0xb6/0xc0
[  866.057129] ---[ end trace dead88c26bcd8261 ]---
[  866.061750] be2net 0000:04:00.0: TXQ Dump: 0 H: 0 T: 0 used: 0, qid: 0x2
[  866.068452] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
[  866.076273] PGD 0 P4D 0
[  866.078810] Oops: 0000 [#1] SMP PTI
[  866.082305] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G        W         4.20.0-rc3+ #2
[  866.090041] Hardware name: Supermicro X9DBU/X9DBU, BIOS 3.2 01/15/2015
[  866.096566] RIP: 0010:be_tx_timeout+0x7c/0x300 [be2net]
[  866.101786] Code: 8b 45 1c 41 8b 4d 14 48 89 df 31 ed 45 8b 4d 18 48 c7 c6 80 51 2c c0 50 45 8b 45 10 8b 54 24 14 e8 09 a7 cb d8 4d 8b 7d 20 59 <41> 8b 0c af 45 8b 44 af 04 41 8b 74 af 0c 45 8b 4c af 08 89 ca 44
[  866.120532] RSP: 0018:ffff9bee73ac3e38 EFLAGS: 00010246
[  866.125758] RAX: 0000000000000000 RBX: ffff9bee72d6b0b0 RCX: 0000000000000002
[  866.132882] RDX: 0000000000000000 RSI: 00000000000000f6 RDI: 000000000000003f
[  866.140014] RBP: 0000000000000000 R08: 000000000000084d R09: 0000000000000007
[  866.147138] R10: 0000000000000000 R11: ffffffff9a3f162d R12: ffffffffc02c60ab
[  866.154263] R13: ffff9bee5fa04b40 R14: ffffffffc02c613a R15: 0000000000000000
[  866.161388] FS:  0000000000000000(0000) GS:ffff9bee73ac0000(0000) knlGS:0000000000000000
[  866.169472] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  866.175210] CR2: 0000000000000000 CR3: 00000002cf60a004 CR4: 00000000000606e0
[  866.182334] Call Trace:
[  866.184781]  <IRQ>
[  866.186799]  dev_watchdog+0x1e4/0x220
[  866.190466]  ? pfifo_fast_dequeue+0x160/0x160
[  866.194825]  call_timer_fn+0x2b/0x130
[  866.198491]  run_timer_softirq+0x3b9/0x3f0
[  866.202590]  ? tick_sched_timer+0x37/0x70
[  866.206604]  ? __hrtimer_run_queues+0x110/0x280
[  866.211134]  __do_softirq+0xdd/0x2fe
[  866.214715]  irq_exit+0xfa/0x100
[  866.217948]  smp_apic_timer_interrupt+0x74/0x140
[  866.222566]  apic_timer_interrupt+0xf/0x20
[  866.226664]  </IRQ>
[  866.228762] RIP: 0010:cpuidle_enter_state+0xb0/0x320
[  866.233727] Code: 89 c3 66 66 66 66 90 31 ff e8 0c 07 a6 ff 80 7c 24 0b 00 74 12 9c 58 f6 c4 02 0f 85 46 02 00 00 31 ff e8 33 e0 ab ff fb 85 ed <0f> 88 1a 02 00 00 48 b8 ff ff ff ff f3 01 00 00 48 2b 1c 24 48 39
[  866.252465] RSP: 0018:ffffbc9ac19e7ea0 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff13
[  866.260031] RAX: ffff9bee73ae1dc0 RBX: 000000c9938e11ae RCX: 000000000000001f
[  866.267164] RDX: 000000c9938e11ae RSI: 00000000435e532a RDI: 0000000000000000
[  866.274288] RBP: 0000000000000005 R08: 0000000000000002 R09: 0000000000021640
[  866.281421] R10: 00009c434b946fde R11: ffff9bee73ae0e44 R12: ffffffff99d27538
[  866.288545] R13: ffff9bee73aec628 R14: 0000000000000005 R15: 0000000000000000
[  866.295680]  do_idle+0x1f1/0x230
[  866.298913]  cpu_startup_entry+0x19/0x20
[  866.302839]  start_secondary+0x195/0x1e0
[  866.306764]  secondary_startup_64+0xb6/0xc0
[  866.310948] Modules linked in: be2net intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul iTCO_wdt iTCO_vendor_support ghash_clmulni_intel mei_me intel_cstate intel_uncore ipmi_ssif mei ipmi_si pcspkr sg i2c_i801 joydev lpc_ich intel_rapl_perf ipmi_devintf ioatdma ipmi_msghandler xfs libcrc32c sd_mod mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ahci libahci crc32c_intel drm serio_raw libata igb dca i2c_algo_bit wmi dm_mirror dm_region_hash dm_log dm_mod [last unloaded: be2net]
[  866.361432] CR2: 0000000000000000
[  866.364748] ---[ end trace dead88c26bcd8262 ]---
[  866.507013] RIP: 0010:be_tx_timeout+0x7c/0x300 [be2net]
[  866.512234] Code: 8b 45 1c 41 8b 4d 14 48 89 df 31 ed 45 8b 4d 18 48 c7 c6 80 51 2c c0 50 45 8b 45 10 8b 54 24 14 e8 09 a7 cb d8 4d 8b 7d 20 59 <41> 8b 0c af 45 8b 44 af 04 41 8b 74 af 0c 45 8b 4c af 08 89 ca 44
[  866.530980] RSP: 0018:ffff9bee73ac3e38 EFLAGS: 00010246
[  866.536206] RAX: 0000000000000000 RBX: ffff9bee72d6b0b0 RCX: 0000000000000002
[  866.543330] RDX: 0000000000000000 RSI: 00000000000000f6 RDI: 000000000000003f
[  866.550454] RBP: 0000000000000000 R08: 000000000000084d R09: 0000000000000007
[  866.557578] R10: 0000000000000000 R11: ffffffff9a3f162d R12: ffffffffc02c60ab
[  866.564710] R13: ffff9bee5fa04b40 R14: ffffffffc02c613a R15: 0000000000000000
[  866.571835] FS:  0000000000000000(0000) GS:ffff9bee73ac0000(0000) knlGS:0000000000000000
[  866.579920] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  866.585658] CR2: 0000000000000000 CR3: 00000002cf60a004 CR4: 00000000000606e0
[  866.592784] Kernel panic - not syncing: Fatal exception in interrupt
[  866.599179] Kernel Offset: 0x17a00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)

Fixes: c1b3bdb2ffa9 ("be2net: gather debug info and reset adapter (only for Lancer) on a tx-timeout")
Signed-off-by: Petr Oros <poros@redhat.com>
---
 drivers/net/ethernet/emulex/benet/be_main.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/emulex/benet/be_main.c b/drivers/net/ethernet/emulex/benet/be_main.c
index c5ad7a4f4d83..02202c5e6794 100644
--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -4700,8 +4700,11 @@ int be_update_queues(struct be_adapter *adapter)
 	struct net_device *netdev = adapter->netdev;
 	int status;
 
-	if (netif_running(netdev))
+	if (netif_running(netdev)) {
+		/* prevent netdev watchdog during tx queue destroy */
+		netif_carrier_off(netdev);
 		be_close(netdev);
+	}
 
 	be_cancel_worker(adapter);
 
-- 
2.18.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread