netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Kernel oops with mlx5 and dual XDP redirect programs
@ 2018-10-03  9:30 Toke Høiland-Jørgensen
  2018-10-03 23:44 ` Saeed Mahameed
  0 siblings, 1 reply; 8+ messages in thread
From: Toke Høiland-Jørgensen @ 2018-10-03  9:30 UTC (permalink / raw)
  To: Saeed Mahameed, netdev; +Cc: brouer, Tariq Toukan, Eran Ben Elisha

Hi Saeed

I can reliably oops the kernel with the mlx5 driver, by installing
XDP_REDIRECT programs on two devices so they redirect to each other,
and then remove them while there is traffic on the interface.

Steps to reproduce:

# cd ~/build/linux/samples/bpf
# ./xdp_redirect_map $(</sys/class/net/ens1f1/ifindex) $(</sys/class/net/ens1f0/ifindex)
# ./xdp_redirect_map $(</sys/class/net/ens1f0/ifindex) $(</sys/class/net/ens1f1/ifindex)

Now, run some traffic (e.g., using pktgen) across the interfaces, and
while the traffic is running, interrupt one of the xdp_redirect_map
commands (thus unloading the eBPF program). This results in a kernel
oops with the backtrace below. I get no crash if there's only a single
XDP program.

Is this something you could look into, please? :)

-Toke


[ 1400.937870] BUG: unable to handle kernel paging request at 0000000000003fa8
[ 1400.944826] PGD 800000072cc7b067 P4D 800000072cc7b067 PUD 72cc7a067 PMD 0 
[ 1400.951693] Oops: 0000 [#1] SMP PTI
[ 1400.955184] CPU: 5 PID: 10392 Comm: xdp_redirect_ma Not tainted 4.19.0-rc5-xdptest-g5be3ebf+ #17
[ 1400.965344] Hardware name: LENOVO 30B3005DMT/102F, BIOS S00KT56A 01/15/2018
[ 1400.972318] RIP: 0010:mlx5e_xdp_xmit+0x7b/0x2a0 [mlx5_core]
[ 1400.977889] Code: 8b 0d 29 d9 4f 3f 39 8f 48 39 00 00 b8 fa ff ff ff 0f 86 45 01 00 00 48 8b 87 40 39 00 00 48 63 c9 4c 8b 24 c8 b8 9c ff ff ff <49> 8b 8c 24 a8 3f 00 00 4d 8d bc 24 c0 3c 00 00 83 e1 01 0f 84 19
[ 1400.996624] RSP: 0018:ffff90209fb43bb0 EFLAGS: 00010202
[ 1401.002001] RAX: 00000000ffffff9c RBX: 0000000000000000 RCX: 0000000000000005
[ 1401.009122] RDX: ffffc7627fd75190 RSI: 0000000000000010 RDI: ffff902084580000
[ 1401.016250] RBP: ffffc7627fd75190 R08: ffff901f9821c100 R09: ffffc7627fd75210
[ 1401.023379] R10: 00000000000005dc R11: 0000000000000000 R12: 0000000000000000
[ 1401.030500] R13: ffff902081580000 R14: 0000000000000001 R15: ffffc7627fd75190
[ 1401.037645] FS:  00007f460fa96700(0000) GS:ffff90209fb40000(0000) knlGS:0000000000000000
[ 1401.045718] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1401.051452] CR2: 0000000000003fa8 CR3: 000000076c3b6006 CR4: 00000000003606e0
[ 1401.058573] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1401.065823] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1401.072943] Call Trace:
[ 1401.075390]  <IRQ>
[ 1401.077409]  bq_xmit_all+0x5e/0x160
[ 1401.080897]  dev_map_enqueue+0x12e/0x140
[ 1401.084823]  xdp_do_redirect+0x1a9/0x2a0
[ 1401.088756]  mlx5e_xdp_handle+0x24f/0x2b0 [mlx5_core]
[ 1401.093821]  ? resched_cpu+0x5f/0x70
[ 1401.097399]  ? __xdp_return+0x189/0x400
[ 1401.101242]  mlx5e_skb_from_cqe_linear+0xdd/0x180 [mlx5_core]
[ 1401.106987]  mlx5e_handle_rx_cqe+0x43/0xe0 [mlx5_core]
[ 1401.112130]  mlx5e_poll_rx_cq+0xcb/0x940 [mlx5_core]
[ 1401.117094]  mlx5e_napi_poll+0xa6/0xc90 [mlx5_core]
[ 1401.121966]  ? smp_reschedule_interrupt+0x16/0xd0
[ 1401.126789]  ? reschedule_interrupt+0xf/0x20
[ 1401.131057]  ? reschedule_interrupt+0xa/0x20
[ 1401.135321]  net_rx_action+0x279/0x3d0
[ 1401.139071]  __do_softirq+0xf2/0x28e
[ 1401.142651]  irq_exit+0xb6/0xc0
[ 1401.145792]  do_IRQ+0x52/0xd0
[ 1401.148785]  common_interrupt+0xf/0xf
[ 1401.152445]  </IRQ>
[ 1401.154559] RIP: 0010:mlx5e_open_channels+0x65e/0x1390 [mlx5_core]
[ 1401.160734] Code: 8b 00 48 05 a8 00 00 00 48 89 85 78 3c 00 00 48 8b 83 f8 8d 01 00 48 89 85 80 3c 00 00 48 8b 83 f0 8d 01 00 8b 80 a8 fb 03 00 <0f> c8 89 85 88 3c 00 00 41 0f b6 45 16 88 85 8c 3c 00 00 49 83 bd
[ 1401.179463] RSP: 0018:ffffa7628dd43808 EFLAGS: 00000282 ORIG_RAX: ffffffffffffffd4
[ 1401.187024] RAX: 0000000000080000 RBX: ffff9020845808c0 RCX: 0000000000000000
[ 1401.194325] RDX: ffffa7628dd43894 RSI: 0000000000000000 RDI: ffff901f8a0e0000
[ 1401.201463] RBP: ffff901f8a0d8000 R08: ffffe1799d283800 R09: 0000000000000008
[ 1401.208582] R10: 0000000000000000 R11: 0000000000000002 R12: 0000000000000000
[ 1401.215702] R13: ffff902084583940 R14: 0000000000000000 R15: 0000000000000000
[ 1401.222834]  ? mlx5e_open_channels+0x5e1/0x1390 [mlx5_core]
[ 1401.228404]  ? rcu_exp_wait_wake+0x550/0x550
[ 1401.232674]  ? free_one_page+0x68/0x370
[ 1401.236519]  mlx5e_open_locked+0x28/0xa0 [mlx5_core]
[ 1401.241491]  mlx5e_xdp+0x2b2/0x300 [mlx5_core]
[ 1401.245936]  dev_xdp_install+0x4c/0x70
[ 1401.249686]  do_setlink+0xcdb/0xd10
[ 1401.253300]  ? flat_send_IPI_allbutself+0x6c/0xa0
[ 1401.258003]  ? __update_load_avg_se+0x20c/0x290
[ 1401.262530]  rtnl_setlink+0x104/0x140
[ 1401.266189]  rtnetlink_rcv_msg+0x269/0x310
[ 1401.270283]  ? _cond_resched+0x16/0x40
[ 1401.274029]  ? __kmalloc_node_track_caller+0x1dd/0x2a0
[ 1401.279162]  ? rtnl_calcit.isra.32+0x110/0x110
[ 1401.283601]  netlink_rcv_skb+0xdb/0x110
[ 1401.287437]  netlink_unicast+0x18b/0x250
[ 1401.291359]  netlink_sendmsg+0x2c7/0x3b0
[ 1401.295287]  sock_sendmsg+0x30/0x40
[ 1401.298776]  __sys_sendto+0xd8/0x150
[ 1401.302351]  ? __sys_getsockname+0xac/0xc0
[ 1401.306448]  ? netlink_setsockopt+0x2e/0x2b0
[ 1401.310718]  ? __sys_setsockopt+0x7c/0xe0
[ 1401.314867]  __x64_sys_sendto+0x24/0x30
[ 1401.318709]  do_syscall_64+0x4f/0x100
[ 1401.322372]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1401.327420] RIP: 0033:0x7f460f3a83dd
[ 1401.330997] Code: 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 8b 05 7a 13 2c 00 85 c0 75 3e 45 31 c9 45 31 c0 4c 63 d1 48 63 ff b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 0b c3 66 2e 0f 1f 84 00 00 00 00 00 48 8b 15
[ 1401.349733] RSP: 002b:00007ffd28d23138 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 1401.357293] RAX: ffffffffffffffda RBX: ffffffffffffff90 RCX: 00007f460f3a83dd
[ 1401.364413] RDX: 000000000000002c RSI: 00007ffd28d23170 RDI: 0000000000000003
[ 1401.371533] RBP: 00007ffd28d231e0 R08: 0000000000000000 R09: 0000000000000000
[ 1401.378767] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000006
[ 1401.385895] R13: 00007ffd28d237f0 R14: 00007ffd28d23830 R15: 00007ffd28d2388c
[ 1401.393016] Modules linked in: rpcrdma ib_umad sunrpc ib_ipoib rdma_ucm mlx5_ib binfmt_misc ib_uverbs snd_hda_codec_hdmi intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp mlx5_core kvm_intel snd_hda_codec_realtek snd_hda_codec_generic kvm snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm e1000e uas irqbypass snd_timer crct10dif_pclmul snd mei_me usb_storage crc32_pclmul ghash_clmulni_intel wmi_bmof mei lpc_ich soundcore mlxfw pata_acpi mac_hid ib_iser rdma_cm iw_cm ib_cm ib_core configfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 raid10 raid456 libcrc32c async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq raid1 raid0 multipath linear nouveau video i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops 
 drm mxm_wmi
[ 1401.463638]  aesni_intel aes_x86_64 crypto_simd cryptd glue_helper ahci libahci wmi
[ 1401.471289] CR2: 0000000000003fa8
[ 1401.474617] ---[ end trace 1a0d8962c7db30ed ]---
[ 1401.528487] RIP: 0010:mlx5e_xdp_xmit+0x7b/0x2a0 [mlx5_core]
[ 1401.534058] Code: 8b 0d 29 d9 4f 3f 39 8f 48 39 00 00 b8 fa ff ff ff 0f 86 45 01 00 00 48 8b 87 40 39 00 00 48 63 c9 4c 8b 24 c8 b8 9c ff ff ff <49> 8b 8c 24 a8 3f 00 00 4d 8d bc 24 c0 3c 00 00 83 e1 01 0f 84 19
[ 1401.552789] RSP: 0018:ffff90209fb43bb0 EFLAGS: 00010202
[ 1401.558012] RAX: 00000000ffffff9c RBX: 0000000000000000 RCX: 0000000000000005
[ 1401.565132] RDX: ffffc7627fd75190 RSI: 0000000000000010 RDI: ffff902084580000
[ 1401.572252] RBP: ffffc7627fd75190 R08: ffff901f9821c100 R09: ffffc7627fd75210
[ 1401.579371] R10: 00000000000005dc R11: 0000000000000000 R12: 0000000000000000
[ 1401.586493] R13: ffff902081580000 R14: 0000000000000001 R15: ffffc7627fd75190
[ 1401.593726] FS:  00007f460fa96700(0000) GS:ffff90209fb40000(0000) knlGS:0000000000000000
[ 1401.601797] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1401.607533] CR2: 0000000000003fa8 CR3: 000000076c3b6006 CR4: 00000000003606e0
[ 1401.614653] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1401.621772] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1401.628895] Kernel panic - not syncing: Fatal exception in interrupt
[ 1401.635280] Kernel Offset: 0x5000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 1401.694263] Rebooting in 5 seconds..

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Kernel oops with mlx5 and dual XDP redirect programs
  2018-10-03  9:30 Kernel oops with mlx5 and dual XDP redirect programs Toke Høiland-Jørgensen
@ 2018-10-03 23:44 ` Saeed Mahameed
  2018-10-04 12:03   ` Toke Høiland-Jørgensen
  2018-10-18 21:53   ` Toke Høiland-Jørgensen
  0 siblings, 2 replies; 8+ messages in thread
From: Saeed Mahameed @ 2018-10-03 23:44 UTC (permalink / raw)
  To: toke, netdev; +Cc: Eran Ben Elisha, Tariq Toukan, brouer

On Wed, 2018-10-03 at 11:30 +0200, Toke Høiland-Jørgensen wrote:
> Hi Saeed
> 
> I can reliably oops the kernel with the mlx5 driver, by installing
> XDP_REDIRECT programs on two devices so they redirect to each other,
> and then remove them while there is traffic on the interface.
> 
> Steps to reproduce:
> 
> # cd ~/build/linux/samples/bpf
> # ./xdp_redirect_map $(</sys/class/net/ens1f1/ifindex)
> $(</sys/class/net/ens1f0/ifindex)
> # ./xdp_redirect_map $(</sys/class/net/ens1f0/ifindex)
> $(</sys/class/net/ens1f1/ifindex)
> 
> Now, run some traffic (e.g., using pktgen) across the interfaces, and
> while the traffic is running, interrupt one of the xdp_redirect_map
> commands (thus unloading the eBPF program). This results in a kernel
> oops with the backtrace below. I get no crash if there's only a
> single
> XDP program.

Hi Toke,

What looks like happening is that while the traffic is being redirected
to the other device, the driver is trying to unload the program and
restarting the rings from below call trace we can see:

[ 1400.972318] RIP: 0010:mlx5e_xdp_xmit+0x7b/0x2a0 [mlx5_core]
[ 1401.077409]  bq_xmit_all+0x5e/0x160
[ 1401.080897]  dev_map_enqueue+0x12e/0x140
[ 1401.084823]  xdp_do_redirect+0x1a9/0x2a0
[ 1401.088756]  mlx5e_xdp_handle+0x24f/0x2b0 [mlx5_core]

and
[ 1401.154559] RIP: 0010:mlx5e_open_channels+0x65e/0x1390 [mlx5_core]
[ 1401.222834]  ? mlx5e_open_channels+0x5e1/0x1390 [mlx5_core]
[ 1401.228404]  ? rcu_exp_wait_wake+0x550/0x550
[ 1401.232674]  ? free_one_page+0x68/0x370
[ 1401.236519]  mlx5e_open_locked+0x28/0xa0 [mlx5_core]
[ 1401.241491]  mlx5e_xdp+0x2b2/0x300 [mlx5_core]
[ 1401.245936]  dev_xdp_install+0x4c/0x70
[ 1401.249686]  do_setlink+0xcdb/0xd10

I think that the mlx5 driver doesn't know how to tell the other device
to stop transmitting to it while it is resetting.. Maybe tariq or
Jesper know more about this ?
I will look at this tomorrow after noon and will try to repro...

what is interesting is that @ mlx5e_open_channels  stage all previous
TX queues must be still active and not destroyed only later on when we
switch to the new channels we stop and destroy older TX/RX queues, the
question is how much this call trace is reliable ?

Thanks for the report.

> 
> Is this something you could look into, please? :)

> 
> -Toke
> 
> 
> [ 1400.937870] BUG: unable to handle kernel paging request at
> 0000000000003fa8
> [ 1400.944826] PGD 800000072cc7b067 P4D 800000072cc7b067 PUD
> 72cc7a067 PMD 0 
> [ 1400.951693] Oops: 0000 [#1] SMP PTI
> [ 1400.955184] CPU: 5 PID: 10392 Comm: xdp_redirect_ma Not tainted
> 4.19.0-rc5-xdptest-g5be3ebf+ #17
> [ 1400.965344] Hardware name: LENOVO 30B3005DMT/102F, BIOS S00KT56A
> 01/15/2018
> [ 1400.972318] RIP: 0010:mlx5e_xdp_xmit+0x7b/0x2a0 [mlx5_core]
> [ 1400.977889] Code: 8b 0d 29 d9 4f 3f 39 8f 48 39 00 00 b8 fa ff ff
> ff 0f 86 45 01 00 00 48 8b 87 40 39 00 00 48 63 c9 4c 8b 24 c8 b8 9c
> ff ff ff <49> 8b 8c 24 a8 3f 00 00 4d 8d bc 24 c0 3c 00 00 83 e1 01
> 0f 84 19
> [ 1400.996624] RSP: 0018:ffff90209fb43bb0 EFLAGS: 00010202
> [ 1401.002001] RAX: 00000000ffffff9c RBX: 0000000000000000 RCX:
> 0000000000000005
> [ 1401.009122] RDX: ffffc7627fd75190 RSI: 0000000000000010 RDI:
> ffff902084580000
> [ 1401.016250] RBP: ffffc7627fd75190 R08: ffff901f9821c100 R09:
> ffffc7627fd75210
> [ 1401.023379] R10: 00000000000005dc R11: 0000000000000000 R12:
> 0000000000000000
> [ 1401.030500] R13: ffff902081580000 R14: 0000000000000001 R15:
> ffffc7627fd75190
> [ 1401.037645] FS:  00007f460fa96700(0000) GS:ffff90209fb40000(0000)
> knlGS:0000000000000000
> [ 1401.045718] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1401.051452] CR2: 0000000000003fa8 CR3: 000000076c3b6006 CR4:
> 00000000003606e0
> [ 1401.058573] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [ 1401.065823] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> 0000000000000400
> [ 1401.072943] Call Trace:
> [ 1401.075390]  <IRQ>
> [ 1401.077409]  bq_xmit_all+0x5e/0x160
> [ 1401.080897]  dev_map_enqueue+0x12e/0x140
> [ 1401.084823]  xdp_do_redirect+0x1a9/0x2a0
> [ 1401.088756]  mlx5e_xdp_handle+0x24f/0x2b0 [mlx5_core]
> [ 1401.093821]  ? resched_cpu+0x5f/0x70
> [ 1401.097399]  ? __xdp_return+0x189/0x400
> [ 1401.101242]  mlx5e_skb_from_cqe_linear+0xdd/0x180 [mlx5_core]
> [ 1401.106987]  mlx5e_handle_rx_cqe+0x43/0xe0 [mlx5_core]
> [ 1401.112130]  mlx5e_poll_rx_cq+0xcb/0x940 [mlx5_core]
> [ 1401.117094]  mlx5e_napi_poll+0xa6/0xc90 [mlx5_core]
> [ 1401.121966]  ? smp_reschedule_interrupt+0x16/0xd0
> [ 1401.126789]  ? reschedule_interrupt+0xf/0x20
> [ 1401.131057]  ? reschedule_interrupt+0xa/0x20
> [ 1401.135321]  net_rx_action+0x279/0x3d0
> [ 1401.139071]  __do_softirq+0xf2/0x28e
> [ 1401.142651]  irq_exit+0xb6/0xc0
> [ 1401.145792]  do_IRQ+0x52/0xd0
> [ 1401.148785]  common_interrupt+0xf/0xf
> [ 1401.152445]  </IRQ>
> [ 1401.154559] RIP: 0010:mlx5e_open_channels+0x65e/0x1390 [mlx5_core]
> [ 1401.160734] Code: 8b 00 48 05 a8 00 00 00 48 89 85 78 3c 00 00 48
> 8b 83 f8 8d 01 00 48 89 85 80 3c 00 00 48 8b 83 f0 8d 01 00 8b 80 a8
> fb 03 00 <0f> c8 89 85 88 3c 00 00 41 0f b6 45 16 88 85 8c 3c 00 00
> 49 83 bd
> [ 1401.179463] RSP: 0018:ffffa7628dd43808 EFLAGS: 00000282 ORIG_RAX:
> ffffffffffffffd4
> [ 1401.187024] RAX: 0000000000080000 RBX: ffff9020845808c0 RCX:
> 0000000000000000
> [ 1401.194325] RDX: ffffa7628dd43894 RSI: 0000000000000000 RDI:
> ffff901f8a0e0000
> [ 1401.201463] RBP: ffff901f8a0d8000 R08: ffffe1799d283800 R09:
> 0000000000000008
> [ 1401.208582] R10: 0000000000000000 R11: 0000000000000002 R12:
> 0000000000000000
> [ 1401.215702] R13: ffff902084583940 R14: 0000000000000000 R15:
> 0000000000000000
> [ 1401.222834]  ? mlx5e_open_channels+0x5e1/0x1390 [mlx5_core]
> [ 1401.228404]  ? rcu_exp_wait_wake+0x550/0x550
> [ 1401.232674]  ? free_one_page+0x68/0x370
> [ 1401.236519]  mlx5e_open_locked+0x28/0xa0 [mlx5_core]
> [ 1401.241491]  mlx5e_xdp+0x2b2/0x300 [mlx5_core]
> [ 1401.245936]  dev_xdp_install+0x4c/0x70
> [ 1401.249686]  do_setlink+0xcdb/0xd10
> [ 1401.253300]  ? flat_send_IPI_allbutself+0x6c/0xa0
> [ 1401.258003]  ? __update_load_avg_se+0x20c/0x290
> [ 1401.262530]  rtnl_setlink+0x104/0x140
> [ 1401.266189]  rtnetlink_rcv_msg+0x269/0x310
> [ 1401.270283]  ? _cond_resched+0x16/0x40
> [ 1401.274029]  ? __kmalloc_node_track_caller+0x1dd/0x2a0
> [ 1401.279162]  ? rtnl_calcit.isra.32+0x110/0x110
> [ 1401.283601]  netlink_rcv_skb+0xdb/0x110
> [ 1401.287437]  netlink_unicast+0x18b/0x250
> [ 1401.291359]  netlink_sendmsg+0x2c7/0x3b0
> [ 1401.295287]  sock_sendmsg+0x30/0x40
> [ 1401.298776]  __sys_sendto+0xd8/0x150
> [ 1401.302351]  ? __sys_getsockname+0xac/0xc0
> [ 1401.306448]  ? netlink_setsockopt+0x2e/0x2b0
> [ 1401.310718]  ? __sys_setsockopt+0x7c/0xe0
> [ 1401.314867]  __x64_sys_sendto+0x24/0x30
> [ 1401.318709]  do_syscall_64+0x4f/0x100
> [ 1401.322372]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [ 1401.327420] RIP: 0033:0x7f460f3a83dd
> [ 1401.330997] Code: 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 8b 05 7a
> 13 2c 00 85 c0 75 3e 45 31 c9 45 31 c0 4c 63 d1 48 63 ff b8 2c 00 00
> 00 0f 05 <48> 3d 00 f0 ff ff 77 0b c3 66 2e 0f 1f 84 00 00 00 00 00
> 48 8b 15
> [ 1401.349733] RSP: 002b:00007ffd28d23138 EFLAGS: 00000246 ORIG_RAX:
> 000000000000002c
> [ 1401.357293] RAX: ffffffffffffffda RBX: ffffffffffffff90 RCX:
> 00007f460f3a83dd
> [ 1401.364413] RDX: 000000000000002c RSI: 00007ffd28d23170 RDI:
> 0000000000000003
> [ 1401.371533] RBP: 00007ffd28d231e0 R08: 0000000000000000 R09:
> 0000000000000000
> [ 1401.378767] R10: 0000000000000000 R11: 0000000000000246 R12:
> 0000000000000006
> [ 1401.385895] R13: 00007ffd28d237f0 R14: 00007ffd28d23830 R15:
> 00007ffd28d2388c
> [ 1401.393016] Modules linked in: rpcrdma ib_umad sunrpc ib_ipoib
> rdma_ucm mlx5_ib binfmt_misc ib_uverbs snd_hda_codec_hdmi intel_rapl
> sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp mlx5_core
> kvm_intel snd_hda_codec_realtek snd_hda_codec_generic kvm
> snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm e1000e uas
> irqbypass snd_timer crct10dif_pclmul snd mei_me usb_storage
> crc32_pclmul ghash_clmulni_intel wmi_bmof mei lpc_ich soundcore mlxfw
> pata_acpi mac_hid ib_iser rdma_cm iw_cm ib_cm ib_core configfs
> iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 raid10
> raid456 libcrc32c async_raid6_recov async_memcpy async_pq async_xor
> xor async_tx raid6_pq raid1 raid0 multipath linear nouveau video
> i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysimgblt
> fb_sys_fops drm mxm_wmi
> [ 1401.463638]  aesni_intel aes_x86_64 crypto_simd cryptd glue_helper
> ahci libahci wmi
> [ 1401.471289] CR2: 0000000000003fa8
> [ 1401.474617] ---[ end trace 1a0d8962c7db30ed ]---
> [ 1401.528487] RIP: 0010:mlx5e_xdp_xmit+0x7b/0x2a0 [mlx5_core]
> [ 1401.534058] Code: 8b 0d 29 d9 4f 3f 39 8f 48 39 00 00 b8 fa ff ff
> ff 0f 86 45 01 00 00 48 8b 87 40 39 00 00 48 63 c9 4c 8b 24 c8 b8 9c
> ff ff ff <49> 8b 8c 24 a8 3f 00 00 4d 8d bc 24 c0 3c 00 00 83 e1 01
> 0f 84 19
> [ 1401.552789] RSP: 0018:ffff90209fb43bb0 EFLAGS: 00010202
> [ 1401.558012] RAX: 00000000ffffff9c RBX: 0000000000000000 RCX:
> 0000000000000005
> [ 1401.565132] RDX: ffffc7627fd75190 RSI: 0000000000000010 RDI:
> ffff902084580000
> [ 1401.572252] RBP: ffffc7627fd75190 R08: ffff901f9821c100 R09:
> ffffc7627fd75210
> [ 1401.579371] R10: 00000000000005dc R11: 0000000000000000 R12:
> 0000000000000000
> [ 1401.586493] R13: ffff902081580000 R14: 0000000000000001 R15:
> ffffc7627fd75190
> [ 1401.593726] FS:  00007f460fa96700(0000) GS:ffff90209fb40000(0000)
> knlGS:0000000000000000
> [ 1401.601797] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1401.607533] CR2: 0000000000003fa8 CR3: 000000076c3b6006 CR4:
> 00000000003606e0
> [ 1401.614653] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [ 1401.621772] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> 0000000000000400
> [ 1401.628895] Kernel panic - not syncing: Fatal exception in
> interrupt
> [ 1401.635280] Kernel Offset: 0x5000000 from 0xffffffff81000000
> (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> [ 1401.694263] Rebooting in 5 seconds..

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Kernel oops with mlx5 and dual XDP redirect programs
  2018-10-03 23:44 ` Saeed Mahameed
@ 2018-10-04 12:03   ` Toke Høiland-Jørgensen
  2018-10-18 21:53   ` Toke Høiland-Jørgensen
  1 sibling, 0 replies; 8+ messages in thread
From: Toke Høiland-Jørgensen @ 2018-10-04 12:03 UTC (permalink / raw)
  To: Saeed Mahameed, netdev; +Cc: Eran Ben Elisha, Tariq Toukan, brouer

Saeed Mahameed <saeedm@mellanox.com> writes:

> On Wed, 2018-10-03 at 11:30 +0200, Toke Høiland-Jørgensen wrote:
>> Hi Saeed
>> 
>> I can reliably oops the kernel with the mlx5 driver, by installing
>> XDP_REDIRECT programs on two devices so they redirect to each other,
>> and then remove them while there is traffic on the interface.
>> 
>> Steps to reproduce:
>> 
>> # cd ~/build/linux/samples/bpf
>> # ./xdp_redirect_map $(</sys/class/net/ens1f1/ifindex)
>> $(</sys/class/net/ens1f0/ifindex)
>> # ./xdp_redirect_map $(</sys/class/net/ens1f0/ifindex)
>> $(</sys/class/net/ens1f1/ifindex)
>> 
>> Now, run some traffic (e.g., using pktgen) across the interfaces, and
>> while the traffic is running, interrupt one of the xdp_redirect_map
>> commands (thus unloading the eBPF program). This results in a kernel
>> oops with the backtrace below. I get no crash if there's only a
>> single
>> XDP program.
>
> Hi Toke,
>
> What looks like happening is that while the traffic is being redirected
> to the other device, the driver is trying to unload the program and
> restarting the rings from below call trace we can see:

Yeah, thought it was something like that, since it only happens on the
bidirectional redirect...

> I think that the mlx5 driver doesn't know how to tell the other device
> to stop transmitting to it while it is resetting.. Maybe tariq or
> Jesper know more about this ?
> I will look at this tomorrow after noon and will try to repro...

Great, thanks! :)

-Toke

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Kernel oops with mlx5 and dual XDP redirect programs
  2018-10-03 23:44 ` Saeed Mahameed
  2018-10-04 12:03   ` Toke Høiland-Jørgensen
@ 2018-10-18 21:53   ` Toke Høiland-Jørgensen
  2018-10-22 17:57     ` Saeed Mahameed
  1 sibling, 1 reply; 8+ messages in thread
From: Toke Høiland-Jørgensen @ 2018-10-18 21:53 UTC (permalink / raw)
  To: Saeed Mahameed, netdev; +Cc: Eran Ben Elisha, Tariq Toukan, brouer

Saeed Mahameed <saeedm@mellanox.com> writes:

> I think that the mlx5 driver doesn't know how to tell the other device
> to stop transmitting to it while it is resetting.. Maybe tariq or
> Jesper know more about this ?
> I will look at this tomorrow after noon and will try to repro...

Hi Saeed

Did you have a chance to poke at this? :)

-Toke

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Kernel oops with mlx5 and dual XDP redirect programs
  2018-10-18 21:53   ` Toke Høiland-Jørgensen
@ 2018-10-22 17:57     ` Saeed Mahameed
  2018-10-23 10:10       ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 8+ messages in thread
From: Saeed Mahameed @ 2018-10-22 17:57 UTC (permalink / raw)
  To: toke, netdev; +Cc: Eran Ben Elisha, Tariq Toukan, brouer

[-- Attachment #1: Type: text/plain, Size: 5748 bytes --]

On Thu, 2018-10-18 at 23:53 +0200, Toke Høiland-Jørgensen wrote:
> Saeed Mahameed <saeedm@mellanox.com> writes:
> 
> > I think that the mlx5 driver doesn't know how to tell the other
> > device
> > to stop transmitting to it while it is resetting.. Maybe tariq or
> > Jesper know more about this ?
> > I will look at this tomorrow after noon and will try to repro...
> 
> Hi Saeed
> 
> Did you have a chance to poke at this? :)

HI Toke, yes i have been planing to respond but also i wanted to dig
more,

so the root cause is very clear.

1. core 1 is doing tx_dev->ndo_xdp_xmit()
2. core 2 is doing tx_dev->xdp_set() //remove xdp program.


in mlx5 you must have xdp porgram on tx_dev in order to be able to use
dev_map and ndo_xdp_xmit, due to the simple reason that we create
unique TX resources (Send Queues/SQs) for xdp redirect/tx use case.

so if you are removing xdp program on core2, driver will start
destroying xdp redirect SQs, this safe for xdp rx and fwd since we use
napi_synchronize. But for xdp redirect, we don't have the means to
synchronize with a different napi device and yet on a different cpu !

so if core 1 got past the below check in mlx5/core/en/xdp.c @
mlx5e_xdp_xmit

if (unlikely(!test_bit(MLX5E_SQ_STATE_ENABLED, &sq->state)))
		return -ENETDOWN;

and at the same moment core 2 destroyed that SQ: 
@mlx5e_xdp_set -> mlx5e_close_locked

This SQ is no longer available and core 1 will be writing descriptors
on a freed dma buffer.

and the problem is beyond mlx5, since we don't have a way to tell a
different core/different netdev to stop xmitting, or at least
synchronize with it.

Assuming napi is polled under rcu read lock, then synchronize_net might
help ass suggested in the attached temporary fix.

the idea is to set a flag for mlx5e_xdp_tx ndo to check if tx resources
are valid, and to set it on and of on xdp_set ndo with synchronize_net
to synchronize with ongoing xdp redirection.

I am still not sure if napi is polled under rcu read lock, if it's not
and synchronize_net() didn't help then replace it with msleep(200),
should be enough for now.

I've managed to reproduce and verify the fix with even one direction of
xdp_redirect and by just removing the xdp program on the tx side  while
xdp redirection was ongoing.

#run xdp redirect
RX_IF=p6p1
TX_IF=p5p1
./samples/bpf/xdp_redirect_map  $(</sys/class/net/$RX_IF/ifindex)
$(</sys/class/net/$TX_IF/ifindex)

sleep 2
#remove/reset the xdp program on the TX interface
./samples/bpf/xdp_fwd -d $TX_IF


I have created an internal bug record to track and pick the right fix
for this ASAP.

I will be waiting for your confirmation that the fix did work.

Thanks,
Saeed

For inline review the attached patch:

From 3cdf9b43ffc0ad63ba60e7f36d429377c8207e5a Mon Sep 17 00:00:00 2001
From: Saeed Mahameed <saeedm@mellanox.com>
Date: Fri, 19 Oct 2018 14:59:00 -0700
Subject: [PATCH] net/mlx5e: XDP redirect bug fix

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h      |  1 +
 drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c  |  3 +++
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 15 +++++++++++++--
 3 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index aea74856c702..4228aafc0165 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -620,6 +620,7 @@ struct mlx5e_channels {
 	struct mlx5e_channel **c;
 	unsigned int           num;
 	struct mlx5e_params    params;
+	bool xdp_disabled;
 };
 
 struct mlx5e_channel_stats {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
index ad6d471d00dd..c338b7b8f838 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
@@ -268,6 +268,9 @@ int mlx5e_xdp_xmit(struct net_device *dev, int n,
struct xdp_frame **frames,
 	if (unlikely(flags & ~XDP_XMIT_FLAGS_MASK))
 		return -EINVAL;
 
+	if (unlikely(READ_ONCE(priv->channels.xdp_disabled)))
+		return -ENETDOWN;
+
 	sq_num = smp_processor_id();
 
 	if (unlikely(sq_num >= priv->channels.num))
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 0d495a6b3949..a2d8a52ae469 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -4237,8 +4237,17 @@ static int mlx5e_xdp_set(struct net_device
*netdev, struct bpf_prog *prog)
 	/* no need for full reset when exchanging programs */
 	reset = (!priv->channels.params.xdp_prog || !prog);
 
-	if (was_opened && reset)
+	if (was_opened && reset) {
+		for (i = 0; i < priv->channels.num; i++)
+			clear_bit(MLX5E_SQ_STATE_ENABLED, &priv-
>channels.c[i]->xdpsq.state);
+		priv->channels.xdp_disabled = true;
+
+		synchronize_net();
+		//msleep(200);
+
 		mlx5e_close_locked(netdev);
+	}
+
 	if (was_opened && !reset) {
 		/* num_channels is invariant here, so we can take the
 		 * batched reference right upfront.
@@ -4260,8 +4269,10 @@ static int mlx5e_xdp_set(struct net_device
*netdev, struct bpf_prog *prog)
 	if (reset) /* change RQ type according to priv->xdp_prog */
 		mlx5e_set_rq_type(priv->mdev, &priv->channels.params);
 
-	if (was_opened && reset)
+	if (was_opened && reset) {
 		mlx5e_open_locked(netdev);
+		priv->channels.xdp_disabled = false;
+	}
 
 	if (!test_bit(MLX5E_STATE_OPENED, &priv->state) || reset)
 		goto unlock;
-- 
2.17.2


> 
> -Toke

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-net-mlx5e-XDP-redirect-bug-fix.patch --]
[-- Type: text/x-patch; name="0001-net-mlx5e-XDP-redirect-bug-fix.patch", Size: 2861 bytes --]

From 3cdf9b43ffc0ad63ba60e7f36d429377c8207e5a Mon Sep 17 00:00:00 2001
From: Saeed Mahameed <saeedm@mellanox.com>
Date: Fri, 19 Oct 2018 14:59:00 -0700
Subject: [PATCH] net/mlx5e: XDP redirect bug fix

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h      |  1 +
 drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c  |  3 +++
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 15 +++++++++++++--
 3 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index aea74856c702..4228aafc0165 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -620,6 +620,7 @@ struct mlx5e_channels {
 	struct mlx5e_channel **c;
 	unsigned int           num;
 	struct mlx5e_params    params;
+	bool xdp_disabled;
 };
 
 struct mlx5e_channel_stats {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
index ad6d471d00dd..c338b7b8f838 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
@@ -268,6 +268,9 @@ int mlx5e_xdp_xmit(struct net_device *dev, int n, struct xdp_frame **frames,
 	if (unlikely(flags & ~XDP_XMIT_FLAGS_MASK))
 		return -EINVAL;
 
+	if (unlikely(READ_ONCE(priv->channels.xdp_disabled)))
+		return -ENETDOWN;
+
 	sq_num = smp_processor_id();
 
 	if (unlikely(sq_num >= priv->channels.num))
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 0d495a6b3949..a2d8a52ae469 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -4237,8 +4237,17 @@ static int mlx5e_xdp_set(struct net_device *netdev, struct bpf_prog *prog)
 	/* no need for full reset when exchanging programs */
 	reset = (!priv->channels.params.xdp_prog || !prog);
 
-	if (was_opened && reset)
+	if (was_opened && reset) {
+		for (i = 0; i < priv->channels.num; i++)
+			clear_bit(MLX5E_SQ_STATE_ENABLED, &priv->channels.c[i]->xdpsq.state);
+		priv->channels.xdp_disabled = true;
+
+		synchronize_net();
+		//msleep(200);
+
 		mlx5e_close_locked(netdev);
+	}
+
 	if (was_opened && !reset) {
 		/* num_channels is invariant here, so we can take the
 		 * batched reference right upfront.
@@ -4260,8 +4269,10 @@ static int mlx5e_xdp_set(struct net_device *netdev, struct bpf_prog *prog)
 	if (reset) /* change RQ type according to priv->xdp_prog */
 		mlx5e_set_rq_type(priv->mdev, &priv->channels.params);
 
-	if (was_opened && reset)
+	if (was_opened && reset) {
 		mlx5e_open_locked(netdev);
+		priv->channels.xdp_disabled = false;
+	}
 
 	if (!test_bit(MLX5E_STATE_OPENED, &priv->state) || reset)
 		goto unlock;
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: Kernel oops with mlx5 and dual XDP redirect programs
  2018-10-22 17:57     ` Saeed Mahameed
@ 2018-10-23 10:10       ` Toke Høiland-Jørgensen
  2018-10-23 18:01         ` Saeed Mahameed
  0 siblings, 1 reply; 8+ messages in thread
From: Toke Høiland-Jørgensen @ 2018-10-23 10:10 UTC (permalink / raw)
  To: Saeed Mahameed, netdev; +Cc: Eran Ben Elisha, Tariq Toukan, brouer

Saeed Mahameed <saeedm@mellanox.com> writes:

> On Thu, 2018-10-18 at 23:53 +0200, Toke Høiland-Jørgensen wrote:
>> Saeed Mahameed <saeedm@mellanox.com> writes:
>> 
>> > I think that the mlx5 driver doesn't know how to tell the other
>> > device
>> > to stop transmitting to it while it is resetting.. Maybe tariq or
>> > Jesper know more about this ?
>> > I will look at this tomorrow after noon and will try to repro...
>> 
>> Hi Saeed
>> 
>> Did you have a chance to poke at this? :)
>
> HI Toke, yes i have been planing to respond but also i wanted to dig
> more,
>
> so the root cause is very clear.
>
> 1. core 1 is doing tx_dev->ndo_xdp_xmit()
> 2. core 2 is doing tx_dev->xdp_set() //remove xdp program.

Right, it was also my guess that it was related to this interaction.
Thanks for looking into it!

> and the problem is beyond mlx5, since we don't have a way to tell a
> different core/different netdev to stop xmitting, or at least
> synchronize with it.

Hmm, ideally there should be some way for the higher level XDP API to
notice this and abort the call before it even reaches the driver on the
TX side, shouldn't there? At LPC, Jesper and I will be talking about a
proposal for decoupling the ndo_xdp_xmit() resource allocation from
loading and unloading XDP programs, which I guess could be a way to deal
with this as well.

In the meantime...

> I will be waiting for your confirmation that the fix did work.

I tested your patch, and it does indeed fix the crash. However, it also
seems to have the effect that the XDP redirect continues to function
even after removing the XDP program on the target device.

I.e., after the call to ./xdp_fwd -d $TX_IF, I still see packets being
redirected out $TX_IF. Is this intentional?

-Toke

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Kernel oops with mlx5 and dual XDP redirect programs
  2018-10-23 10:10       ` Toke Høiland-Jørgensen
@ 2018-10-23 18:01         ` Saeed Mahameed
  2018-10-23 20:29           ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 8+ messages in thread
From: Saeed Mahameed @ 2018-10-23 18:01 UTC (permalink / raw)
  To: toke, netdev; +Cc: Eran Ben Elisha, Tariq Toukan, brouer

On Tue, 2018-10-23 at 12:10 +0200, Toke Høiland-Jørgensen wrote:
> Saeed Mahameed <saeedm@mellanox.com> writes:
> 
> > On Thu, 2018-10-18 at 23:53 +0200, Toke Høiland-Jørgensen wrote:
> > > Saeed Mahameed <saeedm@mellanox.com> writes:
> > > 
> > > > I think that the mlx5 driver doesn't know how to tell the other
> > > > device
> > > > to stop transmitting to it while it is resetting.. Maybe tariq
> > > > or
> > > > Jesper know more about this ?
> > > > I will look at this tomorrow after noon and will try to
> > > > repro...
> > > 
> > > Hi Saeed
> > > 
> > > Did you have a chance to poke at this? :)
> > 
> > HI Toke, yes i have been planing to respond but also i wanted to
> > dig
> > more,
> > 
> > so the root cause is very clear.
> > 
> > 1. core 1 is doing tx_dev->ndo_xdp_xmit()
> > 2. core 2 is doing tx_dev->xdp_set() //remove xdp program.
> 
> Right, it was also my guess that it was related to this interaction.
> Thanks for looking into it!
> 
> > and the problem is beyond mlx5, since we don't have a way to tell a
> > different core/different netdev to stop xmitting, or at least
> > synchronize with it.
> 
> Hmm, ideally there should be some way for the higher level XDP API to
> notice this and abort the call before it even reaches the driver on
> the
> TX side, shouldn't there? At LPC, Jesper and I will be talking about
> a
> proposal for decoupling the ndo_xdp_xmit() resource allocation from
> loading and unloading XDP programs, which I guess could be a way to
> deal
> with this as well.
> 
> In the meantime...
> 

Yes totally agree, this is why my fix is temporary. 
Good Idea about LPC, let's discuss this there.

> > I will be waiting for your confirmation that the fix did work.
> 
> I tested your patch, and it does indeed fix the crash. However, it
> also
> seems to have the effect that the XDP redirect continues to function
> even after removing the XDP program on the target device.
> 
> I.e., after the call to ./xdp_fwd -d $TX_IF, I still see packets
> being
> redirected out $TX_IF. Is this intentional?
> 

Interesting, shouldn't happen, unless there is something weird going on
when running xpd_fwd -d together with xdp_redirect_map, i just checked
the code and if ndo_xdp_set was called with null program we will remove
xdp tx resources, nothing suspicious in the driver.

I will look at this later this week.

> -Toke

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Kernel oops with mlx5 and dual XDP redirect programs
  2018-10-23 18:01         ` Saeed Mahameed
@ 2018-10-23 20:29           ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 8+ messages in thread
From: Toke Høiland-Jørgensen @ 2018-10-23 20:29 UTC (permalink / raw)
  To: Saeed Mahameed, netdev; +Cc: Eran Ben Elisha, Tariq Toukan, brouer

Saeed Mahameed <saeedm@mellanox.com> writes:

> On Tue, 2018-10-23 at 12:10 +0200, Toke Høiland-Jørgensen wrote:
>> Saeed Mahameed <saeedm@mellanox.com> writes:
>> 
>> > On Thu, 2018-10-18 at 23:53 +0200, Toke Høiland-Jørgensen wrote:
>> > > Saeed Mahameed <saeedm@mellanox.com> writes:
>> > > 
>> > > > I think that the mlx5 driver doesn't know how to tell the other
>> > > > device
>> > > > to stop transmitting to it while it is resetting.. Maybe tariq
>> > > > or
>> > > > Jesper know more about this ?
>> > > > I will look at this tomorrow after noon and will try to
>> > > > repro...
>> > > 
>> > > Hi Saeed
>> > > 
>> > > Did you have a chance to poke at this? :)
>> > 
>> > HI Toke, yes i have been planing to respond but also i wanted to
>> > dig
>> > more,
>> > 
>> > so the root cause is very clear.
>> > 
>> > 1. core 1 is doing tx_dev->ndo_xdp_xmit()
>> > 2. core 2 is doing tx_dev->xdp_set() //remove xdp program.
>> 
>> Right, it was also my guess that it was related to this interaction.
>> Thanks for looking into it!
>> 
>> > and the problem is beyond mlx5, since we don't have a way to tell a
>> > different core/different netdev to stop xmitting, or at least
>> > synchronize with it.
>> 
>> Hmm, ideally there should be some way for the higher level XDP API to
>> notice this and abort the call before it even reaches the driver on
>> the
>> TX side, shouldn't there? At LPC, Jesper and I will be talking about
>> a
>> proposal for decoupling the ndo_xdp_xmit() resource allocation from
>> loading and unloading XDP programs, which I guess could be a way to
>> deal
>> with this as well.
>> 
>> In the meantime...
>> 
>
> Yes totally agree, this is why my fix is temporary. 
> Good Idea about LPC, let's discuss this there.
>
>> > I will be waiting for your confirmation that the fix did work.
>> 
>> I tested your patch, and it does indeed fix the crash. However, it
>> also
>> seems to have the effect that the XDP redirect continues to function
>> even after removing the XDP program on the target device.
>> 
>> I.e., after the call to ./xdp_fwd -d $TX_IF, I still see packets
>> being
>> redirected out $TX_IF. Is this intentional?
>> 
>
> Interesting, shouldn't happen, unless there is something weird going on
> when running xpd_fwd -d together with xdp_redirect_map, i just checked
> the code and if ndo_xdp_set was called with null program we will remove
> xdp tx resources, nothing suspicious in the driver.
>
> I will look at this later this week.

Cool. Let me know if you need anything more from me :)

-Toke

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-10-24  4:54 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-03  9:30 Kernel oops with mlx5 and dual XDP redirect programs Toke Høiland-Jørgensen
2018-10-03 23:44 ` Saeed Mahameed
2018-10-04 12:03   ` Toke Høiland-Jørgensen
2018-10-18 21:53   ` Toke Høiland-Jørgensen
2018-10-22 17:57     ` Saeed Mahameed
2018-10-23 10:10       ` Toke Høiland-Jørgensen
2018-10-23 18:01         ` Saeed Mahameed
2018-10-23 20:29           ` Toke Høiland-Jørgensen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).