netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Toke Høiland-Jørgensen" <toke@toke.dk>
To: Saeed Mahameed <saeedm@mellanox.com>,
	"netdev\@vger.kernel.org" <netdev@vger.kernel.org>
Cc: brouer@redhat.com, Tariq Toukan <tariqt@mellanox.com>,
	Eran Ben Elisha <eranbe@mellanox.com>
Subject: Kernel oops with mlx5 and dual XDP redirect programs
Date: Wed, 03 Oct 2018 11:30:51 +0200	[thread overview]
Message-ID: <877eize5ro.fsf@toke.dk> (raw)

Hi Saeed

I can reliably oops the kernel with the mlx5 driver, by installing
XDP_REDIRECT programs on two devices so they redirect to each other,
and then remove them while there is traffic on the interface.

Steps to reproduce:

# cd ~/build/linux/samples/bpf
# ./xdp_redirect_map $(</sys/class/net/ens1f1/ifindex) $(</sys/class/net/ens1f0/ifindex)
# ./xdp_redirect_map $(</sys/class/net/ens1f0/ifindex) $(</sys/class/net/ens1f1/ifindex)

Now, run some traffic (e.g., using pktgen) across the interfaces, and
while the traffic is running, interrupt one of the xdp_redirect_map
commands (thus unloading the eBPF program). This results in a kernel
oops with the backtrace below. I get no crash if there's only a single
XDP program.

Is this something you could look into, please? :)

-Toke


[ 1400.937870] BUG: unable to handle kernel paging request at 0000000000003fa8
[ 1400.944826] PGD 800000072cc7b067 P4D 800000072cc7b067 PUD 72cc7a067 PMD 0 
[ 1400.951693] Oops: 0000 [#1] SMP PTI
[ 1400.955184] CPU: 5 PID: 10392 Comm: xdp_redirect_ma Not tainted 4.19.0-rc5-xdptest-g5be3ebf+ #17
[ 1400.965344] Hardware name: LENOVO 30B3005DMT/102F, BIOS S00KT56A 01/15/2018
[ 1400.972318] RIP: 0010:mlx5e_xdp_xmit+0x7b/0x2a0 [mlx5_core]
[ 1400.977889] Code: 8b 0d 29 d9 4f 3f 39 8f 48 39 00 00 b8 fa ff ff ff 0f 86 45 01 00 00 48 8b 87 40 39 00 00 48 63 c9 4c 8b 24 c8 b8 9c ff ff ff <49> 8b 8c 24 a8 3f 00 00 4d 8d bc 24 c0 3c 00 00 83 e1 01 0f 84 19
[ 1400.996624] RSP: 0018:ffff90209fb43bb0 EFLAGS: 00010202
[ 1401.002001] RAX: 00000000ffffff9c RBX: 0000000000000000 RCX: 0000000000000005
[ 1401.009122] RDX: ffffc7627fd75190 RSI: 0000000000000010 RDI: ffff902084580000
[ 1401.016250] RBP: ffffc7627fd75190 R08: ffff901f9821c100 R09: ffffc7627fd75210
[ 1401.023379] R10: 00000000000005dc R11: 0000000000000000 R12: 0000000000000000
[ 1401.030500] R13: ffff902081580000 R14: 0000000000000001 R15: ffffc7627fd75190
[ 1401.037645] FS:  00007f460fa96700(0000) GS:ffff90209fb40000(0000) knlGS:0000000000000000
[ 1401.045718] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1401.051452] CR2: 0000000000003fa8 CR3: 000000076c3b6006 CR4: 00000000003606e0
[ 1401.058573] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1401.065823] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1401.072943] Call Trace:
[ 1401.075390]  <IRQ>
[ 1401.077409]  bq_xmit_all+0x5e/0x160
[ 1401.080897]  dev_map_enqueue+0x12e/0x140
[ 1401.084823]  xdp_do_redirect+0x1a9/0x2a0
[ 1401.088756]  mlx5e_xdp_handle+0x24f/0x2b0 [mlx5_core]
[ 1401.093821]  ? resched_cpu+0x5f/0x70
[ 1401.097399]  ? __xdp_return+0x189/0x400
[ 1401.101242]  mlx5e_skb_from_cqe_linear+0xdd/0x180 [mlx5_core]
[ 1401.106987]  mlx5e_handle_rx_cqe+0x43/0xe0 [mlx5_core]
[ 1401.112130]  mlx5e_poll_rx_cq+0xcb/0x940 [mlx5_core]
[ 1401.117094]  mlx5e_napi_poll+0xa6/0xc90 [mlx5_core]
[ 1401.121966]  ? smp_reschedule_interrupt+0x16/0xd0
[ 1401.126789]  ? reschedule_interrupt+0xf/0x20
[ 1401.131057]  ? reschedule_interrupt+0xa/0x20
[ 1401.135321]  net_rx_action+0x279/0x3d0
[ 1401.139071]  __do_softirq+0xf2/0x28e
[ 1401.142651]  irq_exit+0xb6/0xc0
[ 1401.145792]  do_IRQ+0x52/0xd0
[ 1401.148785]  common_interrupt+0xf/0xf
[ 1401.152445]  </IRQ>
[ 1401.154559] RIP: 0010:mlx5e_open_channels+0x65e/0x1390 [mlx5_core]
[ 1401.160734] Code: 8b 00 48 05 a8 00 00 00 48 89 85 78 3c 00 00 48 8b 83 f8 8d 01 00 48 89 85 80 3c 00 00 48 8b 83 f0 8d 01 00 8b 80 a8 fb 03 00 <0f> c8 89 85 88 3c 00 00 41 0f b6 45 16 88 85 8c 3c 00 00 49 83 bd
[ 1401.179463] RSP: 0018:ffffa7628dd43808 EFLAGS: 00000282 ORIG_RAX: ffffffffffffffd4
[ 1401.187024] RAX: 0000000000080000 RBX: ffff9020845808c0 RCX: 0000000000000000
[ 1401.194325] RDX: ffffa7628dd43894 RSI: 0000000000000000 RDI: ffff901f8a0e0000
[ 1401.201463] RBP: ffff901f8a0d8000 R08: ffffe1799d283800 R09: 0000000000000008
[ 1401.208582] R10: 0000000000000000 R11: 0000000000000002 R12: 0000000000000000
[ 1401.215702] R13: ffff902084583940 R14: 0000000000000000 R15: 0000000000000000
[ 1401.222834]  ? mlx5e_open_channels+0x5e1/0x1390 [mlx5_core]
[ 1401.228404]  ? rcu_exp_wait_wake+0x550/0x550
[ 1401.232674]  ? free_one_page+0x68/0x370
[ 1401.236519]  mlx5e_open_locked+0x28/0xa0 [mlx5_core]
[ 1401.241491]  mlx5e_xdp+0x2b2/0x300 [mlx5_core]
[ 1401.245936]  dev_xdp_install+0x4c/0x70
[ 1401.249686]  do_setlink+0xcdb/0xd10
[ 1401.253300]  ? flat_send_IPI_allbutself+0x6c/0xa0
[ 1401.258003]  ? __update_load_avg_se+0x20c/0x290
[ 1401.262530]  rtnl_setlink+0x104/0x140
[ 1401.266189]  rtnetlink_rcv_msg+0x269/0x310
[ 1401.270283]  ? _cond_resched+0x16/0x40
[ 1401.274029]  ? __kmalloc_node_track_caller+0x1dd/0x2a0
[ 1401.279162]  ? rtnl_calcit.isra.32+0x110/0x110
[ 1401.283601]  netlink_rcv_skb+0xdb/0x110
[ 1401.287437]  netlink_unicast+0x18b/0x250
[ 1401.291359]  netlink_sendmsg+0x2c7/0x3b0
[ 1401.295287]  sock_sendmsg+0x30/0x40
[ 1401.298776]  __sys_sendto+0xd8/0x150
[ 1401.302351]  ? __sys_getsockname+0xac/0xc0
[ 1401.306448]  ? netlink_setsockopt+0x2e/0x2b0
[ 1401.310718]  ? __sys_setsockopt+0x7c/0xe0
[ 1401.314867]  __x64_sys_sendto+0x24/0x30
[ 1401.318709]  do_syscall_64+0x4f/0x100
[ 1401.322372]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1401.327420] RIP: 0033:0x7f460f3a83dd
[ 1401.330997] Code: 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 8b 05 7a 13 2c 00 85 c0 75 3e 45 31 c9 45 31 c0 4c 63 d1 48 63 ff b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 0b c3 66 2e 0f 1f 84 00 00 00 00 00 48 8b 15
[ 1401.349733] RSP: 002b:00007ffd28d23138 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 1401.357293] RAX: ffffffffffffffda RBX: ffffffffffffff90 RCX: 00007f460f3a83dd
[ 1401.364413] RDX: 000000000000002c RSI: 00007ffd28d23170 RDI: 0000000000000003
[ 1401.371533] RBP: 00007ffd28d231e0 R08: 0000000000000000 R09: 0000000000000000
[ 1401.378767] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000006
[ 1401.385895] R13: 00007ffd28d237f0 R14: 00007ffd28d23830 R15: 00007ffd28d2388c
[ 1401.393016] Modules linked in: rpcrdma ib_umad sunrpc ib_ipoib rdma_ucm mlx5_ib binfmt_misc ib_uverbs snd_hda_codec_hdmi intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp mlx5_core kvm_intel snd_hda_codec_realtek snd_hda_codec_generic kvm snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm e1000e uas irqbypass snd_timer crct10dif_pclmul snd mei_me usb_storage crc32_pclmul ghash_clmulni_intel wmi_bmof mei lpc_ich soundcore mlxfw pata_acpi mac_hid ib_iser rdma_cm iw_cm ib_cm ib_core configfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 raid10 raid456 libcrc32c async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq raid1 raid0 multipath linear nouveau video i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops 
 drm mxm_wmi
[ 1401.463638]  aesni_intel aes_x86_64 crypto_simd cryptd glue_helper ahci libahci wmi
[ 1401.471289] CR2: 0000000000003fa8
[ 1401.474617] ---[ end trace 1a0d8962c7db30ed ]---
[ 1401.528487] RIP: 0010:mlx5e_xdp_xmit+0x7b/0x2a0 [mlx5_core]
[ 1401.534058] Code: 8b 0d 29 d9 4f 3f 39 8f 48 39 00 00 b8 fa ff ff ff 0f 86 45 01 00 00 48 8b 87 40 39 00 00 48 63 c9 4c 8b 24 c8 b8 9c ff ff ff <49> 8b 8c 24 a8 3f 00 00 4d 8d bc 24 c0 3c 00 00 83 e1 01 0f 84 19
[ 1401.552789] RSP: 0018:ffff90209fb43bb0 EFLAGS: 00010202
[ 1401.558012] RAX: 00000000ffffff9c RBX: 0000000000000000 RCX: 0000000000000005
[ 1401.565132] RDX: ffffc7627fd75190 RSI: 0000000000000010 RDI: ffff902084580000
[ 1401.572252] RBP: ffffc7627fd75190 R08: ffff901f9821c100 R09: ffffc7627fd75210
[ 1401.579371] R10: 00000000000005dc R11: 0000000000000000 R12: 0000000000000000
[ 1401.586493] R13: ffff902081580000 R14: 0000000000000001 R15: ffffc7627fd75190
[ 1401.593726] FS:  00007f460fa96700(0000) GS:ffff90209fb40000(0000) knlGS:0000000000000000
[ 1401.601797] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1401.607533] CR2: 0000000000003fa8 CR3: 000000076c3b6006 CR4: 00000000003606e0
[ 1401.614653] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1401.621772] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1401.628895] Kernel panic - not syncing: Fatal exception in interrupt
[ 1401.635280] Kernel Offset: 0x5000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 1401.694263] Rebooting in 5 seconds..

             reply	other threads:[~2018-10-03 16:18 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-03  9:30 Toke Høiland-Jørgensen [this message]
2018-10-03 23:44 ` Kernel oops with mlx5 and dual XDP redirect programs Saeed Mahameed
2018-10-04 12:03   ` Toke Høiland-Jørgensen
2018-10-18 21:53   ` Toke Høiland-Jørgensen
2018-10-22 17:57     ` Saeed Mahameed
2018-10-23 10:10       ` Toke Høiland-Jørgensen
2018-10-23 18:01         ` Saeed Mahameed
2018-10-23 20:29           ` Toke Høiland-Jørgensen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=877eize5ro.fsf@toke.dk \
    --to=toke@toke.dk \
    --cc=brouer@redhat.com \
    --cc=eranbe@mellanox.com \
    --cc=netdev@vger.kernel.org \
    --cc=saeedm@mellanox.com \
    --cc=tariqt@mellanox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).