[mlx5_core] kernel NULL pointer dereference when sending packets with AF_XDP using the hw checksum

* [mlx5_core] kernel NULL pointer dereference when sending packets with AF_XDP using the hw checksum
@ 2024-03-16  0:39 Daniele Salvatore Albano
  2024-03-16  4:11 ` Stanislav Fomichev
  0 siblings, 1 reply; 4+ messages in thread
From: Daniele Salvatore Albano @ 2024-03-16  0:39 UTC (permalink / raw)
  To: netdev

Hey there,

Hope this is the right ml, if not sorry in advance.

I have been facing a reproducible kernel panic with 6.8.0 and 6.8.1
when sending packets and enabling the HW checksum calculation with
AF_XDP on my mellanox connect 5.

Running xskgen ( https://github.com/fomichev/xskgen ), which I saw
mentioned in some patches related to AF_XDP and the hw checksum
support. In addition to the minimum parameters to make it work, adding
the -m option is enough to trigger the kernel panic.

This is a mainline kernel from ubuntu.

Below the output from dmesg
[  157.108211] BUG: kernel NULL pointer dereference, address: 0000000000000000
[  157.108264] #PF: supervisor write access in kernel mode
[  157.108284] #PF: error_code(0x0002) - not-present page
[  157.108304] PGD 302a724067 P4D 302a724067 PUD 3027e99067 PMD 0
[  157.108332] Oops: 0002 [#1] PREEMPT SMP NOPTI
[  157.108352] CPU: 19 PID: 132 Comm: ksoftirqd/19 Not tainted
6.8.0-060800-generic #202403131158
[  157.108379] Hardware name: Supermicro Super Server/H11SSL-i, BIOS
2.1 02/21/2020
[  157.108402] RIP: 0010:mlx5e_free_xdpsq_desc+0x266/0x320 [mlx5_core]
[  157.108576] Code: 94 24 58 02 00 00 49 8b 8c 24 50 02 00 00 48 8d
7d c0 8b 02 8d 70 01 89 32 41 23 84 24 68 02 00 00 4c 8b 2c c1 e8 ca
fc ff ff <49> 89 45 00 e9 ce fe ff ff 41 8b 47 20 41 0f b7 57 0a 48 2d
68 01
[  157.108626] RSP: 0018:ffffa8668cd13b90 EFLAGS: 00010246
[  157.108647] RAX: 17bd161cd26e8f20 RBX: 0000000000000000 RCX: 0000000000000000
[  157.108670] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[  157.108693] RBP: ffffa8668cd13c08 R08: 0000000000000000 R09: 0000000000000000
[  157.108715] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8d5e420d3340
[  157.108737] R13: 0000000000000000 R14: ffffffffffffffff R15: 0000000000000000
[  157.108759] FS:  0000000000000000(0000) GS:ffff8d6ddf780000(0000)
knlGS:0000000000000000
[  157.108784] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  157.108804] CR2: 0000000000000000 CR3: 0000003028e5c000 CR4: 00000000003506f0
[  157.108827] Call Trace:
[  157.108841]  <TASK>
[  157.108855]  ? show_regs+0x6d/0x80
[  157.108876]  ? __die+0x24/0x80
[  157.108893]  ? page_fault_oops+0x99/0x1b0
[  157.108916]  ? do_user_addr_fault+0x2ee/0x6b0
[  157.108937]  ? exc_page_fault+0x83/0x1b0
[  157.108958]  ? asm_exc_page_fault+0x27/0x30
[  157.108986]  ? mlx5e_free_xdpsq_desc+0x266/0x320 [mlx5_core]
[  157.109154]  mlx5e_poll_xdpsq_cq+0x17c/0x4f0 [mlx5_core]
[  157.109324]  mlx5e_napi_poll+0x45e/0x7b0 [mlx5_core]
[  157.109470]  __napi_poll+0x33/0x200
[  157.109488]  net_rx_action+0x181/0x2e0
[  157.109502]  ? sched_clock_cpu+0x12/0x1e0
[  157.109524]  __do_softirq+0xe1/0x363
[  157.109544]  ? __pfx_smpboot_thread_fn+0x10/0x10
[  157.109565]  run_ksoftirqd+0x37/0x60
[  157.109582]  smpboot_thread_fn+0xe3/0x1e0
[  157.109600]  kthread+0xf2/0x120
[  157.109616]  ? __pfx_kthread+0x10/0x10
[  157.109632]  ret_from_fork+0x47/0x70
[  157.109648]  ? __pfx_kthread+0x10/0x10
[  157.109663]  ret_from_fork_asm+0x1b/0x30
[  157.109686]  </TASK>
[  157.109696] Modules linked in: xt_CHECKSUM xt_MASQUERADE
xt_conntrack xt_comment ipt_REJECT nf_reject_ipv4 xt_tcpudp nft_compat
nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4
nf_tables nfnetlink cfg80211 binfmt_misc nls_iso8859_1 intel_rapl_msr
intel_rapl_common amd64_edac edac_mce_amd kvm_amd ipmi_ssif kvm
irqbypass rapl acpi_ipmi ccp k10temp ipmi_si ipmi_devintf joydev
input_leds ipmi_msghandler mac_hid br_netfilter dm_multipath bridge
scsi_dh_rdac scsi_dh_emc stp llc scsi_dh_alua overlay msr efi_pstore
dmi_sysfs ip_tables x_tables autofs4 btrfs blake2b_generic raid10
raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor
raid6_pq libcrc32c raid1 raid0 mlx5_ib ib_uverbs macsec ib_core
hid_generic usbhid hid mlx5_core crct10dif_pclmul crc32_pclmul
polyval_clmulni polyval_generic ghash_clmulni_intel mlxfw sha256_ssse3
psample nvme sha1_ssse3 igb tls ahci nvme_core ast pci_hyperv_intf
libahci dca i2c_piix4 xhci_pci nvme_auth i2c_algo_bit xhci_pci_renesas
aesni_intel crypto_simd cryptd
[  157.113195] CR2: 0000000000000000
[  157.113607] ---[ end trace 0000000000000000 ]---
[  157.877621] clocksource: Long readout interval, skipping watchdog
check: cs_nsec: 1263523800 wd_nsec: 1263521131

Thanks,
Daniele

^ permalink raw reply	[flat|nested] 4+ messages in thread