All of lore.kernel.org
 help / color / mirror / Atom feed
* [bug report]concurrent blktests nvme-rdma execution lead kernel null pointer
@ 2021-12-03  2:20 Yi Zhang
  2021-12-03 11:27 ` Bernard Metzler
  0 siblings, 1 reply; 5+ messages in thread
From: Yi Zhang @ 2021-12-03  2:20 UTC (permalink / raw)
  To: RDMA mailing list

Hello
With the concurrent blktests nvme-rdma execution with both rdma_rxe
and siw lead kernel BUG on 5.16.0-rc3, pls help check it, thanks.

Reproducer:
Run blktests nvme-rdma on two terminal at the same time
terminal 1:
# use_siw=1 nvme_trtype=rdma ./check nvme/
terminal 2:
# nvme_trtype=rdma ./check nvme/

[ 1685.584327] run blktests nvme/013 at 2021-12-02 21:08:46
[ 1685.669804] eno2 speed is unknown, defaulting to 1000
[ 1685.674866] eno2 speed is unknown, defaulting to 1000
[ 1685.679941] eno2 speed is unknown, defaulting to 1000
[ 1685.686033] eno2 speed is unknown, defaulting to 1000
[ 1685.691087] eno2 speed is unknown, defaulting to 1000
[ 1685.697677] eno2 speed is unknown, defaulting to 1000
[ 1685.703727] eno3 speed is unknown, defaulting to 1000
[ 1685.708798] eno3 speed is unknown, defaulting to 1000
[ 1685.713863] eno3 speed is unknown, defaulting to 1000
[ 1685.719965] eno3 speed is unknown, defaulting to 1000
[ 1685.725043] eno3 speed is unknown, defaulting to 1000
[ 1685.731688] eno2 speed is unknown, defaulting to 1000
[ 1685.736763] eno3 speed is unknown, defaulting to 1000
[ 1685.742818] eno4 speed is unknown, defaulting to 1000
[ 1685.747881] eno4 speed is unknown, defaulting to 1000
[ 1685.752949] eno4 speed is unknown, defaulting to 1000
[ 1685.759134] eno4 speed is unknown, defaulting to 1000
[ 1685.764195] eno4 speed is unknown, defaulting to 1000
[ 1685.770914] eno2 speed is unknown, defaulting to 1000
[ 1685.775980] eno3 speed is unknown, defaulting to 1000
[ 1685.781047] eno4 speed is unknown, defaulting to 1000
[ 1686.002801] eno2 speed is unknown, defaulting to 1000
[ 1686.007867] eno3 speed is unknown, defaulting to 1000
[ 1686.012934] eno4 speed is unknown, defaulting to 1000
[ 1686.022521] rdma_rxe: rxe-ah pool destroyed with unfree'd elem
[ 1686.289384] run blktests nvme/013 at 2021-12-02 21:08:46
[ 1686.356666] eno2 speed is unknown, defaulting to 1000
[ 1686.361735] eno2 speed is unknown, defaulting to 1000
[ 1686.366807] eno2 speed is unknown, defaulting to 1000
[ 1686.371876] eno2 speed is unknown, defaulting to 1000
[ 1686.378400] eno2 speed is unknown, defaulting to 1000
[ 1686.384419] eno3 speed is unknown, defaulting to 1000
[ 1686.389494] eno3 speed is unknown, defaulting to 1000
[ 1686.394583] eno3 speed is unknown, defaulting to 1000
[ 1686.399660] eno3 speed is unknown, defaulting to 1000
[ 1686.406219] eno2 speed is unknown, defaulting to 1000
[ 1686.411291] eno3 speed is unknown, defaulting to 1000
[ 1686.417275] eno4 speed is unknown, defaulting to 1000
[ 1686.422338] eno4 speed is unknown, defaulting to 1000
[ 1686.427401] eno4 speed is unknown, defaulting to 1000
[ 1686.432475] eno4 speed is unknown, defaulting to 1000
[ 1686.439038] eno2 speed is unknown, defaulting to 1000
[ 1686.444109] eno3 speed is unknown, defaulting to 1000
[ 1686.449180] eno4 speed is unknown, defaulting to 1000
[ 1686.873596] xfs filesystem being mounted at /mnt/blktests supports
timestamps until 2038 (0x7fffffff)
[ 1687.540606] xfs filesystem being mounted at /mnt/blktests supports
timestamps until 2038 (0x7fffffff)
[ 1693.658327] block nvme0n1: no available path - failing I/O
[ 1693.663038] block nvme0n1: no available path - failing I/O
[ 1693.663828] XFS (nvme0n1): log I/O error -5
[ 1693.665024] block nvme0n1: no available path - failing I/O
[ 1693.665041] XFS (nvme0n1): log I/O error -5
[ 1693.665044] XFS (nvme0n1): Log I/O Error (0x2) detected at
xlog_ioend_work+0x71/0x80 [xfs] (fs/xfs/xfs_log.c:1377).  Shutting
down filesystem.
[ 1693.665142] XFS (nvme0n1): Please unmount the filesystem and
rectify the problem(s)
[ 1693.720462] block nvme0n1: no available path - failing I/O
[ 1693.728150] nvmet_rdma: post_recv cmd failed
[ 1693.732432] nvmet_rdma: sending cmd response failed
[ 1693.836083] eno2 speed is unknown, defaulting to 1000
[ 1693.841152] eno3 speed is unknown, defaulting to 1000
[ 1693.846217] eno4 speed is unknown, defaulting to 1000
[ 1693.852280] BUG: unable to handle page fault for address: ffffffffc09d2680
[ 1693.859156] #PF: supervisor instruction fetch in kernel mode
[ 1693.864815] #PF: error_code(0x0010) - not-present page
[ 1693.869953] PGD 2b5813067 P4D 2b5813067 PUD 2b5815067 PMD 13a157067 PTE 0
[ 1693.876740] Oops: 0010 [#1] PREEMPT SMP NOPTI
[ 1693.881098] CPU: 15 PID: 16091 Comm: rdma Tainted: G S        I
  5.16.0-rc3 #1
[ 1693.888751] Hardware name: Dell Inc. PowerEdge R640/06NR82, BIOS
2.11.2 004/21/2021
[ 1693.896403] RIP: 0010:0xffffffffc09d2680
[ 1693.900329] Code: Unable to access opcode bytes at RIP 0xffffffffc09d2656.
[ 1693.907202] RSP: 0018:ffffb3d5456237b0 EFLAGS: 00010286
[ 1693.912428] RAX: ffffffffc09d2680 RBX: ffff9d4adade2000 RCX: 0000000000000001
[ 1693.919559] RDX: 0000000080000001 RSI: ffffb3d5456237e8 RDI: ffff9d4adade2000
[ 1693.926693] RBP: ffffb3d5456237e8 R08: ffffb3d545623850 R09: 0000000000000230
[ 1693.933823] R10: 0000000000000002 R11: ffffb3d545623840 R12: ffff9d4adade2270
[ 1693.940957] R13: ffff9d4adade21e0 R14: 0000000000000005 R15: ffff9d4adade2220
[ 1693.948089] FS:  00007f2f0601c000(0000) GS:ffff9d59ffdc0000(0000)
knlGS:0000000000000000
[ 1693.956176] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1693.961921] CR2: ffffffffc09d2656 CR3: 0000000180578004 CR4: 00000000007706e0
[ 1693.969052] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1693.976177] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1693.983309] PKRU: 55555554
[ 1693.986023] Call Trace:
[ 1693.988474]  <TASK>
[ 1693.990582]  ? cma_cm_event_handler+0x1d/0xd0 [rdma_cm]
[ 1693.995817]  ? cma_process_remove+0x73/0x290 [rdma_cm]
[ 1694.000954]  ? cma_remove_one+0x5a/0xd0 [rdma_cm]
[ 1694.005661]  ? remove_client_context+0x88/0xd0 [ib_core]
[ 1694.010990]  ? disable_device+0x8c/0x130 [ib_core]
[ 1694.015790]  ? xa_load+0x73/0xa0
[ 1694.019024]  ? __ib_unregister_device+0x40/0xa0 [ib_core]
[ 1694.024431]  ? ib_unregister_device_and_put+0x33/0x50 [ib_core]
[ 1694.030360]  ? nldev_dellink+0x86/0xe0 [ib_core]
[ 1694.035000]  ? rdma_nl_rcv_msg+0x109/0x200 [ib_core]
[ 1694.039978]  ? __alloc_skb+0x8c/0x1b0
[ 1694.043645]  ? __kmalloc_node_track_caller+0x184/0x340
[ 1694.048785]  ? rdma_nl_rcv+0xc8/0x110 [ib_core]
[ 1694.053325]  ? netlink_unicast+0x1a2/0x280
[ 1694.057424]  ? netlink_sendmsg+0x244/0x480
[ 1694.061524]  ? sock_sendmsg+0x58/0x60
[ 1694.065188]  ? __sys_sendto+0xee/0x160
[ 1694.068944]  ? netlink_setsockopt+0x26e/0x3d0
[ 1694.073300]  ? __sys_setsockopt+0xdc/0x1d0
[ 1694.077400]  ? __x64_sys_sendto+0x24/0x30
[ 1694.081414]  ? do_syscall_64+0x37/0x80
[ 1694.085164]  ? entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 1694.090391]  </TASK>
[ 1694.092584] Modules linked in: siw rpcrdma rdma_ucm ib_uverbs
ib_srpt ib_isert iscsi_target_mod target_core_mod loop ib_iser
libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_cm ib_core
rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache
netfs rfkill sunrpc vfat fat dm_multipath intel_rapl_msr
intel_rapl_common isst_if_common skx_edac x86_pkg_temp_thermal
intel_powerclamp coretemp kvm_intel ipmi_ssif kvm mgag200 i2c_algo_bit
drm_kms_helper iTCO_wdt iTCO_vendor_support syscopyarea irqbypass
sysfillrect crct10dif_pclmul sysimgblt crc32_pclmul fb_sys_fops
ghash_clmulni_intel acpi_ipmi drm rapl ipmi_si intel_cstate mei_me
intel_uncore i2c_i801 mei ipmi_devintf nd_pmem dax_pmem_compat
wmi_bmof pcspkr device_dax intel_pch_thermal i2c_smbus lpc_ich
ipmi_msghandler nd_btt dax_pmem_core acpi_power_meter xfs libcrc32c
sd_mod t10_pi sg ahci libahci libata megaraid_sas nfit tg3
crc32c_intel libnvdimm wmi dm_mirror dm_region_hash dm_log dm_mod
[last unloaded: nvmet]
[ 1694.178277] CR2: ffffffffc09d2680
[ 1694.181596] ---[ end trace 9c234cd612cbb92a ]---
[ 1694.217410] RIP: 0010:0xffffffffc09d2680
[ 1694.221343] Code: Unable to access opcode bytes at RIP 0xffffffffc09d2656.
[ 1694.228212] RSP: 0018:ffffb3d5456237b0 EFLAGS: 00010286
[ 1694.233437] RAX: ffffffffc09d2680 RBX: ffff9d4adade2000 RCX: 0000000000000001
[ 1694.240570] RDX: 0000000080000001 RSI: ffffb3d5456237e8 RDI: ffff9d4adade2000
[ 1694.247702] RBP: ffffb3d5456237e8 R08: ffffb3d545623850 R09: 0000000000000230
[ 1694.254828] R10: 0000000000000002 R11: ffffb3d545623840 R12: ffff9d4adade2270
[ 1694.261958] R13: ffff9d4adade21e0 R14: 0000000000000005 R15: ffff9d4adade2220
[ 1694.269091] FS:  00007f2f0601c000(0000) GS:ffff9d59ffdc0000(0000)
knlGS:0000000000000000
[ 1694.277178] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1694.282922] CR2: ffffffffc09d2656 CR3: 0000000180578004 CR4: 00000000007706e0
[ 1694.290054] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1694.297180] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1694.304312] PKRU: 55555554
[ 1694.307025] Kernel panic - not syncing: Fatal exception
[ 1694.772244] Kernel Offset: 0x35c00000 from 0xffffffff81000000
(relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 1694.794394] ---[ end Kernel panic - not syncing: Fatal exception ]---


-- 
Best Regards,
  Yi Zhang


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [bug report]concurrent blktests nvme-rdma execution lead kernel null pointer
  2021-12-03  2:20 [bug report]concurrent blktests nvme-rdma execution lead kernel null pointer Yi Zhang
@ 2021-12-03 11:27 ` Bernard Metzler
  2021-12-05 11:47   ` Leon Romanovsky
  0 siblings, 1 reply; 5+ messages in thread
From: Bernard Metzler @ 2021-12-03 11:27 UTC (permalink / raw)
  To: Yi Zhang; +Cc: RDMA mailing list

-----"Yi Zhang" <yi.zhang@redhat.com> wrote: -----

>To: "RDMA mailing list" <linux-rdma@vger.kernel.org>
>From: "Yi Zhang" <yi.zhang@redhat.com>
>Date: 12/03/2021 03:20AM
>Subject: [EXTERNAL] [bug report]concurrent blktests nvme-rdma
>execution lead kernel null pointer
>
>Hello
>With the concurrent blktests nvme-rdma execution with both rdma_rxe
>and siw lead kernel BUG on 5.16.0-rc3, pls help check it, thanks.
>

The RDMA core currently does not prevent us from
assigning  both siw and rxe to the same netdev. I think this
is what is happening here. This setting is of no sense, but
obviously not prohibited by the RDMA infrastructure. Behavior
is undefined and a kernel panic not unexpected. Shall we
prevent the privileged user from doing this type of
experiments?

A related question: should we also explicitly refuse to
add software RDMA drivers to netdevs with RDMA hardware active?
This is, while stupid and resulting behavior undefined, currently
possible as well.

Thanks
Bernard.

>Reproducer:
>Run blktests nvme-rdma on two terminal at the same time
>terminal 1:
># use_siw=1 nvme_trtype=rdma ./check nvme/
>terminal 2:
># nvme_trtype=rdma ./check nvme/
>
>[ 1685.584327] run blktests nvme/013 at 2021-12-02 21:08:46
>[ 1685.669804] eno2 speed is unknown, defaulting to 1000
>[ 1685.674866] eno2 speed is unknown, defaulting to 1000
>[ 1685.679941] eno2 speed is unknown, defaulting to 1000
>[ 1685.686033] eno2 speed is unknown, defaulting to 1000
>[ 1685.691087] eno2 speed is unknown, defaulting to 1000
>[ 1685.697677] eno2 speed is unknown, defaulting to 1000
>[ 1685.703727] eno3 speed is unknown, defaulting to 1000
>[ 1685.708798] eno3 speed is unknown, defaulting to 1000
>[ 1685.713863] eno3 speed is unknown, defaulting to 1000
>[ 1685.719965] eno3 speed is unknown, defaulting to 1000
>[ 1685.725043] eno3 speed is unknown, defaulting to 1000
>[ 1685.731688] eno2 speed is unknown, defaulting to 1000
>[ 1685.736763] eno3 speed is unknown, defaulting to 1000
>[ 1685.742818] eno4 speed is unknown, defaulting to 1000
>[ 1685.747881] eno4 speed is unknown, defaulting to 1000
>[ 1685.752949] eno4 speed is unknown, defaulting to 1000
>[ 1685.759134] eno4 speed is unknown, defaulting to 1000
>[ 1685.764195] eno4 speed is unknown, defaulting to 1000
>[ 1685.770914] eno2 speed is unknown, defaulting to 1000
>[ 1685.775980] eno3 speed is unknown, defaulting to 1000
>[ 1685.781047] eno4 speed is unknown, defaulting to 1000
>[ 1686.002801] eno2 speed is unknown, defaulting to 1000
>[ 1686.007867] eno3 speed is unknown, defaulting to 1000
>[ 1686.012934] eno4 speed is unknown, defaulting to 1000
>[ 1686.022521] rdma_rxe: rxe-ah pool destroyed with unfree'd elem
>[ 1686.289384] run blktests nvme/013 at 2021-12-02 21:08:46
>[ 1686.356666] eno2 speed is unknown, defaulting to 1000
>[ 1686.361735] eno2 speed is unknown, defaulting to 1000
>[ 1686.366807] eno2 speed is unknown, defaulting to 1000
>[ 1686.371876] eno2 speed is unknown, defaulting to 1000
>[ 1686.378400] eno2 speed is unknown, defaulting to 1000
>[ 1686.384419] eno3 speed is unknown, defaulting to 1000
>[ 1686.389494] eno3 speed is unknown, defaulting to 1000
>[ 1686.394583] eno3 speed is unknown, defaulting to 1000
>[ 1686.399660] eno3 speed is unknown, defaulting to 1000
>[ 1686.406219] eno2 speed is unknown, defaulting to 1000
>[ 1686.411291] eno3 speed is unknown, defaulting to 1000
>[ 1686.417275] eno4 speed is unknown, defaulting to 1000
>[ 1686.422338] eno4 speed is unknown, defaulting to 1000
>[ 1686.427401] eno4 speed is unknown, defaulting to 1000
>[ 1686.432475] eno4 speed is unknown, defaulting to 1000
>[ 1686.439038] eno2 speed is unknown, defaulting to 1000
>[ 1686.444109] eno3 speed is unknown, defaulting to 1000
>[ 1686.449180] eno4 speed is unknown, defaulting to 1000
>[ 1686.873596] xfs filesystem being mounted at /mnt/blktests supports
>timestamps until 2038 (0x7fffffff)
>[ 1687.540606] xfs filesystem being mounted at /mnt/blktests supports
>timestamps until 2038 (0x7fffffff)
>[ 1693.658327] block nvme0n1: no available path - failing I/O
>[ 1693.663038] block nvme0n1: no available path - failing I/O
>[ 1693.663828] XFS (nvme0n1): log I/O error -5
>[ 1693.665024] block nvme0n1: no available path - failing I/O
>[ 1693.665041] XFS (nvme0n1): log I/O error -5
>[ 1693.665044] XFS (nvme0n1): Log I/O Error (0x2) detected at
>xlog_ioend_work+0x71/0x80 [xfs] (fs/xfs/xfs_log.c:1377).  Shutting
>down filesystem.
>[ 1693.665142] XFS (nvme0n1): Please unmount the filesystem and
>rectify the problem(s)
>[ 1693.720462] block nvme0n1: no available path - failing I/O
>[ 1693.728150] nvmet_rdma: post_recv cmd failed
>[ 1693.732432] nvmet_rdma: sending cmd response failed
>[ 1693.836083] eno2 speed is unknown, defaulting to 1000
>[ 1693.841152] eno3 speed is unknown, defaulting to 1000
>[ 1693.846217] eno4 speed is unknown, defaulting to 1000
>[ 1693.852280] BUG: unable to handle page fault for address:
>ffffffffc09d2680
>[ 1693.859156] #PF: supervisor instruction fetch in kernel mode
>[ 1693.864815] #PF: error_code(0x0010) - not-present page
>[ 1693.869953] PGD 2b5813067 P4D 2b5813067 PUD 2b5815067 PMD
>13a157067 PTE 0
>[ 1693.876740] Oops: 0010 [#1] PREEMPT SMP NOPTI
>[ 1693.881098] CPU: 15 PID: 16091 Comm: rdma Tainted: G S        I
>  5.16.0-rc3 #1
>[ 1693.888751] Hardware name: Dell Inc. PowerEdge R640/06NR82, BIOS
>2.11.2 004/21/2021
>[ 1693.896403] RIP: 0010:0xffffffffc09d2680
>[ 1693.900329] Code: Unable to access opcode bytes at RIP
>0xffffffffc09d2656.
>[ 1693.907202] RSP: 0018:ffffb3d5456237b0 EFLAGS: 00010286
>[ 1693.912428] RAX: ffffffffc09d2680 RBX: ffff9d4adade2000 RCX:
>0000000000000001
>[ 1693.919559] RDX: 0000000080000001 RSI: ffffb3d5456237e8 RDI:
>ffff9d4adade2000
>[ 1693.926693] RBP: ffffb3d5456237e8 R08: ffffb3d545623850 R09:
>0000000000000230
>[ 1693.933823] R10: 0000000000000002 R11: ffffb3d545623840 R12:
>ffff9d4adade2270
>[ 1693.940957] R13: ffff9d4adade21e0 R14: 0000000000000005 R15:
>ffff9d4adade2220
>[ 1693.948089] FS:  00007f2f0601c000(0000) GS:ffff9d59ffdc0000(0000)
>knlGS:0000000000000000
>[ 1693.956176] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>[ 1693.961921] CR2: ffffffffc09d2656 CR3: 0000000180578004 CR4:
>00000000007706e0
>[ 1693.969052] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
>0000000000000000
>[ 1693.976177] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
>0000000000000400
>[ 1693.983309] PKRU: 55555554
>[ 1693.986023] Call Trace:
>[ 1693.988474]  <TASK>
>[ 1693.990582]  ? cma_cm_event_handler+0x1d/0xd0 [rdma_cm]
>[ 1693.995817]  ? cma_process_remove+0x73/0x290 [rdma_cm]
>[ 1694.000954]  ? cma_remove_one+0x5a/0xd0 [rdma_cm]
>[ 1694.005661]  ? remove_client_context+0x88/0xd0 [ib_core]
>[ 1694.010990]  ? disable_device+0x8c/0x130 [ib_core]
>[ 1694.015790]  ? xa_load+0x73/0xa0
>[ 1694.019024]  ? __ib_unregister_device+0x40/0xa0 [ib_core]
>[ 1694.024431]  ? ib_unregister_device_and_put+0x33/0x50 [ib_core]
>[ 1694.030360]  ? nldev_dellink+0x86/0xe0 [ib_core]
>[ 1694.035000]  ? rdma_nl_rcv_msg+0x109/0x200 [ib_core]
>[ 1694.039978]  ? __alloc_skb+0x8c/0x1b0
>[ 1694.043645]  ? __kmalloc_node_track_caller+0x184/0x340
>[ 1694.048785]  ? rdma_nl_rcv+0xc8/0x110 [ib_core]
>[ 1694.053325]  ? netlink_unicast+0x1a2/0x280
>[ 1694.057424]  ? netlink_sendmsg+0x244/0x480
>[ 1694.061524]  ? sock_sendmsg+0x58/0x60
>[ 1694.065188]  ? __sys_sendto+0xee/0x160
>[ 1694.068944]  ? netlink_setsockopt+0x26e/0x3d0
>[ 1694.073300]  ? __sys_setsockopt+0xdc/0x1d0
>[ 1694.077400]  ? __x64_sys_sendto+0x24/0x30
>[ 1694.081414]  ? do_syscall_64+0x37/0x80
>[ 1694.085164]  ? entry_SYSCALL_64_after_hwframe+0x44/0xae
>[ 1694.090391]  </TASK>
>[ 1694.092584] Modules linked in: siw rpcrdma rdma_ucm ib_uverbs
>ib_srpt ib_isert iscsi_target_mod target_core_mod loop ib_iser
>libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_cm ib_core
>rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace
>fscache
>netfs rfkill sunrpc vfat fat dm_multipath intel_rapl_msr
>intel_rapl_common isst_if_common skx_edac x86_pkg_temp_thermal
>intel_powerclamp coretemp kvm_intel ipmi_ssif kvm mgag200
>i2c_algo_bit
>drm_kms_helper iTCO_wdt iTCO_vendor_support syscopyarea irqbypass
>sysfillrect crct10dif_pclmul sysimgblt crc32_pclmul fb_sys_fops
>ghash_clmulni_intel acpi_ipmi drm rapl ipmi_si intel_cstate mei_me
>intel_uncore i2c_i801 mei ipmi_devintf nd_pmem dax_pmem_compat
>wmi_bmof pcspkr device_dax intel_pch_thermal i2c_smbus lpc_ich
>ipmi_msghandler nd_btt dax_pmem_core acpi_power_meter xfs libcrc32c
>sd_mod t10_pi sg ahci libahci libata megaraid_sas nfit tg3
>crc32c_intel libnvdimm wmi dm_mirror dm_region_hash dm_log dm_mod
>[last unloaded: nvmet]
>[ 1694.178277] CR2: ffffffffc09d2680
>[ 1694.181596] ---[ end trace 9c234cd612cbb92a ]---
>[ 1694.217410] RIP: 0010:0xffffffffc09d2680
>[ 1694.221343] Code: Unable to access opcode bytes at RIP
>0xffffffffc09d2656.
>[ 1694.228212] RSP: 0018:ffffb3d5456237b0 EFLAGS: 00010286
>[ 1694.233437] RAX: ffffffffc09d2680 RBX: ffff9d4adade2000 RCX:
>0000000000000001
>[ 1694.240570] RDX: 0000000080000001 RSI: ffffb3d5456237e8 RDI:
>ffff9d4adade2000
>[ 1694.247702] RBP: ffffb3d5456237e8 R08: ffffb3d545623850 R09:
>0000000000000230
>[ 1694.254828] R10: 0000000000000002 R11: ffffb3d545623840 R12:
>ffff9d4adade2270
>[ 1694.261958] R13: ffff9d4adade21e0 R14: 0000000000000005 R15:
>ffff9d4adade2220
>[ 1694.269091] FS:  00007f2f0601c000(0000) GS:ffff9d59ffdc0000(0000)
>knlGS:0000000000000000
>[ 1694.277178] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>[ 1694.282922] CR2: ffffffffc09d2656 CR3: 0000000180578004 CR4:
>00000000007706e0
>[ 1694.290054] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
>0000000000000000
>[ 1694.297180] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
>0000000000000400
>[ 1694.304312] PKRU: 55555554
>[ 1694.307025] Kernel panic - not syncing: Fatal exception
>[ 1694.772244] Kernel Offset: 0x35c00000 from 0xffffffff81000000
>(relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>[ 1694.794394] ---[ end Kernel panic - not syncing: Fatal exception
>]---
>
>
>-- 
>Best Regards,
>  Yi Zhang
>
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [bug report]concurrent blktests nvme-rdma execution lead kernel null pointer
  2021-12-03 11:27 ` Bernard Metzler
@ 2021-12-05 11:47   ` Leon Romanovsky
  2021-12-06 11:10     ` Bernard Metzler
  0 siblings, 1 reply; 5+ messages in thread
From: Leon Romanovsky @ 2021-12-05 11:47 UTC (permalink / raw)
  To: Bernard Metzler; +Cc: Yi Zhang, RDMA mailing list

On Fri, Dec 03, 2021 at 11:27:22AM +0000, Bernard Metzler wrote:
> -----"Yi Zhang" <yi.zhang@redhat.com> wrote: -----
> 
> >To: "RDMA mailing list" <linux-rdma@vger.kernel.org>
> >From: "Yi Zhang" <yi.zhang@redhat.com>
> >Date: 12/03/2021 03:20AM
> >Subject: [EXTERNAL] [bug report]concurrent blktests nvme-rdma
> >execution lead kernel null pointer
> >
> >Hello
> >With the concurrent blktests nvme-rdma execution with both rdma_rxe
> >and siw lead kernel BUG on 5.16.0-rc3, pls help check it, thanks.
> >
> 
> The RDMA core currently does not prevent us from
> assigning  both siw and rxe to the same netdev. I think this
> is what is happening here. This setting is of no sense, but
> obviously not prohibited by the RDMA infrastructure. Behavior
> is undefined and a kernel panic not unexpected. Shall we
> prevent the privileged user from doing this type of
> experiments?
> 
> A related question: should we also explicitly refuse to
> add software RDMA drivers to netdevs with RDMA hardware active?
> This is, while stupid and resulting behavior undefined, currently
> possible as well.

In old soft-RoCE manuals, I saw a request to unload mlx4_ib/mlx5_ib
modules before configuring RXE. This effectively "prevented" from
running with "RDMA hardware active". 

So I'm not surprised that it doesn't work, but why do you think that
this behavior is stupid? RXE/SIW can be seen as ULP and as such it
is ok to run many ULPs on same netdev.

Thanks

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: [bug report]concurrent blktests nvme-rdma execution lead kernel null pointer
  2021-12-05 11:47   ` Leon Romanovsky
@ 2021-12-06 11:10     ` Bernard Metzler
  2021-12-06 13:13       ` Leon Romanovsky
  0 siblings, 1 reply; 5+ messages in thread
From: Bernard Metzler @ 2021-12-06 11:10 UTC (permalink / raw)
  To: Leon Romanovsky; +Cc: Yi Zhang, RDMA mailing list

> -----Original Message-----
> From: Leon Romanovsky <leon@kernel.org>
> Sent: Sunday, 5 December 2021 12:47
> To: Bernard Metzler <BMT@zurich.ibm.com>
> Cc: Yi Zhang <yi.zhang@redhat.com>; RDMA mailing list <linux-
> rdma@vger.kernel.org>
> Subject: [EXTERNAL] Re: [bug report]concurrent blktests nvme-rdma
> execution lead kernel null pointer
> 
> On Fri, Dec 03, 2021 at 11:27:22AM +0000, Bernard Metzler wrote:
> > -----"Yi Zhang" <yi.zhang@redhat.com> wrote: -----
> >
> > >To: "RDMA mailing list" <linux-rdma@vger.kernel.org>
> > >From: "Yi Zhang" <yi.zhang@redhat.com>
> > >Date: 12/03/2021 03:20AM
> > >Subject: [EXTERNAL] [bug report]concurrent blktests nvme-rdma
> > >execution lead kernel null pointer
> > >
> > >Hello
> > >With the concurrent blktests nvme-rdma execution with both rdma_rxe
> > >and siw lead kernel BUG on 5.16.0-rc3, pls help check it, thanks.
> > >
> >
> > The RDMA core currently does not prevent us from assigning  both siw
> > and rxe to the same netdev. I think this is what is happening here.
> > This setting is of no sense, but obviously not prohibited by the RDMA
> > infrastructure. Behavior is undefined and a kernel panic not
> > unexpected. Shall we prevent the privileged user from doing this type
> > of experiments?
> >
> > A related question: should we also explicitly refuse to add software
> > RDMA drivers to netdevs with RDMA hardware active?
> > This is, while stupid and resulting behavior undefined, currently
> > possible as well.
> 
> In old soft-RoCE manuals, I saw a request to unload mlx4_ib/mlx5_ib
> modules before configuring RXE. This effectively "prevented" from running
> with "RDMA hardware active".
> 
Right. Same for 'siw over Chelsio T5/6' etc: first unload the iw_cxgb4
driver, which implements the iWarp protocol, before attaching siw to
the network interface. But shouldn't the kernel just refuse that two
instances of the _same_ ULP (e.g., one hardware iWarp, one software
iWARP) can be attached to the same netdev, potentially sharing IP
address and port space?

> So I'm not surprised that it doesn't work, but why do you think that this
> behavior is stupid? RXE/SIW can be seen as ULP and as such it is ok to run
> many ULPs on same netdev.

Hmm, from an rdma_cm perspective, I am not sure it is supported
that two RDMA providers can share the same device and IP address.
Without recreating it or looking into the code, I expect Yi's
null pointer issue is caused by this unsupported setup. If it is
unsupported, it should be impossible to setup.

Thanks,
Bernard.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [bug report]concurrent blktests nvme-rdma execution lead kernel null pointer
  2021-12-06 11:10     ` Bernard Metzler
@ 2021-12-06 13:13       ` Leon Romanovsky
  0 siblings, 0 replies; 5+ messages in thread
From: Leon Romanovsky @ 2021-12-06 13:13 UTC (permalink / raw)
  To: Bernard Metzler; +Cc: Yi Zhang, RDMA mailing list

On Mon, Dec 06, 2021 at 11:10:52AM +0000, Bernard Metzler wrote:
> > -----Original Message-----
> > From: Leon Romanovsky <leon@kernel.org>
> > Sent: Sunday, 5 December 2021 12:47
> > To: Bernard Metzler <BMT@zurich.ibm.com>
> > Cc: Yi Zhang <yi.zhang@redhat.com>; RDMA mailing list <linux-
> > rdma@vger.kernel.org>
> > Subject: [EXTERNAL] Re: [bug report]concurrent blktests nvme-rdma
> > execution lead kernel null pointer
> > 
> > On Fri, Dec 03, 2021 at 11:27:22AM +0000, Bernard Metzler wrote:
> > > -----"Yi Zhang" <yi.zhang@redhat.com> wrote: -----
> > >
> > > >To: "RDMA mailing list" <linux-rdma@vger.kernel.org>
> > > >From: "Yi Zhang" <yi.zhang@redhat.com>
> > > >Date: 12/03/2021 03:20AM
> > > >Subject: [EXTERNAL] [bug report]concurrent blktests nvme-rdma
> > > >execution lead kernel null pointer
> > > >
> > > >Hello
> > > >With the concurrent blktests nvme-rdma execution with both rdma_rxe
> > > >and siw lead kernel BUG on 5.16.0-rc3, pls help check it, thanks.
> > > >
> > >
> > > The RDMA core currently does not prevent us from assigning  both siw
> > > and rxe to the same netdev. I think this is what is happening here.
> > > This setting is of no sense, but obviously not prohibited by the RDMA
> > > infrastructure. Behavior is undefined and a kernel panic not
> > > unexpected. Shall we prevent the privileged user from doing this type
> > > of experiments?
> > >
> > > A related question: should we also explicitly refuse to add software
> > > RDMA drivers to netdevs with RDMA hardware active?
> > > This is, while stupid and resulting behavior undefined, currently
> > > possible as well.
> > 
> > In old soft-RoCE manuals, I saw a request to unload mlx4_ib/mlx5_ib
> > modules before configuring RXE. This effectively "prevented" from running
> > with "RDMA hardware active".
> > 
> Right. Same for 'siw over Chelsio T5/6' etc: first unload the iw_cxgb4
> driver, which implements the iWarp protocol, before attaching siw to
> the network interface. But shouldn't the kernel just refuse that two
> instances of the _same_ ULP (e.g., one hardware iWarp, one software
> iWARP) can be attached to the same netdev, potentially sharing IP
> address and port space?

I think that users will get different rdma-cm ids for real HW and SW devices.
The rdma_getaddrinfo() should help here.

> 
> > So I'm not surprised that it doesn't work, but why do you think that this
> > behavior is stupid? RXE/SIW can be seen as ULP and as such it is ok to run
> > many ULPs on same netdev.
> 
> Hmm, from an rdma_cm perspective, I am not sure it is supported
> that two RDMA providers can share the same device and IP address.
> Without recreating it or looking into the code, I expect Yi's
> null pointer issue is caused by this unsupported setup. If it is
> unsupported, it should be impossible to setup.

I agree with you that it is the best solution here, just because it is
good enough for RXE/SIW.

Thanks

> 
> Thanks,
> Bernard.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-12-06 13:14 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-03  2:20 [bug report]concurrent blktests nvme-rdma execution lead kernel null pointer Yi Zhang
2021-12-03 11:27 ` Bernard Metzler
2021-12-05 11:47   ` Leon Romanovsky
2021-12-06 11:10     ` Bernard Metzler
2021-12-06 13:13       ` Leon Romanovsky

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.