All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [SPDK] Kernel panic on redhat 7.5 host
@ 2019-01-07 15:11 Howell, Seth
  0 siblings, 0 replies; 7+ messages in thread
From: Howell, Seth @ 2019-01-07 15:11 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 4816 bytes --]

Hi Shahar,

Thank you for bringing this up. We currently don't have any CentOS 7.5 machines in our build pool, We have CentOS7.4 machines, and haven't observed this behavior. Although we admittedly aren't doing a lot of per-patch NVMe-oF testing on those machines. I will make time this week to reproduce this issue.

I will spin up CentOS 7.4 and 7.5 machines to repro this week.

Have you created a GitHub issue yet?

Thanks,

Seth

-----Original Message-----
From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Shahar Salzman
Sent: Thursday, January 3, 2019 6:26 AM
To: Shahar Salzman <shahar.salzman(a)kaminario.com>; Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: Re: [SPDK] Kernel panic on redhat 7.5 host

BTW, this does not happen on CentOS7.6
________________________________
From: SPDK <spdk-bounces(a)lists.01.org> on behalf of Shahar Salzman <shahar.salzman(a)kaminario.com>
Sent: Wednesday, January 2, 2019 9:40 PM
To: Storage Performance Development Kit
Subject: [SPDK] Kernel panic on redhat 7.5 host

Hi,

We have been playing around with SPDK access control, and it seems that when a RHEL 7.5 host attempts to connect to a subsystem and is denied access, it panics with the BT bellow.

This doesn't happen in my newer kernel system, but we where hoping to qualify RHEL 7.5.
I am now going down the rabbit hole of debugging this kernel panic, I wonder if anyone has encountered such kernel panics?

[714117.730194] BUG: unable to handle kernel NULL pointer dereference at 00000000000002b0 [714117.730556] IP: [<ffffffffc086a69d>] rdma_disconnect+0xd/0xa0 [rdma_cm] [714117.730827] PGD 0 [714117.731067] Oops: 0000 [#1] SMP [714117.731337] Modules linked in: dm_queue_length nvme_rdma nvme_fabrics nvme_core nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache ib_isert iscsi_target_m ore_mod ib_srp scsi_transport_srp scsi_tgt ib_ucm rpcrdma rdma_ucm ib_uverbs ib_iser ib_umad rdma_cm iw_cm ib_ipoib libiscsi scsi_transport_iscsi ib_cm mlx5_ib ib_core vmw_vsock_vmci_transport  sb_edac coretemp vmw_balloon iosf_mbi crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd mlx5_core joydev pcspkr mlxfw devlink ptp pps_core sg nfit parpo dimm shpchp vmw_vmci i2c_piix4 dm_multipath dm_mod ip_tables ext4 mbcache jbd2 sr_mod cdrom ata_generic pata_acpi vmwgfx drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops [714117.733246]  ttm ahci drm sd_mod crc_t10dif ata_piix libahci crct10dif_generic libata crct10dif_pclmul crct10dif_common serio_raw crc32c_intel vmw_pvscsi vmxnet3 i2c_core floppy [714117.734185] CPU: 1 PID: 14001 Comm: kworker/u16:3 Kdump: loaded Not tainted 3.10.0-862.el7.x86_64 #1 [714117.734661] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/05/2016 [714117.735165] Workqueue: nvme-wq nvme_rdma_reconnect_ctrl_work [nvme_rdma] [714117.735692] task: ffff9d383b1d4f10 ti: ffff9d39b5d30000 task.ti: ffff9d39b5d30000 [714117.736273] RIP: 0010:[<ffffffffc086a69d>]  [<ffffffffc086a69d>] rdma_disconnect+0xd/0xa0 [rdma_cm] [714117.736872] RSP: 0018:ffff9d39b5d33dd0  EFLAGS: 00010246 [714117.737450] RAX: 0000000000000000 RBX: 0000000000000000 RCX: dead000000000200 [714117.738071] RDX: 0000000000000010 RSI: 86c0000000000000 RDI: 0000000000000000 [714117.738677] RBP: ffff9d39b5d33dd8 R08: ffff9d39a924b0e0 R09: 8b18f8a6e5c4b0d8 [714117.739287] R10: 8b18f8a6e5c4b0d8 R11: 000289c6869521c0 R12: ffff9d39a924b0d8 [714117.739911] R13: ffff9d39a924b000 R14: ffff9d37b694df00 R15: 0000000000000200 [714117.740548] FS:  0000000000000000(0000) GS:ffff9d39bfc40000(0000) knlGS:0000000000000000 [714117.741196] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b [714117.741850] CR2: 00000000000002b0 CR3: 00000001f7026000 CR4: 00000000001607e0 [714117.742576] Call Trace:
[714117.743253]  [<ffffffffc087bbe2>] nvme_rdma_stop_and_free_queue+0x22/0x40 [nvme_rdma] [714117.743956]  [<ffffffffc087be84>] nvme_rdma_reconnect_ctrl_work+0x54/0x1b0 [nvme_rdma] [714117.744676]  [<ffffffff900b2dff>] process_one_work+0x17f/0x440 [714117.745395]  [<ffffffff900b3ac6>] worker_thread+0x126/0x3c0 [714117.746120]  [<ffffffff900b39a0>] ? manage_workers.isra.24+0x2a0/0x2a0
[714117.746856]  [<ffffffff900bae31>] kthread+0xd1/0xe0 [714117.747600]  [<ffffffff900bad60>] ? insert_kthread_work+0x40/0x40 [714117.748352]  [<ffffffff9071f637>] ret_from_fork_nospec_begin+0x21/0x21
[714117.749112]  [<ffffffff900bad60>] ? insert_kthread_work+0x40/0x40

Shahar
_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org
https://lists.01.org/mailman/listinfo/spdk
_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org
https://lists.01.org/mailman/listinfo/spdk

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [SPDK] Kernel panic on redhat 7.5 host
@ 2019-01-15  8:52 Shahar Salzman
  0 siblings, 0 replies; 7+ messages in thread
From: Shahar Salzman @ 2019-01-15  8:52 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 4956 bytes --]

Hi,

System is NVMeF over ROCE, drivers are OFED.
This has been tested on both bare metal and virtual server (with SR-IOV).

In order to recreate:

  1.  Target - Create subsystem, do use the --allow-any-host flag
  2.  Target - Create access control list with nvmf_subsystem_add_host
  3.  Initiator - run "nvme discover" on the target IP (all the subsystems should be visible)
  4.  Initiator - run "nvme connect" to one of the subsystems (to which this initiator does not have access)

I created a github issue: https://github.com/spdk/spdk/issues/575

Shahar

________________________________
From: SPDK <spdk-bounces(a)lists.01.org> on behalf of Sasha Kotchubievsky <sashakot(a)dev.mellanox.co.il>
Sent: Thursday, January 10, 2019 4:36 PM
To: spdk(a)lists.01.org
Subject: Re: [SPDK] Kernel panic on redhat 7.5 host

Hi Shahar,

Can you share details about your environment?

- Network: IB or ROCE

- Software stack: in-box/upstream/MOFED

Is there a simple scenario for reproduction?


Best regards

Sasha

On 1/2/2019 9:40 PM, Shahar Salzman wrote:
> Hi,
>
> We have been playing around with SPDK access control, and it seems that when a RHEL 7.5 host attempts to connect to a subsystem and is denied access, it panics with the BT bellow.
>
> This doesn't happen in my newer kernel system, but we where hoping to qualify RHEL 7.5.
> I am now going down the rabbit hole of debugging this kernel panic, I wonder if anyone has encountered such kernel panics?
>
> [714117.730194] BUG: unable to handle kernel NULL pointer dereference at 00000000000002b0
> [714117.730556] IP: [<ffffffffc086a69d>] rdma_disconnect+0xd/0xa0 [rdma_cm]
> [714117.730827] PGD 0
> [714117.731067] Oops: 0000 [#1] SMP
> [714117.731337] Modules linked in: dm_queue_length nvme_rdma nvme_fabrics nvme_core nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache ib_isert iscsi_target_m
> ore_mod ib_srp scsi_transport_srp scsi_tgt ib_ucm rpcrdma rdma_ucm ib_uverbs ib_iser ib_umad rdma_cm iw_cm ib_ipoib libiscsi scsi_transport_iscsi ib_cm mlx5_ib ib_core vmw_vsock_vmci_transport
>   sb_edac coretemp vmw_balloon iosf_mbi crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd mlx5_core joydev pcspkr mlxfw devlink ptp pps_core sg nfit parpo
> dimm shpchp vmw_vmci i2c_piix4 dm_multipath dm_mod ip_tables ext4 mbcache jbd2 sr_mod cdrom ata_generic pata_acpi vmwgfx drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops
> [714117.733246]  ttm ahci drm sd_mod crc_t10dif ata_piix libahci crct10dif_generic libata crct10dif_pclmul crct10dif_common serio_raw crc32c_intel vmw_pvscsi vmxnet3 i2c_core floppy
> [714117.734185] CPU: 1 PID: 14001 Comm: kworker/u16:3 Kdump: loaded Not tainted 3.10.0-862.el7.x86_64 #1
> [714117.734661] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/05/2016
> [714117.735165] Workqueue: nvme-wq nvme_rdma_reconnect_ctrl_work [nvme_rdma]
> [714117.735692] task: ffff9d383b1d4f10 ti: ffff9d39b5d30000 task.ti: ffff9d39b5d30000
> [714117.736273] RIP: 0010:[<ffffffffc086a69d>]  [<ffffffffc086a69d>] rdma_disconnect+0xd/0xa0 [rdma_cm]
> [714117.736872] RSP: 0018:ffff9d39b5d33dd0  EFLAGS: 00010246
> [714117.737450] RAX: 0000000000000000 RBX: 0000000000000000 RCX: dead000000000200
> [714117.738071] RDX: 0000000000000010 RSI: 86c0000000000000 RDI: 0000000000000000
> [714117.738677] RBP: ffff9d39b5d33dd8 R08: ffff9d39a924b0e0 R09: 8b18f8a6e5c4b0d8
> [714117.739287] R10: 8b18f8a6e5c4b0d8 R11: 000289c6869521c0 R12: ffff9d39a924b0d8
> [714117.739911] R13: ffff9d39a924b000 R14: ffff9d37b694df00 R15: 0000000000000200
> [714117.740548] FS:  0000000000000000(0000) GS:ffff9d39bfc40000(0000) knlGS:0000000000000000
> [714117.741196] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [714117.741850] CR2: 00000000000002b0 CR3: 00000001f7026000 CR4: 00000000001607e0
> [714117.742576] Call Trace:
> [714117.743253]  [<ffffffffc087bbe2>] nvme_rdma_stop_and_free_queue+0x22/0x40 [nvme_rdma]
> [714117.743956]  [<ffffffffc087be84>] nvme_rdma_reconnect_ctrl_work+0x54/0x1b0 [nvme_rdma]
> [714117.744676]  [<ffffffff900b2dff>] process_one_work+0x17f/0x440
> [714117.745395]  [<ffffffff900b3ac6>] worker_thread+0x126/0x3c0
> [714117.746120]  [<ffffffff900b39a0>] ? manage_workers.isra.24+0x2a0/0x2a0
> [714117.746856]  [<ffffffff900bae31>] kthread+0xd1/0xe0
> [714117.747600]  [<ffffffff900bad60>] ? insert_kthread_work+0x40/0x40
> [714117.748352]  [<ffffffff9071f637>] ret_from_fork_nospec_begin+0x21/0x21
> [714117.749112]  [<ffffffff900bad60>] ? insert_kthread_work+0x40/0x40
>
> Shahar
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org
https://lists.01.org/mailman/listinfo/spdk

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [SPDK] Kernel panic on redhat 7.5 host
@ 2019-01-10 14:36 Sasha Kotchubievsky
  0 siblings, 0 replies; 7+ messages in thread
From: Sasha Kotchubievsky @ 2019-01-10 14:36 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 3980 bytes --]

Hi Shahar,

Can you share details about your environment?

- Network: IB or ROCE

- Software stack: in-box/upstream/MOFED

Is there a simple scenario for reproduction?


Best regards

Sasha

On 1/2/2019 9:40 PM, Shahar Salzman wrote:
> Hi,
>
> We have been playing around with SPDK access control, and it seems that when a RHEL 7.5 host attempts to connect to a subsystem and is denied access, it panics with the BT bellow.
>
> This doesn't happen in my newer kernel system, but we where hoping to qualify RHEL 7.5.
> I am now going down the rabbit hole of debugging this kernel panic, I wonder if anyone has encountered such kernel panics?
>
> [714117.730194] BUG: unable to handle kernel NULL pointer dereference at 00000000000002b0
> [714117.730556] IP: [<ffffffffc086a69d>] rdma_disconnect+0xd/0xa0 [rdma_cm]
> [714117.730827] PGD 0
> [714117.731067] Oops: 0000 [#1] SMP
> [714117.731337] Modules linked in: dm_queue_length nvme_rdma nvme_fabrics nvme_core nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache ib_isert iscsi_target_m
> ore_mod ib_srp scsi_transport_srp scsi_tgt ib_ucm rpcrdma rdma_ucm ib_uverbs ib_iser ib_umad rdma_cm iw_cm ib_ipoib libiscsi scsi_transport_iscsi ib_cm mlx5_ib ib_core vmw_vsock_vmci_transport
>   sb_edac coretemp vmw_balloon iosf_mbi crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd mlx5_core joydev pcspkr mlxfw devlink ptp pps_core sg nfit parpo
> dimm shpchp vmw_vmci i2c_piix4 dm_multipath dm_mod ip_tables ext4 mbcache jbd2 sr_mod cdrom ata_generic pata_acpi vmwgfx drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops
> [714117.733246]  ttm ahci drm sd_mod crc_t10dif ata_piix libahci crct10dif_generic libata crct10dif_pclmul crct10dif_common serio_raw crc32c_intel vmw_pvscsi vmxnet3 i2c_core floppy
> [714117.734185] CPU: 1 PID: 14001 Comm: kworker/u16:3 Kdump: loaded Not tainted 3.10.0-862.el7.x86_64 #1
> [714117.734661] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/05/2016
> [714117.735165] Workqueue: nvme-wq nvme_rdma_reconnect_ctrl_work [nvme_rdma]
> [714117.735692] task: ffff9d383b1d4f10 ti: ffff9d39b5d30000 task.ti: ffff9d39b5d30000
> [714117.736273] RIP: 0010:[<ffffffffc086a69d>]  [<ffffffffc086a69d>] rdma_disconnect+0xd/0xa0 [rdma_cm]
> [714117.736872] RSP: 0018:ffff9d39b5d33dd0  EFLAGS: 00010246
> [714117.737450] RAX: 0000000000000000 RBX: 0000000000000000 RCX: dead000000000200
> [714117.738071] RDX: 0000000000000010 RSI: 86c0000000000000 RDI: 0000000000000000
> [714117.738677] RBP: ffff9d39b5d33dd8 R08: ffff9d39a924b0e0 R09: 8b18f8a6e5c4b0d8
> [714117.739287] R10: 8b18f8a6e5c4b0d8 R11: 000289c6869521c0 R12: ffff9d39a924b0d8
> [714117.739911] R13: ffff9d39a924b000 R14: ffff9d37b694df00 R15: 0000000000000200
> [714117.740548] FS:  0000000000000000(0000) GS:ffff9d39bfc40000(0000) knlGS:0000000000000000
> [714117.741196] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [714117.741850] CR2: 00000000000002b0 CR3: 00000001f7026000 CR4: 00000000001607e0
> [714117.742576] Call Trace:
> [714117.743253]  [<ffffffffc087bbe2>] nvme_rdma_stop_and_free_queue+0x22/0x40 [nvme_rdma]
> [714117.743956]  [<ffffffffc087be84>] nvme_rdma_reconnect_ctrl_work+0x54/0x1b0 [nvme_rdma]
> [714117.744676]  [<ffffffff900b2dff>] process_one_work+0x17f/0x440
> [714117.745395]  [<ffffffff900b3ac6>] worker_thread+0x126/0x3c0
> [714117.746120]  [<ffffffff900b39a0>] ? manage_workers.isra.24+0x2a0/0x2a0
> [714117.746856]  [<ffffffff900bae31>] kthread+0xd1/0xe0
> [714117.747600]  [<ffffffff900bad60>] ? insert_kthread_work+0x40/0x40
> [714117.748352]  [<ffffffff9071f637>] ret_from_fork_nospec_begin+0x21/0x21
> [714117.749112]  [<ffffffff900bad60>] ? insert_kthread_work+0x40/0x40
>
> Shahar
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [SPDK] Kernel panic on redhat 7.5 host
@ 2019-01-09 16:24 Harris, James R
  0 siblings, 0 replies; 7+ messages in thread
From: Harris, James R @ 2019-01-09 16:24 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 6509 bytes --]

I'm assuming that if you ran this same test, but using the kernel NVMe-oF target, that you also get a kernel panic?

Filing a GitHub issue is an interesting idea.  This isn't something SPDK can fix, so we would probably immediately close the issue, but then it would be saved for posterity.  Maybe we could create a new label type to indicate these types of issues.

Alternatively, we create some kind of other document in the SPDK repository to track and record these types of issues.

I'd be curious to hear others' opinions on this.

-Jim


On 1/9/19, 3:51 PM, "SPDK on behalf of Shahar Salzman" <spdk-bounces(a)lists.01.org on behalf of shahar.salzman(a)kaminario.com> wrote:

    Should I open a github issue so that people are aware of this?
    
    As it succeeds on CentOS 7.6 the issue itself is solved, so no real need to open a bug for RedHat/linux kernel.
    ________________________________
    From: SPDK <spdk-bounces(a)lists.01.org> on behalf of Howell, Seth <seth.howell(a)intel.com>
    Sent: Monday, January 7, 2019 5:11 PM
    To: Storage Performance Development Kit
    Subject: Re: [SPDK] Kernel panic on redhat 7.5 host
    
    Hi Shahar,
    
    Thank you for bringing this up. We currently don't have any CentOS 7.5 machines in our build pool, We have CentOS7.4 machines, and haven't observed this behavior. Although we admittedly aren't doing a lot of per-patch NVMe-oF testing on those machines. I will make time this week to reproduce this issue.
    
    I will spin up CentOS 7.4 and 7.5 machines to repro this week.
    
    Have you created a GitHub issue yet?
    
    Thanks,
    
    Seth
    
    -----Original Message-----
    From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Shahar Salzman
    Sent: Thursday, January 3, 2019 6:26 AM
    To: Shahar Salzman <shahar.salzman(a)kaminario.com>; Storage Performance Development Kit <spdk(a)lists.01.org>
    Subject: Re: [SPDK] Kernel panic on redhat 7.5 host
    
    BTW, this does not happen on CentOS7.6
    ________________________________
    From: SPDK <spdk-bounces(a)lists.01.org> on behalf of Shahar Salzman <shahar.salzman(a)kaminario.com>
    Sent: Wednesday, January 2, 2019 9:40 PM
    To: Storage Performance Development Kit
    Subject: [SPDK] Kernel panic on redhat 7.5 host
    
    Hi,
    
    We have been playing around with SPDK access control, and it seems that when a RHEL 7.5 host attempts to connect to a subsystem and is denied access, it panics with the BT bellow.
    
    This doesn't happen in my newer kernel system, but we where hoping to qualify RHEL 7.5.
    I am now going down the rabbit hole of debugging this kernel panic, I wonder if anyone has encountered such kernel panics?
    
    [714117.730194] BUG: unable to handle kernel NULL pointer dereference at 00000000000002b0 [714117.730556] IP: [<ffffffffc086a69d>] rdma_disconnect+0xd/0xa0 [rdma_cm] [714117.730827] PGD 0 [714117.731067] Oops: 0000 [#1] SMP [714117.731337] Modules linked in: dm_queue_length nvme_rdma nvme_fabrics nvme_core nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache ib_isert iscsi_target_m ore_mod ib_srp scsi_transport_srp scsi_tgt ib_ucm rpcrdma rdma_ucm ib_uverbs ib_iser ib_umad rdma_cm iw_cm ib_ipoib libiscsi scsi_transport_iscsi ib_cm mlx5_ib ib_core vmw_vsock_vmci_transport  sb_edac coretemp vmw_balloon iosf_mbi crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd mlx5_core joydev pcspkr mlxfw devlink ptp pps_core sg nfit parpo dimm shpchp vmw_vmci i2c_piix4 dm_multipath dm_mod ip_tables ext4 mbcache jbd2 sr_mod cdrom ata_generic pata_acpi vmwgfx drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops [7141
     17.73324
     6]  ttm ahci drm sd_mod crc_t10dif ata_piix libahci crct10dif_generic libata crct10dif_pclmul crct10dif_common serio_raw crc32c_intel vmw_pvscsi vmxnet3 i2c_core floppy [714117.734185] CPU: 1 PID: 14001 Comm: kworker/u16:3 Kdump: loaded Not tainted 3.10.0-862.el7.x86_64 #1 [714117.734661] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/05/2016 [714117.735165] Workqueue: nvme-wq nvme_rdma_reconnect_ctrl_work [nvme_rdma] [714117.735692] task: ffff9d383b1d4f10 ti: ffff9d39b5d30000 task.ti: ffff9d39b5d30000 [714117.736273] RIP: 0010:[<ffffffffc086a69d>]  [<ffffffffc086a69d>] rdma_disconnect+0xd/0xa0 [rdma_cm] [714117.736872] RSP: 0018:ffff9d39b5d33dd0  EFLAGS: 00010246 [714117.737450] RAX: 0000000000000000 RBX: 0000000000000000 RCX: dead000000000200 [714117.738071] RDX: 0000000000000010 RSI: 86c0000000000000 RDI: 0000000000000000 [714117.738677] RBP: ffff9d39b5d33dd8 R08: ffff9d39a924b0e0 R09: 8b18f8a6e5c4b0d8 [714117.739287] R1
     0: 8b18f
     8a6e5c4b0d8 R11: 000289c6869521c0 R12: ffff9d39a924b0d8 [714117.739911] R13: ffff9d39a924b000 R14: ffff9d37b694df00 R15: 0000000000000200 [714117.740548] FS:  0000000000000000(0000) GS:ffff9d39bfc40000(0000) knlGS:0000000000000000 [714117.741196] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b [714117.741850] CR2: 00000000000002b0 CR3: 00000001f7026000 CR4: 00000000001607e0 [714117.742576] Call Trace:
    [714117.743253]  [<ffffffffc087bbe2>] nvme_rdma_stop_and_free_queue+0x22/0x40 [nvme_rdma] [714117.743956]  [<ffffffffc087be84>] nvme_rdma_reconnect_ctrl_work+0x54/0x1b0 [nvme_rdma] [714117.744676]  [<ffffffff900b2dff>] process_one_work+0x17f/0x440 [714117.745395]  [<ffffffff900b3ac6>] worker_thread+0x126/0x3c0 [714117.746120]  [<ffffffff900b39a0>] ? manage_workers.isra.24+0x2a0/0x2a0
    [714117.746856]  [<ffffffff900bae31>] kthread+0xd1/0xe0 [714117.747600]  [<ffffffff900bad60>] ? insert_kthread_work+0x40/0x40 [714117.748352]  [<ffffffff9071f637>] ret_from_fork_nospec_begin+0x21/0x21
    [714117.749112]  [<ffffffff900bad60>] ? insert_kthread_work+0x40/0x40
    
    Shahar
    _______________________________________________
    SPDK mailing list
    SPDK(a)lists.01.org
    https://lists.01.org/mailman/listinfo/spdk
    _______________________________________________
    SPDK mailing list
    SPDK(a)lists.01.org
    https://lists.01.org/mailman/listinfo/spdk
    _______________________________________________
    SPDK mailing list
    SPDK(a)lists.01.org
    https://lists.01.org/mailman/listinfo/spdk
    _______________________________________________
    SPDK mailing list
    SPDK(a)lists.01.org
    https://lists.01.org/mailman/listinfo/spdk
    


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [SPDK] Kernel panic on redhat 7.5 host
@ 2019-01-09 14:48 Shahar Salzman
  0 siblings, 0 replies; 7+ messages in thread
From: Shahar Salzman @ 2019-01-09 14:48 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 5403 bytes --]

Should I open a github issue so that people are aware of this?

As it succeeds on CentOS 7.6 the issue itself is solved, so no real need to open a bug for RedHat/linux kernel.
________________________________
From: SPDK <spdk-bounces(a)lists.01.org> on behalf of Howell, Seth <seth.howell(a)intel.com>
Sent: Monday, January 7, 2019 5:11 PM
To: Storage Performance Development Kit
Subject: Re: [SPDK] Kernel panic on redhat 7.5 host

Hi Shahar,

Thank you for bringing this up. We currently don't have any CentOS 7.5 machines in our build pool, We have CentOS7.4 machines, and haven't observed this behavior. Although we admittedly aren't doing a lot of per-patch NVMe-oF testing on those machines. I will make time this week to reproduce this issue.

I will spin up CentOS 7.4 and 7.5 machines to repro this week.

Have you created a GitHub issue yet?

Thanks,

Seth

-----Original Message-----
From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Shahar Salzman
Sent: Thursday, January 3, 2019 6:26 AM
To: Shahar Salzman <shahar.salzman(a)kaminario.com>; Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: Re: [SPDK] Kernel panic on redhat 7.5 host

BTW, this does not happen on CentOS7.6
________________________________
From: SPDK <spdk-bounces(a)lists.01.org> on behalf of Shahar Salzman <shahar.salzman(a)kaminario.com>
Sent: Wednesday, January 2, 2019 9:40 PM
To: Storage Performance Development Kit
Subject: [SPDK] Kernel panic on redhat 7.5 host

Hi,

We have been playing around with SPDK access control, and it seems that when a RHEL 7.5 host attempts to connect to a subsystem and is denied access, it panics with the BT bellow.

This doesn't happen in my newer kernel system, but we where hoping to qualify RHEL 7.5.
I am now going down the rabbit hole of debugging this kernel panic, I wonder if anyone has encountered such kernel panics?

[714117.730194] BUG: unable to handle kernel NULL pointer dereference at 00000000000002b0 [714117.730556] IP: [<ffffffffc086a69d>] rdma_disconnect+0xd/0xa0 [rdma_cm] [714117.730827] PGD 0 [714117.731067] Oops: 0000 [#1] SMP [714117.731337] Modules linked in: dm_queue_length nvme_rdma nvme_fabrics nvme_core nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache ib_isert iscsi_target_m ore_mod ib_srp scsi_transport_srp scsi_tgt ib_ucm rpcrdma rdma_ucm ib_uverbs ib_iser ib_umad rdma_cm iw_cm ib_ipoib libiscsi scsi_transport_iscsi ib_cm mlx5_ib ib_core vmw_vsock_vmci_transport  sb_edac coretemp vmw_balloon iosf_mbi crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd mlx5_core joydev pcspkr mlxfw devlink ptp pps_core sg nfit parpo dimm shpchp vmw_vmci i2c_piix4 dm_multipath dm_mod ip_tables ext4 mbcache jbd2 sr_mod cdrom ata_generic pata_acpi vmwgfx drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops [7141
 17.73324
 6]  ttm ahci drm sd_mod crc_t10dif ata_piix libahci crct10dif_generic libata crct10dif_pclmul crct10dif_common serio_raw crc32c_intel vmw_pvscsi vmxnet3 i2c_core floppy [714117.734185] CPU: 1 PID: 14001 Comm: kworker/u16:3 Kdump: loaded Not tainted 3.10.0-862.el7.x86_64 #1 [714117.734661] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/05/2016 [714117.735165] Workqueue: nvme-wq nvme_rdma_reconnect_ctrl_work [nvme_rdma] [714117.735692] task: ffff9d383b1d4f10 ti: ffff9d39b5d30000 task.ti: ffff9d39b5d30000 [714117.736273] RIP: 0010:[<ffffffffc086a69d>]  [<ffffffffc086a69d>] rdma_disconnect+0xd/0xa0 [rdma_cm] [714117.736872] RSP: 0018:ffff9d39b5d33dd0  EFLAGS: 00010246 [714117.737450] RAX: 0000000000000000 RBX: 0000000000000000 RCX: dead000000000200 [714117.738071] RDX: 0000000000000010 RSI: 86c0000000000000 RDI: 0000000000000000 [714117.738677] RBP: ffff9d39b5d33dd8 R08: ffff9d39a924b0e0 R09: 8b18f8a6e5c4b0d8 [714117.739287] R1
 0: 8b18f
 8a6e5c4b0d8 R11: 000289c6869521c0 R12: ffff9d39a924b0d8 [714117.739911] R13: ffff9d39a924b000 R14: ffff9d37b694df00 R15: 0000000000000200 [714117.740548] FS:  0000000000000000(0000) GS:ffff9d39bfc40000(0000) knlGS:0000000000000000 [714117.741196] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b [714117.741850] CR2: 00000000000002b0 CR3: 00000001f7026000 CR4: 00000000001607e0 [714117.742576] Call Trace:
[714117.743253]  [<ffffffffc087bbe2>] nvme_rdma_stop_and_free_queue+0x22/0x40 [nvme_rdma] [714117.743956]  [<ffffffffc087be84>] nvme_rdma_reconnect_ctrl_work+0x54/0x1b0 [nvme_rdma] [714117.744676]  [<ffffffff900b2dff>] process_one_work+0x17f/0x440 [714117.745395]  [<ffffffff900b3ac6>] worker_thread+0x126/0x3c0 [714117.746120]  [<ffffffff900b39a0>] ? manage_workers.isra.24+0x2a0/0x2a0
[714117.746856]  [<ffffffff900bae31>] kthread+0xd1/0xe0 [714117.747600]  [<ffffffff900bad60>] ? insert_kthread_work+0x40/0x40 [714117.748352]  [<ffffffff9071f637>] ret_from_fork_nospec_begin+0x21/0x21
[714117.749112]  [<ffffffff900bad60>] ? insert_kthread_work+0x40/0x40

Shahar
_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org
https://lists.01.org/mailman/listinfo/spdk
_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org
https://lists.01.org/mailman/listinfo/spdk
_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org
https://lists.01.org/mailman/listinfo/spdk

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [SPDK] Kernel panic on redhat 7.5 host
@ 2019-01-03 13:26 Shahar Salzman
  0 siblings, 0 replies; 7+ messages in thread
From: Shahar Salzman @ 2019-01-03 13:26 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 3952 bytes --]

BTW, this does not happen on CentOS7.6
________________________________
From: SPDK <spdk-bounces(a)lists.01.org> on behalf of Shahar Salzman <shahar.salzman(a)kaminario.com>
Sent: Wednesday, January 2, 2019 9:40 PM
To: Storage Performance Development Kit
Subject: [SPDK] Kernel panic on redhat 7.5 host

Hi,

We have been playing around with SPDK access control, and it seems that when a RHEL 7.5 host attempts to connect to a subsystem and is denied access, it panics with the BT bellow.

This doesn't happen in my newer kernel system, but we where hoping to qualify RHEL 7.5.
I am now going down the rabbit hole of debugging this kernel panic, I wonder if anyone has encountered such kernel panics?

[714117.730194] BUG: unable to handle kernel NULL pointer dereference at 00000000000002b0
[714117.730556] IP: [<ffffffffc086a69d>] rdma_disconnect+0xd/0xa0 [rdma_cm]
[714117.730827] PGD 0
[714117.731067] Oops: 0000 [#1] SMP
[714117.731337] Modules linked in: dm_queue_length nvme_rdma nvme_fabrics nvme_core nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache ib_isert iscsi_target_m
ore_mod ib_srp scsi_transport_srp scsi_tgt ib_ucm rpcrdma rdma_ucm ib_uverbs ib_iser ib_umad rdma_cm iw_cm ib_ipoib libiscsi scsi_transport_iscsi ib_cm mlx5_ib ib_core vmw_vsock_vmci_transport
 sb_edac coretemp vmw_balloon iosf_mbi crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd mlx5_core joydev pcspkr mlxfw devlink ptp pps_core sg nfit parpo
dimm shpchp vmw_vmci i2c_piix4 dm_multipath dm_mod ip_tables ext4 mbcache jbd2 sr_mod cdrom ata_generic pata_acpi vmwgfx drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops
[714117.733246]  ttm ahci drm sd_mod crc_t10dif ata_piix libahci crct10dif_generic libata crct10dif_pclmul crct10dif_common serio_raw crc32c_intel vmw_pvscsi vmxnet3 i2c_core floppy
[714117.734185] CPU: 1 PID: 14001 Comm: kworker/u16:3 Kdump: loaded Not tainted 3.10.0-862.el7.x86_64 #1
[714117.734661] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/05/2016
[714117.735165] Workqueue: nvme-wq nvme_rdma_reconnect_ctrl_work [nvme_rdma]
[714117.735692] task: ffff9d383b1d4f10 ti: ffff9d39b5d30000 task.ti: ffff9d39b5d30000
[714117.736273] RIP: 0010:[<ffffffffc086a69d>]  [<ffffffffc086a69d>] rdma_disconnect+0xd/0xa0 [rdma_cm]
[714117.736872] RSP: 0018:ffff9d39b5d33dd0  EFLAGS: 00010246
[714117.737450] RAX: 0000000000000000 RBX: 0000000000000000 RCX: dead000000000200
[714117.738071] RDX: 0000000000000010 RSI: 86c0000000000000 RDI: 0000000000000000
[714117.738677] RBP: ffff9d39b5d33dd8 R08: ffff9d39a924b0e0 R09: 8b18f8a6e5c4b0d8
[714117.739287] R10: 8b18f8a6e5c4b0d8 R11: 000289c6869521c0 R12: ffff9d39a924b0d8
[714117.739911] R13: ffff9d39a924b000 R14: ffff9d37b694df00 R15: 0000000000000200
[714117.740548] FS:  0000000000000000(0000) GS:ffff9d39bfc40000(0000) knlGS:0000000000000000
[714117.741196] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[714117.741850] CR2: 00000000000002b0 CR3: 00000001f7026000 CR4: 00000000001607e0
[714117.742576] Call Trace:
[714117.743253]  [<ffffffffc087bbe2>] nvme_rdma_stop_and_free_queue+0x22/0x40 [nvme_rdma]
[714117.743956]  [<ffffffffc087be84>] nvme_rdma_reconnect_ctrl_work+0x54/0x1b0 [nvme_rdma]
[714117.744676]  [<ffffffff900b2dff>] process_one_work+0x17f/0x440
[714117.745395]  [<ffffffff900b3ac6>] worker_thread+0x126/0x3c0
[714117.746120]  [<ffffffff900b39a0>] ? manage_workers.isra.24+0x2a0/0x2a0
[714117.746856]  [<ffffffff900bae31>] kthread+0xd1/0xe0
[714117.747600]  [<ffffffff900bad60>] ? insert_kthread_work+0x40/0x40
[714117.748352]  [<ffffffff9071f637>] ret_from_fork_nospec_begin+0x21/0x21
[714117.749112]  [<ffffffff900bad60>] ? insert_kthread_work+0x40/0x40

Shahar
_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org
https://lists.01.org/mailman/listinfo/spdk

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [SPDK] Kernel panic on redhat 7.5 host
@ 2019-01-02 19:40 Shahar Salzman
  0 siblings, 0 replies; 7+ messages in thread
From: Shahar Salzman @ 2019-01-02 19:40 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 3508 bytes --]

Hi,

We have been playing around with SPDK access control, and it seems that when a RHEL 7.5 host attempts to connect to a subsystem and is denied access, it panics with the BT bellow.

This doesn't happen in my newer kernel system, but we where hoping to qualify RHEL 7.5.
I am now going down the rabbit hole of debugging this kernel panic, I wonder if anyone has encountered such kernel panics?

[714117.730194] BUG: unable to handle kernel NULL pointer dereference at 00000000000002b0
[714117.730556] IP: [<ffffffffc086a69d>] rdma_disconnect+0xd/0xa0 [rdma_cm]
[714117.730827] PGD 0
[714117.731067] Oops: 0000 [#1] SMP
[714117.731337] Modules linked in: dm_queue_length nvme_rdma nvme_fabrics nvme_core nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache ib_isert iscsi_target_m
ore_mod ib_srp scsi_transport_srp scsi_tgt ib_ucm rpcrdma rdma_ucm ib_uverbs ib_iser ib_umad rdma_cm iw_cm ib_ipoib libiscsi scsi_transport_iscsi ib_cm mlx5_ib ib_core vmw_vsock_vmci_transport
 sb_edac coretemp vmw_balloon iosf_mbi crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd mlx5_core joydev pcspkr mlxfw devlink ptp pps_core sg nfit parpo
dimm shpchp vmw_vmci i2c_piix4 dm_multipath dm_mod ip_tables ext4 mbcache jbd2 sr_mod cdrom ata_generic pata_acpi vmwgfx drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops
[714117.733246]  ttm ahci drm sd_mod crc_t10dif ata_piix libahci crct10dif_generic libata crct10dif_pclmul crct10dif_common serio_raw crc32c_intel vmw_pvscsi vmxnet3 i2c_core floppy
[714117.734185] CPU: 1 PID: 14001 Comm: kworker/u16:3 Kdump: loaded Not tainted 3.10.0-862.el7.x86_64 #1
[714117.734661] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/05/2016
[714117.735165] Workqueue: nvme-wq nvme_rdma_reconnect_ctrl_work [nvme_rdma]
[714117.735692] task: ffff9d383b1d4f10 ti: ffff9d39b5d30000 task.ti: ffff9d39b5d30000
[714117.736273] RIP: 0010:[<ffffffffc086a69d>]  [<ffffffffc086a69d>] rdma_disconnect+0xd/0xa0 [rdma_cm]
[714117.736872] RSP: 0018:ffff9d39b5d33dd0  EFLAGS: 00010246
[714117.737450] RAX: 0000000000000000 RBX: 0000000000000000 RCX: dead000000000200
[714117.738071] RDX: 0000000000000010 RSI: 86c0000000000000 RDI: 0000000000000000
[714117.738677] RBP: ffff9d39b5d33dd8 R08: ffff9d39a924b0e0 R09: 8b18f8a6e5c4b0d8
[714117.739287] R10: 8b18f8a6e5c4b0d8 R11: 000289c6869521c0 R12: ffff9d39a924b0d8
[714117.739911] R13: ffff9d39a924b000 R14: ffff9d37b694df00 R15: 0000000000000200
[714117.740548] FS:  0000000000000000(0000) GS:ffff9d39bfc40000(0000) knlGS:0000000000000000
[714117.741196] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[714117.741850] CR2: 00000000000002b0 CR3: 00000001f7026000 CR4: 00000000001607e0
[714117.742576] Call Trace:
[714117.743253]  [<ffffffffc087bbe2>] nvme_rdma_stop_and_free_queue+0x22/0x40 [nvme_rdma]
[714117.743956]  [<ffffffffc087be84>] nvme_rdma_reconnect_ctrl_work+0x54/0x1b0 [nvme_rdma]
[714117.744676]  [<ffffffff900b2dff>] process_one_work+0x17f/0x440
[714117.745395]  [<ffffffff900b3ac6>] worker_thread+0x126/0x3c0
[714117.746120]  [<ffffffff900b39a0>] ? manage_workers.isra.24+0x2a0/0x2a0
[714117.746856]  [<ffffffff900bae31>] kthread+0xd1/0xe0
[714117.747600]  [<ffffffff900bad60>] ? insert_kthread_work+0x40/0x40
[714117.748352]  [<ffffffff9071f637>] ret_from_fork_nospec_begin+0x21/0x21
[714117.749112]  [<ffffffff900bad60>] ? insert_kthread_work+0x40/0x40

Shahar

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-01-15  8:52 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-07 15:11 [SPDK] Kernel panic on redhat 7.5 host Howell, Seth
  -- strict thread matches above, loose matches on Subject: below --
2019-01-15  8:52 Shahar Salzman
2019-01-10 14:36 Sasha Kotchubievsky
2019-01-09 16:24 Harris, James R
2019-01-09 14:48 Shahar Salzman
2019-01-03 13:26 Shahar Salzman
2019-01-02 19:40 Shahar Salzman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.