All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [SPDK] uio crash when hot-remove NVME SSD disk
@ 2018-08-06  9:48 Wodkowski, PawelX
  0 siblings, 0 replies; 8+ messages in thread
From: Wodkowski, PawelX @ 2018-08-06  9:48 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 8474 bytes --]

There are more issues with UIO and mainline kernel. I really advice you to use VFIO.

Pawel

From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Vincent
Sent: Monday, August 6, 2018 11:13 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: Re: [SPDK] uio crash when hot-remove NVME SSD disk

Hello  DariuszX,

   Thank you for your response.

    I had already saw the patch, but the call stack of the crash is different from mine.

This is the patch's crash call stack

[  251.173789]  [<ffffffffa0c9b2d3>] dev_attr_show+0x23/0x60

[  251.174356]  [<ffffffffa0f561b2>] ? mutex_lock+0x12/0x2f

[  251.174892]  [<ffffffffa0ac6d9f>] sysfs_kf_seq_show+0xcf/0x1f0

[  251.175433]  [<ffffffffa0ac54e6>] kernfs_seq_show+0x26/0x30

[  251.175981]  [<ffffffffa0a63be0>] seq_read+0x110/0x3f0

This is my crash call stack.

[ 3813.935843]  ? uio_release+0x37/0x60 [uio]
[ 3813.935859]  __fput+0xea/0x220
[ 3813.935871]  ____fput+0xe/0x10
[ 3813.935881]  task_work_run+0x8c/0xb0
[ 3813.935895]  exit_to_usermode_loop+0x6b/0x95


So I think maybe these are 2 different , unrelated issues.

Any way, I will try it.

Thank you so much for your response.

--
Vincent


2018-08-06 16:52 GMT+08:00 Stojaczyk, DariuszX <dariuszx.stojaczyk(a)intel.com<mailto:dariuszx.stojaczyk(a)intel.com>>:
Hi,

@baruch on github recently gave us an update that there's already a fix in the mainline kernel.
https://github.com/spdk/spdk/issues/231#issuecomment-409976409

I believe 4.16.11-1.el7.elrepo.x86_64 doesn't have that patch yet.
D.

> -----Original Message-----
> From: SPDK [mailto:spdk-bounces(a)lists.01.org<mailto:spdk-bounces(a)lists.01.org>] On Behalf Of Vincent
> Sent: Thursday, August 2, 2018 11:04 AM
> To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
> Subject: Re: [SPDK] uio crash when hot-remove NVME SSD disk
>
> Hello All,
>
>       Actually we  had found this happened in Feb.
>
> this is the mail return from   Stojaczyk, DariuszX
> ------------------------------------------------------------------------------------------------------
>
> 2018-02-07 22:52 GMT+08:00 Stojaczyk, DariuszX <dariuszx.stojaczyk(a)intel.com<mailto:dariuszx.stojaczyk(a)intel.com>
> <mailto:dariuszx.stojaczyk(a)intel.com<mailto:dariuszx.stojaczyk(a)intel.com>> >:
>
> Hi Vincent,
>
> This is known kernel bug. There is an SPDK github issue regarding the same
> problem https://github.com/spdk/spdk/issues/231.
>
> Your options are:
>
> a) update your kernel
>
> b) switch from uio to vfio-pci driver
>
>
> Regards,
>
>
> ----------------------------------------------------------------------------------------------------------
> -------------
>
> We upgrade the kernel at that time, and problem seems disappear.
>
> But it seems that the crash problem still has small probability to happen.
>
> So, if we want to try the second proposal in DariuszX's mail,  switch from uio to
> vfio-pci driver,
>
> But how to switch from uio to vfio-pci driver ??
>
> Any hint is appreciated.
>
> Thank you so much.
> --
> Vincent
>
>
>
>
>
>
> 2018-08-02 15:35 GMT+08:00 Yan, Liang Z <liang.z.yan(a)intel.com<mailto:liang.z.yan(a)intel.com>
> <mailto:liang.z.yan(a)intel.com<mailto:liang.z.yan(a)intel.com>> >:
>
>
>       Hi Vincent,
>
>
>
>       What is the CentOS version you are using? We are trying to reproduce
> this issue.
>
>
>
>       Thanks.
>
>
>
>       Liang Yan
>
>
>
>       From: SPDK [mailto:spdk-bounces(a)lists.01.org<mailto:spdk-bounces(a)lists.01.org> <mailto:spdk-<mailto:spdk->
> bounces(a)lists.01.org<mailto:bounces(a)lists.01.org>> ] On Behalf Of Vincent
>       Sent: Thursday, August 2, 2018 3:12 PM
>       To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>
> <mailto:spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>> >
>       Subject: [SPDK] uio crash when hot-remove NVME SSD disk
>
>
>
>       Hello all,
>
>
>
>                  We just test the hotplug function of SPDK,
>
>
>
>       The kernel will crash sometimes when we remove disk.
>
>
>
>       the kdump crash call stack is attached below.
>
>
>
>       The kernel version is   "4.16.11-1.el7.elrepo.x86_64"
>
>
>
>       Does anyone  can give us a hint of how to solve this problem ?
>
>
>
>       Thank you so much.
>
>
>
>
>
>
>
>
>
>       [ 3813.935443] BUG: unable to handle kernel NULL pointer dereference
> at 0000000000000173
>
>       [ 3813.935466] IP: 0x173
>
>       [ 3813.935472] PGD 8000001fe7558067 P4D 8000001fe7558067 PUD
> 1ff332d067 PMD 0
>
>       [ 3813.935490] Oops: 0010 [#1] SMP PTI
>
>       [ 3813.935496] Modules linked in: virtio_pci virtio_ring virtio
> uio_pci_generic uio ipmi_si ipmi_devintf ipmi_msghandler sr_mod cdrom joydev
> uas usb_storage fuse xt_CHECKSUM iptable_mangle ipt_MASQUERADE
> nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4
> nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 tun
> ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter iscsi_tcp
> libiscsi_tcp libiscsi scsi_transport_iscsi binfmt_misc sha512_ssse3 sha512_generic
> skx_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm
> irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel
> crypto_simd glue_helper cryptd intel_cstate qat_c62x intel_rapl_perf intel_qat
> wdat_wdt pcspkr dh_generic input_leds cdc_acm authenc sg mei_me lpc_ich mei
> ioatdma
>
>       [ 3813.935638]  shpchp i2c_i801 mfd_core acpi_power_meter acpi_pad
> tpm_crb nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c
> sd_mod crc32c_intel ast i40e drm_kms_helper syscopyarea sysfillrect sysimgblt
> fb_sys_fops igb ttm ahci ptp libahci pps_core drm dca i2c_algo_bit libata
> dm_mirror dm_region_hash dm_log dm_mod dax [last unloaded:
> ipmi_msghandler]
>
>       [ 3813.935725] CPU: 35 PID: 21251 Comm: reactor_35 Not tainted
> 4.16.11-1.el7.elrepo.x86_64 #1
>
>       [ 3813.935731] Hardware name: AIC HA202-PV/PAVO, BIOS PAVH_0.02.1
> 02/27/2018
>
>       [ 3813.935738] RIP: 0010:0x173
>
>       [ 3813.935745] RSP: 0018:ffffc900216d7e40 EFLAGS: 00010202
>
>       [ 3813.935753] RAX: ffff883fe34a7850 RBX: ffff881fd5331570 RCX:
> 0000000000000001
>
>       [ 3813.935760] RDX: 0000000000000173 RSI: ffff881ff4835de8 RDI:
> ffff883fe34a7850
>
>       [ 3813.935766] RBP: ffffc900216d7e60 R08: 0000000000000000 R09:
> 0000000000000000
>
>       [ 3813.935773] R10: ffff881ff4835de8 R11: ffff881ff1c13310 R12:
> ffff883ff462ed98
>
>       [ 3813.935779] R13: 0000000000000000 R14: ffff881ff499c320 R15:
> ffff881fb97e9200
>
>       [ 3813.935788] FS:  00007efa9d5f9700(0000) GS:ffff883fff3c0000(0000)
> knlGS:0000000000000000
>
>       [ 3813.935795] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>
>       [ 3813.935802] CR2: 0000000000000173 CR3: 0000001fc27f8003 CR4:
> 00000000007606e0
>
>       [ 3813.935809] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
>
>       [ 3813.935816] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> 0000000000000400
>
>       [ 3813.935821] PKRU: 55555554
>
>       [ 3813.935826] Call Trace:
>
>       [ 3813.935843]  ? uio_release+0x37/0x60 [uio]
>
>       [ 3813.935859]  __fput+0xea/0x220
>
>       [ 3813.935871]  ____fput+0xe/0x10
>
>       [ 3813.935881]  task_work_run+0x8c/0xb0
>
>       [ 3813.935895]  exit_to_usermode_loop+0x6b/0x95
>
>       [ 3813.935905]  do_syscall_64+0x182/0x1b0
>
>       [ 3813.935920]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
>
>       [ 3813.935927] RIP: 0033:0x7f02aa71c76d
>
>       [ 3813.935934] RSP: 002b:00007efa9d5f88b0 EFLAGS: 00000293
> ORIG_RAX: 0000000000000003
>
>       [ 3813.935943] RAX: 0000000000000000 RBX: 00007efea2c2ec80 RCX:
> 00007f02aa71c76d
>
>
>       _______________________________________________
>       SPDK mailing list
>       SPDK(a)lists.01.org<mailto:SPDK(a)lists.01.org> <mailto:SPDK(a)lists.01.org<mailto:SPDK(a)lists.01.org>>
>       https://lists.01.org/mailman/listinfo/spdk
> <https://lists.01.org/mailman/listinfo/spdk>
>
>
>

_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org<mailto:SPDK(a)lists.01.org>
https://lists.01.org/mailman/listinfo/spdk


[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 19788 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [SPDK] uio crash when hot-remove NVME SSD disk
@ 2018-08-06  9:57 Vincent
  0 siblings, 0 replies; 8+ messages in thread
From: Vincent @ 2018-08-06  9:57 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 9251 bytes --]

Hello PawelX,


 Actually I am trying the vfio-pci driver to replace uio driver.

But I found some problems because the the PCIe devices in my system

are complicated.  I will try it harder.

   Than you again for your response.

--
Vincent












2018-08-06 17:48 GMT+08:00 Wodkowski, PawelX <pawelx.wodkowski(a)intel.com>:

> There are more issues with UIO and mainline kernel. I really advice you to
> use VFIO.
>
>
>
> Pawel
>
>
>
> *From:* SPDK [mailto:spdk-bounces(a)lists.01.org] *On Behalf Of *Vincent
> *Sent:* Monday, August 6, 2018 11:13 AM
>
> *To:* Storage Performance Development Kit <spdk(a)lists.01.org>
> *Subject:* Re: [SPDK] uio crash when hot-remove NVME SSD disk
>
>
>
> Hello  DariuszX,
>
>
>
>    Thank you for your response.
>
>
>
>     I had already saw the patch, but the call stack of the crash is
> different from mine.
>
>
>
> This is the patch's crash call stack
>
> [  251.173789]  [<ffffffffa0c9b2d3>] dev_attr_show+0x23/0x60
>
> [  251.174356]  [<ffffffffa0f561b2>] ? mutex_lock+0x12/0x2f
>
> [  251.174892]  [<ffffffffa0ac6d9f>] sysfs_kf_seq_show+0xcf/0x1f0
>
> [  251.175433]  [<ffffffffa0ac54e6>] kernfs_seq_show+0x26/0x30
>
> [  251.175981]  [<ffffffffa0a63be0>] seq_read+0x110/0x3f0
>
>
>
> This is my crash call stack.
>
>
>
> [ 3813.935843]  ? uio_release+0x37/0x60 [uio]
>
> [ 3813.935859]  __fput+0xea/0x220
>
> [ 3813.935871]  ____fput+0xe/0x10
>
> [ 3813.935881]  task_work_run+0x8c/0xb0
>
> [ 3813.935895]  exit_to_usermode_loop+0x6b/0x95
>
>
>
>
>
> So I think maybe these are 2 different , unrelated issues.
>
>
>
> Any way, I will try it.
>
>
>
> Thank you so much for your response.
>
>
>
> --
>
> Vincent
>
>
>
>
>
> 2018-08-06 16:52 GMT+08:00 Stojaczyk, DariuszX <
> dariuszx.stojaczyk(a)intel.com>:
>
> Hi,
>
> @baruch on github recently gave us an update that there's already a fix in
> the mainline kernel.
> https://github.com/spdk/spdk/issues/231#issuecomment-409976409
>
> I believe 4.16.11-1.el7.elrepo.x86_64 doesn't have that patch yet.
> D.
>
> > -----Original Message-----
> > From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Vincent
> > Sent: Thursday, August 2, 2018 11:04 AM
> > To: Storage Performance Development Kit <spdk(a)lists.01.org>
> > Subject: Re: [SPDK] uio crash when hot-remove NVME SSD disk
> >
> > Hello All,
> >
> >       Actually we  had found this happened in Feb.
> >
> > this is the mail return from   Stojaczyk, DariuszX
> > ------------------------------------------------------------
> ------------------------------------------
> >
> > 2018-02-07 22:52 GMT+08:00 Stojaczyk, DariuszX <
> dariuszx.stojaczyk(a)intel.com
> > <mailto:dariuszx.stojaczyk(a)intel.com> >:
> >
> > Hi Vincent,
> >
> > This is known kernel bug. There is an SPDK github issue regarding the
> same
> > problem https://github.com/spdk/spdk/issues/231.
> >
> > Your options are:
> >
> > a) update your kernel
> >
> > b) switch from uio to vfio-pci driver
> >
> >
> > Regards,
> >
> >
> > ------------------------------------------------------------
> ----------------------------------------------
> > -------------
> >
> > We upgrade the kernel at that time, and problem seems disappear.
> >
> > But it seems that the crash problem still has small probability to
> happen.
> >
> > So, if we want to try the second proposal in DariuszX's mail,  switch
> from uio to
> > vfio-pci driver,
> >
> > But how to switch from uio to vfio-pci driver ??
> >
> > Any hint is appreciated.
> >
> > Thank you so much.
> > --
> > Vincent
> >
> >
> >
> >
> >
> >
> > 2018-08-02 15:35 GMT+08:00 Yan, Liang Z <liang.z.yan(a)intel.com
> > <mailto:liang.z.yan(a)intel.com> >:
> >
> >
> >       Hi Vincent,
> >
> >
> >
> >       What is the CentOS version you are using? We are trying to
> reproduce
> > this issue.
> >
> >
> >
> >       Thanks.
> >
> >
> >
> >       Liang Yan
> >
> >
> >
> >       From: SPDK [mailto:spdk-bounces(a)lists.01.org <mailto:spdk-
> > bounces(a)lists.01.org> ] On Behalf Of Vincent
> >       Sent: Thursday, August 2, 2018 3:12 PM
> >       To: Storage Performance Development Kit <spdk(a)lists.01.org
> > <mailto:spdk(a)lists.01.org> >
>
> >       Subject: [SPDK] uio crash when hot-remove NVME SSD disk
> >
> >
> >
> >       Hello all,
> >
> >
> >
> >                  We just test the hotplug function of SPDK,
> >
> >
> >
> >       The kernel will crash sometimes when we remove disk.
> >
> >
> >
> >       the kdump crash call stack is attached below.
> >
> >
> >
> >       The kernel version is   "4.16.11-1.el7.elrepo.x86_64"
> >
> >
> >
> >       Does anyone  can give us a hint of how to solve this problem ?
> >
> >
> >
> >       Thank you so much.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >       [ 3813.935443] BUG: unable to handle kernel NULL pointer
> dereference
> > at 0000000000000173
> >
> >       [ 3813.935466] IP: 0x173
> >
> >       [ 3813.935472] PGD 8000001fe7558067 P4D 8000001fe7558067 PUD
> > 1ff332d067 PMD 0
> >
> >       [ 3813.935490] Oops: 0010 [#1] SMP PTI
> >
> >       [ 3813.935496] Modules linked in: virtio_pci virtio_ring virtio
> > uio_pci_generic uio ipmi_si ipmi_devintf ipmi_msghandler sr_mod cdrom
> joydev
> > uas usb_storage fuse xt_CHECKSUM iptable_mangle ipt_MASQUERADE
> > nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4
> > nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 tun
> > ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter
> iscsi_tcp
> > libiscsi_tcp libiscsi scsi_transport_iscsi binfmt_misc sha512_ssse3
> sha512_generic
> > skx_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm
> > irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc
> aesni_intel
> > crypto_simd glue_helper cryptd intel_cstate qat_c62x intel_rapl_perf
> intel_qat
> > wdat_wdt pcspkr dh_generic input_leds cdc_acm authenc sg mei_me lpc_ich
> mei
> > ioatdma
> >
> >       [ 3813.935638]  shpchp i2c_i801 mfd_core acpi_power_meter acpi_pad
> > tpm_crb nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs
> libcrc32c
> > sd_mod crc32c_intel ast i40e drm_kms_helper syscopyarea sysfillrect
> sysimgblt
> > fb_sys_fops igb ttm ahci ptp libahci pps_core drm dca i2c_algo_bit libata
> > dm_mirror dm_region_hash dm_log dm_mod dax [last unloaded:
> > ipmi_msghandler]
> >
> >       [ 3813.935725] CPU: 35 PID: 21251 Comm: reactor_35 Not tainted
> > 4.16.11-1.el7.elrepo.x86_64 #1
> >
> >       [ 3813.935731] Hardware name: AIC HA202-PV/PAVO, BIOS PAVH_0.02.1
> > 02/27/2018
> >
> >       [ 3813.935738] RIP: 0010:0x173
> >
> >       [ 3813.935745] RSP: 0018:ffffc900216d7e40 EFLAGS: 00010202
> >
> >       [ 3813.935753] RAX: ffff883fe34a7850 RBX: ffff881fd5331570 RCX:
> > 0000000000000001
> >
> >       [ 3813.935760] RDX: 0000000000000173 RSI: ffff881ff4835de8 RDI:
> > ffff883fe34a7850
> >
> >       [ 3813.935766] RBP: ffffc900216d7e60 R08: 0000000000000000 R09:
> > 0000000000000000
> >
> >       [ 3813.935773] R10: ffff881ff4835de8 R11: ffff881ff1c13310 R12:
> > ffff883ff462ed98
> >
> >       [ 3813.935779] R13: 0000000000000000 R14: ffff881ff499c320 R15:
> > ffff881fb97e9200
> >
> >       [ 3813.935788] FS:  00007efa9d5f9700(0000)
> GS:ffff883fff3c0000(0000)
> > knlGS:0000000000000000
> >
> >       [ 3813.935795] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >
> >       [ 3813.935802] CR2: 0000000000000173 CR3: 0000001fc27f8003 CR4:
> > 00000000007606e0
> >
> >       [ 3813.935809] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > 0000000000000000
> >
> >       [ 3813.935816] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> > 0000000000000400
> >
> >       [ 3813.935821] PKRU: 55555554
> >
> >       [ 3813.935826] Call Trace:
> >
> >       [ 3813.935843]  ? uio_release+0x37/0x60 [uio]
> >
> >       [ 3813.935859]  __fput+0xea/0x220
> >
> >       [ 3813.935871]  ____fput+0xe/0x10
> >
> >       [ 3813.935881]  task_work_run+0x8c/0xb0
> >
> >       [ 3813.935895]  exit_to_usermode_loop+0x6b/0x95
> >
> >       [ 3813.935905]  do_syscall_64+0x182/0x1b0
> >
> >       [ 3813.935920]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> >
> >       [ 3813.935927] RIP: 0033:0x7f02aa71c76d
> >
> >       [ 3813.935934] RSP: 002b:00007efa9d5f88b0 EFLAGS: 00000293
> > ORIG_RAX: 0000000000000003
> >
> >       [ 3813.935943] RAX: 0000000000000000 RBX: 00007efea2c2ec80 RCX:
> > 00007f02aa71c76d
> >
> >
> >       _______________________________________________
> >       SPDK mailing list
>
> >       SPDK(a)lists.01.org <mailto:SPDK(a)lists.01.org>
> >       https://lists.01.org/mailman/listinfo/spdk
> > <https://lists.01.org/mailman/listinfo/spdk>
>
> >
> >
> >
>
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
>
>
>
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
>
>

[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 17939 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [SPDK] uio crash when hot-remove NVME SSD disk
@ 2018-08-06  9:13 Vincent
  0 siblings, 0 replies; 8+ messages in thread
From: Vincent @ 2018-08-06  9:13 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 8207 bytes --]

Hello  DariuszX,

   Thank you for your response.

    I had already saw the patch, but the call stack of the crash is
different from mine.

This is the patch's crash call stack

[  251.173789]  [<ffffffffa0c9b2d3>] dev_attr_show+0x23/0x60
[  251.174356]  [<ffffffffa0f561b2>] ? mutex_lock+0x12/0x2f
[  251.174892]  [<ffffffffa0ac6d9f>] sysfs_kf_seq_show+0xcf/0x1f0
[  251.175433]  [<ffffffffa0ac54e6>] kernfs_seq_show+0x26/0x30
[  251.175981]  [<ffffffffa0a63be0>] seq_read+0x110/0x3f0


This is my crash call stack.

[ 3813.935843]  ? uio_release+0x37/0x60 [uio]

[ 3813.935859]  __fput+0xea/0x220

[ 3813.935871]  ____fput+0xe/0x10

[ 3813.935881]  task_work_run+0x8c/0xb0

[ 3813.935895]  exit_to_usermode_loop+0x6b/0x95


So I think maybe these are 2 different , unrelated issues.

Any way, I will try it.

Thank you so much for your response.

--
Vincent


2018-08-06 16:52 GMT+08:00 Stojaczyk, DariuszX <dariuszx.stojaczyk(a)intel.com
>:

> Hi,
>
> @baruch on github recently gave us an update that there's already a fix in
> the mainline kernel.
> https://github.com/spdk/spdk/issues/231#issuecomment-409976409
>
> I believe 4.16.11-1.el7.elrepo.x86_64 doesn't have that patch yet.
> D.
>
> > -----Original Message-----
> > From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Vincent
> > Sent: Thursday, August 2, 2018 11:04 AM
> > To: Storage Performance Development Kit <spdk(a)lists.01.org>
> > Subject: Re: [SPDK] uio crash when hot-remove NVME SSD disk
> >
> > Hello All,
> >
> >       Actually we  had found this happened in Feb.
> >
> > this is the mail return from   Stojaczyk, DariuszX
> > ------------------------------------------------------------
> ------------------------------------------
> >
> > 2018-02-07 22:52 GMT+08:00 Stojaczyk, DariuszX <
> dariuszx.stojaczyk(a)intel.com
> > <mailto:dariuszx.stojaczyk(a)intel.com> >:
> >
> > Hi Vincent,
> >
> > This is known kernel bug. There is an SPDK github issue regarding the
> same
> > problem https://github.com/spdk/spdk/issues/231.
> >
> > Your options are:
> >
> > a) update your kernel
> >
> > b) switch from uio to vfio-pci driver
> >
> >
> > Regards,
> >
> >
> > ------------------------------------------------------------
> ----------------------------------------------
> > -------------
> >
> > We upgrade the kernel at that time, and problem seems disappear.
> >
> > But it seems that the crash problem still has small probability to
> happen.
> >
> > So, if we want to try the second proposal in DariuszX's mail,  switch
> from uio to
> > vfio-pci driver,
> >
> > But how to switch from uio to vfio-pci driver ??
> >
> > Any hint is appreciated.
> >
> > Thank you so much.
> > --
> > Vincent
> >
> >
> >
> >
> >
> >
> > 2018-08-02 15:35 GMT+08:00 Yan, Liang Z <liang.z.yan(a)intel.com
> > <mailto:liang.z.yan(a)intel.com> >:
> >
> >
> >       Hi Vincent,
> >
> >
> >
> >       What is the CentOS version you are using? We are trying to
> reproduce
> > this issue.
> >
> >
> >
> >       Thanks.
> >
> >
> >
> >       Liang Yan
> >
> >
> >
> >       From: SPDK [mailto:spdk-bounces(a)lists.01.org <mailto:spdk-
> > bounces(a)lists.01.org> ] On Behalf Of Vincent
> >       Sent: Thursday, August 2, 2018 3:12 PM
> >       To: Storage Performance Development Kit <spdk(a)lists.01.org
> > <mailto:spdk(a)lists.01.org> >
> >       Subject: [SPDK] uio crash when hot-remove NVME SSD disk
> >
> >
> >
> >       Hello all,
> >
> >
> >
> >                  We just test the hotplug function of SPDK,
> >
> >
> >
> >       The kernel will crash sometimes when we remove disk.
> >
> >
> >
> >       the kdump crash call stack is attached below.
> >
> >
> >
> >       The kernel version is   "4.16.11-1.el7.elrepo.x86_64"
> >
> >
> >
> >       Does anyone  can give us a hint of how to solve this problem ?
> >
> >
> >
> >       Thank you so much.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >       [ 3813.935443] BUG: unable to handle kernel NULL pointer
> dereference
> > at 0000000000000173
> >
> >       [ 3813.935466] IP: 0x173
> >
> >       [ 3813.935472] PGD 8000001fe7558067 P4D 8000001fe7558067 PUD
> > 1ff332d067 PMD 0
> >
> >       [ 3813.935490] Oops: 0010 [#1] SMP PTI
> >
> >       [ 3813.935496] Modules linked in: virtio_pci virtio_ring virtio
> > uio_pci_generic uio ipmi_si ipmi_devintf ipmi_msghandler sr_mod cdrom
> joydev
> > uas usb_storage fuse xt_CHECKSUM iptable_mangle ipt_MASQUERADE
> > nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4
> > nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 tun
> > ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter
> iscsi_tcp
> > libiscsi_tcp libiscsi scsi_transport_iscsi binfmt_misc sha512_ssse3
> sha512_generic
> > skx_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm
> > irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc
> aesni_intel
> > crypto_simd glue_helper cryptd intel_cstate qat_c62x intel_rapl_perf
> intel_qat
> > wdat_wdt pcspkr dh_generic input_leds cdc_acm authenc sg mei_me lpc_ich
> mei
> > ioatdma
> >
> >       [ 3813.935638]  shpchp i2c_i801 mfd_core acpi_power_meter acpi_pad
> > tpm_crb nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs
> libcrc32c
> > sd_mod crc32c_intel ast i40e drm_kms_helper syscopyarea sysfillrect
> sysimgblt
> > fb_sys_fops igb ttm ahci ptp libahci pps_core drm dca i2c_algo_bit libata
> > dm_mirror dm_region_hash dm_log dm_mod dax [last unloaded:
> > ipmi_msghandler]
> >
> >       [ 3813.935725] CPU: 35 PID: 21251 Comm: reactor_35 Not tainted
> > 4.16.11-1.el7.elrepo.x86_64 #1
> >
> >       [ 3813.935731] Hardware name: AIC HA202-PV/PAVO, BIOS PAVH_0.02.1
> > 02/27/2018
> >
> >       [ 3813.935738] RIP: 0010:0x173
> >
> >       [ 3813.935745] RSP: 0018:ffffc900216d7e40 EFLAGS: 00010202
> >
> >       [ 3813.935753] RAX: ffff883fe34a7850 RBX: ffff881fd5331570 RCX:
> > 0000000000000001
> >
> >       [ 3813.935760] RDX: 0000000000000173 RSI: ffff881ff4835de8 RDI:
> > ffff883fe34a7850
> >
> >       [ 3813.935766] RBP: ffffc900216d7e60 R08: 0000000000000000 R09:
> > 0000000000000000
> >
> >       [ 3813.935773] R10: ffff881ff4835de8 R11: ffff881ff1c13310 R12:
> > ffff883ff462ed98
> >
> >       [ 3813.935779] R13: 0000000000000000 R14: ffff881ff499c320 R15:
> > ffff881fb97e9200
> >
> >       [ 3813.935788] FS:  00007efa9d5f9700(0000)
> GS:ffff883fff3c0000(0000)
> > knlGS:0000000000000000
> >
> >       [ 3813.935795] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >
> >       [ 3813.935802] CR2: 0000000000000173 CR3: 0000001fc27f8003 CR4:
> > 00000000007606e0
> >
> >       [ 3813.935809] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > 0000000000000000
> >
> >       [ 3813.935816] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> > 0000000000000400
> >
> >       [ 3813.935821] PKRU: 55555554
> >
> >       [ 3813.935826] Call Trace:
> >
> >       [ 3813.935843]  ? uio_release+0x37/0x60 [uio]
> >
> >       [ 3813.935859]  __fput+0xea/0x220
> >
> >       [ 3813.935871]  ____fput+0xe/0x10
> >
> >       [ 3813.935881]  task_work_run+0x8c/0xb0
> >
> >       [ 3813.935895]  exit_to_usermode_loop+0x6b/0x95
> >
> >       [ 3813.935905]  do_syscall_64+0x182/0x1b0
> >
> >       [ 3813.935920]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> >
> >       [ 3813.935927] RIP: 0033:0x7f02aa71c76d
> >
> >       [ 3813.935934] RSP: 002b:00007efa9d5f88b0 EFLAGS: 00000293
> > ORIG_RAX: 0000000000000003
> >
> >       [ 3813.935943] RAX: 0000000000000000 RBX: 00007efea2c2ec80 RCX:
> > 00007f02aa71c76d
> >
> >
> >       _______________________________________________
> >       SPDK mailing list
> >       SPDK(a)lists.01.org <mailto:SPDK(a)lists.01.org>
> >       https://lists.01.org/mailman/listinfo/spdk
> > <https://lists.01.org/mailman/listinfo/spdk>
> >
> >
> >
>
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
>

[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 15897 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [SPDK] uio crash when hot-remove NVME SSD disk
@ 2018-08-06  8:52 Stojaczyk, DariuszX
  0 siblings, 0 replies; 8+ messages in thread
From: Stojaczyk, DariuszX @ 2018-08-06  8:52 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 6411 bytes --]

Hi,

@baruch on github recently gave us an update that there's already a fix in the mainline kernel.
https://github.com/spdk/spdk/issues/231#issuecomment-409976409 

I believe 4.16.11-1.el7.elrepo.x86_64 doesn't have that patch yet.
D.

> -----Original Message-----
> From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Vincent
> Sent: Thursday, August 2, 2018 11:04 AM
> To: Storage Performance Development Kit <spdk(a)lists.01.org>
> Subject: Re: [SPDK] uio crash when hot-remove NVME SSD disk
> 
> Hello All,
> 
>       Actually we  had found this happened in Feb.
> 
> this is the mail return from   Stojaczyk, DariuszX
> ------------------------------------------------------------------------------------------------------
> 
> 2018-02-07 22:52 GMT+08:00 Stojaczyk, DariuszX <dariuszx.stojaczyk(a)intel.com
> <mailto:dariuszx.stojaczyk(a)intel.com> >:
> 
> Hi Vincent,
> 
> This is known kernel bug. There is an SPDK github issue regarding the same
> problem https://github.com/spdk/spdk/issues/231.
> 
> Your options are:
> 
> a) update your kernel
> 
> b) switch from uio to vfio-pci driver
> 
> 
> Regards,
> 
> 
> ----------------------------------------------------------------------------------------------------------
> -------------
> 
> We upgrade the kernel at that time, and problem seems disappear.
> 
> But it seems that the crash problem still has small probability to happen.
> 
> So, if we want to try the second proposal in DariuszX's mail,  switch from uio to
> vfio-pci driver,
> 
> But how to switch from uio to vfio-pci driver ??
> 
> Any hint is appreciated.
> 
> Thank you so much.
> --
> Vincent
> 
> 
> 
> 
> 
> 
> 2018-08-02 15:35 GMT+08:00 Yan, Liang Z <liang.z.yan(a)intel.com
> <mailto:liang.z.yan(a)intel.com> >:
> 
> 
> 	Hi Vincent,
> 
> 
> 
> 	What is the CentOS version you are using? We are trying to reproduce
> this issue.
> 
> 
> 
> 	Thanks.
> 
> 
> 
> 	Liang Yan
> 
> 
> 
> 	From: SPDK [mailto:spdk-bounces(a)lists.01.org <mailto:spdk-
> bounces(a)lists.01.org> ] On Behalf Of Vincent
> 	Sent: Thursday, August 2, 2018 3:12 PM
> 	To: Storage Performance Development Kit <spdk(a)lists.01.org
> <mailto:spdk(a)lists.01.org> >
> 	Subject: [SPDK] uio crash when hot-remove NVME SSD disk
> 
> 
> 
> 	Hello all,
> 
> 
> 
> 	           We just test the hotplug function of SPDK,
> 
> 
> 
> 	The kernel will crash sometimes when we remove disk.
> 
> 
> 
> 	the kdump crash call stack is attached below.
> 
> 
> 
> 	The kernel version is   "4.16.11-1.el7.elrepo.x86_64"
> 
> 
> 
> 	Does anyone  can give us a hint of how to solve this problem ?
> 
> 
> 
> 	Thank you so much.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 	[ 3813.935443] BUG: unable to handle kernel NULL pointer dereference
> at 0000000000000173
> 
> 	[ 3813.935466] IP: 0x173
> 
> 	[ 3813.935472] PGD 8000001fe7558067 P4D 8000001fe7558067 PUD
> 1ff332d067 PMD 0
> 
> 	[ 3813.935490] Oops: 0010 [#1] SMP PTI
> 
> 	[ 3813.935496] Modules linked in: virtio_pci virtio_ring virtio
> uio_pci_generic uio ipmi_si ipmi_devintf ipmi_msghandler sr_mod cdrom joydev
> uas usb_storage fuse xt_CHECKSUM iptable_mangle ipt_MASQUERADE
> nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4
> nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 tun
> ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter iscsi_tcp
> libiscsi_tcp libiscsi scsi_transport_iscsi binfmt_misc sha512_ssse3 sha512_generic
> skx_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm
> irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel
> crypto_simd glue_helper cryptd intel_cstate qat_c62x intel_rapl_perf intel_qat
> wdat_wdt pcspkr dh_generic input_leds cdc_acm authenc sg mei_me lpc_ich mei
> ioatdma
> 
> 	[ 3813.935638]  shpchp i2c_i801 mfd_core acpi_power_meter acpi_pad
> tpm_crb nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c
> sd_mod crc32c_intel ast i40e drm_kms_helper syscopyarea sysfillrect sysimgblt
> fb_sys_fops igb ttm ahci ptp libahci pps_core drm dca i2c_algo_bit libata
> dm_mirror dm_region_hash dm_log dm_mod dax [last unloaded:
> ipmi_msghandler]
> 
> 	[ 3813.935725] CPU: 35 PID: 21251 Comm: reactor_35 Not tainted
> 4.16.11-1.el7.elrepo.x86_64 #1
> 
> 	[ 3813.935731] Hardware name: AIC HA202-PV/PAVO, BIOS PAVH_0.02.1
> 02/27/2018
> 
> 	[ 3813.935738] RIP: 0010:0x173
> 
> 	[ 3813.935745] RSP: 0018:ffffc900216d7e40 EFLAGS: 00010202
> 
> 	[ 3813.935753] RAX: ffff883fe34a7850 RBX: ffff881fd5331570 RCX:
> 0000000000000001
> 
> 	[ 3813.935760] RDX: 0000000000000173 RSI: ffff881ff4835de8 RDI:
> ffff883fe34a7850
> 
> 	[ 3813.935766] RBP: ffffc900216d7e60 R08: 0000000000000000 R09:
> 0000000000000000
> 
> 	[ 3813.935773] R10: ffff881ff4835de8 R11: ffff881ff1c13310 R12:
> ffff883ff462ed98
> 
> 	[ 3813.935779] R13: 0000000000000000 R14: ffff881ff499c320 R15:
> ffff881fb97e9200
> 
> 	[ 3813.935788] FS:  00007efa9d5f9700(0000) GS:ffff883fff3c0000(0000)
> knlGS:0000000000000000
> 
> 	[ 3813.935795] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> 
> 	[ 3813.935802] CR2: 0000000000000173 CR3: 0000001fc27f8003 CR4:
> 00000000007606e0
> 
> 	[ 3813.935809] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> 
> 	[ 3813.935816] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> 0000000000000400
> 
> 	[ 3813.935821] PKRU: 55555554
> 
> 	[ 3813.935826] Call Trace:
> 
> 	[ 3813.935843]  ? uio_release+0x37/0x60 [uio]
> 
> 	[ 3813.935859]  __fput+0xea/0x220
> 
> 	[ 3813.935871]  ____fput+0xe/0x10
> 
> 	[ 3813.935881]  task_work_run+0x8c/0xb0
> 
> 	[ 3813.935895]  exit_to_usermode_loop+0x6b/0x95
> 
> 	[ 3813.935905]  do_syscall_64+0x182/0x1b0
> 
> 	[ 3813.935920]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> 
> 	[ 3813.935927] RIP: 0033:0x7f02aa71c76d
> 
> 	[ 3813.935934] RSP: 002b:00007efa9d5f88b0 EFLAGS: 00000293
> ORIG_RAX: 0000000000000003
> 
> 	[ 3813.935943] RAX: 0000000000000000 RBX: 00007efea2c2ec80 RCX:
> 00007f02aa71c76d
> 
> 
> 	_______________________________________________
> 	SPDK mailing list
> 	SPDK(a)lists.01.org <mailto:SPDK(a)lists.01.org>
> 	https://lists.01.org/mailman/listinfo/spdk
> <https://lists.01.org/mailman/listinfo/spdk>
> 
> 
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [SPDK] uio crash when hot-remove NVME SSD disk
@ 2018-08-02  9:03 Vincent
  0 siblings, 0 replies; 8+ messages in thread
From: Vincent @ 2018-08-02  9:03 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 5418 bytes --]

Hello All,

      Actually we  had found this happened in Feb.

this is the mail return from   Stojaczyk, DariuszX
------------------------------------------------------------------------------------------------------

2018-02-07 22:52 GMT+08:00 Stojaczyk, DariuszX <dariuszx.stojaczyk(a)intel.com
>:

Hi Vincent,

This is known kernel bug. There is an SPDK github issue regarding the same
problem https://github.com/spdk/spdk/issues/231.

Your options are:

a) update your kernel

b) switch from uio to vfio-pci driver


Regards,

-----------------------------------------------------------------------------------------------------------------------

We upgrade the kernel at that time, and problem seems disappear.

But it seems that the crash problem still has small probability to happen.

So, if we want to try the second proposal in DariuszX's mail,  switch from
uio to vfio-pci driver,

But how to switch from uio to vfio-pci driver ??

Any hint is appreciated.

Thank you so much.
--
Vincent




2018-08-02 15:35 GMT+08:00 Yan, Liang Z <liang.z.yan(a)intel.com>:

> Hi Vincent,
>
>
>
> What is the CentOS version you are using? We are trying to reproduce this
> issue.
>
>
>
> Thanks.
>
>
>
> Liang Yan
>
>
>
> *From:* SPDK [mailto:spdk-bounces(a)lists.01.org] *On Behalf Of *Vincent
> *Sent:* Thursday, August 2, 2018 3:12 PM
> *To:* Storage Performance Development Kit <spdk(a)lists.01.org>
> *Subject:* [SPDK] uio crash when hot-remove NVME SSD disk
>
>
>
> Hello all,
>
>
>
>            We just test the hotplug function of SPDK,
>
>
>
> The kernel will crash sometimes when we remove disk.
>
>
>
> the kdump crash call stack is attached below.
>
>
>
> The kernel version is   "4.16.11-1.el7.elrepo.x86_64"
>
>
>
> Does anyone  can give us a hint of how to solve this problem ?
>
>
>
> Thank you so much.
>
>
>
>
>
>
>
>
>
> [ 3813.935443] BUG: unable to handle kernel NULL pointer dereference at
> 0000000000000173
>
> [ 3813.935466] IP: 0x173
>
> [ 3813.935472] PGD 8000001fe7558067 P4D 8000001fe7558067 PUD 1ff332d067
> PMD 0
>
> [ 3813.935490] Oops: 0010 [#1] SMP PTI
>
> [ 3813.935496] Modules linked in: virtio_pci virtio_ring virtio
> uio_pci_generic uio ipmi_si ipmi_devintf ipmi_msghandler sr_mod cdrom
> joydev uas usb_storage fuse xt_CHECKSUM iptable_mangle ipt_MASQUERADE
> nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4
> nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 tun
> ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter iscsi_tcp
> libiscsi_tcp libiscsi scsi_transport_iscsi binfmt_misc sha512_ssse3
> sha512_generic skx_edac x86_pkg_temp_thermal intel_powerclamp coretemp
> kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel
> pcbc aesni_intel crypto_simd glue_helper cryptd intel_cstate qat_c62x
> intel_rapl_perf intel_qat wdat_wdt pcspkr dh_generic input_leds cdc_acm
> authenc sg mei_me lpc_ich mei ioatdma
>
> [ 3813.935638]  shpchp i2c_i801 mfd_core acpi_power_meter acpi_pad tpm_crb
> nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod
> crc32c_intel ast i40e drm_kms_helper syscopyarea sysfillrect sysimgblt
> fb_sys_fops igb ttm ahci ptp libahci pps_core drm dca i2c_algo_bit libata
> dm_mirror dm_region_hash dm_log dm_mod dax [last unloaded: ipmi_msghandler]
>
> [ 3813.935725] CPU: 35 PID: 21251 Comm: reactor_35 Not tainted
> 4.16.11-1.el7.elrepo.x86_64 #1
>
> [ 3813.935731] Hardware name: AIC HA202-PV/PAVO, BIOS PAVH_0.02.1
> 02/27/2018
>
> [ 3813.935738] RIP: 0010:0x173
>
> [ 3813.935745] RSP: 0018:ffffc900216d7e40 EFLAGS: 00010202
>
> [ 3813.935753] RAX: ffff883fe34a7850 RBX: ffff881fd5331570 RCX:
> 0000000000000001
>
> [ 3813.935760] RDX: 0000000000000173 RSI: ffff881ff4835de8 RDI:
> ffff883fe34a7850
>
> [ 3813.935766] RBP: ffffc900216d7e60 R08: 0000000000000000 R09:
> 0000000000000000
>
> [ 3813.935773] R10: ffff881ff4835de8 R11: ffff881ff1c13310 R12:
> ffff883ff462ed98
>
> [ 3813.935779] R13: 0000000000000000 R14: ffff881ff499c320 R15:
> ffff881fb97e9200
>
> [ 3813.935788] FS:  00007efa9d5f9700(0000) GS:ffff883fff3c0000(0000)
> knlGS:0000000000000000
>
> [ 3813.935795] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>
> [ 3813.935802] CR2: 0000000000000173 CR3: 0000001fc27f8003 CR4:
> 00000000007606e0
>
> [ 3813.935809] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
>
> [ 3813.935816] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> 0000000000000400
>
> [ 3813.935821] PKRU: 55555554
>
> [ 3813.935826] Call Trace:
>
> [ 3813.935843]  ? uio_release+0x37/0x60 [uio]
>
> [ 3813.935859]  __fput+0xea/0x220
>
> [ 3813.935871]  ____fput+0xe/0x10
>
> [ 3813.935881]  task_work_run+0x8c/0xb0
>
> [ 3813.935895]  exit_to_usermode_loop+0x6b/0x95
>
> [ 3813.935905]  do_syscall_64+0x182/0x1b0
>
> [ 3813.935920]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
>
> [ 3813.935927] RIP: 0033:0x7f02aa71c76d
>
> [ 3813.935934] RSP: 002b:00007efa9d5f88b0 EFLAGS: 00000293 ORIG_RAX:
> 0000000000000003
>
> [ 3813.935943] RAX: 0000000000000000 RBX: 00007efea2c2ec80 RCX:
> 00007f02aa71c76d
>
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
>
>

[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 12982 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [SPDK] uio crash when hot-remove NVME SSD disk
@ 2018-08-02  7:48 Vincent
  0 siblings, 0 replies; 8+ messages in thread
From: Vincent @ 2018-08-02  7:48 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 4512 bytes --]

Thank you for your quick response

the host system is centos 7.4,  kernel upgrade to 4.16

This problem do not happen often.
Sometimes it may take 100 or 200 times hotplug to make this problem happen.

Any hint for this problem please let me known

thank you so much

Yan, Liang Z <liang.z.yan(a)intel.com> 於 2018年8月2日 星期四寫道:

> Hi Vincent,
>
>
>
> What is the CentOS version you are using? We are trying to reproduce this
> issue.
>
>
>
> Thanks.
>
>
>
> Liang Yan
>
>
>
> *From:* SPDK [mailto:spdk-bounces(a)lists.01.org] *On Behalf Of *Vincent
> *Sent:* Thursday, August 2, 2018 3:12 PM
> *To:* Storage Performance Development Kit <spdk(a)lists.01.org>
> *Subject:* [SPDK] uio crash when hot-remove NVME SSD disk
>
>
>
> Hello all,
>
>
>
>            We just test the hotplug function of SPDK,
>
>
>
> The kernel will crash sometimes when we remove disk.
>
>
>
> the kdump crash call stack is attached below.
>
>
>
> The kernel version is   "4.16.11-1.el7.elrepo.x86_64"
>
>
>
> Does anyone  can give us a hint of how to solve this problem ?
>
>
>
> Thank you so much.
>
>
>
>
>
>
>
>
>
> [ 3813.935443] BUG: unable to handle kernel NULL pointer dereference at
> 0000000000000173
>
> [ 3813.935466] IP: 0x173
>
> [ 3813.935472] PGD 8000001fe7558067 P4D 8000001fe7558067 PUD 1ff332d067
> PMD 0
>
> [ 3813.935490] Oops: 0010 [#1] SMP PTI
>
> [ 3813.935496] Modules linked in: virtio_pci virtio_ring virtio
> uio_pci_generic uio ipmi_si ipmi_devintf ipmi_msghandler sr_mod cdrom
> joydev uas usb_storage fuse xt_CHECKSUM iptable_mangle ipt_MASQUERADE
> nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4
> nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 tun
> ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter iscsi_tcp
> libiscsi_tcp libiscsi scsi_transport_iscsi binfmt_misc sha512_ssse3
> sha512_generic skx_edac x86_pkg_temp_thermal intel_powerclamp coretemp
> kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel
> pcbc aesni_intel crypto_simd glue_helper cryptd intel_cstate qat_c62x
> intel_rapl_perf intel_qat wdat_wdt pcspkr dh_generic input_leds cdc_acm
> authenc sg mei_me lpc_ich mei ioatdma
>
> [ 3813.935638]  shpchp i2c_i801 mfd_core acpi_power_meter acpi_pad tpm_crb
> nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod
> crc32c_intel ast i40e drm_kms_helper syscopyarea sysfillrect sysimgblt
> fb_sys_fops igb ttm ahci ptp libahci pps_core drm dca i2c_algo_bit libata
> dm_mirror dm_region_hash dm_log dm_mod dax [last unloaded: ipmi_msghandler]
>
> [ 3813.935725] CPU: 35 PID: 21251 Comm: reactor_35 Not tainted
> 4.16.11-1.el7.elrepo.x86_64 #1
>
> [ 3813.935731] Hardware name: AIC HA202-PV/PAVO, BIOS PAVH_0.02.1
> 02/27/2018
>
> [ 3813.935738] RIP: 0010:0x173
>
> [ 3813.935745] RSP: 0018:ffffc900216d7e40 EFLAGS: 00010202
>
> [ 3813.935753] RAX: ffff883fe34a7850 RBX: ffff881fd5331570 RCX:
> 0000000000000001
>
> [ 3813.935760] RDX: 0000000000000173 RSI: ffff881ff4835de8 RDI:
> ffff883fe34a7850
>
> [ 3813.935766] RBP: ffffc900216d7e60 R08: 0000000000000000 R09:
> 0000000000000000
>
> [ 3813.935773] R10: ffff881ff4835de8 R11: ffff881ff1c13310 R12:
> ffff883ff462ed98
>
> [ 3813.935779] R13: 0000000000000000 R14: ffff881ff499c320 R15:
> ffff881fb97e9200
>
> [ 3813.935788] FS:  00007efa9d5f9700(0000) GS:ffff883fff3c0000(0000)
> knlGS:0000000000000000
>
> [ 3813.935795] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>
> [ 3813.935802] CR2: 0000000000000173 CR3: 0000001fc27f8003 CR4:
> 00000000007606e0
>
> [ 3813.935809] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
>
> [ 3813.935816] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> 0000000000000400
>
> [ 3813.935821] PKRU: 55555554
>
> [ 3813.935826] Call Trace:
>
> [ 3813.935843]  ? uio_release+0x37/0x60 [uio]
>
> [ 3813.935859]  __fput+0xea/0x220
>
> [ 3813.935871]  ____fput+0xe/0x10
>
> [ 3813.935881]  task_work_run+0x8c/0xb0
>
> [ 3813.935895]  exit_to_usermode_loop+0x6b/0x95
>
> [ 3813.935905]  do_syscall_64+0x182/0x1b0
>
> [ 3813.935920]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
>
> [ 3813.935927] RIP: 0033:0x7f02aa71c76d
>
> [ 3813.935934] RSP: 002b:00007efa9d5f88b0 EFLAGS: 00000293 ORIG_RAX:
> 0000000000000003
>
> [ 3813.935943] RAX: 0000000000000000 RBX: 00007efea2c2ec80 RCX:
> 00007f02aa71c76d
>

[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 8612 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [SPDK] uio crash when hot-remove NVME SSD disk
@ 2018-08-02  7:35 Yan, Liang Z
  0 siblings, 0 replies; 8+ messages in thread
From: Yan, Liang Z @ 2018-08-02  7:35 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 3763 bytes --]

Hi Vincent,

What is the CentOS version you are using? We are trying to reproduce this issue.

Thanks.

Liang Yan

From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Vincent
Sent: Thursday, August 2, 2018 3:12 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: [SPDK] uio crash when hot-remove NVME SSD disk

Hello all,

           We just test the hotplug function of SPDK,

The kernel will crash sometimes when we remove disk.

the kdump crash call stack is attached below.

The kernel version is   "4.16.11-1.el7.elrepo.x86_64"

Does anyone  can give us a hint of how to solve this problem ?

Thank you so much.




[ 3813.935443] BUG: unable to handle kernel NULL pointer dereference at 0000000000000173
[ 3813.935466] IP: 0x173
[ 3813.935472] PGD 8000001fe7558067 P4D 8000001fe7558067 PUD 1ff332d067 PMD 0
[ 3813.935490] Oops: 0010 [#1] SMP PTI
[ 3813.935496] Modules linked in: virtio_pci virtio_ring virtio uio_pci_generic uio ipmi_si ipmi_devintf ipmi_msghandler sr_mod cdrom joydev uas usb_storage fuse xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 tun ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi binfmt_misc sha512_ssse3 sha512_generic skx_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel crypto_simd glue_helper cryptd intel_cstate qat_c62x intel_rapl_perf intel_qat wdat_wdt pcspkr dh_generic input_leds cdc_acm authenc sg mei_me lpc_ich mei ioatdma
[ 3813.935638]  shpchp i2c_i801 mfd_core acpi_power_meter acpi_pad tpm_crb nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod crc32c_intel ast i40e drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops igb ttm ahci ptp libahci pps_core drm dca i2c_algo_bit libata dm_mirror dm_region_hash dm_log dm_mod dax [last unloaded: ipmi_msghandler]
[ 3813.935725] CPU: 35 PID: 21251 Comm: reactor_35 Not tainted 4.16.11-1.el7.elrepo.x86_64 #1
[ 3813.935731] Hardware name: AIC HA202-PV/PAVO, BIOS PAVH_0.02.1 02/27/2018
[ 3813.935738] RIP: 0010:0x173
[ 3813.935745] RSP: 0018:ffffc900216d7e40 EFLAGS: 00010202
[ 3813.935753] RAX: ffff883fe34a7850 RBX: ffff881fd5331570 RCX: 0000000000000001
[ 3813.935760] RDX: 0000000000000173 RSI: ffff881ff4835de8 RDI: ffff883fe34a7850
[ 3813.935766] RBP: ffffc900216d7e60 R08: 0000000000000000 R09: 0000000000000000
[ 3813.935773] R10: ffff881ff4835de8 R11: ffff881ff1c13310 R12: ffff883ff462ed98
[ 3813.935779] R13: 0000000000000000 R14: ffff881ff499c320 R15: ffff881fb97e9200
[ 3813.935788] FS:  00007efa9d5f9700(0000) GS:ffff883fff3c0000(0000) knlGS:0000000000000000
[ 3813.935795] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3813.935802] CR2: 0000000000000173 CR3: 0000001fc27f8003 CR4: 00000000007606e0
[ 3813.935809] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 3813.935816] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 3813.935821] PKRU: 55555554
[ 3813.935826] Call Trace:
[ 3813.935843]  ? uio_release+0x37/0x60 [uio]
[ 3813.935859]  __fput+0xea/0x220
[ 3813.935871]  ____fput+0xe/0x10
[ 3813.935881]  task_work_run+0x8c/0xb0
[ 3813.935895]  exit_to_usermode_loop+0x6b/0x95
[ 3813.935905]  do_syscall_64+0x182/0x1b0
[ 3813.935920]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[ 3813.935927] RIP: 0033:0x7f02aa71c76d
[ 3813.935934] RSP: 002b:00007efa9d5f88b0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
[ 3813.935943] RAX: 0000000000000000 RBX: 00007efea2c2ec80 RCX: 00007f02aa71c76d

[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 9659 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [SPDK] uio crash when hot-remove NVME SSD disk
@ 2018-08-02  7:12 Vincent
  0 siblings, 0 replies; 8+ messages in thread
From: Vincent @ 2018-08-02  7:12 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 3437 bytes --]

Hello all,

           We just test the hotplug function of SPDK,

The kernel will crash sometimes when we remove disk.

the kdump crash call stack is attached below.

The kernel version is   "4.16.11-1.el7.elrepo.x86_64"

Does anyone  can give us a hint of how to solve this problem ?

Thank you so much.




[ 3813.935443] BUG: unable to handle kernel NULL pointer dereference at
0000000000000173
[ 3813.935466] IP: 0x173
[ 3813.935472] PGD 8000001fe7558067 P4D 8000001fe7558067 PUD 1ff332d067 PMD
0
[ 3813.935490] Oops: 0010 [#1] SMP PTI
[ 3813.935496] Modules linked in: virtio_pci virtio_ring virtio
uio_pci_generic uio ipmi_si ipmi_devintf ipmi_msghandler sr_mod cdrom
joydev uas usb_storage fuse xt_CHECKSUM iptable_mangle ipt_MASQUERADE
nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4
nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 tun
ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter iscsi_tcp
libiscsi_tcp libiscsi scsi_transport_iscsi binfmt_misc sha512_ssse3
sha512_generic skx_edac x86_pkg_temp_thermal intel_powerclamp coretemp
kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel
pcbc aesni_intel crypto_simd glue_helper cryptd intel_cstate qat_c62x
intel_rapl_perf intel_qat wdat_wdt pcspkr dh_generic input_leds cdc_acm
authenc sg mei_me lpc_ich mei ioatdma
[ 3813.935638]  shpchp i2c_i801 mfd_core acpi_power_meter acpi_pad tpm_crb
nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod
crc32c_intel ast i40e drm_kms_helper syscopyarea sysfillrect sysimgblt
fb_sys_fops igb ttm ahci ptp libahci pps_core drm dca i2c_algo_bit libata
dm_mirror dm_region_hash dm_log dm_mod dax [last unloaded: ipmi_msghandler]
[ 3813.935725] CPU: 35 PID: 21251 Comm: reactor_35 Not tainted
4.16.11-1.el7.elrepo.x86_64 #1
[ 3813.935731] Hardware name: AIC HA202-PV/PAVO, BIOS PAVH_0.02.1 02/27/2018
[ 3813.935738] RIP: 0010:0x173
[ 3813.935745] RSP: 0018:ffffc900216d7e40 EFLAGS: 00010202
[ 3813.935753] RAX: ffff883fe34a7850 RBX: ffff881fd5331570 RCX:
0000000000000001
[ 3813.935760] RDX: 0000000000000173 RSI: ffff881ff4835de8 RDI:
ffff883fe34a7850
[ 3813.935766] RBP: ffffc900216d7e60 R08: 0000000000000000 R09:
0000000000000000
[ 3813.935773] R10: ffff881ff4835de8 R11: ffff881ff1c13310 R12:
ffff883ff462ed98
[ 3813.935779] R13: 0000000000000000 R14: ffff881ff499c320 R15:
ffff881fb97e9200
[ 3813.935788] FS:  00007efa9d5f9700(0000) GS:ffff883fff3c0000(0000)
knlGS:0000000000000000
[ 3813.935795] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3813.935802] CR2: 0000000000000173 CR3: 0000001fc27f8003 CR4:
00000000007606e0
[ 3813.935809] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 3813.935816] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[ 3813.935821] PKRU: 55555554
[ 3813.935826] Call Trace:
[ 3813.935843]  ? uio_release+0x37/0x60 [uio]
[ 3813.935859]  __fput+0xea/0x220
[ 3813.935871]  ____fput+0xe/0x10
[ 3813.935881]  task_work_run+0x8c/0xb0
[ 3813.935895]  exit_to_usermode_loop+0x6b/0x95
[ 3813.935905]  do_syscall_64+0x182/0x1b0
[ 3813.935920]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[ 3813.935927] RIP: 0033:0x7f02aa71c76d
[ 3813.935934] RSP: 002b:00007efa9d5f88b0 EFLAGS: 00000293 ORIG_RAX:
0000000000000003
[ 3813.935943] RAX: 0000000000000000 RBX: 00007efea2c2ec80 RCX:
00007f02aa71c76d

[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 4108 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-08-06  9:57 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-06  9:48 [SPDK] uio crash when hot-remove NVME SSD disk Wodkowski, PawelX
  -- strict thread matches above, loose matches on Subject: below --
2018-08-06  9:57 Vincent
2018-08-06  9:13 Vincent
2018-08-06  8:52 Stojaczyk, DariuszX
2018-08-02  9:03 Vincent
2018-08-02  7:48 Vincent
2018-08-02  7:35 Yan, Liang Z
2018-08-02  7:12 Vincent

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.