All of lore.kernel.org
 help / color / mirror / Atom feed
* [SPDK] Buffer I/O error on bigger block size running fio
@ 2017-10-05 18:26 Victor Banh
  0 siblings, 0 replies; 24+ messages in thread
From: Victor Banh @ 2017-10-05 18:26 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 2714 bytes --]

Hi
I have SPDK NVMeoF and keep getting error with bigger block size with fio on randwrite tests.
I am using Ubuntu 16.04 with kernel version 4.12.0-041200-generic on target and client.
The DPDK is 17.08 and SPDK is 17.07.1.
Thanks
Victor


[46905.233553] perf: interrupt took too long (2503 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
[48285.159186] blk_update_request: I/O error, dev nvme1n1, sector 2507351968
[48285.159207] blk_update_request: I/O error, dev nvme1n1, sector 1301294496
[48285.159226] blk_update_request: I/O error, dev nvme1n1, sector 1947371168
[48285.159239] blk_update_request: I/O error, dev nvme1n1, sector 1891797568
[48285.159252] blk_update_request: I/O error, dev nvme1n1, sector 10833824
[48285.159265] blk_update_request: I/O error, dev nvme1n1, sector 614937152
[48285.159277] blk_update_request: I/O error, dev nvme1n1, sector 1872305088
[48285.159290] blk_update_request: I/O error, dev nvme1n1, sector 1504491040
[48285.159299] blk_update_request: I/O error, dev nvme1n1, sector 1182136128
[48285.159308] blk_update_request: I/O error, dev nvme1n1, sector 1662985792
[48285.191185] nvme nvme1: Reconnecting in 10 seconds...
[48285.191254] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191291] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191305] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191314] ldm_validate_partition_table(): Disk read failed.
[48285.191320] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191327] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191335] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191342] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191347] Dev nvme1n1: unable to read RDB block 0
[48285.191353] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191360] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191375] Buffer I/O error on dev nvme1n1, logical block 3, async page read
[48285.191389]  nvme1n1: unable to read partition table
[48285.223197] nvme1n1: detected capacity change from 1600321314816 to 0
[48289.623192] nvme1n1: detected capacity change from 0 to -65647705833078784
[48289.623411] ldm_validate_partition_table(): Disk read failed.
[48289.623447] Dev nvme1n1: unable to read RDB block 0
[48289.623486]  nvme1n1: unable to read partition table
[48289.643305] ldm_validate_partition_table(): Disk read failed.
[48289.643328] Dev nvme1n1: unable to read RDB block 0
[48289.643373]  nvme1n1: unable to read partition table





[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 5908 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [SPDK] Buffer I/O error on bigger block size running fio
@ 2018-03-05 19:32 Wodkowski, PawelX
  0 siblings, 0 replies; 24+ messages in thread
From: Wodkowski, PawelX @ 2018-03-05 19:32 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 25283 bytes --]

Hi Victor,

We do not test mainline kernels. Please use official Ubuntu 16.04.3 kernel (currently I think it is 4.4.0-116).
If you need kernel version >= 4.12.0 please use Ubuntu 17.10.

Pawel

From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Victor Banh
Sent: Saturday, March 3, 2018 5:23 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org>; Harris, James R <james.r.harris(a)intel.com>; John F. Kim <johnk(a)mellanox.com>; Cao, Gang <gang.cao(a)intel.com>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio



Hi Cao
Have you tried DPDK 18.02 with SPDK V18.01?
Thanks
Victor


From: Victor Banh
Sent: Saturday, October 21, 2017 8:23 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; John F. Kim <johnk(a)mellanox.com<mailto:johnk(a)mellanox.com>>; Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

I am using NVMe.
Thanks
Victor
________________________________
From: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Sent: Saturday, October 21, 2017 1:21:03 AM
To: Victor Banh; Storage Performance Development Kit; Harris, James R; John F. Kim
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

I am also using the SPDK v17.07.1 and DPDK 17.08 on my system to have the trying based on your IO configuration.

Let me check whether our colleague has this Ubuntu OS installed. If not, could you please have a try on CentOS.

Also what kind of backend device did you configured for the NVMe-oF, like Malloc, NVMe SSD or other?

Thanks,
Gang

From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Saturday, October 21, 2017 6:14 AM
To: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; John F. Kim <johnk(a)mellanox.com<mailto:johnk(a)mellanox.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Gang
Is it possible to try on Ubuntu OS?
Which verion of DPDK and SPDK are you using?
Where can I get them? Github?
Thanks
Victor

root(a)clientintel:~# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 16.04.3 LTS
Release:        16.04
Codename:       xenial
root(a)clientintel:~# uname -a
Linux clientintel 4.12.0-041200-generic #201707022031 SMP Mon Jul 3 00:32:52 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux



From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Wednesday, October 18, 2017 11:52 PM
To: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

[root(a)node4 gangcao]# uname -a
Linux node4 4.10.1 #1 SMP Fri Mar 10 15:59:57 CST 2017 x86_64 x86_64 x86_64 GNU/Linux

[root(a)node4 gangcao]# lsb_release -a
LSB Version:    :core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch
Distributor ID: CentOS
Description:    CentOS Linux release 7.2.1511 (Core)
Release:        7.2.1511
Codename:       Core

From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Thursday, October 19, 2017 2:14 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Can you give the OS version and kernel version again for target and client?
I couldn't compile dkdp without installed the latest Kernel on 4.12.X kernel on Ubuntu 16.04.
Thanks
Victor
________________________________
From: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Sent: Wednesday, October 18, 2017 10:59:19 PM
To: Victor Banh; Storage Performance Development Kit; Harris, James R
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

This server does not have OFED installed and it is in the loopback mode.

Found another server also in the loopback mode with ConnectX-3 and have OFED installed.

By the way, what SSD are you using? Maybe it is relating to the SSD? I've just run with 2048k for a short duration and seems no issue. Will run more time to see whether can hit this error.

[root(a)slave3 fio]# lspci | grep -i mell
08:00.0 Ethernet controller: Mellanox Technologies MT27520 Family [ConnectX-3 Pro]

[root(a)slave3 fio]# lsmod | grep -i mlx
mlx4_ib               159744  0
ib_core               208896  15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,ib_srpt,ib_ucm,rdma_ucm,mlx4_ib
mlx4_en               114688  0
mlx4_core             307200  2 mlx4_en,mlx4_ib
ptp                    20480  3 ixgbe,igb,mlx4_en

[root(a)slave3 fio]# ofed_info
MLNX_OFED_LINUX-3.2-2.0.0.0 (OFED-3.2-2.0.0):

ar_mgr:
osm_plugins/ar_mgr/ar_mgr-1.0-0.30.ga1ea4b7.tar.gz

cc_mgr:
osm_plugins/cc_mgr/cc_mgr-1.0-0.29.ga1ea4b7.tar.gz

dapl:
dapl.git mlnx_ofed_3_1
commit c30fb6ce2cbc29d8ed4bde51437f7abb93378c78
dump_pr:
osm_plugins/dump_pr//dump_pr-1.0-0.25.ga1ea4b7.tar.gz

fabric-collector:
fabric_collector//fabric-collector-1.1.0.MLNX20140410.51b267e.tar.gz

fca:
mlnx_ofed_fca/fca-2.5.2431-1.src.rpm

hcoll:
mlnx_ofed_hcol/hcoll-3.4.807-1.src.rpm

ibacm:
mlnx_ofed/ibacm.git mlnx_ofed_3_2
commit 15ad8c13bdebbe62edea0b7df030710b65c14f7f
ibacm_ssa:
mlnx_ofed_ssa/acm/ibacm_ssa-0.0.9.3.MLNX20151203.50eb579.tar.gz

ibdump:
sniffer/sniffer-4.0.0-2/ibdump/linux/ibdump-4.0.0-2.tgz

ibsim:
mlnx_ofed_ibsim/ibsim-0.6-0.8.g9d76581.tar.gz

ibssa:
mlnx_ofed_ssa/distrib/ibssa-0.0.9.3.MLNX20151203.50eb579.tar.gz

ibutils:
ofed-1.5.3-rpms/ibutils/ibutils-1.5.7.1-0.12.gdcaeae2.tar.gz

ibutils2:
ibutils2/ibutils2-2.1.1-0.76.MLNX20160222.gd366c7b.tar.gz

infiniband-diags:
mlnx_ofed_infiniband_diags/infiniband-diags-1.6.6.MLNX20151130.7f0213e.tar.gz

infinipath-psm:
mlnx_ofed_infinipath-psm/infinipath-psm-3.3-2_g6f42cdb_open.tar.gz

iser:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d

isert:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d

kernel-mft:
mlnx_ofed_mft/kernel-mft-4.3.0-25.src.rpm

knem:
knem.git mellanox-master
commit f143ee19a575cd42a334422fa8bd329d671238db
libibcm:
mlnx_ofed/libibcm.git mlnx_ofed_3_0
commit d7d485df305e6536711485bd7e477668e77d8320
libibmad:
mlnx_ofed_libibmad/libibmad-1.3.12.MLNX20151122.d140cb1.tar.gz

libibprof:
mlnx_ofed_libibprof/libibprof-1.1.22-1.src.rpm

libibumad:
mlnx_ofed_libibumad/libibumad-1.3.10.2.MLNX20150406.966500d.tar.gz

libibverbs:
mlnx_ofed/libibverbs.git mlnx_ofed_3_2_2
commit 217a77686f4861229f0e4b94485a13f024634caf
libmlx4:
mlnx_ofed/libmlx4.git mlnx_ofed_3_2_2
commit dda6d7ae1e6e3779a485ebdd0a882f4bcbd027a6
libmlx5:
mlnx_ofed/libmlx5.git mlnx_ofed_3_2_2
commit d0c8645359e0f0aba0408b2d344f3b418d27019b
libopensmssa:
mlnx_ofed_ssa/plugin/libopensmssa-0.0.9.3.MLNX20151203.50eb579.tar.gz

librdmacm:
mlnx_ofed/librdmacm.git mlnx_ofed_3_2_2
commit 6bd430fed9e7b3d57a1876c040431ce7295c7703
libsdp:
libsdp.git mlnx_ofed_3_0
commit fbd01dfff05f42d6b82506e7dbf4bc6b7e6a59a4
libvma:
vma/source_rpms//libvma-7.0.14-0.src.rpm

mlnx-ethtool:
upstream/ethtool.git for-upstream
commit ac0cf295abe0c0832f0711fed66ab9601c8b2513
mlnx-ofa_kernel:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d

mpi-selector:
ofed-1.5.3-rpms/mpi-selector/mpi-selector-1.0.3-1.src.rpm

mpitests:
mlnx_ofed_mpitest/mpitests-3.2.17-e1c7f2f.src.rpm

mstflint:
mlnx_ofed_mstflint/mstflint-4.3.0-1.49.g9b9af70.tar.gz

multiperf:
mlnx_ofed_multiperf/multiperf-3.0-0.10.gda89e8c.tar.gz

mvapich2:
mlnx_ofed_mvapich2/mvapich2-2.2a-1.src.rpm

mxm:
mlnx_ofed_mxm/mxm-3.4.3079-1.src.rpm

ofed-docs:
docs.git mlnx_ofed-3.2
commit ea3386416f9f7130edd2c70fc3424cb2cda50f7d

openmpi:
mlnx_ofed_ompi_1.8/openmpi-1.10.3a1-1.src.rpm

opensm:
mlnx_ofed_opensm/opensm-4.6.1.MLNX20160112.774e977.tar.gz

perftest:
mlnx_ofed_perftest/perftest-3.0-0.18.gb464d59.tar.gz

qperf:
mlnx_ofed_qperf/qperf-0.4.9.tar.gz

rds-tools:
rds-tools.git mlnx_ofed_2_4
commit 299420ca25cf9996bc0748e3bc4b08748996ba49
sdpnetstat:
sdpnetstat.git mlnx_ofed_3_0
commit 3cf409a7cc07e5c71f9640eddbb801ece21b4169
sockperf:
sockperf/sockperf-2.7-43.git3ee62bd8107a.src.rpm

srp:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d

srptools:
srptools/srptools-1.0.2-12.src.rpm


Installed Packages:
-------------------
infiniband-diags
librdmacm
libmlx4
libibverbs-utils
mpi-selector
libibmad-devel
sdpnetstat
knem
libibumad-devel
libsdp
mlnx-ethtool
libibverbs-debuginfo
mlnx-ofa_kernel-modules
srp
opensm
mstflint
cc_mgr
libibmad
libibverbs
kernel-mft
libibverbs-devel-static
libibumad
librdmacm-devel
mlnx-ofa_kernel
libsdp-devel
ibutils2
mlnxofed-docs
libibmad-static
iser
opensm-devel
dump_pr
libibumad-static
rds-tools
libmlx4-debuginfo
mlnx-ofa_kernel-devel
opensm-libs
opensm-static
ar_mgr
dapl-devel-static
infiniband-diags-compat
libibverbs-devel
ibsim

[root(a)slave3 fio]# ibstat
CA 'mlx4_0'
        CA type: MT4103
        Number of ports: 2
        Firmware version: 2.35.5100
        Hardware version: 0
        Node GUID: 0x248a0703006090e0
        System image GUID: 0x248a0703006090e0
        Port 1:
                State: Active
                Physical state: LinkUp
                Rate: 40
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x04010000
                Port GUID: 0x268a07fffe6090e0
                Link layer: Ethernet
        Port 2:
                State: Active
                Physical state: LinkUp
                Rate: 40
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x04010000
                Port GUID: 0x268a07fffe6090e1
                Link layer: Ethernet

[root(a)slave3 fio]# ibv_devinfo
hca_id: mlx4_0
        transport:                      InfiniBand (0)
        fw_ver:                         2.35.5100
        node_guid:                      248a:0703:0060:90e0
        sys_image_guid:                 248a:0703:0060:90e0
        vendor_id:                      0x02c9
        vendor_part_id:                 4103
        hw_ver:                         0x0
        board_id:                       MT_1090111023
        phys_port_cnt:                  2
                port:   1
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                4096 (5)
                        active_mtu:             1024 (3)
                        sm_lid:                 0
                        port_lid:               0
                        port_lmc:               0x00
                        link_layer:             Ethernet

                port:   2
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                4096 (5)
                        active_mtu:             1024 (3)
                        sm_lid:                 0
                        port_lid:               0
                        port_lmc:               0x00
                        link_layer:             Ethernet

From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Thursday, October 19, 2017 12:34 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Do you install Mellanox OFED on the Target and Client server?
Can you run ibstat on both servers?
Thanks
Victor
________________________________
From: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Sent: Wednesday, October 18, 2017 8:59:28 PM
To: Victor Banh; Storage Performance Development Kit; Harris, James R
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Victor,

I've just tried the SPDK v17.07.1 and DPDK v17.08.

nvme version: 1.1.38.gfaab
fio version: 3.1

Tried the 512k and 1024k IO size and there is no error. demsg information as following.

So that there may be other difference here? Looks like you are using ConnectX-5 while I am using ConnectX-4?

Other related information:

[root(a)node4 fio]# lspci | grep -i mell
81:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
81:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]

[root(a)node4 fio]# lsmod | grep -i mlx
mlx5_ib               172032  0
ib_core               200704  15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,mlx5_ib,ib_srpt,ib_ucm,rdma_ucm
mlx5_core             380928  1 mlx5_ib
ptp                    20480  3 ixgbe,igb,mlx5_core

[root(a)node4 fio]# uname -a
Linux node4 4.10.1 #1 SMP Fri Mar 10 15:59:57 CST 2017 x86_64 x86_64 x86_64 GNU/Linux

[577707.543326] nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 192.168.100.8:4420
[577730.854540] detected loopback device
[577730.893761] nvme nvme0: creating 7 I/O queues.
[577730.893797] detected loopback device
[577730.898611] detected loopback device
[577730.908917] detected loopback device
[577730.919073] detected loopback device
[577730.928922] detected loopback device
[577730.938679] detected loopback device
[577730.948365] detected loopback device
[577731.146290] nvme nvme0: new ctrl: NQN "nqn.2016-06.io.spdk:cnode2", addr 192.168.100.8:4420

Thanks,
Gang

From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Thursday, October 19, 2017 9:43 AM
To: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Gang
Any update?
Do you see any error message from "dmesg" with 512k block size running fio?
Thanks
Victor

From: Victor Banh
Sent: Tuesday, October 17, 2017 7:37 PM
To: 'Cao, Gang' <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Gang
spdk-17.07.1 and dpdk-17.08
Thanks
Victor

From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Monday, October 16, 2017 8:51 PM
To: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Victor,

Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?

Thanks,
Gang

From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Tuesday, October 17, 2017 5:30 AM
To: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Cao
Do you see any message from dmesg?

I tried this fio version and still saw these error message from dmesg.

fio-3.1

[869053.218235] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218250] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218259] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218263] ldm_validate_partition_table(): Disk read failed.
[869053.218269] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218277] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218285] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218292] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218296] Dev nvme2n1: unable to read RDB block 0
[869053.218303] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218311] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218323] Buffer I/O error on dev nvme2n1, logical block 3, async page read
[869053.218338]  nvme2n1: unable to read partition table
[869053.246126] nvme2n1: detected capacity change from -62111005559226368 to -62042256479092736
[869053.246195] ldm_validate_partition_table(): Disk read failed.
[869053.246217] Dev nvme2n1: unable to read RDB block 0

From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Monday, October 09, 2017 10:59 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Cc: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Victor,

Thanks for your detailed information on the testing.

I've tried the latest SPDK code and with latest fio-3.1-20-g132b and fio-2.19. It seems like no this kind of error.

Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?

fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme0n1  --name=read-phase --rw=randwrite
read-phase: (g=0): rw=randwrite, bs=(R) 512KiB-512KiB, (W) 512KiB-512KiB, (T) 512KiB-512KiB, ioengine=libaio, iodepth=16
...
fio-3.1-20-g132b
Starting 4 processes
Jobs: 4 (f=4): [w(4)][100.0%][r=0KiB/s,w=1592MiB/s][r=0,w=3183 IOPS][eta 00m:00s]
read-phase: (groupid=0, jobs=1): err= 0: pid=46378: Tue Oct 10 01:23:39 2017

My NIC information:
[root(a)node4 nvme-cli-gerrit]# lsmod | grep -i mlx
mlx5_ib               172032  0
ib_core               200704  15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,mlx5_ib,ib_srpt,ib_ucm,rdma_ucm
mlx5_core             380928  1 mlx5_ib
ptp                    20480  3 ixgbe,igb,mlx5_core
[root(a)node4 nvme-cli-gerrit]# lspci | grep -i mell
81:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
81:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]

From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Victor Banh
Sent: Friday, October 6, 2017 2:41 PM
To: Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio



From: Harris, James R [mailto:james.r.harris(a)intel.com]
Sent: Friday, October 06, 2017 2:32 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Cc: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio

(cc Victor)

From: James Harris <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Date: Thursday, October 5, 2017 at 1:59 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio

Hi Victor,

Could you provide a few more details?  This will help the list to provide some ideas.


1)      On the client, are you using the SPDK NVMe-oF initiator or the kernel initiator?


Kernel initiator, run these commands on client server.



modprobe mlx5_ib

modprobe nvme-rdma

nvme discover -t rdma -a 192.168.10.11 -s 4420

nvme connect -t rdma -n nqn.2016-06.io.spdk:nvme-subsystem-1  -a 192.168.10.11 -s 4420




2)      Can you provide the fio configuration file or command line?  Just so we can have more specifics on "bigger block size".


fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme1n1  --name=read-phase --rw=randwrite


3)      Any details on the HW setup - specifically details on the RDMA NIC (or if you're using SW RoCE).

Nvmf.conf on target server

[Global]
  Comment "Global section"
    ReactorMask 0xff00

[Rpc]
  Enable No
  Listen 127.0.0.1

[Nvmf]
  MaxQueuesPerSession 8
  MaxQueueDepth 128

[Subsystem1]
  NQN nqn.2016-06.io.spdk:nvme-subsystem-1
  Core 9
  Mode Direct
  Listen RDMA 192.168.10.11:4420
  NVMe 0000:82:00.0
  SN S2PMNAAH400039


           It is RDMA NIC, ConnectX 5, Intel CPU Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
NUMA node0 CPU(s):     0-7
NUMA node1 CPU(s):     8-15




Thanks,

-Jim


From: SPDK <spdk-bounces(a)lists.01.org<mailto:spdk-bounces(a)lists.01.org>> on behalf of Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Date: Thursday, October 5, 2017 at 11:26 AM
To: "spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>" <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: [SPDK] Buffer I/O error on bigger block size running fio

Hi
I have SPDK NVMeoF and keep getting error with bigger block size with fio on randwrite tests.
I am using Ubuntu 16.04 with kernel version 4.12.0-041200-generic on target and client.
The DPDK is 17.08 and SPDK is 17.07.1.
Thanks
Victor


[46905.233553] perf: interrupt took too long (2503 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
[48285.159186] blk_update_request: I/O error, dev nvme1n1, sector 2507351968
[48285.159207] blk_update_request: I/O error, dev nvme1n1, sector 1301294496
[48285.159226] blk_update_request: I/O error, dev nvme1n1, sector 1947371168
[48285.159239] blk_update_request: I/O error, dev nvme1n1, sector 1891797568
[48285.159252] blk_update_request: I/O error, dev nvme1n1, sector 10833824
[48285.159265] blk_update_request: I/O error, dev nvme1n1, sector 614937152
[48285.159277] blk_update_request: I/O error, dev nvme1n1, sector 1872305088
[48285.159290] blk_update_request: I/O error, dev nvme1n1, sector 1504491040
[48285.159299] blk_update_request: I/O error, dev nvme1n1, sector 1182136128
[48285.159308] blk_update_request: I/O error, dev nvme1n1, sector 1662985792
[48285.191185] nvme nvme1: Reconnecting in 10 seconds...
[48285.191254] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191291] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191305] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191314] ldm_validate_partition_table(): Disk read failed.
[48285.191320] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191327] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191335] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191342] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191347] Dev nvme1n1: unable to read RDB block 0
[48285.191353] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191360] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191375] Buffer I/O error on dev nvme1n1, logical block 3, async page read
[48285.191389]  nvme1n1: unable to read partition table
[48285.223197] nvme1n1: detected capacity change from 1600321314816 to 0
[48289.623192] nvme1n1: detected capacity change from 0 to -65647705833078784
[48289.623411] ldm_validate_partition_table(): Disk read failed.
[48289.623447] Dev nvme1n1: unable to read RDB block 0
[48289.623486]  nvme1n1: unable to read partition table
[48289.643305] ldm_validate_partition_table(): Disk read failed.
[48289.643328] Dev nvme1n1: unable to read RDB block 0
[48289.643373]  nvme1n1: unable to read partition table





[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 139622 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [SPDK] Buffer I/O error on bigger block size running fio
@ 2018-03-03  4:22 Victor Banh
  0 siblings, 0 replies; 24+ messages in thread
From: Victor Banh @ 2018-03-03  4:22 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 24596 bytes --]



Hi Cao
Have you tried DPDK 18.02 with SPDK V18.01?
Thanks
Victor


From: Victor Banh
Sent: Saturday, October 21, 2017 8:23 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org>; Harris, James R <james.r.harris(a)intel.com>; John F. Kim <johnk(a)mellanox.com>; Cao, Gang <gang.cao(a)intel.com>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

I am using NVMe.
Thanks
Victor
________________________________
From: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Sent: Saturday, October 21, 2017 1:21:03 AM
To: Victor Banh; Storage Performance Development Kit; Harris, James R; John F. Kim
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

I am also using the SPDK v17.07.1 and DPDK 17.08 on my system to have the trying based on your IO configuration.

Let me check whether our colleague has this Ubuntu OS installed. If not, could you please have a try on CentOS.

Also what kind of backend device did you configured for the NVMe-oF, like Malloc, NVMe SSD or other?

Thanks,
Gang

From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Saturday, October 21, 2017 6:14 AM
To: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; John F. Kim <johnk(a)mellanox.com<mailto:johnk(a)mellanox.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Gang
Is it possible to try on Ubuntu OS?
Which verion of DPDK and SPDK are you using?
Where can I get them? Github?
Thanks
Victor

root(a)clientintel:~# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 16.04.3 LTS
Release:        16.04
Codename:       xenial
root(a)clientintel:~# uname -a
Linux clientintel 4.12.0-041200-generic #201707022031 SMP Mon Jul 3 00:32:52 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux



From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Wednesday, October 18, 2017 11:52 PM
To: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

[root(a)node4 gangcao]# uname -a
Linux node4 4.10.1 #1 SMP Fri Mar 10 15:59:57 CST 2017 x86_64 x86_64 x86_64 GNU/Linux

[root(a)node4 gangcao]# lsb_release -a
LSB Version:    :core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch
Distributor ID: CentOS
Description:    CentOS Linux release 7.2.1511 (Core)
Release:        7.2.1511
Codename:       Core

From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Thursday, October 19, 2017 2:14 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Can you give the OS version and kernel version again for target and client?
I couldn't compile dkdp without installed the latest Kernel on 4.12.X kernel on Ubuntu 16.04.
Thanks
Victor
________________________________
From: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Sent: Wednesday, October 18, 2017 10:59:19 PM
To: Victor Banh; Storage Performance Development Kit; Harris, James R
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

This server does not have OFED installed and it is in the loopback mode.

Found another server also in the loopback mode with ConnectX-3 and have OFED installed.

By the way, what SSD are you using? Maybe it is relating to the SSD? I've just run with 2048k for a short duration and seems no issue. Will run more time to see whether can hit this error.

[root(a)slave3 fio]# lspci | grep -i mell
08:00.0 Ethernet controller: Mellanox Technologies MT27520 Family [ConnectX-3 Pro]

[root(a)slave3 fio]# lsmod | grep -i mlx
mlx4_ib               159744  0
ib_core               208896  15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,ib_srpt,ib_ucm,rdma_ucm,mlx4_ib
mlx4_en               114688  0
mlx4_core             307200  2 mlx4_en,mlx4_ib
ptp                    20480  3 ixgbe,igb,mlx4_en

[root(a)slave3 fio]# ofed_info
MLNX_OFED_LINUX-3.2-2.0.0.0 (OFED-3.2-2.0.0):

ar_mgr:
osm_plugins/ar_mgr/ar_mgr-1.0-0.30.ga1ea4b7.tar.gz

cc_mgr:
osm_plugins/cc_mgr/cc_mgr-1.0-0.29.ga1ea4b7.tar.gz

dapl:
dapl.git mlnx_ofed_3_1
commit c30fb6ce2cbc29d8ed4bde51437f7abb93378c78
dump_pr:
osm_plugins/dump_pr//dump_pr-1.0-0.25.ga1ea4b7.tar.gz

fabric-collector:
fabric_collector//fabric-collector-1.1.0.MLNX20140410.51b267e.tar.gz

fca:
mlnx_ofed_fca/fca-2.5.2431-1.src.rpm

hcoll:
mlnx_ofed_hcol/hcoll-3.4.807-1.src.rpm

ibacm:
mlnx_ofed/ibacm.git mlnx_ofed_3_2
commit 15ad8c13bdebbe62edea0b7df030710b65c14f7f
ibacm_ssa:
mlnx_ofed_ssa/acm/ibacm_ssa-0.0.9.3.MLNX20151203.50eb579.tar.gz

ibdump:
sniffer/sniffer-4.0.0-2/ibdump/linux/ibdump-4.0.0-2.tgz

ibsim:
mlnx_ofed_ibsim/ibsim-0.6-0.8.g9d76581.tar.gz

ibssa:
mlnx_ofed_ssa/distrib/ibssa-0.0.9.3.MLNX20151203.50eb579.tar.gz

ibutils:
ofed-1.5.3-rpms/ibutils/ibutils-1.5.7.1-0.12.gdcaeae2.tar.gz

ibutils2:
ibutils2/ibutils2-2.1.1-0.76.MLNX20160222.gd366c7b.tar.gz

infiniband-diags:
mlnx_ofed_infiniband_diags/infiniband-diags-1.6.6.MLNX20151130.7f0213e.tar.gz

infinipath-psm:
mlnx_ofed_infinipath-psm/infinipath-psm-3.3-2_g6f42cdb_open.tar.gz

iser:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d

isert:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d

kernel-mft:
mlnx_ofed_mft/kernel-mft-4.3.0-25.src.rpm

knem:
knem.git mellanox-master
commit f143ee19a575cd42a334422fa8bd329d671238db
libibcm:
mlnx_ofed/libibcm.git mlnx_ofed_3_0
commit d7d485df305e6536711485bd7e477668e77d8320
libibmad:
mlnx_ofed_libibmad/libibmad-1.3.12.MLNX20151122.d140cb1.tar.gz

libibprof:
mlnx_ofed_libibprof/libibprof-1.1.22-1.src.rpm

libibumad:
mlnx_ofed_libibumad/libibumad-1.3.10.2.MLNX20150406.966500d.tar.gz

libibverbs:
mlnx_ofed/libibverbs.git mlnx_ofed_3_2_2
commit 217a77686f4861229f0e4b94485a13f024634caf
libmlx4:
mlnx_ofed/libmlx4.git mlnx_ofed_3_2_2
commit dda6d7ae1e6e3779a485ebdd0a882f4bcbd027a6
libmlx5:
mlnx_ofed/libmlx5.git mlnx_ofed_3_2_2
commit d0c8645359e0f0aba0408b2d344f3b418d27019b
libopensmssa:
mlnx_ofed_ssa/plugin/libopensmssa-0.0.9.3.MLNX20151203.50eb579.tar.gz

librdmacm:
mlnx_ofed/librdmacm.git mlnx_ofed_3_2_2
commit 6bd430fed9e7b3d57a1876c040431ce7295c7703
libsdp:
libsdp.git mlnx_ofed_3_0
commit fbd01dfff05f42d6b82506e7dbf4bc6b7e6a59a4
libvma:
vma/source_rpms//libvma-7.0.14-0.src.rpm

mlnx-ethtool:
upstream/ethtool.git for-upstream
commit ac0cf295abe0c0832f0711fed66ab9601c8b2513
mlnx-ofa_kernel:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d

mpi-selector:
ofed-1.5.3-rpms/mpi-selector/mpi-selector-1.0.3-1.src.rpm

mpitests:
mlnx_ofed_mpitest/mpitests-3.2.17-e1c7f2f.src.rpm

mstflint:
mlnx_ofed_mstflint/mstflint-4.3.0-1.49.g9b9af70.tar.gz

multiperf:
mlnx_ofed_multiperf/multiperf-3.0-0.10.gda89e8c.tar.gz

mvapich2:
mlnx_ofed_mvapich2/mvapich2-2.2a-1.src.rpm

mxm:
mlnx_ofed_mxm/mxm-3.4.3079-1.src.rpm

ofed-docs:
docs.git mlnx_ofed-3.2
commit ea3386416f9f7130edd2c70fc3424cb2cda50f7d

openmpi:
mlnx_ofed_ompi_1.8/openmpi-1.10.3a1-1.src.rpm

opensm:
mlnx_ofed_opensm/opensm-4.6.1.MLNX20160112.774e977.tar.gz

perftest:
mlnx_ofed_perftest/perftest-3.0-0.18.gb464d59.tar.gz

qperf:
mlnx_ofed_qperf/qperf-0.4.9.tar.gz

rds-tools:
rds-tools.git mlnx_ofed_2_4
commit 299420ca25cf9996bc0748e3bc4b08748996ba49
sdpnetstat:
sdpnetstat.git mlnx_ofed_3_0
commit 3cf409a7cc07e5c71f9640eddbb801ece21b4169
sockperf:
sockperf/sockperf-2.7-43.git3ee62bd8107a.src.rpm

srp:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d

srptools:
srptools/srptools-1.0.2-12.src.rpm


Installed Packages:
-------------------
infiniband-diags
librdmacm
libmlx4
libibverbs-utils
mpi-selector
libibmad-devel
sdpnetstat
knem
libibumad-devel
libsdp
mlnx-ethtool
libibverbs-debuginfo
mlnx-ofa_kernel-modules
srp
opensm
mstflint
cc_mgr
libibmad
libibverbs
kernel-mft
libibverbs-devel-static
libibumad
librdmacm-devel
mlnx-ofa_kernel
libsdp-devel
ibutils2
mlnxofed-docs
libibmad-static
iser
opensm-devel
dump_pr
libibumad-static
rds-tools
libmlx4-debuginfo
mlnx-ofa_kernel-devel
opensm-libs
opensm-static
ar_mgr
dapl-devel-static
infiniband-diags-compat
libibverbs-devel
ibsim

[root(a)slave3 fio]# ibstat
CA 'mlx4_0'
        CA type: MT4103
        Number of ports: 2
        Firmware version: 2.35.5100
        Hardware version: 0
        Node GUID: 0x248a0703006090e0
        System image GUID: 0x248a0703006090e0
        Port 1:
                State: Active
                Physical state: LinkUp
                Rate: 40
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x04010000
                Port GUID: 0x268a07fffe6090e0
                Link layer: Ethernet
        Port 2:
                State: Active
                Physical state: LinkUp
                Rate: 40
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x04010000
                Port GUID: 0x268a07fffe6090e1
                Link layer: Ethernet

[root(a)slave3 fio]# ibv_devinfo
hca_id: mlx4_0
        transport:                      InfiniBand (0)
        fw_ver:                         2.35.5100
        node_guid:                      248a:0703:0060:90e0
        sys_image_guid:                 248a:0703:0060:90e0
        vendor_id:                      0x02c9
        vendor_part_id:                 4103
        hw_ver:                         0x0
        board_id:                       MT_1090111023
        phys_port_cnt:                  2
                port:   1
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                4096 (5)
                        active_mtu:             1024 (3)
                        sm_lid:                 0
                        port_lid:               0
                        port_lmc:               0x00
                        link_layer:             Ethernet

                port:   2
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                4096 (5)
                        active_mtu:             1024 (3)
                        sm_lid:                 0
                        port_lid:               0
                        port_lmc:               0x00
                        link_layer:             Ethernet

From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Thursday, October 19, 2017 12:34 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Do you install Mellanox OFED on the Target and Client server?
Can you run ibstat on both servers?
Thanks
Victor
________________________________
From: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Sent: Wednesday, October 18, 2017 8:59:28 PM
To: Victor Banh; Storage Performance Development Kit; Harris, James R
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Victor,

I've just tried the SPDK v17.07.1 and DPDK v17.08.

nvme version: 1.1.38.gfaab
fio version: 3.1

Tried the 512k and 1024k IO size and there is no error. demsg information as following.

So that there may be other difference here? Looks like you are using ConnectX-5 while I am using ConnectX-4?

Other related information:

[root(a)node4 fio]# lspci | grep -i mell
81:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
81:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]

[root(a)node4 fio]# lsmod | grep -i mlx
mlx5_ib               172032  0
ib_core               200704  15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,mlx5_ib,ib_srpt,ib_ucm,rdma_ucm
mlx5_core             380928  1 mlx5_ib
ptp                    20480  3 ixgbe,igb,mlx5_core

[root(a)node4 fio]# uname -a
Linux node4 4.10.1 #1 SMP Fri Mar 10 15:59:57 CST 2017 x86_64 x86_64 x86_64 GNU/Linux

[577707.543326] nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 192.168.100.8:4420
[577730.854540] detected loopback device
[577730.893761] nvme nvme0: creating 7 I/O queues.
[577730.893797] detected loopback device
[577730.898611] detected loopback device
[577730.908917] detected loopback device
[577730.919073] detected loopback device
[577730.928922] detected loopback device
[577730.938679] detected loopback device
[577730.948365] detected loopback device
[577731.146290] nvme nvme0: new ctrl: NQN "nqn.2016-06.io.spdk:cnode2", addr 192.168.100.8:4420

Thanks,
Gang

From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Thursday, October 19, 2017 9:43 AM
To: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Gang
Any update?
Do you see any error message from "dmesg" with 512k block size running fio?
Thanks
Victor

From: Victor Banh
Sent: Tuesday, October 17, 2017 7:37 PM
To: 'Cao, Gang' <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Gang
spdk-17.07.1 and dpdk-17.08
Thanks
Victor

From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Monday, October 16, 2017 8:51 PM
To: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Victor,

Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?

Thanks,
Gang

From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Tuesday, October 17, 2017 5:30 AM
To: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Cao
Do you see any message from dmesg?

I tried this fio version and still saw these error message from dmesg.

fio-3.1

[869053.218235] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218250] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218259] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218263] ldm_validate_partition_table(): Disk read failed.
[869053.218269] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218277] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218285] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218292] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218296] Dev nvme2n1: unable to read RDB block 0
[869053.218303] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218311] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218323] Buffer I/O error on dev nvme2n1, logical block 3, async page read
[869053.218338]  nvme2n1: unable to read partition table
[869053.246126] nvme2n1: detected capacity change from -62111005559226368 to -62042256479092736
[869053.246195] ldm_validate_partition_table(): Disk read failed.
[869053.246217] Dev nvme2n1: unable to read RDB block 0

From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Monday, October 09, 2017 10:59 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Cc: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Victor,

Thanks for your detailed information on the testing.

I've tried the latest SPDK code and with latest fio-3.1-20-g132b and fio-2.19. It seems like no this kind of error.

Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?

fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme0n1  --name=read-phase --rw=randwrite
read-phase: (g=0): rw=randwrite, bs=(R) 512KiB-512KiB, (W) 512KiB-512KiB, (T) 512KiB-512KiB, ioengine=libaio, iodepth=16
...
fio-3.1-20-g132b
Starting 4 processes
Jobs: 4 (f=4): [w(4)][100.0%][r=0KiB/s,w=1592MiB/s][r=0,w=3183 IOPS][eta 00m:00s]
read-phase: (groupid=0, jobs=1): err= 0: pid=46378: Tue Oct 10 01:23:39 2017

My NIC information:
[root(a)node4 nvme-cli-gerrit]# lsmod | grep -i mlx
mlx5_ib               172032  0
ib_core               200704  15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,mlx5_ib,ib_srpt,ib_ucm,rdma_ucm
mlx5_core             380928  1 mlx5_ib
ptp                    20480  3 ixgbe,igb,mlx5_core
[root(a)node4 nvme-cli-gerrit]# lspci | grep -i mell
81:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
81:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]

From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Victor Banh
Sent: Friday, October 6, 2017 2:41 PM
To: Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio



From: Harris, James R [mailto:james.r.harris(a)intel.com]
Sent: Friday, October 06, 2017 2:32 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Cc: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio

(cc Victor)

From: James Harris <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Date: Thursday, October 5, 2017 at 1:59 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio

Hi Victor,

Could you provide a few more details?  This will help the list to provide some ideas.


1)      On the client, are you using the SPDK NVMe-oF initiator or the kernel initiator?


Kernel initiator, run these commands on client server.



modprobe mlx5_ib

modprobe nvme-rdma

nvme discover -t rdma -a 192.168.10.11 -s 4420

nvme connect -t rdma -n nqn.2016-06.io.spdk:nvme-subsystem-1  -a 192.168.10.11 -s 4420




2)      Can you provide the fio configuration file or command line?  Just so we can have more specifics on "bigger block size".


fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme1n1  --name=read-phase --rw=randwrite


3)      Any details on the HW setup - specifically details on the RDMA NIC (or if you're using SW RoCE).

Nvmf.conf on target server

[Global]
  Comment "Global section"
    ReactorMask 0xff00

[Rpc]
  Enable No
  Listen 127.0.0.1

[Nvmf]
  MaxQueuesPerSession 8
  MaxQueueDepth 128

[Subsystem1]
  NQN nqn.2016-06.io.spdk:nvme-subsystem-1
  Core 9
  Mode Direct
  Listen RDMA 192.168.10.11:4420
  NVMe 0000:82:00.0
  SN S2PMNAAH400039


           It is RDMA NIC, ConnectX 5, Intel CPU Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
NUMA node0 CPU(s):     0-7
NUMA node1 CPU(s):     8-15




Thanks,

-Jim


From: SPDK <spdk-bounces(a)lists.01.org<mailto:spdk-bounces(a)lists.01.org>> on behalf of Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Date: Thursday, October 5, 2017 at 11:26 AM
To: "spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>" <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: [SPDK] Buffer I/O error on bigger block size running fio

Hi
I have SPDK NVMeoF and keep getting error with bigger block size with fio on randwrite tests.
I am using Ubuntu 16.04 with kernel version 4.12.0-041200-generic on target and client.
The DPDK is 17.08 and SPDK is 17.07.1.
Thanks
Victor


[46905.233553] perf: interrupt took too long (2503 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
[48285.159186] blk_update_request: I/O error, dev nvme1n1, sector 2507351968
[48285.159207] blk_update_request: I/O error, dev nvme1n1, sector 1301294496
[48285.159226] blk_update_request: I/O error, dev nvme1n1, sector 1947371168
[48285.159239] blk_update_request: I/O error, dev nvme1n1, sector 1891797568
[48285.159252] blk_update_request: I/O error, dev nvme1n1, sector 10833824
[48285.159265] blk_update_request: I/O error, dev nvme1n1, sector 614937152
[48285.159277] blk_update_request: I/O error, dev nvme1n1, sector 1872305088
[48285.159290] blk_update_request: I/O error, dev nvme1n1, sector 1504491040
[48285.159299] blk_update_request: I/O error, dev nvme1n1, sector 1182136128
[48285.159308] blk_update_request: I/O error, dev nvme1n1, sector 1662985792
[48285.191185] nvme nvme1: Reconnecting in 10 seconds...
[48285.191254] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191291] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191305] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191314] ldm_validate_partition_table(): Disk read failed.
[48285.191320] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191327] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191335] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191342] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191347] Dev nvme1n1: unable to read RDB block 0
[48285.191353] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191360] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191375] Buffer I/O error on dev nvme1n1, logical block 3, async page read
[48285.191389]  nvme1n1: unable to read partition table
[48285.223197] nvme1n1: detected capacity change from 1600321314816 to 0
[48289.623192] nvme1n1: detected capacity change from 0 to -65647705833078784
[48289.623411] ldm_validate_partition_table(): Disk read failed.
[48289.623447] Dev nvme1n1: unable to read RDB block 0
[48289.623486]  nvme1n1: unable to read partition table
[48289.643305] ldm_validate_partition_table(): Disk read failed.
[48289.643328] Dev nvme1n1: unable to read RDB block 0
[48289.643373]  nvme1n1: unable to read partition table





[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 124749 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [SPDK] Buffer I/O error on bigger block size running fio
@ 2017-10-24  5:04 Cao, Gang
  0 siblings, 0 replies; 24+ messages in thread
From: Cao, Gang @ 2017-10-24  5:04 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 41661 bytes --]

I have run the NVMe-oF in the loopback mode. Both host and target on the same server.

From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Tuesday, October 24, 2017 12:57 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org>; Harris, James R <james.r.harris(a)intel.com>; John F. Kim <johnk(a)mellanox.com>; Cao, Gang <gang.cao(a)intel.com>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Can you run dmesg on client?
Thanks
Victor
________________________________
From: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Sent: Monday, October 23, 2017 9:25:16 PM
To: Victor Banh; Storage Performance Development Kit; Harris, James R; John F. Kim
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Victor,

I've just successfully run the SPDK NVMe-oF target in the loopback mode with Ubuntu 16.04 and Kernel 4.13.7. Detailed information as following.

Also tried the 5120K big IO size from the FIO for 5 minutes.

root(a)waikikibeach111:~# uname -a
Linux waikikibeach111 4.13.7 #1 SMP Sun Oct 22 23:24:08 PDT 2017 x86_64 x86_64 x86_64 GNU/Linux

root(a)waikikibeach111:~# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 16.04.3 LTS
Release:        16.04
Codename:       xenial

root(a)waikikibeach111:~# lspci | grep -i mellanox
03:00.0 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4]
03:00.1 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4]

root(a)waikikibeach111:~# ibstat
CA 'mlx5_0'
        CA type: MT4115
        Number of ports: 1
        Firmware version: 12.18.1000
        Hardware version: 0
        Node GUID: 0x248a07030049d218
        System image GUID: 0x248a07030049d218
        Port 1:
                State: Active
                Physical state: LinkUp
                Rate: 100
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x04010000
                Port GUID: 0x268a07fffe49d218
                Link layer: Ethernet
CA 'mlx5_1'
        CA type: MT4115
        Number of ports: 1
        Firmware version: 12.18.1000
        Hardware version: 0
        Node GUID: 0x248a07030049d219
        System image GUID: 0x248a07030049d218
        Port 1:
                State: Active
                Physical state: LinkUp
                Rate: 100
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x04010000
                Port GUID: 0x268a07fffe49d219
                Link layer: Ethernet

root(a)waikikibeach111:~# cd /home/gangcao/spdk
root(a)waikikibeach111:/home/gangcao/spdk# git branch
* (HEAD detached at v17.07.1)  --> DPDK 17.08
  master

root(a)waikikibeach111:/home/gangcao/spdk# nvme --version
nvme version 1.2

root(a)waikikibeach111:/home/gangcao/spdk# /home/gangcao/fio/fio --version
fio-3.1

root(a)waikikibeach111:/home/gangcao/spdk# lspci | grep -i volatile
84:00.0 Non-Volatile memory controller: Intel Corporation PCIe Data Center SSD (rev 01)

root(a)waikikibeach111:/home/gangcao/spdk# lspci -s 84:00.0 -v
84:00.0 Non-Volatile memory controller: Intel Corporation PCIe Data Center SSD (rev 01) (prog-if 02 [NVM Express])
        Subsystem: Intel Corporation DC P3700 SSD [2.5" SFF]
        Physical Slot: 37
        Flags: bus master, fast devsel, latency 0, IRQ 39
        Memory at fbc10000 (64-bit, non-prefetchable) [size=16K]
        Expansion ROM at fbc00000 [disabled] [size=64K]
        Capabilities: [40] Power Management version 3
        Capabilities: [50] MSI-X: Enable- Count=32 Masked-
        Capabilities: [60] Express Endpoint, MSI 00
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [150] Virtual Channel
        Capabilities: [180] Power Budgeting <?>
        Capabilities: [190] Alternative Routing-ID Interpretation (ARI)
        Capabilities: [270] Device Serial Number 55-cd-2e-41-4d-2f-5b-5c
        Capabilities: [2a0] #19
        Kernel driver in use: uio_pci_generic
        Kernel modules: nvme

root(a)waikikibeach111:/home/gangcao/spdk# cat test/nvmf/fio/nvmf.conf
# NVMf Target Configuration File
#
# Please write all parameters using ASCII.
# The parameter must be quoted if it includes whitespace.
#
# Configuration syntax:
# Leading whitespace is ignored.
# Lines starting with '#' are comments.
# Lines ending with '\' are concatenated with the next line.
# Bracketed ([]) names define sections

[Global]
  # Users can restrict work items to only run on certain cores by
  #  specifying a ReactorMask.  Default ReactorMask mask is defined as
  #  -c option in the 'ealargs' setting at beginning of file nvmf_tgt.c.
  #ReactorMask 0x00FF

  # Tracepoint group mask for spdk trace buffers
  # Default: 0x0 (all tracepoint groups disabled)
  # Set to 0xFFFFFFFFFFFFFFFF to enable all tracepoint groups.
  #TpointGroupMask 0x0

  # syslog facility
  LogFacility "local7"

[Rpc]
  # Defines whether to enable configuration via RPC.
  # Default is disabled.  Note that the RPC interface is not
  # authenticated, so users should be careful about enabling
  # RPC in non-trusted environments.
  Enable no
  # Listen address for the RPC service.
  # May be an IP address or an absolute path to a Unix socket.
  Listen 192.168.5.11

# Users may change this section to create a different number or size of
#  malloc LUNs.
# This will generate 8 LUNs with a malloc-allocated backend.
# Each LUN will be size 64MB and these will be named
# Malloc0 through Malloc7.  Not all LUNs defined here are necessarily
#  used below.
#[Malloc]
  #NumberOfLuns 8
  #LunSizeInMB 64

# Users must change this section to match the /dev/sdX devices to be virtual
# NVMe devices. The devices are accessed using Linux AIO.
#[AIO]
  #AIO /dev/sdb
  #AIO /dev/sdc

# Define NVMf protocol global options
[Nvmf]
  # Set the maximum number of submission and completion queues per session.
  # Setting this to '8', for example, allows for 8 submission and 8 completion queues
  # per session.
  MaxQueuesPerSession 8

  # Set the maximum number of outstanding I/O per queue.
  MaxQueueDepth 512

  # Set the maximum in-capsule data size. Must be a multiple of 16.
  #InCapsuleDataSize 4096

  # Set the maximum I/O size. Must be a multiple of 4096.
  #MaxIOSize 131072

  # Set the global acceptor lcore ID, lcores are numbered starting at 0.
  #AcceptorCore 0

  # Set how often the acceptor polls for incoming connections. The acceptor is also
  # responsible for polling existing connections that have gone idle. 0 means continuously
  # poll. Units in microseconds.
  AcceptorPollRate 10000

[Nvme]
  # Registers the application to receive timeout callback and to reset the controller.
  #ResetControllerOnTimeout Yes
  # Timeout value.
  #NvmeTimeoutValue 30
  # Set how often the admin queue is polled for asynchronous events
  # Units in microseconds.
  #AdminPollRate 100000
  HotplugEnable no

# The Split virtual block device slices block devices into multiple smaller bdevs.
#[Split]
  # Syntax:
  #   Split <bdev> <count> [<size_in_megabytes>]

  # Split Malloc2 into two equally-sized portions, Malloc2p0 and Malloc2p1
  #Split Malloc2 2

  # Split Malloc3 into eight 1-megabyte portions, Malloc3p0 ... Malloc3p7,
  # leaving the rest of the device inaccessible
  #Split Malloc3 8 1

# Define an NVMf Subsystem.
# - NQN is required and must be unique.
# - Core may be set or not. If set, the specified subsystem will run on
#   it, otherwise each subsystem will use a round-robin method to allocate
#   core from available cores,  lcores are numbered starting at 0.
# - Mode may be either "Direct" or "Virtual". Direct means that physical
#   devices attached to the target will be presented to hosts as if they
#   were directly attached to the host. No software emulation or command
#   validation is performed. Virtual means that an NVMe controller is
#   emulated in software and the namespaces it contains map to block devices
#   on the target system. These block devices do not need to be NVMe devices.
#   Only Direct mode is currently supported.
# - Between 1 and 255 Listen directives are allowed. This defines
#   the addresses on which new connections may be accepted. The format
#   is Listen <type> <address> where type currently can only be RDMA.
# - Between 0 and 255 Host directives are allowed. This defines the
#   NQNs of allowed hosts. If no Host directive is specified, all hosts
#   are allowed to connect.
# - Exactly 1 NVMe directive specifying an NVMe device by PCI BDF. The
#   PCI domain:bus:device.function can be replaced by "*" to indicate
#   any PCI device.

# Direct controller
[Subsystem1]
  NQN nqn.2016-06.io.spdk:cnode2
  Core 0
  Mode Direct
  Listen rdma 192.168.5.11:4420
  #Host nqn.2016-06.io.spdk:init
  NVMe 0000:84:00.0

root(a)waikikibeach111:~# dmesg -w -T
[Mon Oct 23 19:45:21 2017] mlx5_core 0000:03:00.0: firmware version: 12.18.1000
[Mon Oct 23 19:45:21 2017] (0000:03:00.0): E-Switch: Total vports 1, l2 table size(65536), per vport: max uc(1024) max mc(16384)
[Mon Oct 23 19:45:21 2017] mlx5_core 0000:03:00.0: Port module event: module 0, Cable plugged
[Mon Oct 23 19:45:21 2017] mlx5_core 0000:03:00.1: firmware version: 12.18.1000
[Mon Oct 23 19:45:22 2017] (0000:03:00.1): E-Switch: Total vports 1, l2 table size(65536), per vport: max uc(1024) max mc(16384)
[Mon Oct 23 19:45:22 2017] mlx5_core 0000:03:00.1: Port module event: module 1, Cable plugged
[Mon Oct 23 19:45:22 2017] mlx5_core 0000:03:00.0: MLX5E: StrdRq(0) RqSz(1024) StrdSz(1) RxCqeCmprss(0)
[Mon Oct 23 19:45:22 2017] mlx5_core 0000:03:00.1: MLX5E: StrdRq(0) RqSz(1024) StrdSz(1) RxCqeCmprss(0)
[Mon Oct 23 19:45:22 2017] mlx5_core 0000:03:00.0 ens1f0: renamed from eth0
[Mon Oct 23 19:45:23 2017] mlx5_core 0000:03:00.1 ens1f1: renamed from eth0
[Mon Oct 23 19:45:23 2017] mlx5_ib: Mellanox Connect-IB Infiniband driver v5.0-0
[Mon Oct 23 19:47:01 2017] mlx5_core 0000:03:00.0 ens1f0: Link up
[Mon Oct 23 19:47:03 2017] mlx5_core 0000:03:00.1 ens1f1: Link up
[Mon Oct 23 19:54:26 2017] nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 192.168.5.11:4420
[Mon Oct 23 19:54:49 2017] nvme nvme0: creating 7 I/O queues.
[Mon Oct 23 19:54:50 2017] nvme nvme0: new ctrl: NQN "nqn.2016-06.io.spdk:cnode2", addr 192.168.5.11:4420

root(a)waikikibeach111:/home/gangcao/fio# ./fio --bs=5120k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=300 --filename=/dev/nvme0n1  --name=read-phase --rw=randwrite
read-phase: (g=0): rw=randwrite, bs=(R) 5120KiB-5120KiB, (W) 5120KiB-5120KiB, (T) 5120KiB-5120KiB, ioengine=libaio, iodepth=16
...
fio-3.1
Starting 4 processes
Jobs: 4 (f=4): [w(4)][100.0%][r=0KiB/s,w=220MiB/s][r=0,w=44 IOPS][eta 00m:00s]
read-phase: (groupid=0, jobs=1): err= 0: pid=17076: Mon Oct 23 20:05:47 2017
  write: IOPS=19, BW=96.2MiB/s (101MB/s)(28.2GiB/300399msec)
    slat (usec): min=387, max=350086, avg=51937.15, stdev=38959.27
    clat (msec): min=199, max=2162, avg=779.50, stdev=553.60
     lat (msec): min=214, max=2287, avg=831.44, stdev=589.70
    clat percentiles (msec):
     |  1.00th=[  209],  5.00th=[  275], 10.00th=[  279], 20.00th=[  296],
     | 30.00th=[  321], 40.00th=[  397], 50.00th=[  489], 60.00th=[  709],
     | 70.00th=[ 1133], 80.00th=[ 1485], 90.00th=[ 1586], 95.00th=[ 1787],
     | 99.00th=[ 1989], 99.50th=[ 2039], 99.90th=[ 2123], 99.95th=[ 2140],
     | 99.99th=[ 2165]
   bw (  KiB/s): min=10260, max=368640, per=25.13%, avg=98423.58, stdev=77390.22, samples=600
   iops        : min=    2, max=   72, avg=19.20, stdev=15.10, samples=600
  lat (msec)   : 250=3.70%, 500=47.17%, 750=11.84%, 1000=5.59%, 2000=30.83%
  lat (msec)   : >=2000=0.87%
  cpu          : usr=0.61%, sys=0.48%, ctx=27957, majf=0, minf=91
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=99.7%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwt: total=0,5777,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16
read-phase: (groupid=0, jobs=1): err= 0: pid=17077: Mon Oct 23 20:05:47 2017
  write: IOPS=18, BW=94.9MiB/s (99.5MB/s)(27.8GiB/300346msec)
    slat (usec): min=386, max=407847, avg=52605.77, stdev=39162.39
    clat (msec): min=197, max=2168, avg=789.72, stdev=551.07
     lat (msec): min=220, max=2291, avg=842.33, stdev=587.00
    clat percentiles (msec):
     |  1.00th=[  255],  5.00th=[  275], 10.00th=[  284], 20.00th=[  300],
     | 30.00th=[  342], 40.00th=[  405], 50.00th=[  493], 60.00th=[  709],
     | 70.00th=[ 1167], 80.00th=[ 1485], 90.00th=[ 1603], 95.00th=[ 1787],
     | 99.00th=[ 1989], 99.50th=[ 2056], 99.90th=[ 2140], 99.95th=[ 2165],
     | 99.99th=[ 2165]
   bw (  KiB/s): min=10240, max=337920, per=24.78%, avg=97039.87, stdev=73967.65, samples=600
   iops        : min=    2, max=   66, avg=18.94, stdev=14.46, samples=600
  lat (msec)   : 250=0.98%, 500=49.23%, 750=12.01%, 1000=5.68%, 2000=31.20%
  lat (msec)   : >=2000=0.89%
  cpu          : usr=0.60%, sys=0.50%, ctx=27504, majf=0, minf=90
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=99.7%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwt: total=0,5702,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16
read-phase: (groupid=0, jobs=1): err= 0: pid=17078: Mon Oct 23 20:05:47 2017
  write: IOPS=18, BW=94.5MiB/s (99.1MB/s)(27.7GiB/300403msec)
    slat (usec): min=385, max=454024, avg=52851.88, stdev=39734.33
    clat (msec): min=174, max=2136, avg=793.55, stdev=550.47
     lat (msec): min=175, max=2272, avg=846.41, stdev=586.35
    clat percentiles (msec):
     |  1.00th=[  275],  5.00th=[  279], 10.00th=[  284], 20.00th=[  300],
     | 30.00th=[  355], 40.00th=[  418], 50.00th=[  498], 60.00th=[  709],
     | 70.00th=[ 1167], 80.00th=[ 1485], 90.00th=[ 1620], 95.00th=[ 1787],
     | 99.00th=[ 1989], 99.50th=[ 2056], 99.90th=[ 2123], 99.95th=[ 2140],
     | 99.99th=[ 2140]
   bw (  KiB/s): min=10240, max=286720, per=24.69%, avg=96665.08, stdev=73206.19, samples=600
   iops        : min=    2, max=   56, avg=18.87, stdev=14.29, samples=600
  lat (msec)   : 250=0.12%, 500=49.97%, 750=11.96%, 1000=5.67%, 2000=31.30%
  lat (msec)   : >=2000=0.97%
  cpu          : usr=0.60%, sys=0.50%, ctx=27456, majf=0, minf=91
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=99.7%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwt: total=0,5677,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16
read-phase: (groupid=0, jobs=1): err= 0: pid=17079: Mon Oct 23 20:05:47 2017
  write: IOPS=19, BW=96.9MiB/s (102MB/s)(28.4GiB/300394msec)
    slat (usec): min=364, max=435971, avg=51560.26, stdev=39426.52
    clat (msec): min=176, max=2179, avg=774.17, stdev=555.08
     lat (msec): min=177, max=2299, avg=825.73, stdev=591.31
    clat percentiles (msec):
     |  1.00th=[  207],  5.00th=[  232], 10.00th=[  279], 20.00th=[  296],
     | 30.00th=[  313], 40.00th=[  393], 50.00th=[  485], 60.00th=[  701],
     | 70.00th=[ 1116], 80.00th=[ 1485], 90.00th=[ 1586], 95.00th=[ 1787],
     | 99.00th=[ 1989], 99.50th=[ 2039], 99.90th=[ 2140], 99.95th=[ 2165],
     | 99.99th=[ 2165]
   bw (  KiB/s): min=10240, max=378880, per=25.30%, avg=99076.15, stdev=78461.72, samples=600
   iops        : min=    2, max=   74, avg=19.34, stdev=15.32, samples=600
  lat (msec)   : 250=5.07%, 500=46.18%, 750=11.84%, 1000=5.41%, 2000=30.73%
  lat (msec)   : >=2000=0.77%
  cpu          : usr=0.57%, sys=0.56%, ctx=28034, majf=0, minf=97
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=99.7%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwt: total=0,5819,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
  WRITE: bw=382MiB/s (401MB/s), 94.5MiB/s-96.9MiB/s (99.1MB/s-102MB/s), io=112GiB (120GB), run=300346-300403msec

Disk stats (read/write):
  nvme0n1: ios=44/918923, merge=0/0, ticks=8/147977388, in_queue=148007124, util=100.00%

From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Saturday, October 21, 2017 11:23 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; John F. Kim <johnk(a)mellanox.com<mailto:johnk(a)mellanox.com>>; Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

I am using NVMe.
Thanks
Victor
________________________________
From: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Sent: Saturday, October 21, 2017 1:21:03 AM
To: Victor Banh; Storage Performance Development Kit; Harris, James R; John F. Kim
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

I am also using the SPDK v17.07.1 and DPDK 17.08 on my system to have the trying based on your IO configuration.

Let me check whether our colleague has this Ubuntu OS installed. If not, could you please have a try on CentOS.

Also what kind of backend device did you configured for the NVMe-oF, like Malloc, NVMe SSD or other?

Thanks,
Gang

From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Saturday, October 21, 2017 6:14 AM
To: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; John F. Kim <johnk(a)mellanox.com<mailto:johnk(a)mellanox.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Gang
Is it possible to try on Ubuntu OS?
Which verion of DPDK and SPDK are you using?
Where can I get them? Github?
Thanks
Victor

root(a)clientintel:~# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 16.04.3 LTS
Release:        16.04
Codename:       xenial
root(a)clientintel:~# uname -a
Linux clientintel 4.12.0-041200-generic #201707022031 SMP Mon Jul 3 00:32:52 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux



From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Wednesday, October 18, 2017 11:52 PM
To: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

[root(a)node4 gangcao]# uname -a
Linux node4 4.10.1 #1 SMP Fri Mar 10 15:59:57 CST 2017 x86_64 x86_64 x86_64 GNU/Linux

[root(a)node4 gangcao]# lsb_release -a
LSB Version:    :core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch
Distributor ID: CentOS
Description:    CentOS Linux release 7.2.1511 (Core)
Release:        7.2.1511
Codename:       Core

From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Thursday, October 19, 2017 2:14 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Can you give the OS version and kernel version again for target and client?
I couldn't compile dkdp without installed the latest Kernel on 4.12.X kernel on Ubuntu 16.04.
Thanks
Victor
________________________________
From: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Sent: Wednesday, October 18, 2017 10:59:19 PM
To: Victor Banh; Storage Performance Development Kit; Harris, James R
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

This server does not have OFED installed and it is in the loopback mode.

Found another server also in the loopback mode with ConnectX-3 and have OFED installed.

By the way, what SSD are you using? Maybe it is relating to the SSD? I've just run with 2048k for a short duration and seems no issue. Will run more time to see whether can hit this error.

[root(a)slave3 fio]# lspci | grep -i mell
08:00.0 Ethernet controller: Mellanox Technologies MT27520 Family [ConnectX-3 Pro]

[root(a)slave3 fio]# lsmod | grep -i mlx
mlx4_ib               159744  0
ib_core               208896  15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,ib_srpt,ib_ucm,rdma_ucm,mlx4_ib
mlx4_en               114688  0
mlx4_core             307200  2 mlx4_en,mlx4_ib
ptp                    20480  3 ixgbe,igb,mlx4_en

[root(a)slave3 fio]# ofed_info
MLNX_OFED_LINUX-3.2-2.0.0.0 (OFED-3.2-2.0.0):

ar_mgr:
osm_plugins/ar_mgr/ar_mgr-1.0-0.30.ga1ea4b7.tar.gz

cc_mgr:
osm_plugins/cc_mgr/cc_mgr-1.0-0.29.ga1ea4b7.tar.gz

dapl:
dapl.git mlnx_ofed_3_1
commit c30fb6ce2cbc29d8ed4bde51437f7abb93378c78
dump_pr:
osm_plugins/dump_pr//dump_pr-1.0-0.25.ga1ea4b7.tar.gz

fabric-collector:
fabric_collector//fabric-collector-1.1.0.MLNX20140410.51b267e.tar.gz

fca:
mlnx_ofed_fca/fca-2.5.2431-1.src.rpm

hcoll:
mlnx_ofed_hcol/hcoll-3.4.807-1.src.rpm

ibacm:
mlnx_ofed/ibacm.git mlnx_ofed_3_2
commit 15ad8c13bdebbe62edea0b7df030710b65c14f7f
ibacm_ssa:
mlnx_ofed_ssa/acm/ibacm_ssa-0.0.9.3.MLNX20151203.50eb579.tar.gz

ibdump:
sniffer/sniffer-4.0.0-2/ibdump/linux/ibdump-4.0.0-2.tgz

ibsim:
mlnx_ofed_ibsim/ibsim-0.6-0.8.g9d76581.tar.gz

ibssa:
mlnx_ofed_ssa/distrib/ibssa-0.0.9.3.MLNX20151203.50eb579.tar.gz

ibutils:
ofed-1.5.3-rpms/ibutils/ibutils-1.5.7.1-0.12.gdcaeae2.tar.gz

ibutils2:
ibutils2/ibutils2-2.1.1-0.76.MLNX20160222.gd366c7b.tar.gz

infiniband-diags:
mlnx_ofed_infiniband_diags/infiniband-diags-1.6.6.MLNX20151130.7f0213e.tar.gz

infinipath-psm:
mlnx_ofed_infinipath-psm/infinipath-psm-3.3-2_g6f42cdb_open.tar.gz

iser:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d

isert:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d

kernel-mft:
mlnx_ofed_mft/kernel-mft-4.3.0-25.src.rpm

knem:
knem.git mellanox-master
commit f143ee19a575cd42a334422fa8bd329d671238db
libibcm:
mlnx_ofed/libibcm.git mlnx_ofed_3_0
commit d7d485df305e6536711485bd7e477668e77d8320
libibmad:
mlnx_ofed_libibmad/libibmad-1.3.12.MLNX20151122.d140cb1.tar.gz

libibprof:
mlnx_ofed_libibprof/libibprof-1.1.22-1.src.rpm

libibumad:
mlnx_ofed_libibumad/libibumad-1.3.10.2.MLNX20150406.966500d.tar.gz

libibverbs:
mlnx_ofed/libibverbs.git mlnx_ofed_3_2_2
commit 217a77686f4861229f0e4b94485a13f024634caf
libmlx4:
mlnx_ofed/libmlx4.git mlnx_ofed_3_2_2
commit dda6d7ae1e6e3779a485ebdd0a882f4bcbd027a6
libmlx5:
mlnx_ofed/libmlx5.git mlnx_ofed_3_2_2
commit d0c8645359e0f0aba0408b2d344f3b418d27019b
libopensmssa:
mlnx_ofed_ssa/plugin/libopensmssa-0.0.9.3.MLNX20151203.50eb579.tar.gz

librdmacm:
mlnx_ofed/librdmacm.git mlnx_ofed_3_2_2
commit 6bd430fed9e7b3d57a1876c040431ce7295c7703
libsdp:
libsdp.git mlnx_ofed_3_0
commit fbd01dfff05f42d6b82506e7dbf4bc6b7e6a59a4
libvma:
vma/source_rpms//libvma-7.0.14-0.src.rpm

mlnx-ethtool:
upstream/ethtool.git for-upstream
commit ac0cf295abe0c0832f0711fed66ab9601c8b2513
mlnx-ofa_kernel:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d

mpi-selector:
ofed-1.5.3-rpms/mpi-selector/mpi-selector-1.0.3-1.src.rpm

mpitests:
mlnx_ofed_mpitest/mpitests-3.2.17-e1c7f2f.src.rpm

mstflint:
mlnx_ofed_mstflint/mstflint-4.3.0-1.49.g9b9af70.tar.gz

multiperf:
mlnx_ofed_multiperf/multiperf-3.0-0.10.gda89e8c.tar.gz

mvapich2:
mlnx_ofed_mvapich2/mvapich2-2.2a-1.src.rpm

mxm:
mlnx_ofed_mxm/mxm-3.4.3079-1.src.rpm

ofed-docs:
docs.git mlnx_ofed-3.2
commit ea3386416f9f7130edd2c70fc3424cb2cda50f7d

openmpi:
mlnx_ofed_ompi_1.8/openmpi-1.10.3a1-1.src.rpm

opensm:
mlnx_ofed_opensm/opensm-4.6.1.MLNX20160112.774e977.tar.gz

perftest:
mlnx_ofed_perftest/perftest-3.0-0.18.gb464d59.tar.gz

qperf:
mlnx_ofed_qperf/qperf-0.4.9.tar.gz

rds-tools:
rds-tools.git mlnx_ofed_2_4
commit 299420ca25cf9996bc0748e3bc4b08748996ba49
sdpnetstat:
sdpnetstat.git mlnx_ofed_3_0
commit 3cf409a7cc07e5c71f9640eddbb801ece21b4169
sockperf:
sockperf/sockperf-2.7-43.git3ee62bd8107a.src.rpm

srp:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d

srptools:
srptools/srptools-1.0.2-12.src.rpm


Installed Packages:
-------------------
infiniband-diags
librdmacm
libmlx4
libibverbs-utils
mpi-selector
libibmad-devel
sdpnetstat
knem
libibumad-devel
libsdp
mlnx-ethtool
libibverbs-debuginfo
mlnx-ofa_kernel-modules
srp
opensm
mstflint
cc_mgr
libibmad
libibverbs
kernel-mft
libibverbs-devel-static
libibumad
librdmacm-devel
mlnx-ofa_kernel
libsdp-devel
ibutils2
mlnxofed-docs
libibmad-static
iser
opensm-devel
dump_pr
libibumad-static
rds-tools
libmlx4-debuginfo
mlnx-ofa_kernel-devel
opensm-libs
opensm-static
ar_mgr
dapl-devel-static
infiniband-diags-compat
libibverbs-devel
ibsim

[root(a)slave3 fio]# ibstat
CA 'mlx4_0'
        CA type: MT4103
        Number of ports: 2
        Firmware version: 2.35.5100
        Hardware version: 0
        Node GUID: 0x248a0703006090e0
        System image GUID: 0x248a0703006090e0
        Port 1:
                State: Active
                Physical state: LinkUp
                Rate: 40
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x04010000
                Port GUID: 0x268a07fffe6090e0
                Link layer: Ethernet
        Port 2:
                State: Active
                Physical state: LinkUp
                Rate: 40
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x04010000
                Port GUID: 0x268a07fffe6090e1
                Link layer: Ethernet

[root(a)slave3 fio]# ibv_devinfo
hca_id: mlx4_0
        transport:                      InfiniBand (0)
        fw_ver:                         2.35.5100
        node_guid:                      248a:0703:0060:90e0
        sys_image_guid:                 248a:0703:0060:90e0
        vendor_id:                      0x02c9
        vendor_part_id:                 4103
        hw_ver:                         0x0
        board_id:                       MT_1090111023
        phys_port_cnt:                  2
                port:   1
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                4096 (5)
                        active_mtu:             1024 (3)
                        sm_lid:                 0
                        port_lid:               0
                        port_lmc:               0x00
                        link_layer:             Ethernet

                port:   2
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                4096 (5)
                        active_mtu:             1024 (3)
                        sm_lid:                 0
                        port_lid:               0
                        port_lmc:               0x00
                        link_layer:             Ethernet

From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Thursday, October 19, 2017 12:34 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Do you install Mellanox OFED on the Target and Client server?
Can you run ibstat on both servers?
Thanks
Victor
________________________________
From: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Sent: Wednesday, October 18, 2017 8:59:28 PM
To: Victor Banh; Storage Performance Development Kit; Harris, James R
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Victor,

I've just tried the SPDK v17.07.1 and DPDK v17.08.

nvme version: 1.1.38.gfaab
fio version: 3.1

Tried the 512k and 1024k IO size and there is no error. demsg information as following.

So that there may be other difference here? Looks like you are using ConnectX-5 while I am using ConnectX-4?

Other related information:

[root(a)node4 fio]# lspci | grep -i mell
81:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
81:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]

[root(a)node4 fio]# lsmod | grep -i mlx
mlx5_ib               172032  0
ib_core               200704  15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,mlx5_ib,ib_srpt,ib_ucm,rdma_ucm
mlx5_core             380928  1 mlx5_ib
ptp                    20480  3 ixgbe,igb,mlx5_core

[root(a)node4 fio]# uname -a
Linux node4 4.10.1 #1 SMP Fri Mar 10 15:59:57 CST 2017 x86_64 x86_64 x86_64 GNU/Linux

[577707.543326] nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 192.168.100.8:4420
[577730.854540] detected loopback device
[577730.893761] nvme nvme0: creating 7 I/O queues.
[577730.893797] detected loopback device
[577730.898611] detected loopback device
[577730.908917] detected loopback device
[577730.919073] detected loopback device
[577730.928922] detected loopback device
[577730.938679] detected loopback device
[577730.948365] detected loopback device
[577731.146290] nvme nvme0: new ctrl: NQN "nqn.2016-06.io.spdk:cnode2", addr 192.168.100.8:4420

Thanks,
Gang

From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Thursday, October 19, 2017 9:43 AM
To: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Gang
Any update?
Do you see any error message from "dmesg" with 512k block size running fio?
Thanks
Victor

From: Victor Banh
Sent: Tuesday, October 17, 2017 7:37 PM
To: 'Cao, Gang' <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Gang
spdk-17.07.1 and dpdk-17.08
Thanks
Victor

From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Monday, October 16, 2017 8:51 PM
To: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Victor,

Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?

Thanks,
Gang

From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Tuesday, October 17, 2017 5:30 AM
To: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Cao
Do you see any message from dmesg?

I tried this fio version and still saw these error message from dmesg.

fio-3.1

[869053.218235] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218250] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218259] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218263] ldm_validate_partition_table(): Disk read failed.
[869053.218269] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218277] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218285] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218292] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218296] Dev nvme2n1: unable to read RDB block 0
[869053.218303] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218311] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218323] Buffer I/O error on dev nvme2n1, logical block 3, async page read
[869053.218338]  nvme2n1: unable to read partition table
[869053.246126] nvme2n1: detected capacity change from -62111005559226368 to -62042256479092736
[869053.246195] ldm_validate_partition_table(): Disk read failed.
[869053.246217] Dev nvme2n1: unable to read RDB block 0

From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Monday, October 09, 2017 10:59 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Cc: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Victor,

Thanks for your detailed information on the testing.

I've tried the latest SPDK code and with latest fio-3.1-20-g132b and fio-2.19. It seems like no this kind of error.

Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?

fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme0n1  --name=read-phase --rw=randwrite
read-phase: (g=0): rw=randwrite, bs=(R) 512KiB-512KiB, (W) 512KiB-512KiB, (T) 512KiB-512KiB, ioengine=libaio, iodepth=16
...
fio-3.1-20-g132b
Starting 4 processes
Jobs: 4 (f=4): [w(4)][100.0%][r=0KiB/s,w=1592MiB/s][r=0,w=3183 IOPS][eta 00m:00s]
read-phase: (groupid=0, jobs=1): err= 0: pid=46378: Tue Oct 10 01:23:39 2017

My NIC information:
[root(a)node4 nvme-cli-gerrit]# lsmod | grep -i mlx
mlx5_ib               172032  0
ib_core               200704  15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,mlx5_ib,ib_srpt,ib_ucm,rdma_ucm
mlx5_core             380928  1 mlx5_ib
ptp                    20480  3 ixgbe,igb,mlx5_core
[root(a)node4 nvme-cli-gerrit]# lspci | grep -i mell
81:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
81:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]

From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Victor Banh
Sent: Friday, October 6, 2017 2:41 PM
To: Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio



From: Harris, James R [mailto:james.r.harris(a)intel.com]
Sent: Friday, October 06, 2017 2:32 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Cc: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio

(cc Victor)

From: James Harris <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Date: Thursday, October 5, 2017 at 1:59 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio

Hi Victor,

Could you provide a few more details?  This will help the list to provide some ideas.


1)      On the client, are you using the SPDK NVMe-oF initiator or the kernel initiator?


Kernel initiator, run these commands on client server.



modprobe mlx5_ib

modprobe nvme-rdma

nvme discover -t rdma -a 192.168.10.11 -s 4420

nvme connect -t rdma -n nqn.2016-06.io.spdk:nvme-subsystem-1  -a 192.168.10.11 -s 4420




2)      Can you provide the fio configuration file or command line?  Just so we can have more specifics on "bigger block size".


fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme1n1  --name=read-phase --rw=randwrite


3)      Any details on the HW setup - specifically details on the RDMA NIC (or if you're using SW RoCE).

Nvmf.conf on target server

[Global]
  Comment "Global section"
    ReactorMask 0xff00

[Rpc]
  Enable No
  Listen 127.0.0.1

[Nvmf]
  MaxQueuesPerSession 8
  MaxQueueDepth 128

[Subsystem1]
  NQN nqn.2016-06.io.spdk:nvme-subsystem-1
  Core 9
  Mode Direct
  Listen RDMA 192.168.10.11:4420
  NVMe 0000:82:00.0
  SN S2PMNAAH400039


           It is RDMA NIC, ConnectX 5, Intel CPU Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
NUMA node0 CPU(s):     0-7
NUMA node1 CPU(s):     8-15




Thanks,

-Jim


From: SPDK <spdk-bounces(a)lists.01.org<mailto:spdk-bounces(a)lists.01.org>> on behalf of Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Date: Thursday, October 5, 2017 at 11:26 AM
To: "spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>" <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: [SPDK] Buffer I/O error on bigger block size running fio

Hi
I have SPDK NVMeoF and keep getting error with bigger block size with fio on randwrite tests.
I am using Ubuntu 16.04 with kernel version 4.12.0-041200-generic on target and client.
The DPDK is 17.08 and SPDK is 17.07.1.
Thanks
Victor


[46905.233553] perf: interrupt took too long (2503 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
[48285.159186] blk_update_request: I/O error, dev nvme1n1, sector 2507351968
[48285.159207] blk_update_request: I/O error, dev nvme1n1, sector 1301294496
[48285.159226] blk_update_request: I/O error, dev nvme1n1, sector 1947371168
[48285.159239] blk_update_request: I/O error, dev nvme1n1, sector 1891797568
[48285.159252] blk_update_request: I/O error, dev nvme1n1, sector 10833824
[48285.159265] blk_update_request: I/O error, dev nvme1n1, sector 614937152
[48285.159277] blk_update_request: I/O error, dev nvme1n1, sector 1872305088
[48285.159290] blk_update_request: I/O error, dev nvme1n1, sector 1504491040
[48285.159299] blk_update_request: I/O error, dev nvme1n1, sector 1182136128
[48285.159308] blk_update_request: I/O error, dev nvme1n1, sector 1662985792
[48285.191185] nvme nvme1: Reconnecting in 10 seconds...
[48285.191254] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191291] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191305] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191314] ldm_validate_partition_table(): Disk read failed.
[48285.191320] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191327] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191335] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191342] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191347] Dev nvme1n1: unable to read RDB block 0
[48285.191353] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191360] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191375] Buffer I/O error on dev nvme1n1, logical block 3, async page read
[48285.191389]  nvme1n1: unable to read partition table
[48285.223197] nvme1n1: detected capacity change from 1600321314816 to 0
[48289.623192] nvme1n1: detected capacity change from 0 to -65647705833078784
[48289.623411] ldm_validate_partition_table(): Disk read failed.
[48289.623447] Dev nvme1n1: unable to read RDB block 0
[48289.623486]  nvme1n1: unable to read partition table
[48289.643305] ldm_validate_partition_table(): Disk read failed.
[48289.643328] Dev nvme1n1: unable to read RDB block 0
[48289.643373]  nvme1n1: unable to read partition table





[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 173003 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [SPDK] Buffer I/O error on bigger block size running fio
@ 2017-10-24  4:57 Victor Banh
  0 siblings, 0 replies; 24+ messages in thread
From: Victor Banh @ 2017-10-24  4:57 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 41097 bytes --]

Can you run dmesg on client?
Thanks
Victor
________________________________
From: Cao, Gang <gang.cao(a)intel.com>
Sent: Monday, October 23, 2017 9:25:16 PM
To: Victor Banh; Storage Performance Development Kit; Harris, James R; John F. Kim
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Victor,

I’ve just successfully run the SPDK NVMe-oF target in the loopback mode with Ubuntu 16.04 and Kernel 4.13.7. Detailed information as following.

Also tried the 5120K big IO size from the FIO for 5 minutes.

root(a)waikikibeach111:~# uname -a
Linux waikikibeach111 4.13.7 #1 SMP Sun Oct 22 23:24:08 PDT 2017 x86_64 x86_64 x86_64 GNU/Linux

root(a)waikikibeach111:~# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 16.04.3 LTS
Release:        16.04
Codename:       xenial

root(a)waikikibeach111:~# lspci | grep -i mellanox
03:00.0 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4]
03:00.1 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4]

root(a)waikikibeach111:~# ibstat
CA 'mlx5_0'
        CA type: MT4115
        Number of ports: 1
        Firmware version: 12.18.1000
        Hardware version: 0
        Node GUID: 0x248a07030049d218
        System image GUID: 0x248a07030049d218
        Port 1:
                State: Active
                Physical state: LinkUp
                Rate: 100
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x04010000
                Port GUID: 0x268a07fffe49d218
                Link layer: Ethernet
CA 'mlx5_1'
        CA type: MT4115
        Number of ports: 1
        Firmware version: 12.18.1000
        Hardware version: 0
        Node GUID: 0x248a07030049d219
        System image GUID: 0x248a07030049d218
        Port 1:
                State: Active
                Physical state: LinkUp
                Rate: 100
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x04010000
                Port GUID: 0x268a07fffe49d219
                Link layer: Ethernet

root(a)waikikibeach111:~# cd /home/gangcao/spdk
root(a)waikikibeach111:/home/gangcao/spdk# git branch
* (HEAD detached at v17.07.1)  --> DPDK 17.08
  master

root(a)waikikibeach111:/home/gangcao/spdk# nvme --version
nvme version 1.2

root(a)waikikibeach111:/home/gangcao/spdk# /home/gangcao/fio/fio --version
fio-3.1

root(a)waikikibeach111:/home/gangcao/spdk# lspci | grep -i volatile
84:00.0 Non-Volatile memory controller: Intel Corporation PCIe Data Center SSD (rev 01)

root(a)waikikibeach111:/home/gangcao/spdk# lspci -s 84:00.0 -v
84:00.0 Non-Volatile memory controller: Intel Corporation PCIe Data Center SSD (rev 01) (prog-if 02 [NVM Express])
        Subsystem: Intel Corporation DC P3700 SSD [2.5" SFF]
        Physical Slot: 37
        Flags: bus master, fast devsel, latency 0, IRQ 39
        Memory at fbc10000 (64-bit, non-prefetchable) [size=16K]
        Expansion ROM at fbc00000 [disabled] [size=64K]
        Capabilities: [40] Power Management version 3
        Capabilities: [50] MSI-X: Enable- Count=32 Masked-
        Capabilities: [60] Express Endpoint, MSI 00
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [150] Virtual Channel
        Capabilities: [180] Power Budgeting <?>
        Capabilities: [190] Alternative Routing-ID Interpretation (ARI)
        Capabilities: [270] Device Serial Number 55-cd-2e-41-4d-2f-5b-5c
        Capabilities: [2a0] #19
        Kernel driver in use: uio_pci_generic
        Kernel modules: nvme

root(a)waikikibeach111:/home/gangcao/spdk# cat test/nvmf/fio/nvmf.conf
# NVMf Target Configuration File
#
# Please write all parameters using ASCII.
# The parameter must be quoted if it includes whitespace.
#
# Configuration syntax:
# Leading whitespace is ignored.
# Lines starting with '#' are comments.
# Lines ending with '\' are concatenated with the next line.
# Bracketed ([]) names define sections

[Global]
  # Users can restrict work items to only run on certain cores by
  #  specifying a ReactorMask.  Default ReactorMask mask is defined as
  #  -c option in the 'ealargs' setting at beginning of file nvmf_tgt.c.
  #ReactorMask 0x00FF

  # Tracepoint group mask for spdk trace buffers
  # Default: 0x0 (all tracepoint groups disabled)
  # Set to 0xFFFFFFFFFFFFFFFF to enable all tracepoint groups.
  #TpointGroupMask 0x0

  # syslog facility
  LogFacility "local7"

[Rpc]
  # Defines whether to enable configuration via RPC.
  # Default is disabled.  Note that the RPC interface is not
  # authenticated, so users should be careful about enabling
  # RPC in non-trusted environments.
  Enable no
  # Listen address for the RPC service.
  # May be an IP address or an absolute path to a Unix socket.
  Listen 192.168.5.11

# Users may change this section to create a different number or size of
#  malloc LUNs.
# This will generate 8 LUNs with a malloc-allocated backend.
# Each LUN will be size 64MB and these will be named
# Malloc0 through Malloc7.  Not all LUNs defined here are necessarily
#  used below.
#[Malloc]
  #NumberOfLuns 8
  #LunSizeInMB 64

# Users must change this section to match the /dev/sdX devices to be virtual
# NVMe devices. The devices are accessed using Linux AIO.
#[AIO]
  #AIO /dev/sdb
  #AIO /dev/sdc

# Define NVMf protocol global options
[Nvmf]
  # Set the maximum number of submission and completion queues per session.
  # Setting this to '8', for example, allows for 8 submission and 8 completion queues
  # per session.
  MaxQueuesPerSession 8

  # Set the maximum number of outstanding I/O per queue.
  MaxQueueDepth 512

  # Set the maximum in-capsule data size. Must be a multiple of 16.
  #InCapsuleDataSize 4096

  # Set the maximum I/O size. Must be a multiple of 4096.
  #MaxIOSize 131072

  # Set the global acceptor lcore ID, lcores are numbered starting at 0.
  #AcceptorCore 0

  # Set how often the acceptor polls for incoming connections. The acceptor is also
  # responsible for polling existing connections that have gone idle. 0 means continuously
  # poll. Units in microseconds.
  AcceptorPollRate 10000

[Nvme]
  # Registers the application to receive timeout callback and to reset the controller.
  #ResetControllerOnTimeout Yes
  # Timeout value.
  #NvmeTimeoutValue 30
  # Set how often the admin queue is polled for asynchronous events
  # Units in microseconds.
  #AdminPollRate 100000
  HotplugEnable no

# The Split virtual block device slices block devices into multiple smaller bdevs.
#[Split]
  # Syntax:
  #   Split <bdev> <count> [<size_in_megabytes>]

  # Split Malloc2 into two equally-sized portions, Malloc2p0 and Malloc2p1
  #Split Malloc2 2

  # Split Malloc3 into eight 1-megabyte portions, Malloc3p0 ... Malloc3p7,
  # leaving the rest of the device inaccessible
  #Split Malloc3 8 1

# Define an NVMf Subsystem.
# - NQN is required and must be unique.
# - Core may be set or not. If set, the specified subsystem will run on
#   it, otherwise each subsystem will use a round-robin method to allocate
#   core from available cores,  lcores are numbered starting at 0.
# - Mode may be either "Direct" or "Virtual". Direct means that physical
#   devices attached to the target will be presented to hosts as if they
#   were directly attached to the host. No software emulation or command
#   validation is performed. Virtual means that an NVMe controller is
#   emulated in software and the namespaces it contains map to block devices
#   on the target system. These block devices do not need to be NVMe devices.
#   Only Direct mode is currently supported.
# - Between 1 and 255 Listen directives are allowed. This defines
#   the addresses on which new connections may be accepted. The format
#   is Listen <type> <address> where type currently can only be RDMA.
# - Between 0 and 255 Host directives are allowed. This defines the
#   NQNs of allowed hosts. If no Host directive is specified, all hosts
#   are allowed to connect.
# - Exactly 1 NVMe directive specifying an NVMe device by PCI BDF. The
#   PCI domain:bus:device.function can be replaced by "*" to indicate
#   any PCI device.

# Direct controller
[Subsystem1]
  NQN nqn.2016-06.io.spdk:cnode2
  Core 0
  Mode Direct
  Listen rdma 192.168.5.11:4420
  #Host nqn.2016-06.io.spdk:init
  NVMe 0000:84:00.0

root(a)waikikibeach111:~# dmesg -w -T
[Mon Oct 23 19:45:21 2017] mlx5_core 0000:03:00.0: firmware version: 12.18.1000
[Mon Oct 23 19:45:21 2017] (0000:03:00.0): E-Switch: Total vports 1, l2 table size(65536), per vport: max uc(1024) max mc(16384)
[Mon Oct 23 19:45:21 2017] mlx5_core 0000:03:00.0: Port module event: module 0, Cable plugged
[Mon Oct 23 19:45:21 2017] mlx5_core 0000:03:00.1: firmware version: 12.18.1000
[Mon Oct 23 19:45:22 2017] (0000:03:00.1): E-Switch: Total vports 1, l2 table size(65536), per vport: max uc(1024) max mc(16384)
[Mon Oct 23 19:45:22 2017] mlx5_core 0000:03:00.1: Port module event: module 1, Cable plugged
[Mon Oct 23 19:45:22 2017] mlx5_core 0000:03:00.0: MLX5E: StrdRq(0) RqSz(1024) StrdSz(1) RxCqeCmprss(0)
[Mon Oct 23 19:45:22 2017] mlx5_core 0000:03:00.1: MLX5E: StrdRq(0) RqSz(1024) StrdSz(1) RxCqeCmprss(0)
[Mon Oct 23 19:45:22 2017] mlx5_core 0000:03:00.0 ens1f0: renamed from eth0
[Mon Oct 23 19:45:23 2017] mlx5_core 0000:03:00.1 ens1f1: renamed from eth0
[Mon Oct 23 19:45:23 2017] mlx5_ib: Mellanox Connect-IB Infiniband driver v5.0-0
[Mon Oct 23 19:47:01 2017] mlx5_core 0000:03:00.0 ens1f0: Link up
[Mon Oct 23 19:47:03 2017] mlx5_core 0000:03:00.1 ens1f1: Link up
[Mon Oct 23 19:54:26 2017] nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 192.168.5.11:4420
[Mon Oct 23 19:54:49 2017] nvme nvme0: creating 7 I/O queues.
[Mon Oct 23 19:54:50 2017] nvme nvme0: new ctrl: NQN "nqn.2016-06.io.spdk:cnode2", addr 192.168.5.11:4420

root(a)waikikibeach111:/home/gangcao/fio# ./fio --bs=5120k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=300 --filename=/dev/nvme0n1  --name=read-phase --rw=randwrite
read-phase: (g=0): rw=randwrite, bs=(R) 5120KiB-5120KiB, (W) 5120KiB-5120KiB, (T) 5120KiB-5120KiB, ioengine=libaio, iodepth=16
...
fio-3.1
Starting 4 processes
Jobs: 4 (f=4): [w(4)][100.0%][r=0KiB/s,w=220MiB/s][r=0,w=44 IOPS][eta 00m:00s]
read-phase: (groupid=0, jobs=1): err= 0: pid=17076: Mon Oct 23 20:05:47 2017
  write: IOPS=19, BW=96.2MiB/s (101MB/s)(28.2GiB/300399msec)
    slat (usec): min=387, max=350086, avg=51937.15, stdev=38959.27
    clat (msec): min=199, max=2162, avg=779.50, stdev=553.60
     lat (msec): min=214, max=2287, avg=831.44, stdev=589.70
    clat percentiles (msec):
     |  1.00th=[  209],  5.00th=[  275], 10.00th=[  279], 20.00th=[  296],
     | 30.00th=[  321], 40.00th=[  397], 50.00th=[  489], 60.00th=[  709],
     | 70.00th=[ 1133], 80.00th=[ 1485], 90.00th=[ 1586], 95.00th=[ 1787],
     | 99.00th=[ 1989], 99.50th=[ 2039], 99.90th=[ 2123], 99.95th=[ 2140],
     | 99.99th=[ 2165]
   bw (  KiB/s): min=10260, max=368640, per=25.13%, avg=98423.58, stdev=77390.22, samples=600
   iops        : min=    2, max=   72, avg=19.20, stdev=15.10, samples=600
  lat (msec)   : 250=3.70%, 500=47.17%, 750=11.84%, 1000=5.59%, 2000=30.83%
  lat (msec)   : >=2000=0.87%
  cpu          : usr=0.61%, sys=0.48%, ctx=27957, majf=0, minf=91
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=99.7%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwt: total=0,5777,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16
read-phase: (groupid=0, jobs=1): err= 0: pid=17077: Mon Oct 23 20:05:47 2017
  write: IOPS=18, BW=94.9MiB/s (99.5MB/s)(27.8GiB/300346msec)
    slat (usec): min=386, max=407847, avg=52605.77, stdev=39162.39
    clat (msec): min=197, max=2168, avg=789.72, stdev=551.07
     lat (msec): min=220, max=2291, avg=842.33, stdev=587.00
    clat percentiles (msec):
     |  1.00th=[  255],  5.00th=[  275], 10.00th=[  284], 20.00th=[  300],
     | 30.00th=[  342], 40.00th=[  405], 50.00th=[  493], 60.00th=[  709],
     | 70.00th=[ 1167], 80.00th=[ 1485], 90.00th=[ 1603], 95.00th=[ 1787],
     | 99.00th=[ 1989], 99.50th=[ 2056], 99.90th=[ 2140], 99.95th=[ 2165],
     | 99.99th=[ 2165]
   bw (  KiB/s): min=10240, max=337920, per=24.78%, avg=97039.87, stdev=73967.65, samples=600
   iops        : min=    2, max=   66, avg=18.94, stdev=14.46, samples=600
  lat (msec)   : 250=0.98%, 500=49.23%, 750=12.01%, 1000=5.68%, 2000=31.20%
  lat (msec)   : >=2000=0.89%
  cpu          : usr=0.60%, sys=0.50%, ctx=27504, majf=0, minf=90
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=99.7%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwt: total=0,5702,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16
read-phase: (groupid=0, jobs=1): err= 0: pid=17078: Mon Oct 23 20:05:47 2017
  write: IOPS=18, BW=94.5MiB/s (99.1MB/s)(27.7GiB/300403msec)
    slat (usec): min=385, max=454024, avg=52851.88, stdev=39734.33
    clat (msec): min=174, max=2136, avg=793.55, stdev=550.47
     lat (msec): min=175, max=2272, avg=846.41, stdev=586.35
    clat percentiles (msec):
     |  1.00th=[  275],  5.00th=[  279], 10.00th=[  284], 20.00th=[  300],
     | 30.00th=[  355], 40.00th=[  418], 50.00th=[  498], 60.00th=[  709],
     | 70.00th=[ 1167], 80.00th=[ 1485], 90.00th=[ 1620], 95.00th=[ 1787],
     | 99.00th=[ 1989], 99.50th=[ 2056], 99.90th=[ 2123], 99.95th=[ 2140],
     | 99.99th=[ 2140]
   bw (  KiB/s): min=10240, max=286720, per=24.69%, avg=96665.08, stdev=73206.19, samples=600
   iops        : min=    2, max=   56, avg=18.87, stdev=14.29, samples=600
  lat (msec)   : 250=0.12%, 500=49.97%, 750=11.96%, 1000=5.67%, 2000=31.30%
  lat (msec)   : >=2000=0.97%
  cpu          : usr=0.60%, sys=0.50%, ctx=27456, majf=0, minf=91
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=99.7%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwt: total=0,5677,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16
read-phase: (groupid=0, jobs=1): err= 0: pid=17079: Mon Oct 23 20:05:47 2017
  write: IOPS=19, BW=96.9MiB/s (102MB/s)(28.4GiB/300394msec)
    slat (usec): min=364, max=435971, avg=51560.26, stdev=39426.52
    clat (msec): min=176, max=2179, avg=774.17, stdev=555.08
     lat (msec): min=177, max=2299, avg=825.73, stdev=591.31
    clat percentiles (msec):
     |  1.00th=[  207],  5.00th=[  232], 10.00th=[  279], 20.00th=[  296],
     | 30.00th=[  313], 40.00th=[  393], 50.00th=[  485], 60.00th=[  701],
     | 70.00th=[ 1116], 80.00th=[ 1485], 90.00th=[ 1586], 95.00th=[ 1787],
     | 99.00th=[ 1989], 99.50th=[ 2039], 99.90th=[ 2140], 99.95th=[ 2165],
     | 99.99th=[ 2165]
   bw (  KiB/s): min=10240, max=378880, per=25.30%, avg=99076.15, stdev=78461.72, samples=600
   iops        : min=    2, max=   74, avg=19.34, stdev=15.32, samples=600
  lat (msec)   : 250=5.07%, 500=46.18%, 750=11.84%, 1000=5.41%, 2000=30.73%
  lat (msec)   : >=2000=0.77%
  cpu          : usr=0.57%, sys=0.56%, ctx=28034, majf=0, minf=97
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=99.7%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwt: total=0,5819,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
  WRITE: bw=382MiB/s (401MB/s), 94.5MiB/s-96.9MiB/s (99.1MB/s-102MB/s), io=112GiB (120GB), run=300346-300403msec

Disk stats (read/write):
  nvme0n1: ios=44/918923, merge=0/0, ticks=8/147977388, in_queue=148007124, util=100.00%

From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Saturday, October 21, 2017 11:23 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org>; Harris, James R <james.r.harris(a)intel.com>; John F. Kim <johnk(a)mellanox.com>; Cao, Gang <gang.cao(a)intel.com>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

I am using NVMe.
Thanks
Victor
________________________________
From: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Sent: Saturday, October 21, 2017 1:21:03 AM
To: Victor Banh; Storage Performance Development Kit; Harris, James R; John F. Kim
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

I am also using the SPDK v17.07.1 and DPDK 17.08 on my system to have the trying based on your IO configuration.

Let me check whether our colleague has this Ubuntu OS installed. If not, could you please have a try on CentOS.

Also what kind of backend device did you configured for the NVMe-oF, like Malloc, NVMe SSD or other?

Thanks,
Gang

From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Saturday, October 21, 2017 6:14 AM
To: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; John F. Kim <johnk(a)mellanox.com<mailto:johnk(a)mellanox.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Gang
Is it possible to try on Ubuntu OS?
Which verion of DPDK and SPDK are you using?
Where can I get them? Github?
Thanks
Victor

root(a)clientintel:~# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 16.04.3 LTS
Release:        16.04
Codename:       xenial
root(a)clientintel:~# uname -a
Linux clientintel 4.12.0-041200-generic #201707022031 SMP Mon Jul 3 00:32:52 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux



From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Wednesday, October 18, 2017 11:52 PM
To: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

[root(a)node4 gangcao]# uname -a
Linux node4 4.10.1 #1 SMP Fri Mar 10 15:59:57 CST 2017 x86_64 x86_64 x86_64 GNU/Linux

[root(a)node4 gangcao]# lsb_release -a
LSB Version:    :core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch
Distributor ID: CentOS
Description:    CentOS Linux release 7.2.1511 (Core)
Release:        7.2.1511
Codename:       Core

From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Thursday, October 19, 2017 2:14 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Can you give the OS version and kernel version again for target and client?
I couldn't compile dkdp without installed the latest Kernel on 4.12.X kernel on Ubuntu 16.04.
Thanks
Victor
________________________________
From: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Sent: Wednesday, October 18, 2017 10:59:19 PM
To: Victor Banh; Storage Performance Development Kit; Harris, James R
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

This server does not have OFED installed and it is in the loopback mode.

Found another server also in the loopback mode with ConnectX-3 and have OFED installed.

By the way, what SSD are you using? Maybe it is relating to the SSD? I’ve just run with 2048k for a short duration and seems no issue. Will run more time to see whether can hit this error.

[root(a)slave3 fio]# lspci | grep -i mell
08:00.0 Ethernet controller: Mellanox Technologies MT27520 Family [ConnectX-3 Pro]

[root(a)slave3 fio]# lsmod | grep -i mlx
mlx4_ib               159744  0
ib_core               208896  15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,ib_srpt,ib_ucm,rdma_ucm,mlx4_ib
mlx4_en               114688  0
mlx4_core             307200  2 mlx4_en,mlx4_ib
ptp                    20480  3 ixgbe,igb,mlx4_en

[root(a)slave3 fio]# ofed_info
MLNX_OFED_LINUX-3.2-2.0.0.0 (OFED-3.2-2.0.0):

ar_mgr:
osm_plugins/ar_mgr/ar_mgr-1.0-0.30.ga1ea4b7.tar.gz

cc_mgr:
osm_plugins/cc_mgr/cc_mgr-1.0-0.29.ga1ea4b7.tar.gz

dapl:
dapl.git mlnx_ofed_3_1
commit c30fb6ce2cbc29d8ed4bde51437f7abb93378c78
dump_pr:
osm_plugins/dump_pr//dump_pr-1.0-0.25.ga1ea4b7.tar.gz

fabric-collector:
fabric_collector//fabric-collector-1.1.0.MLNX20140410.51b267e.tar.gz

fca:
mlnx_ofed_fca/fca-2.5.2431-1.src.rpm

hcoll:
mlnx_ofed_hcol/hcoll-3.4.807-1.src.rpm

ibacm:
mlnx_ofed/ibacm.git mlnx_ofed_3_2
commit 15ad8c13bdebbe62edea0b7df030710b65c14f7f
ibacm_ssa:
mlnx_ofed_ssa/acm/ibacm_ssa-0.0.9.3.MLNX20151203.50eb579.tar.gz

ibdump:
sniffer/sniffer-4.0.0-2/ibdump/linux/ibdump-4.0.0-2.tgz

ibsim:
mlnx_ofed_ibsim/ibsim-0.6-0.8.g9d76581.tar.gz

ibssa:
mlnx_ofed_ssa/distrib/ibssa-0.0.9.3.MLNX20151203.50eb579.tar.gz

ibutils:
ofed-1.5.3-rpms/ibutils/ibutils-1.5.7.1-0.12.gdcaeae2.tar.gz

ibutils2:
ibutils2/ibutils2-2.1.1-0.76.MLNX20160222.gd366c7b.tar.gz

infiniband-diags:
mlnx_ofed_infiniband_diags/infiniband-diags-1.6.6.MLNX20151130.7f0213e.tar.gz

infinipath-psm:
mlnx_ofed_infinipath-psm/infinipath-psm-3.3-2_g6f42cdb_open.tar.gz

iser:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d

isert:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d

kernel-mft:
mlnx_ofed_mft/kernel-mft-4.3.0-25.src.rpm

knem:
knem.git mellanox-master
commit f143ee19a575cd42a334422fa8bd329d671238db
libibcm:
mlnx_ofed/libibcm.git mlnx_ofed_3_0
commit d7d485df305e6536711485bd7e477668e77d8320
libibmad:
mlnx_ofed_libibmad/libibmad-1.3.12.MLNX20151122.d140cb1.tar.gz

libibprof:
mlnx_ofed_libibprof/libibprof-1.1.22-1.src.rpm

libibumad:
mlnx_ofed_libibumad/libibumad-1.3.10.2.MLNX20150406.966500d.tar.gz

libibverbs:
mlnx_ofed/libibverbs.git mlnx_ofed_3_2_2
commit 217a77686f4861229f0e4b94485a13f024634caf
libmlx4:
mlnx_ofed/libmlx4.git mlnx_ofed_3_2_2
commit dda6d7ae1e6e3779a485ebdd0a882f4bcbd027a6
libmlx5:
mlnx_ofed/libmlx5.git mlnx_ofed_3_2_2
commit d0c8645359e0f0aba0408b2d344f3b418d27019b
libopensmssa:
mlnx_ofed_ssa/plugin/libopensmssa-0.0.9.3.MLNX20151203.50eb579.tar.gz

librdmacm:
mlnx_ofed/librdmacm.git mlnx_ofed_3_2_2
commit 6bd430fed9e7b3d57a1876c040431ce7295c7703
libsdp:
libsdp.git mlnx_ofed_3_0
commit fbd01dfff05f42d6b82506e7dbf4bc6b7e6a59a4
libvma:
vma/source_rpms//libvma-7.0.14-0.src.rpm

mlnx-ethtool:
upstream/ethtool.git for-upstream
commit ac0cf295abe0c0832f0711fed66ab9601c8b2513
mlnx-ofa_kernel:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d

mpi-selector:
ofed-1.5.3-rpms/mpi-selector/mpi-selector-1.0.3-1.src.rpm

mpitests:
mlnx_ofed_mpitest/mpitests-3.2.17-e1c7f2f.src.rpm

mstflint:
mlnx_ofed_mstflint/mstflint-4.3.0-1.49.g9b9af70.tar.gz

multiperf:
mlnx_ofed_multiperf/multiperf-3.0-0.10.gda89e8c.tar.gz

mvapich2:
mlnx_ofed_mvapich2/mvapich2-2.2a-1.src.rpm

mxm:
mlnx_ofed_mxm/mxm-3.4.3079-1.src.rpm

ofed-docs:
docs.git mlnx_ofed-3.2
commit ea3386416f9f7130edd2c70fc3424cb2cda50f7d

openmpi:
mlnx_ofed_ompi_1.8/openmpi-1.10.3a1-1.src.rpm

opensm:
mlnx_ofed_opensm/opensm-4.6.1.MLNX20160112.774e977.tar.gz

perftest:
mlnx_ofed_perftest/perftest-3.0-0.18.gb464d59.tar.gz

qperf:
mlnx_ofed_qperf/qperf-0.4.9.tar.gz

rds-tools:
rds-tools.git mlnx_ofed_2_4
commit 299420ca25cf9996bc0748e3bc4b08748996ba49
sdpnetstat:
sdpnetstat.git mlnx_ofed_3_0
commit 3cf409a7cc07e5c71f9640eddbb801ece21b4169
sockperf:
sockperf/sockperf-2.7-43.git3ee62bd8107a.src.rpm

srp:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d

srptools:
srptools/srptools-1.0.2-12.src.rpm


Installed Packages:
-------------------
infiniband-diags
librdmacm
libmlx4
libibverbs-utils
mpi-selector
libibmad-devel
sdpnetstat
knem
libibumad-devel
libsdp
mlnx-ethtool
libibverbs-debuginfo
mlnx-ofa_kernel-modules
srp
opensm
mstflint
cc_mgr
libibmad
libibverbs
kernel-mft
libibverbs-devel-static
libibumad
librdmacm-devel
mlnx-ofa_kernel
libsdp-devel
ibutils2
mlnxofed-docs
libibmad-static
iser
opensm-devel
dump_pr
libibumad-static
rds-tools
libmlx4-debuginfo
mlnx-ofa_kernel-devel
opensm-libs
opensm-static
ar_mgr
dapl-devel-static
infiniband-diags-compat
libibverbs-devel
ibsim

[root(a)slave3 fio]# ibstat
CA 'mlx4_0'
        CA type: MT4103
        Number of ports: 2
        Firmware version: 2.35.5100
        Hardware version: 0
        Node GUID: 0x248a0703006090e0
        System image GUID: 0x248a0703006090e0
        Port 1:
                State: Active
                Physical state: LinkUp
                Rate: 40
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x04010000
                Port GUID: 0x268a07fffe6090e0
                Link layer: Ethernet
        Port 2:
                State: Active
                Physical state: LinkUp
                Rate: 40
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x04010000
                Port GUID: 0x268a07fffe6090e1
                Link layer: Ethernet

[root(a)slave3 fio]# ibv_devinfo
hca_id: mlx4_0
        transport:                      InfiniBand (0)
        fw_ver:                         2.35.5100
        node_guid:                      248a:0703:0060:90e0
        sys_image_guid:                 248a:0703:0060:90e0
        vendor_id:                      0x02c9
        vendor_part_id:                 4103
        hw_ver:                         0x0
        board_id:                       MT_1090111023
        phys_port_cnt:                  2
                port:   1
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                4096 (5)
                        active_mtu:             1024 (3)
                        sm_lid:                 0
                        port_lid:               0
                        port_lmc:               0x00
                        link_layer:             Ethernet

                port:   2
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                4096 (5)
                        active_mtu:             1024 (3)
                        sm_lid:                 0
                        port_lid:               0
                        port_lmc:               0x00
                        link_layer:             Ethernet

From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Thursday, October 19, 2017 12:34 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Do you install Mellanox OFED on the Target and Client server?
Can you run ibstat on both servers?
Thanks
Victor
________________________________
From: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Sent: Wednesday, October 18, 2017 8:59:28 PM
To: Victor Banh; Storage Performance Development Kit; Harris, James R
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Victor,

I’ve just tried the SPDK v17.07.1 and DPDK v17.08.

nvme version: 1.1.38.gfaab
fio version: 3.1

Tried the 512k and 1024k IO size and there is no error. demsg information as following.

So that there may be other difference here? Looks like you are using ConnectX-5 while I am using ConnectX-4?

Other related information:

[root(a)node4 fio]# lspci | grep -i mell
81:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
81:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]

[root(a)node4 fio]# lsmod | grep -i mlx
mlx5_ib               172032  0
ib_core               200704  15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,mlx5_ib,ib_srpt,ib_ucm,rdma_ucm
mlx5_core             380928  1 mlx5_ib
ptp                    20480  3 ixgbe,igb,mlx5_core

[root(a)node4 fio]# uname -a
Linux node4 4.10.1 #1 SMP Fri Mar 10 15:59:57 CST 2017 x86_64 x86_64 x86_64 GNU/Linux

[577707.543326] nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 192.168.100.8:4420
[577730.854540] detected loopback device
[577730.893761] nvme nvme0: creating 7 I/O queues.
[577730.893797] detected loopback device
[577730.898611] detected loopback device
[577730.908917] detected loopback device
[577730.919073] detected loopback device
[577730.928922] detected loopback device
[577730.938679] detected loopback device
[577730.948365] detected loopback device
[577731.146290] nvme nvme0: new ctrl: NQN "nqn.2016-06.io.spdk:cnode2", addr 192.168.100.8:4420

Thanks,
Gang

From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Thursday, October 19, 2017 9:43 AM
To: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Gang
Any update?
Do you see any error message from “dmesg” with 512k block size running fio?
Thanks
Victor

From: Victor Banh
Sent: Tuesday, October 17, 2017 7:37 PM
To: 'Cao, Gang' <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Gang
spdk-17.07.1 and dpdk-17.08
Thanks
Victor

From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Monday, October 16, 2017 8:51 PM
To: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Victor,

Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?

Thanks,
Gang

From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Tuesday, October 17, 2017 5:30 AM
To: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Cao
Do you see any message from dmesg?

I tried this fio version and still saw these error message from dmesg.

fio-3.1

[869053.218235] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218250] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218259] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218263] ldm_validate_partition_table(): Disk read failed.
[869053.218269] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218277] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218285] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218292] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218296] Dev nvme2n1: unable to read RDB block 0
[869053.218303] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218311] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218323] Buffer I/O error on dev nvme2n1, logical block 3, async page read
[869053.218338]  nvme2n1: unable to read partition table
[869053.246126] nvme2n1: detected capacity change from -62111005559226368 to -62042256479092736
[869053.246195] ldm_validate_partition_table(): Disk read failed.
[869053.246217] Dev nvme2n1: unable to read RDB block 0

From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Monday, October 09, 2017 10:59 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Cc: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Victor,

Thanks for your detailed information on the testing.

I’ve tried the latest SPDK code and with latest fio-3.1-20-g132b and fio-2.19. It seems like no this kind of error.

Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?

fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme0n1  --name=read-phase --rw=randwrite
read-phase: (g=0): rw=randwrite, bs=(R) 512KiB-512KiB, (W) 512KiB-512KiB, (T) 512KiB-512KiB, ioengine=libaio, iodepth=16
...
fio-3.1-20-g132b
Starting 4 processes
Jobs: 4 (f=4): [w(4)][100.0%][r=0KiB/s,w=1592MiB/s][r=0,w=3183 IOPS][eta 00m:00s]
read-phase: (groupid=0, jobs=1): err= 0: pid=46378: Tue Oct 10 01:23:39 2017

My NIC information:
[root(a)node4 nvme-cli-gerrit]# lsmod | grep -i mlx
mlx5_ib               172032  0
ib_core               200704  15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,mlx5_ib,ib_srpt,ib_ucm,rdma_ucm
mlx5_core             380928  1 mlx5_ib
ptp                    20480  3 ixgbe,igb,mlx5_core
[root(a)node4 nvme-cli-gerrit]# lspci | grep -i mell
81:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
81:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]

From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Victor Banh
Sent: Friday, October 6, 2017 2:41 PM
To: Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio



From: Harris, James R [mailto:james.r.harris(a)intel.com]
Sent: Friday, October 06, 2017 2:32 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Cc: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio

(cc Victor)

From: James Harris <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Date: Thursday, October 5, 2017 at 1:59 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio

Hi Victor,

Could you provide a few more details?  This will help the list to provide some ideas.


1)      On the client, are you using the SPDK NVMe-oF initiator or the kernel initiator?


Kernel initiator, run these commands on client server.



modprobe mlx5_ib

modprobe nvme-rdma

nvme discover -t rdma -a 192.168.10.11 -s 4420

nvme connect -t rdma -n nqn.2016-06.io.spdk:nvme-subsystem-1  -a 192.168.10.11 -s 4420




2)      Can you provide the fio configuration file or command line?  Just so we can have more specifics on “bigger block size”.


fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme1n1  --name=read-phase --rw=randwrite


3)      Any details on the HW setup – specifically details on the RDMA NIC (or if you’re using SW RoCE).

Nvmf.conf on target server

[Global]
  Comment "Global section"
    ReactorMask 0xff00

[Rpc]
  Enable No
  Listen 127.0.0.1

[Nvmf]
  MaxQueuesPerSession 8
  MaxQueueDepth 128

[Subsystem1]
  NQN nqn.2016-06.io.spdk:nvme-subsystem-1
  Core 9
  Mode Direct
  Listen RDMA 192.168.10.11:4420
  NVMe 0000:82:00.0
  SN S2PMNAAH400039


           It is RDMA NIC, ConnectX 5, Intel CPU Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
NUMA node0 CPU(s):     0-7
NUMA node1 CPU(s):     8-15




Thanks,

-Jim


From: SPDK <spdk-bounces(a)lists.01.org<mailto:spdk-bounces(a)lists.01.org>> on behalf of Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Date: Thursday, October 5, 2017 at 11:26 AM
To: "spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>" <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: [SPDK] Buffer I/O error on bigger block size running fio

Hi
I have SPDK NVMeoF and keep getting error with bigger block size with fio on randwrite tests.
I am using Ubuntu 16.04 with kernel version 4.12.0-041200-generic on target and client.
The DPDK is 17.08 and SPDK is 17.07.1.
Thanks
Victor


[46905.233553] perf: interrupt took too long (2503 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
[48285.159186] blk_update_request: I/O error, dev nvme1n1, sector 2507351968
[48285.159207] blk_update_request: I/O error, dev nvme1n1, sector 1301294496
[48285.159226] blk_update_request: I/O error, dev nvme1n1, sector 1947371168
[48285.159239] blk_update_request: I/O error, dev nvme1n1, sector 1891797568
[48285.159252] blk_update_request: I/O error, dev nvme1n1, sector 10833824
[48285.159265] blk_update_request: I/O error, dev nvme1n1, sector 614937152
[48285.159277] blk_update_request: I/O error, dev nvme1n1, sector 1872305088
[48285.159290] blk_update_request: I/O error, dev nvme1n1, sector 1504491040
[48285.159299] blk_update_request: I/O error, dev nvme1n1, sector 1182136128
[48285.159308] blk_update_request: I/O error, dev nvme1n1, sector 1662985792
[48285.191185] nvme nvme1: Reconnecting in 10 seconds...
[48285.191254] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191291] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191305] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191314] ldm_validate_partition_table(): Disk read failed.
[48285.191320] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191327] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191335] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191342] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191347] Dev nvme1n1: unable to read RDB block 0
[48285.191353] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191360] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191375] Buffer I/O error on dev nvme1n1, logical block 3, async page read
[48285.191389]  nvme1n1: unable to read partition table
[48285.223197] nvme1n1: detected capacity change from 1600321314816 to 0
[48289.623192] nvme1n1: detected capacity change from 0 to -65647705833078784
[48289.623411] ldm_validate_partition_table(): Disk read failed.
[48289.623447] Dev nvme1n1: unable to read RDB block 0
[48289.623486]  nvme1n1: unable to read partition table
[48289.643305] ldm_validate_partition_table(): Disk read failed.
[48289.643328] Dev nvme1n1: unable to read RDB block 0
[48289.643373]  nvme1n1: unable to read partition table





[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 151450 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [SPDK] Buffer I/O error on bigger block size running fio
@ 2017-10-24  4:25 Cao, Gang
  0 siblings, 0 replies; 24+ messages in thread
From: Cao, Gang @ 2017-10-24  4:25 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 40757 bytes --]

Hi Victor,

I've just successfully run the SPDK NVMe-oF target in the loopback mode with Ubuntu 16.04 and Kernel 4.13.7. Detailed information as following.

Also tried the 5120K big IO size from the FIO for 5 minutes.

root(a)waikikibeach111:~# uname -a
Linux waikikibeach111 4.13.7 #1 SMP Sun Oct 22 23:24:08 PDT 2017 x86_64 x86_64 x86_64 GNU/Linux

root(a)waikikibeach111:~# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 16.04.3 LTS
Release:        16.04
Codename:       xenial

root(a)waikikibeach111:~# lspci | grep -i mellanox
03:00.0 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4]
03:00.1 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4]

root(a)waikikibeach111:~# ibstat
CA 'mlx5_0'
        CA type: MT4115
        Number of ports: 1
        Firmware version: 12.18.1000
        Hardware version: 0
        Node GUID: 0x248a07030049d218
        System image GUID: 0x248a07030049d218
        Port 1:
                State: Active
                Physical state: LinkUp
                Rate: 100
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x04010000
                Port GUID: 0x268a07fffe49d218
                Link layer: Ethernet
CA 'mlx5_1'
        CA type: MT4115
        Number of ports: 1
        Firmware version: 12.18.1000
        Hardware version: 0
        Node GUID: 0x248a07030049d219
        System image GUID: 0x248a07030049d218
        Port 1:
                State: Active
                Physical state: LinkUp
                Rate: 100
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x04010000
                Port GUID: 0x268a07fffe49d219
                Link layer: Ethernet

root(a)waikikibeach111:~# cd /home/gangcao/spdk
root(a)waikikibeach111:/home/gangcao/spdk# git branch
* (HEAD detached at v17.07.1)  --> DPDK 17.08
  master

root(a)waikikibeach111:/home/gangcao/spdk# nvme --version
nvme version 1.2

root(a)waikikibeach111:/home/gangcao/spdk# /home/gangcao/fio/fio --version
fio-3.1

root(a)waikikibeach111:/home/gangcao/spdk# lspci | grep -i volatile
84:00.0 Non-Volatile memory controller: Intel Corporation PCIe Data Center SSD (rev 01)

root(a)waikikibeach111:/home/gangcao/spdk# lspci -s 84:00.0 -v
84:00.0 Non-Volatile memory controller: Intel Corporation PCIe Data Center SSD (rev 01) (prog-if 02 [NVM Express])
        Subsystem: Intel Corporation DC P3700 SSD [2.5" SFF]
        Physical Slot: 37
        Flags: bus master, fast devsel, latency 0, IRQ 39
        Memory at fbc10000 (64-bit, non-prefetchable) [size=16K]
        Expansion ROM at fbc00000 [disabled] [size=64K]
        Capabilities: [40] Power Management version 3
        Capabilities: [50] MSI-X: Enable- Count=32 Masked-
        Capabilities: [60] Express Endpoint, MSI 00
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [150] Virtual Channel
        Capabilities: [180] Power Budgeting <?>
        Capabilities: [190] Alternative Routing-ID Interpretation (ARI)
        Capabilities: [270] Device Serial Number 55-cd-2e-41-4d-2f-5b-5c
        Capabilities: [2a0] #19
        Kernel driver in use: uio_pci_generic
        Kernel modules: nvme

root(a)waikikibeach111:/home/gangcao/spdk# cat test/nvmf/fio/nvmf.conf
# NVMf Target Configuration File
#
# Please write all parameters using ASCII.
# The parameter must be quoted if it includes whitespace.
#
# Configuration syntax:
# Leading whitespace is ignored.
# Lines starting with '#' are comments.
# Lines ending with '\' are concatenated with the next line.
# Bracketed ([]) names define sections

[Global]
  # Users can restrict work items to only run on certain cores by
  #  specifying a ReactorMask.  Default ReactorMask mask is defined as
  #  -c option in the 'ealargs' setting at beginning of file nvmf_tgt.c.
  #ReactorMask 0x00FF

  # Tracepoint group mask for spdk trace buffers
  # Default: 0x0 (all tracepoint groups disabled)
  # Set to 0xFFFFFFFFFFFFFFFF to enable all tracepoint groups.
  #TpointGroupMask 0x0

  # syslog facility
  LogFacility "local7"

[Rpc]
  # Defines whether to enable configuration via RPC.
  # Default is disabled.  Note that the RPC interface is not
  # authenticated, so users should be careful about enabling
  # RPC in non-trusted environments.
  Enable no
  # Listen address for the RPC service.
  # May be an IP address or an absolute path to a Unix socket.
  Listen 192.168.5.11

# Users may change this section to create a different number or size of
#  malloc LUNs.
# This will generate 8 LUNs with a malloc-allocated backend.
# Each LUN will be size 64MB and these will be named
# Malloc0 through Malloc7.  Not all LUNs defined here are necessarily
#  used below.
#[Malloc]
  #NumberOfLuns 8
  #LunSizeInMB 64

# Users must change this section to match the /dev/sdX devices to be virtual
# NVMe devices. The devices are accessed using Linux AIO.
#[AIO]
  #AIO /dev/sdb
  #AIO /dev/sdc

# Define NVMf protocol global options
[Nvmf]
  # Set the maximum number of submission and completion queues per session.
  # Setting this to '8', for example, allows for 8 submission and 8 completion queues
  # per session.
  MaxQueuesPerSession 8

  # Set the maximum number of outstanding I/O per queue.
  MaxQueueDepth 512

  # Set the maximum in-capsule data size. Must be a multiple of 16.
  #InCapsuleDataSize 4096

  # Set the maximum I/O size. Must be a multiple of 4096.
  #MaxIOSize 131072

  # Set the global acceptor lcore ID, lcores are numbered starting at 0.
  #AcceptorCore 0

  # Set how often the acceptor polls for incoming connections. The acceptor is also
  # responsible for polling existing connections that have gone idle. 0 means continuously
  # poll. Units in microseconds.
  AcceptorPollRate 10000

[Nvme]
  # Registers the application to receive timeout callback and to reset the controller.
  #ResetControllerOnTimeout Yes
  # Timeout value.
  #NvmeTimeoutValue 30
  # Set how often the admin queue is polled for asynchronous events
  # Units in microseconds.
  #AdminPollRate 100000
  HotplugEnable no

# The Split virtual block device slices block devices into multiple smaller bdevs.
#[Split]
  # Syntax:
  #   Split <bdev> <count> [<size_in_megabytes>]

  # Split Malloc2 into two equally-sized portions, Malloc2p0 and Malloc2p1
  #Split Malloc2 2

  # Split Malloc3 into eight 1-megabyte portions, Malloc3p0 ... Malloc3p7,
  # leaving the rest of the device inaccessible
  #Split Malloc3 8 1

# Define an NVMf Subsystem.
# - NQN is required and must be unique.
# - Core may be set or not. If set, the specified subsystem will run on
#   it, otherwise each subsystem will use a round-robin method to allocate
#   core from available cores,  lcores are numbered starting at 0.
# - Mode may be either "Direct" or "Virtual". Direct means that physical
#   devices attached to the target will be presented to hosts as if they
#   were directly attached to the host. No software emulation or command
#   validation is performed. Virtual means that an NVMe controller is
#   emulated in software and the namespaces it contains map to block devices
#   on the target system. These block devices do not need to be NVMe devices.
#   Only Direct mode is currently supported.
# - Between 1 and 255 Listen directives are allowed. This defines
#   the addresses on which new connections may be accepted. The format
#   is Listen <type> <address> where type currently can only be RDMA.
# - Between 0 and 255 Host directives are allowed. This defines the
#   NQNs of allowed hosts. If no Host directive is specified, all hosts
#   are allowed to connect.
# - Exactly 1 NVMe directive specifying an NVMe device by PCI BDF. The
#   PCI domain:bus:device.function can be replaced by "*" to indicate
#   any PCI device.

# Direct controller
[Subsystem1]
  NQN nqn.2016-06.io.spdk:cnode2
  Core 0
  Mode Direct
  Listen rdma 192.168.5.11:4420
  #Host nqn.2016-06.io.spdk:init
  NVMe 0000:84:00.0

root(a)waikikibeach111:~# dmesg -w -T
[Mon Oct 23 19:45:21 2017] mlx5_core 0000:03:00.0: firmware version: 12.18.1000
[Mon Oct 23 19:45:21 2017] (0000:03:00.0): E-Switch: Total vports 1, l2 table size(65536), per vport: max uc(1024) max mc(16384)
[Mon Oct 23 19:45:21 2017] mlx5_core 0000:03:00.0: Port module event: module 0, Cable plugged
[Mon Oct 23 19:45:21 2017] mlx5_core 0000:03:00.1: firmware version: 12.18.1000
[Mon Oct 23 19:45:22 2017] (0000:03:00.1): E-Switch: Total vports 1, l2 table size(65536), per vport: max uc(1024) max mc(16384)
[Mon Oct 23 19:45:22 2017] mlx5_core 0000:03:00.1: Port module event: module 1, Cable plugged
[Mon Oct 23 19:45:22 2017] mlx5_core 0000:03:00.0: MLX5E: StrdRq(0) RqSz(1024) StrdSz(1) RxCqeCmprss(0)
[Mon Oct 23 19:45:22 2017] mlx5_core 0000:03:00.1: MLX5E: StrdRq(0) RqSz(1024) StrdSz(1) RxCqeCmprss(0)
[Mon Oct 23 19:45:22 2017] mlx5_core 0000:03:00.0 ens1f0: renamed from eth0
[Mon Oct 23 19:45:23 2017] mlx5_core 0000:03:00.1 ens1f1: renamed from eth0
[Mon Oct 23 19:45:23 2017] mlx5_ib: Mellanox Connect-IB Infiniband driver v5.0-0
[Mon Oct 23 19:47:01 2017] mlx5_core 0000:03:00.0 ens1f0: Link up
[Mon Oct 23 19:47:03 2017] mlx5_core 0000:03:00.1 ens1f1: Link up
[Mon Oct 23 19:54:26 2017] nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 192.168.5.11:4420
[Mon Oct 23 19:54:49 2017] nvme nvme0: creating 7 I/O queues.
[Mon Oct 23 19:54:50 2017] nvme nvme0: new ctrl: NQN "nqn.2016-06.io.spdk:cnode2", addr 192.168.5.11:4420

root(a)waikikibeach111:/home/gangcao/fio# ./fio --bs=5120k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=300 --filename=/dev/nvme0n1  --name=read-phase --rw=randwrite
read-phase: (g=0): rw=randwrite, bs=(R) 5120KiB-5120KiB, (W) 5120KiB-5120KiB, (T) 5120KiB-5120KiB, ioengine=libaio, iodepth=16
...
fio-3.1
Starting 4 processes
Jobs: 4 (f=4): [w(4)][100.0%][r=0KiB/s,w=220MiB/s][r=0,w=44 IOPS][eta 00m:00s]
read-phase: (groupid=0, jobs=1): err= 0: pid=17076: Mon Oct 23 20:05:47 2017
  write: IOPS=19, BW=96.2MiB/s (101MB/s)(28.2GiB/300399msec)
    slat (usec): min=387, max=350086, avg=51937.15, stdev=38959.27
    clat (msec): min=199, max=2162, avg=779.50, stdev=553.60
     lat (msec): min=214, max=2287, avg=831.44, stdev=589.70
    clat percentiles (msec):
     |  1.00th=[  209],  5.00th=[  275], 10.00th=[  279], 20.00th=[  296],
     | 30.00th=[  321], 40.00th=[  397], 50.00th=[  489], 60.00th=[  709],
     | 70.00th=[ 1133], 80.00th=[ 1485], 90.00th=[ 1586], 95.00th=[ 1787],
     | 99.00th=[ 1989], 99.50th=[ 2039], 99.90th=[ 2123], 99.95th=[ 2140],
     | 99.99th=[ 2165]
   bw (  KiB/s): min=10260, max=368640, per=25.13%, avg=98423.58, stdev=77390.22, samples=600
   iops        : min=    2, max=   72, avg=19.20, stdev=15.10, samples=600
  lat (msec)   : 250=3.70%, 500=47.17%, 750=11.84%, 1000=5.59%, 2000=30.83%
  lat (msec)   : >=2000=0.87%
  cpu          : usr=0.61%, sys=0.48%, ctx=27957, majf=0, minf=91
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=99.7%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwt: total=0,5777,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16
read-phase: (groupid=0, jobs=1): err= 0: pid=17077: Mon Oct 23 20:05:47 2017
  write: IOPS=18, BW=94.9MiB/s (99.5MB/s)(27.8GiB/300346msec)
    slat (usec): min=386, max=407847, avg=52605.77, stdev=39162.39
    clat (msec): min=197, max=2168, avg=789.72, stdev=551.07
     lat (msec): min=220, max=2291, avg=842.33, stdev=587.00
    clat percentiles (msec):
     |  1.00th=[  255],  5.00th=[  275], 10.00th=[  284], 20.00th=[  300],
     | 30.00th=[  342], 40.00th=[  405], 50.00th=[  493], 60.00th=[  709],
     | 70.00th=[ 1167], 80.00th=[ 1485], 90.00th=[ 1603], 95.00th=[ 1787],
     | 99.00th=[ 1989], 99.50th=[ 2056], 99.90th=[ 2140], 99.95th=[ 2165],
     | 99.99th=[ 2165]
   bw (  KiB/s): min=10240, max=337920, per=24.78%, avg=97039.87, stdev=73967.65, samples=600
   iops        : min=    2, max=   66, avg=18.94, stdev=14.46, samples=600
  lat (msec)   : 250=0.98%, 500=49.23%, 750=12.01%, 1000=5.68%, 2000=31.20%
  lat (msec)   : >=2000=0.89%
  cpu          : usr=0.60%, sys=0.50%, ctx=27504, majf=0, minf=90
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=99.7%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwt: total=0,5702,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16
read-phase: (groupid=0, jobs=1): err= 0: pid=17078: Mon Oct 23 20:05:47 2017
  write: IOPS=18, BW=94.5MiB/s (99.1MB/s)(27.7GiB/300403msec)
    slat (usec): min=385, max=454024, avg=52851.88, stdev=39734.33
    clat (msec): min=174, max=2136, avg=793.55, stdev=550.47
     lat (msec): min=175, max=2272, avg=846.41, stdev=586.35
    clat percentiles (msec):
     |  1.00th=[  275],  5.00th=[  279], 10.00th=[  284], 20.00th=[  300],
     | 30.00th=[  355], 40.00th=[  418], 50.00th=[  498], 60.00th=[  709],
     | 70.00th=[ 1167], 80.00th=[ 1485], 90.00th=[ 1620], 95.00th=[ 1787],
     | 99.00th=[ 1989], 99.50th=[ 2056], 99.90th=[ 2123], 99.95th=[ 2140],
     | 99.99th=[ 2140]
   bw (  KiB/s): min=10240, max=286720, per=24.69%, avg=96665.08, stdev=73206.19, samples=600
   iops        : min=    2, max=   56, avg=18.87, stdev=14.29, samples=600
  lat (msec)   : 250=0.12%, 500=49.97%, 750=11.96%, 1000=5.67%, 2000=31.30%
  lat (msec)   : >=2000=0.97%
  cpu          : usr=0.60%, sys=0.50%, ctx=27456, majf=0, minf=91
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=99.7%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwt: total=0,5677,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16
read-phase: (groupid=0, jobs=1): err= 0: pid=17079: Mon Oct 23 20:05:47 2017
  write: IOPS=19, BW=96.9MiB/s (102MB/s)(28.4GiB/300394msec)
    slat (usec): min=364, max=435971, avg=51560.26, stdev=39426.52
    clat (msec): min=176, max=2179, avg=774.17, stdev=555.08
     lat (msec): min=177, max=2299, avg=825.73, stdev=591.31
    clat percentiles (msec):
     |  1.00th=[  207],  5.00th=[  232], 10.00th=[  279], 20.00th=[  296],
     | 30.00th=[  313], 40.00th=[  393], 50.00th=[  485], 60.00th=[  701],
     | 70.00th=[ 1116], 80.00th=[ 1485], 90.00th=[ 1586], 95.00th=[ 1787],
     | 99.00th=[ 1989], 99.50th=[ 2039], 99.90th=[ 2140], 99.95th=[ 2165],
     | 99.99th=[ 2165]
   bw (  KiB/s): min=10240, max=378880, per=25.30%, avg=99076.15, stdev=78461.72, samples=600
   iops        : min=    2, max=   74, avg=19.34, stdev=15.32, samples=600
  lat (msec)   : 250=5.07%, 500=46.18%, 750=11.84%, 1000=5.41%, 2000=30.73%
  lat (msec)   : >=2000=0.77%
  cpu          : usr=0.57%, sys=0.56%, ctx=28034, majf=0, minf=97
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=99.7%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwt: total=0,5819,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
  WRITE: bw=382MiB/s (401MB/s), 94.5MiB/s-96.9MiB/s (99.1MB/s-102MB/s), io=112GiB (120GB), run=300346-300403msec

Disk stats (read/write):
  nvme0n1: ios=44/918923, merge=0/0, ticks=8/147977388, in_queue=148007124, util=100.00%

From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Saturday, October 21, 2017 11:23 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org>; Harris, James R <james.r.harris(a)intel.com>; John F. Kim <johnk(a)mellanox.com>; Cao, Gang <gang.cao(a)intel.com>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

I am using NVMe.
Thanks
Victor
________________________________
From: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Sent: Saturday, October 21, 2017 1:21:03 AM
To: Victor Banh; Storage Performance Development Kit; Harris, James R; John F. Kim
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

I am also using the SPDK v17.07.1 and DPDK 17.08 on my system to have the trying based on your IO configuration.

Let me check whether our colleague has this Ubuntu OS installed. If not, could you please have a try on CentOS.

Also what kind of backend device did you configured for the NVMe-oF, like Malloc, NVMe SSD or other?

Thanks,
Gang

From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Saturday, October 21, 2017 6:14 AM
To: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; John F. Kim <johnk(a)mellanox.com<mailto:johnk(a)mellanox.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Gang
Is it possible to try on Ubuntu OS?
Which verion of DPDK and SPDK are you using?
Where can I get them? Github?
Thanks
Victor

root(a)clientintel:~# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 16.04.3 LTS
Release:        16.04
Codename:       xenial
root(a)clientintel:~# uname -a
Linux clientintel 4.12.0-041200-generic #201707022031 SMP Mon Jul 3 00:32:52 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux



From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Wednesday, October 18, 2017 11:52 PM
To: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

[root(a)node4 gangcao]# uname -a
Linux node4 4.10.1 #1 SMP Fri Mar 10 15:59:57 CST 2017 x86_64 x86_64 x86_64 GNU/Linux

[root(a)node4 gangcao]# lsb_release -a
LSB Version:    :core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch
Distributor ID: CentOS
Description:    CentOS Linux release 7.2.1511 (Core)
Release:        7.2.1511
Codename:       Core

From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Thursday, October 19, 2017 2:14 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Can you give the OS version and kernel version again for target and client?
I couldn't compile dkdp without installed the latest Kernel on 4.12.X kernel on Ubuntu 16.04.
Thanks
Victor
________________________________
From: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Sent: Wednesday, October 18, 2017 10:59:19 PM
To: Victor Banh; Storage Performance Development Kit; Harris, James R
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

This server does not have OFED installed and it is in the loopback mode.

Found another server also in the loopback mode with ConnectX-3 and have OFED installed.

By the way, what SSD are you using? Maybe it is relating to the SSD? I've just run with 2048k for a short duration and seems no issue. Will run more time to see whether can hit this error.

[root(a)slave3 fio]# lspci | grep -i mell
08:00.0 Ethernet controller: Mellanox Technologies MT27520 Family [ConnectX-3 Pro]

[root(a)slave3 fio]# lsmod | grep -i mlx
mlx4_ib               159744  0
ib_core               208896  15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,ib_srpt,ib_ucm,rdma_ucm,mlx4_ib
mlx4_en               114688  0
mlx4_core             307200  2 mlx4_en,mlx4_ib
ptp                    20480  3 ixgbe,igb,mlx4_en

[root(a)slave3 fio]# ofed_info
MLNX_OFED_LINUX-3.2-2.0.0.0 (OFED-3.2-2.0.0):

ar_mgr:
osm_plugins/ar_mgr/ar_mgr-1.0-0.30.ga1ea4b7.tar.gz

cc_mgr:
osm_plugins/cc_mgr/cc_mgr-1.0-0.29.ga1ea4b7.tar.gz

dapl:
dapl.git mlnx_ofed_3_1
commit c30fb6ce2cbc29d8ed4bde51437f7abb93378c78
dump_pr:
osm_plugins/dump_pr//dump_pr-1.0-0.25.ga1ea4b7.tar.gz

fabric-collector:
fabric_collector//fabric-collector-1.1.0.MLNX20140410.51b267e.tar.gz

fca:
mlnx_ofed_fca/fca-2.5.2431-1.src.rpm

hcoll:
mlnx_ofed_hcol/hcoll-3.4.807-1.src.rpm

ibacm:
mlnx_ofed/ibacm.git mlnx_ofed_3_2
commit 15ad8c13bdebbe62edea0b7df030710b65c14f7f
ibacm_ssa:
mlnx_ofed_ssa/acm/ibacm_ssa-0.0.9.3.MLNX20151203.50eb579.tar.gz

ibdump:
sniffer/sniffer-4.0.0-2/ibdump/linux/ibdump-4.0.0-2.tgz

ibsim:
mlnx_ofed_ibsim/ibsim-0.6-0.8.g9d76581.tar.gz

ibssa:
mlnx_ofed_ssa/distrib/ibssa-0.0.9.3.MLNX20151203.50eb579.tar.gz

ibutils:
ofed-1.5.3-rpms/ibutils/ibutils-1.5.7.1-0.12.gdcaeae2.tar.gz

ibutils2:
ibutils2/ibutils2-2.1.1-0.76.MLNX20160222.gd366c7b.tar.gz

infiniband-diags:
mlnx_ofed_infiniband_diags/infiniband-diags-1.6.6.MLNX20151130.7f0213e.tar.gz

infinipath-psm:
mlnx_ofed_infinipath-psm/infinipath-psm-3.3-2_g6f42cdb_open.tar.gz

iser:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d

isert:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d

kernel-mft:
mlnx_ofed_mft/kernel-mft-4.3.0-25.src.rpm

knem:
knem.git mellanox-master
commit f143ee19a575cd42a334422fa8bd329d671238db
libibcm:
mlnx_ofed/libibcm.git mlnx_ofed_3_0
commit d7d485df305e6536711485bd7e477668e77d8320
libibmad:
mlnx_ofed_libibmad/libibmad-1.3.12.MLNX20151122.d140cb1.tar.gz

libibprof:
mlnx_ofed_libibprof/libibprof-1.1.22-1.src.rpm

libibumad:
mlnx_ofed_libibumad/libibumad-1.3.10.2.MLNX20150406.966500d.tar.gz

libibverbs:
mlnx_ofed/libibverbs.git mlnx_ofed_3_2_2
commit 217a77686f4861229f0e4b94485a13f024634caf
libmlx4:
mlnx_ofed/libmlx4.git mlnx_ofed_3_2_2
commit dda6d7ae1e6e3779a485ebdd0a882f4bcbd027a6
libmlx5:
mlnx_ofed/libmlx5.git mlnx_ofed_3_2_2
commit d0c8645359e0f0aba0408b2d344f3b418d27019b
libopensmssa:
mlnx_ofed_ssa/plugin/libopensmssa-0.0.9.3.MLNX20151203.50eb579.tar.gz

librdmacm:
mlnx_ofed/librdmacm.git mlnx_ofed_3_2_2
commit 6bd430fed9e7b3d57a1876c040431ce7295c7703
libsdp:
libsdp.git mlnx_ofed_3_0
commit fbd01dfff05f42d6b82506e7dbf4bc6b7e6a59a4
libvma:
vma/source_rpms//libvma-7.0.14-0.src.rpm

mlnx-ethtool:
upstream/ethtool.git for-upstream
commit ac0cf295abe0c0832f0711fed66ab9601c8b2513
mlnx-ofa_kernel:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d

mpi-selector:
ofed-1.5.3-rpms/mpi-selector/mpi-selector-1.0.3-1.src.rpm

mpitests:
mlnx_ofed_mpitest/mpitests-3.2.17-e1c7f2f.src.rpm

mstflint:
mlnx_ofed_mstflint/mstflint-4.3.0-1.49.g9b9af70.tar.gz

multiperf:
mlnx_ofed_multiperf/multiperf-3.0-0.10.gda89e8c.tar.gz

mvapich2:
mlnx_ofed_mvapich2/mvapich2-2.2a-1.src.rpm

mxm:
mlnx_ofed_mxm/mxm-3.4.3079-1.src.rpm

ofed-docs:
docs.git mlnx_ofed-3.2
commit ea3386416f9f7130edd2c70fc3424cb2cda50f7d

openmpi:
mlnx_ofed_ompi_1.8/openmpi-1.10.3a1-1.src.rpm

opensm:
mlnx_ofed_opensm/opensm-4.6.1.MLNX20160112.774e977.tar.gz

perftest:
mlnx_ofed_perftest/perftest-3.0-0.18.gb464d59.tar.gz

qperf:
mlnx_ofed_qperf/qperf-0.4.9.tar.gz

rds-tools:
rds-tools.git mlnx_ofed_2_4
commit 299420ca25cf9996bc0748e3bc4b08748996ba49
sdpnetstat:
sdpnetstat.git mlnx_ofed_3_0
commit 3cf409a7cc07e5c71f9640eddbb801ece21b4169
sockperf:
sockperf/sockperf-2.7-43.git3ee62bd8107a.src.rpm

srp:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d

srptools:
srptools/srptools-1.0.2-12.src.rpm


Installed Packages:
-------------------
infiniband-diags
librdmacm
libmlx4
libibverbs-utils
mpi-selector
libibmad-devel
sdpnetstat
knem
libibumad-devel
libsdp
mlnx-ethtool
libibverbs-debuginfo
mlnx-ofa_kernel-modules
srp
opensm
mstflint
cc_mgr
libibmad
libibverbs
kernel-mft
libibverbs-devel-static
libibumad
librdmacm-devel
mlnx-ofa_kernel
libsdp-devel
ibutils2
mlnxofed-docs
libibmad-static
iser
opensm-devel
dump_pr
libibumad-static
rds-tools
libmlx4-debuginfo
mlnx-ofa_kernel-devel
opensm-libs
opensm-static
ar_mgr
dapl-devel-static
infiniband-diags-compat
libibverbs-devel
ibsim

[root(a)slave3 fio]# ibstat
CA 'mlx4_0'
        CA type: MT4103
        Number of ports: 2
        Firmware version: 2.35.5100
        Hardware version: 0
        Node GUID: 0x248a0703006090e0
        System image GUID: 0x248a0703006090e0
        Port 1:
                State: Active
                Physical state: LinkUp
                Rate: 40
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x04010000
                Port GUID: 0x268a07fffe6090e0
                Link layer: Ethernet
        Port 2:
                State: Active
                Physical state: LinkUp
                Rate: 40
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x04010000
                Port GUID: 0x268a07fffe6090e1
                Link layer: Ethernet

[root(a)slave3 fio]# ibv_devinfo
hca_id: mlx4_0
        transport:                      InfiniBand (0)
        fw_ver:                         2.35.5100
        node_guid:                      248a:0703:0060:90e0
        sys_image_guid:                 248a:0703:0060:90e0
        vendor_id:                      0x02c9
        vendor_part_id:                 4103
        hw_ver:                         0x0
        board_id:                       MT_1090111023
        phys_port_cnt:                  2
                port:   1
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                4096 (5)
                        active_mtu:             1024 (3)
                        sm_lid:                 0
                        port_lid:               0
                        port_lmc:               0x00
                        link_layer:             Ethernet

                port:   2
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                4096 (5)
                        active_mtu:             1024 (3)
                        sm_lid:                 0
                        port_lid:               0
                        port_lmc:               0x00
                        link_layer:             Ethernet

From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Thursday, October 19, 2017 12:34 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Do you install Mellanox OFED on the Target and Client server?
Can you run ibstat on both servers?
Thanks
Victor
________________________________
From: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Sent: Wednesday, October 18, 2017 8:59:28 PM
To: Victor Banh; Storage Performance Development Kit; Harris, James R
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Victor,

I've just tried the SPDK v17.07.1 and DPDK v17.08.

nvme version: 1.1.38.gfaab
fio version: 3.1

Tried the 512k and 1024k IO size and there is no error. demsg information as following.

So that there may be other difference here? Looks like you are using ConnectX-5 while I am using ConnectX-4?

Other related information:

[root(a)node4 fio]# lspci | grep -i mell
81:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
81:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]

[root(a)node4 fio]# lsmod | grep -i mlx
mlx5_ib               172032  0
ib_core               200704  15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,mlx5_ib,ib_srpt,ib_ucm,rdma_ucm
mlx5_core             380928  1 mlx5_ib
ptp                    20480  3 ixgbe,igb,mlx5_core

[root(a)node4 fio]# uname -a
Linux node4 4.10.1 #1 SMP Fri Mar 10 15:59:57 CST 2017 x86_64 x86_64 x86_64 GNU/Linux

[577707.543326] nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 192.168.100.8:4420
[577730.854540] detected loopback device
[577730.893761] nvme nvme0: creating 7 I/O queues.
[577730.893797] detected loopback device
[577730.898611] detected loopback device
[577730.908917] detected loopback device
[577730.919073] detected loopback device
[577730.928922] detected loopback device
[577730.938679] detected loopback device
[577730.948365] detected loopback device
[577731.146290] nvme nvme0: new ctrl: NQN "nqn.2016-06.io.spdk:cnode2", addr 192.168.100.8:4420

Thanks,
Gang

From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Thursday, October 19, 2017 9:43 AM
To: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Gang
Any update?
Do you see any error message from "dmesg" with 512k block size running fio?
Thanks
Victor

From: Victor Banh
Sent: Tuesday, October 17, 2017 7:37 PM
To: 'Cao, Gang' <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Gang
spdk-17.07.1 and dpdk-17.08
Thanks
Victor

From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Monday, October 16, 2017 8:51 PM
To: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Victor,

Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?

Thanks,
Gang

From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Tuesday, October 17, 2017 5:30 AM
To: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Cao
Do you see any message from dmesg?

I tried this fio version and still saw these error message from dmesg.

fio-3.1

[869053.218235] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218250] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218259] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218263] ldm_validate_partition_table(): Disk read failed.
[869053.218269] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218277] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218285] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218292] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218296] Dev nvme2n1: unable to read RDB block 0
[869053.218303] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218311] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218323] Buffer I/O error on dev nvme2n1, logical block 3, async page read
[869053.218338]  nvme2n1: unable to read partition table
[869053.246126] nvme2n1: detected capacity change from -62111005559226368 to -62042256479092736
[869053.246195] ldm_validate_partition_table(): Disk read failed.
[869053.246217] Dev nvme2n1: unable to read RDB block 0

From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Monday, October 09, 2017 10:59 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Cc: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Victor,

Thanks for your detailed information on the testing.

I've tried the latest SPDK code and with latest fio-3.1-20-g132b and fio-2.19. It seems like no this kind of error.

Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?

fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme0n1  --name=read-phase --rw=randwrite
read-phase: (g=0): rw=randwrite, bs=(R) 512KiB-512KiB, (W) 512KiB-512KiB, (T) 512KiB-512KiB, ioengine=libaio, iodepth=16
...
fio-3.1-20-g132b
Starting 4 processes
Jobs: 4 (f=4): [w(4)][100.0%][r=0KiB/s,w=1592MiB/s][r=0,w=3183 IOPS][eta 00m:00s]
read-phase: (groupid=0, jobs=1): err= 0: pid=46378: Tue Oct 10 01:23:39 2017

My NIC information:
[root(a)node4 nvme-cli-gerrit]# lsmod | grep -i mlx
mlx5_ib               172032  0
ib_core               200704  15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,mlx5_ib,ib_srpt,ib_ucm,rdma_ucm
mlx5_core             380928  1 mlx5_ib
ptp                    20480  3 ixgbe,igb,mlx5_core
[root(a)node4 nvme-cli-gerrit]# lspci | grep -i mell
81:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
81:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]

From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Victor Banh
Sent: Friday, October 6, 2017 2:41 PM
To: Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio



From: Harris, James R [mailto:james.r.harris(a)intel.com]
Sent: Friday, October 06, 2017 2:32 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Cc: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio

(cc Victor)

From: James Harris <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Date: Thursday, October 5, 2017 at 1:59 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio

Hi Victor,

Could you provide a few more details?  This will help the list to provide some ideas.


1)      On the client, are you using the SPDK NVMe-oF initiator or the kernel initiator?


Kernel initiator, run these commands on client server.



modprobe mlx5_ib

modprobe nvme-rdma

nvme discover -t rdma -a 192.168.10.11 -s 4420

nvme connect -t rdma -n nqn.2016-06.io.spdk:nvme-subsystem-1  -a 192.168.10.11 -s 4420




2)      Can you provide the fio configuration file or command line?  Just so we can have more specifics on "bigger block size".


fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme1n1  --name=read-phase --rw=randwrite


3)      Any details on the HW setup - specifically details on the RDMA NIC (or if you're using SW RoCE).

Nvmf.conf on target server

[Global]
  Comment "Global section"
    ReactorMask 0xff00

[Rpc]
  Enable No
  Listen 127.0.0.1

[Nvmf]
  MaxQueuesPerSession 8
  MaxQueueDepth 128

[Subsystem1]
  NQN nqn.2016-06.io.spdk:nvme-subsystem-1
  Core 9
  Mode Direct
  Listen RDMA 192.168.10.11:4420
  NVMe 0000:82:00.0
  SN S2PMNAAH400039


           It is RDMA NIC, ConnectX 5, Intel CPU Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
NUMA node0 CPU(s):     0-7
NUMA node1 CPU(s):     8-15




Thanks,

-Jim


From: SPDK <spdk-bounces(a)lists.01.org<mailto:spdk-bounces(a)lists.01.org>> on behalf of Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Date: Thursday, October 5, 2017 at 11:26 AM
To: "spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>" <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: [SPDK] Buffer I/O error on bigger block size running fio

Hi
I have SPDK NVMeoF and keep getting error with bigger block size with fio on randwrite tests.
I am using Ubuntu 16.04 with kernel version 4.12.0-041200-generic on target and client.
The DPDK is 17.08 and SPDK is 17.07.1.
Thanks
Victor


[46905.233553] perf: interrupt took too long (2503 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
[48285.159186] blk_update_request: I/O error, dev nvme1n1, sector 2507351968
[48285.159207] blk_update_request: I/O error, dev nvme1n1, sector 1301294496
[48285.159226] blk_update_request: I/O error, dev nvme1n1, sector 1947371168
[48285.159239] blk_update_request: I/O error, dev nvme1n1, sector 1891797568
[48285.159252] blk_update_request: I/O error, dev nvme1n1, sector 10833824
[48285.159265] blk_update_request: I/O error, dev nvme1n1, sector 614937152
[48285.159277] blk_update_request: I/O error, dev nvme1n1, sector 1872305088
[48285.159290] blk_update_request: I/O error, dev nvme1n1, sector 1504491040
[48285.159299] blk_update_request: I/O error, dev nvme1n1, sector 1182136128
[48285.159308] blk_update_request: I/O error, dev nvme1n1, sector 1662985792
[48285.191185] nvme nvme1: Reconnecting in 10 seconds...
[48285.191254] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191291] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191305] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191314] ldm_validate_partition_table(): Disk read failed.
[48285.191320] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191327] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191335] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191342] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191347] Dev nvme1n1: unable to read RDB block 0
[48285.191353] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191360] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191375] Buffer I/O error on dev nvme1n1, logical block 3, async page read
[48285.191389]  nvme1n1: unable to read partition table
[48285.223197] nvme1n1: detected capacity change from 1600321314816 to 0
[48289.623192] nvme1n1: detected capacity change from 0 to -65647705833078784
[48289.623411] ldm_validate_partition_table(): Disk read failed.
[48289.623447] Dev nvme1n1: unable to read RDB block 0
[48289.623486]  nvme1n1: unable to read partition table
[48289.643305] ldm_validate_partition_table(): Disk read failed.
[48289.643328] Dev nvme1n1: unable to read RDB block 0
[48289.643373]  nvme1n1: unable to read partition table





[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 161485 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [SPDK] Buffer I/O error on bigger block size running fio
@ 2017-10-21 15:22 Victor Banh
  0 siblings, 0 replies; 24+ messages in thread
From: Victor Banh @ 2017-10-21 15:22 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 24074 bytes --]

I am using NVMe.
Thanks
Victor
________________________________
From: Cao, Gang <gang.cao(a)intel.com>
Sent: Saturday, October 21, 2017 1:21:03 AM
To: Victor Banh; Storage Performance Development Kit; Harris, James R; John F. Kim
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

I am also using the SPDK v17.07.1 and DPDK 17.08 on my system to have the trying based on your IO configuration.

Let me check whether our colleague has this Ubuntu OS installed. If not, could you please have a try on CentOS.

Also what kind of backend device did you configured for the NVMe-oF, like Malloc, NVMe SSD or other?

Thanks,
Gang

From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Saturday, October 21, 2017 6:14 AM
To: Cao, Gang <gang.cao(a)intel.com>; Storage Performance Development Kit <spdk(a)lists.01.org>; Harris, James R <james.r.harris(a)intel.com>; John F. Kim <johnk(a)mellanox.com>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Gang
Is it possible to try on Ubuntu OS?
Which verion of DPDK and SPDK are you using?
Where can I get them? Github?
Thanks
Victor

root(a)clientintel:~# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 16.04.3 LTS
Release:        16.04
Codename:       xenial
root(a)clientintel:~# uname -a
Linux clientintel 4.12.0-041200-generic #201707022031 SMP Mon Jul 3 00:32:52 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux



From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Wednesday, October 18, 2017 11:52 PM
To: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

[root(a)node4 gangcao]# uname -a
Linux node4 4.10.1 #1 SMP Fri Mar 10 15:59:57 CST 2017 x86_64 x86_64 x86_64 GNU/Linux

[root(a)node4 gangcao]# lsb_release -a
LSB Version:    :core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch
Distributor ID: CentOS
Description:    CentOS Linux release 7.2.1511 (Core)
Release:        7.2.1511
Codename:       Core

From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Thursday, October 19, 2017 2:14 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Can you give the OS version and kernel version again for target and client?
I couldn't compile dkdp without installed the latest Kernel on 4.12.X kernel on Ubuntu 16.04.
Thanks
Victor
________________________________
From: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Sent: Wednesday, October 18, 2017 10:59:19 PM
To: Victor Banh; Storage Performance Development Kit; Harris, James R
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

This server does not have OFED installed and it is in the loopback mode.

Found another server also in the loopback mode with ConnectX-3 and have OFED installed.

By the way, what SSD are you using? Maybe it is relating to the SSD? I’ve just run with 2048k for a short duration and seems no issue. Will run more time to see whether can hit this error.

[root(a)slave3 fio]# lspci | grep -i mell
08:00.0 Ethernet controller: Mellanox Technologies MT27520 Family [ConnectX-3 Pro]

[root(a)slave3 fio]# lsmod | grep -i mlx
mlx4_ib               159744  0
ib_core               208896  15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,ib_srpt,ib_ucm,rdma_ucm,mlx4_ib
mlx4_en               114688  0
mlx4_core             307200  2 mlx4_en,mlx4_ib
ptp                    20480  3 ixgbe,igb,mlx4_en

[root(a)slave3 fio]# ofed_info
MLNX_OFED_LINUX-3.2-2.0.0.0 (OFED-3.2-2.0.0):

ar_mgr:
osm_plugins/ar_mgr/ar_mgr-1.0-0.30.ga1ea4b7.tar.gz

cc_mgr:
osm_plugins/cc_mgr/cc_mgr-1.0-0.29.ga1ea4b7.tar.gz

dapl:
dapl.git mlnx_ofed_3_1
commit c30fb6ce2cbc29d8ed4bde51437f7abb93378c78
dump_pr:
osm_plugins/dump_pr//dump_pr-1.0-0.25.ga1ea4b7.tar.gz

fabric-collector:
fabric_collector//fabric-collector-1.1.0.MLNX20140410.51b267e.tar.gz

fca:
mlnx_ofed_fca/fca-2.5.2431-1.src.rpm

hcoll:
mlnx_ofed_hcol/hcoll-3.4.807-1.src.rpm

ibacm:
mlnx_ofed/ibacm.git mlnx_ofed_3_2
commit 15ad8c13bdebbe62edea0b7df030710b65c14f7f
ibacm_ssa:
mlnx_ofed_ssa/acm/ibacm_ssa-0.0.9.3.MLNX20151203.50eb579.tar.gz

ibdump:
sniffer/sniffer-4.0.0-2/ibdump/linux/ibdump-4.0.0-2.tgz

ibsim:
mlnx_ofed_ibsim/ibsim-0.6-0.8.g9d76581.tar.gz

ibssa:
mlnx_ofed_ssa/distrib/ibssa-0.0.9.3.MLNX20151203.50eb579.tar.gz

ibutils:
ofed-1.5.3-rpms/ibutils/ibutils-1.5.7.1-0.12.gdcaeae2.tar.gz

ibutils2:
ibutils2/ibutils2-2.1.1-0.76.MLNX20160222.gd366c7b.tar.gz

infiniband-diags:
mlnx_ofed_infiniband_diags/infiniband-diags-1.6.6.MLNX20151130.7f0213e.tar.gz

infinipath-psm:
mlnx_ofed_infinipath-psm/infinipath-psm-3.3-2_g6f42cdb_open.tar.gz

iser:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d

isert:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d

kernel-mft:
mlnx_ofed_mft/kernel-mft-4.3.0-25.src.rpm

knem:
knem.git mellanox-master
commit f143ee19a575cd42a334422fa8bd329d671238db
libibcm:
mlnx_ofed/libibcm.git mlnx_ofed_3_0
commit d7d485df305e6536711485bd7e477668e77d8320
libibmad:
mlnx_ofed_libibmad/libibmad-1.3.12.MLNX20151122.d140cb1.tar.gz

libibprof:
mlnx_ofed_libibprof/libibprof-1.1.22-1.src.rpm

libibumad:
mlnx_ofed_libibumad/libibumad-1.3.10.2.MLNX20150406.966500d.tar.gz

libibverbs:
mlnx_ofed/libibverbs.git mlnx_ofed_3_2_2
commit 217a77686f4861229f0e4b94485a13f024634caf
libmlx4:
mlnx_ofed/libmlx4.git mlnx_ofed_3_2_2
commit dda6d7ae1e6e3779a485ebdd0a882f4bcbd027a6
libmlx5:
mlnx_ofed/libmlx5.git mlnx_ofed_3_2_2
commit d0c8645359e0f0aba0408b2d344f3b418d27019b
libopensmssa:
mlnx_ofed_ssa/plugin/libopensmssa-0.0.9.3.MLNX20151203.50eb579.tar.gz

librdmacm:
mlnx_ofed/librdmacm.git mlnx_ofed_3_2_2
commit 6bd430fed9e7b3d57a1876c040431ce7295c7703
libsdp:
libsdp.git mlnx_ofed_3_0
commit fbd01dfff05f42d6b82506e7dbf4bc6b7e6a59a4
libvma:
vma/source_rpms//libvma-7.0.14-0.src.rpm

mlnx-ethtool:
upstream/ethtool.git for-upstream
commit ac0cf295abe0c0832f0711fed66ab9601c8b2513
mlnx-ofa_kernel:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d

mpi-selector:
ofed-1.5.3-rpms/mpi-selector/mpi-selector-1.0.3-1.src.rpm

mpitests:
mlnx_ofed_mpitest/mpitests-3.2.17-e1c7f2f.src.rpm

mstflint:
mlnx_ofed_mstflint/mstflint-4.3.0-1.49.g9b9af70.tar.gz

multiperf:
mlnx_ofed_multiperf/multiperf-3.0-0.10.gda89e8c.tar.gz

mvapich2:
mlnx_ofed_mvapich2/mvapich2-2.2a-1.src.rpm

mxm:
mlnx_ofed_mxm/mxm-3.4.3079-1.src.rpm

ofed-docs:
docs.git mlnx_ofed-3.2
commit ea3386416f9f7130edd2c70fc3424cb2cda50f7d

openmpi:
mlnx_ofed_ompi_1.8/openmpi-1.10.3a1-1.src.rpm

opensm:
mlnx_ofed_opensm/opensm-4.6.1.MLNX20160112.774e977.tar.gz

perftest:
mlnx_ofed_perftest/perftest-3.0-0.18.gb464d59.tar.gz

qperf:
mlnx_ofed_qperf/qperf-0.4.9.tar.gz

rds-tools:
rds-tools.git mlnx_ofed_2_4
commit 299420ca25cf9996bc0748e3bc4b08748996ba49
sdpnetstat:
sdpnetstat.git mlnx_ofed_3_0
commit 3cf409a7cc07e5c71f9640eddbb801ece21b4169
sockperf:
sockperf/sockperf-2.7-43.git3ee62bd8107a.src.rpm

srp:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d

srptools:
srptools/srptools-1.0.2-12.src.rpm


Installed Packages:
-------------------
infiniband-diags
librdmacm
libmlx4
libibverbs-utils
mpi-selector
libibmad-devel
sdpnetstat
knem
libibumad-devel
libsdp
mlnx-ethtool
libibverbs-debuginfo
mlnx-ofa_kernel-modules
srp
opensm
mstflint
cc_mgr
libibmad
libibverbs
kernel-mft
libibverbs-devel-static
libibumad
librdmacm-devel
mlnx-ofa_kernel
libsdp-devel
ibutils2
mlnxofed-docs
libibmad-static
iser
opensm-devel
dump_pr
libibumad-static
rds-tools
libmlx4-debuginfo
mlnx-ofa_kernel-devel
opensm-libs
opensm-static
ar_mgr
dapl-devel-static
infiniband-diags-compat
libibverbs-devel
ibsim

[root(a)slave3 fio]# ibstat
CA 'mlx4_0'
        CA type: MT4103
        Number of ports: 2
        Firmware version: 2.35.5100
        Hardware version: 0
        Node GUID: 0x248a0703006090e0
        System image GUID: 0x248a0703006090e0
        Port 1:
                State: Active
                Physical state: LinkUp
                Rate: 40
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x04010000
                Port GUID: 0x268a07fffe6090e0
                Link layer: Ethernet
        Port 2:
                State: Active
                Physical state: LinkUp
                Rate: 40
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x04010000
                Port GUID: 0x268a07fffe6090e1
                Link layer: Ethernet

[root(a)slave3 fio]# ibv_devinfo
hca_id: mlx4_0
        transport:                      InfiniBand (0)
        fw_ver:                         2.35.5100
        node_guid:                      248a:0703:0060:90e0
        sys_image_guid:                 248a:0703:0060:90e0
        vendor_id:                      0x02c9
        vendor_part_id:                 4103
        hw_ver:                         0x0
        board_id:                       MT_1090111023
        phys_port_cnt:                  2
                port:   1
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                4096 (5)
                        active_mtu:             1024 (3)
                        sm_lid:                 0
                        port_lid:               0
                        port_lmc:               0x00
                        link_layer:             Ethernet

                port:   2
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                4096 (5)
                        active_mtu:             1024 (3)
                        sm_lid:                 0
                        port_lid:               0
                        port_lmc:               0x00
                        link_layer:             Ethernet

From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Thursday, October 19, 2017 12:34 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Do you install Mellanox OFED on the Target and Client server?
Can you run ibstat on both servers?
Thanks
Victor
________________________________
From: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Sent: Wednesday, October 18, 2017 8:59:28 PM
To: Victor Banh; Storage Performance Development Kit; Harris, James R
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Victor,

I’ve just tried the SPDK v17.07.1 and DPDK v17.08.

nvme version: 1.1.38.gfaab
fio version: 3.1

Tried the 512k and 1024k IO size and there is no error. demsg information as following.

So that there may be other difference here? Looks like you are using ConnectX-5 while I am using ConnectX-4?

Other related information:

[root(a)node4 fio]# lspci | grep -i mell
81:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
81:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]

[root(a)node4 fio]# lsmod | grep -i mlx
mlx5_ib               172032  0
ib_core               200704  15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,mlx5_ib,ib_srpt,ib_ucm,rdma_ucm
mlx5_core             380928  1 mlx5_ib
ptp                    20480  3 ixgbe,igb,mlx5_core

[root(a)node4 fio]# uname -a
Linux node4 4.10.1 #1 SMP Fri Mar 10 15:59:57 CST 2017 x86_64 x86_64 x86_64 GNU/Linux

[577707.543326] nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 192.168.100.8:4420
[577730.854540] detected loopback device
[577730.893761] nvme nvme0: creating 7 I/O queues.
[577730.893797] detected loopback device
[577730.898611] detected loopback device
[577730.908917] detected loopback device
[577730.919073] detected loopback device
[577730.928922] detected loopback device
[577730.938679] detected loopback device
[577730.948365] detected loopback device
[577731.146290] nvme nvme0: new ctrl: NQN "nqn.2016-06.io.spdk:cnode2", addr 192.168.100.8:4420

Thanks,
Gang

From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Thursday, October 19, 2017 9:43 AM
To: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Gang
Any update?
Do you see any error message from “dmesg” with 512k block size running fio?
Thanks
Victor

From: Victor Banh
Sent: Tuesday, October 17, 2017 7:37 PM
To: 'Cao, Gang' <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Gang
spdk-17.07.1 and dpdk-17.08
Thanks
Victor

From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Monday, October 16, 2017 8:51 PM
To: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Victor,

Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?

Thanks,
Gang

From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Tuesday, October 17, 2017 5:30 AM
To: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Cao
Do you see any message from dmesg?

I tried this fio version and still saw these error message from dmesg.

fio-3.1

[869053.218235] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218250] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218259] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218263] ldm_validate_partition_table(): Disk read failed.
[869053.218269] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218277] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218285] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218292] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218296] Dev nvme2n1: unable to read RDB block 0
[869053.218303] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218311] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218323] Buffer I/O error on dev nvme2n1, logical block 3, async page read
[869053.218338]  nvme2n1: unable to read partition table
[869053.246126] nvme2n1: detected capacity change from -62111005559226368 to -62042256479092736
[869053.246195] ldm_validate_partition_table(): Disk read failed.
[869053.246217] Dev nvme2n1: unable to read RDB block 0

From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Monday, October 09, 2017 10:59 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Cc: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Victor,

Thanks for your detailed information on the testing.

I’ve tried the latest SPDK code and with latest fio-3.1-20-g132b and fio-2.19. It seems like no this kind of error.

Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?

fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme0n1  --name=read-phase --rw=randwrite
read-phase: (g=0): rw=randwrite, bs=(R) 512KiB-512KiB, (W) 512KiB-512KiB, (T) 512KiB-512KiB, ioengine=libaio, iodepth=16
...
fio-3.1-20-g132b
Starting 4 processes
Jobs: 4 (f=4): [w(4)][100.0%][r=0KiB/s,w=1592MiB/s][r=0,w=3183 IOPS][eta 00m:00s]
read-phase: (groupid=0, jobs=1): err= 0: pid=46378: Tue Oct 10 01:23:39 2017

My NIC information:
[root(a)node4 nvme-cli-gerrit]# lsmod | grep -i mlx
mlx5_ib               172032  0
ib_core               200704  15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,mlx5_ib,ib_srpt,ib_ucm,rdma_ucm
mlx5_core             380928  1 mlx5_ib
ptp                    20480  3 ixgbe,igb,mlx5_core
[root(a)node4 nvme-cli-gerrit]# lspci | grep -i mell
81:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
81:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]

From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Victor Banh
Sent: Friday, October 6, 2017 2:41 PM
To: Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio



From: Harris, James R [mailto:james.r.harris(a)intel.com]
Sent: Friday, October 06, 2017 2:32 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Cc: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio

(cc Victor)

From: James Harris <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Date: Thursday, October 5, 2017 at 1:59 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio

Hi Victor,

Could you provide a few more details?  This will help the list to provide some ideas.


1)      On the client, are you using the SPDK NVMe-oF initiator or the kernel initiator?


Kernel initiator, run these commands on client server.



modprobe mlx5_ib

modprobe nvme-rdma

nvme discover -t rdma -a 192.168.10.11 -s 4420

nvme connect -t rdma -n nqn.2016-06.io.spdk:nvme-subsystem-1  -a 192.168.10.11 -s 4420




2)      Can you provide the fio configuration file or command line?  Just so we can have more specifics on “bigger block size”.


fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme1n1  --name=read-phase --rw=randwrite


3)      Any details on the HW setup – specifically details on the RDMA NIC (or if you’re using SW RoCE).

Nvmf.conf on target server

[Global]
  Comment "Global section"
    ReactorMask 0xff00

[Rpc]
  Enable No
  Listen 127.0.0.1

[Nvmf]
  MaxQueuesPerSession 8
  MaxQueueDepth 128

[Subsystem1]
  NQN nqn.2016-06.io.spdk:nvme-subsystem-1
  Core 9
  Mode Direct
  Listen RDMA 192.168.10.11:4420
  NVMe 0000:82:00.0
  SN S2PMNAAH400039


           It is RDMA NIC, ConnectX 5, Intel CPU Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
NUMA node0 CPU(s):     0-7
NUMA node1 CPU(s):     8-15




Thanks,

-Jim


From: SPDK <spdk-bounces(a)lists.01.org<mailto:spdk-bounces(a)lists.01.org>> on behalf of Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Date: Thursday, October 5, 2017 at 11:26 AM
To: "spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>" <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: [SPDK] Buffer I/O error on bigger block size running fio

Hi
I have SPDK NVMeoF and keep getting error with bigger block size with fio on randwrite tests.
I am using Ubuntu 16.04 with kernel version 4.12.0-041200-generic on target and client.
The DPDK is 17.08 and SPDK is 17.07.1.
Thanks
Victor


[46905.233553] perf: interrupt took too long (2503 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
[48285.159186] blk_update_request: I/O error, dev nvme1n1, sector 2507351968
[48285.159207] blk_update_request: I/O error, dev nvme1n1, sector 1301294496
[48285.159226] blk_update_request: I/O error, dev nvme1n1, sector 1947371168
[48285.159239] blk_update_request: I/O error, dev nvme1n1, sector 1891797568
[48285.159252] blk_update_request: I/O error, dev nvme1n1, sector 10833824
[48285.159265] blk_update_request: I/O error, dev nvme1n1, sector 614937152
[48285.159277] blk_update_request: I/O error, dev nvme1n1, sector 1872305088
[48285.159290] blk_update_request: I/O error, dev nvme1n1, sector 1504491040
[48285.159299] blk_update_request: I/O error, dev nvme1n1, sector 1182136128
[48285.159308] blk_update_request: I/O error, dev nvme1n1, sector 1662985792
[48285.191185] nvme nvme1: Reconnecting in 10 seconds...
[48285.191254] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191291] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191305] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191314] ldm_validate_partition_table(): Disk read failed.
[48285.191320] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191327] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191335] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191342] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191347] Dev nvme1n1: unable to read RDB block 0
[48285.191353] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191360] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191375] Buffer I/O error on dev nvme1n1, logical block 3, async page read
[48285.191389]  nvme1n1: unable to read partition table
[48285.223197] nvme1n1: detected capacity change from 1600321314816 to 0
[48289.623192] nvme1n1: detected capacity change from 0 to -65647705833078784
[48289.623411] ldm_validate_partition_table(): Disk read failed.
[48289.623447] Dev nvme1n1: unable to read RDB block 0
[48289.623486]  nvme1n1: unable to read partition table
[48289.643305] ldm_validate_partition_table(): Disk read failed.
[48289.643328] Dev nvme1n1: unable to read RDB block 0
[48289.643373]  nvme1n1: unable to read partition table





[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 96437 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [SPDK] Buffer I/O error on bigger block size running fio
@ 2017-10-21  8:21 Cao, Gang
  0 siblings, 0 replies; 24+ messages in thread
From: Cao, Gang @ 2017-10-21  8:21 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 23746 bytes --]

I am also using the SPDK v17.07.1 and DPDK 17.08 on my system to have the trying based on your IO configuration.

Let me check whether our colleague has this Ubuntu OS installed. If not, could you please have a try on CentOS.

Also what kind of backend device did you configured for the NVMe-oF, like Malloc, NVMe SSD or other?

Thanks,
Gang

From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Saturday, October 21, 2017 6:14 AM
To: Cao, Gang <gang.cao(a)intel.com>; Storage Performance Development Kit <spdk(a)lists.01.org>; Harris, James R <james.r.harris(a)intel.com>; John F. Kim <johnk(a)mellanox.com>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Gang
Is it possible to try on Ubuntu OS?
Which verion of DPDK and SPDK are you using?
Where can I get them? Github?
Thanks
Victor

root(a)clientintel:~# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 16.04.3 LTS
Release:        16.04
Codename:       xenial
root(a)clientintel:~# uname -a
Linux clientintel 4.12.0-041200-generic #201707022031 SMP Mon Jul 3 00:32:52 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux



From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Wednesday, October 18, 2017 11:52 PM
To: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

[root(a)node4 gangcao]# uname -a
Linux node4 4.10.1 #1 SMP Fri Mar 10 15:59:57 CST 2017 x86_64 x86_64 x86_64 GNU/Linux

[root(a)node4 gangcao]# lsb_release -a
LSB Version:    :core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch
Distributor ID: CentOS
Description:    CentOS Linux release 7.2.1511 (Core)
Release:        7.2.1511
Codename:       Core

From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Thursday, October 19, 2017 2:14 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Can you give the OS version and kernel version again for target and client?
I couldn't compile dkdp without installed the latest Kernel on 4.12.X kernel on Ubuntu 16.04.
Thanks
Victor
________________________________
From: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Sent: Wednesday, October 18, 2017 10:59:19 PM
To: Victor Banh; Storage Performance Development Kit; Harris, James R
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

This server does not have OFED installed and it is in the loopback mode.

Found another server also in the loopback mode with ConnectX-3 and have OFED installed.

By the way, what SSD are you using? Maybe it is relating to the SSD? I've just run with 2048k for a short duration and seems no issue. Will run more time to see whether can hit this error.

[root(a)slave3 fio]# lspci | grep -i mell
08:00.0 Ethernet controller: Mellanox Technologies MT27520 Family [ConnectX-3 Pro]

[root(a)slave3 fio]# lsmod | grep -i mlx
mlx4_ib               159744  0
ib_core               208896  15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,ib_srpt,ib_ucm,rdma_ucm,mlx4_ib
mlx4_en               114688  0
mlx4_core             307200  2 mlx4_en,mlx4_ib
ptp                    20480  3 ixgbe,igb,mlx4_en

[root(a)slave3 fio]# ofed_info
MLNX_OFED_LINUX-3.2-2.0.0.0 (OFED-3.2-2.0.0):

ar_mgr:
osm_plugins/ar_mgr/ar_mgr-1.0-0.30.ga1ea4b7.tar.gz

cc_mgr:
osm_plugins/cc_mgr/cc_mgr-1.0-0.29.ga1ea4b7.tar.gz

dapl:
dapl.git mlnx_ofed_3_1
commit c30fb6ce2cbc29d8ed4bde51437f7abb93378c78
dump_pr:
osm_plugins/dump_pr//dump_pr-1.0-0.25.ga1ea4b7.tar.gz

fabric-collector:
fabric_collector//fabric-collector-1.1.0.MLNX20140410.51b267e.tar.gz

fca:
mlnx_ofed_fca/fca-2.5.2431-1.src.rpm

hcoll:
mlnx_ofed_hcol/hcoll-3.4.807-1.src.rpm

ibacm:
mlnx_ofed/ibacm.git mlnx_ofed_3_2
commit 15ad8c13bdebbe62edea0b7df030710b65c14f7f
ibacm_ssa:
mlnx_ofed_ssa/acm/ibacm_ssa-0.0.9.3.MLNX20151203.50eb579.tar.gz

ibdump:
sniffer/sniffer-4.0.0-2/ibdump/linux/ibdump-4.0.0-2.tgz

ibsim:
mlnx_ofed_ibsim/ibsim-0.6-0.8.g9d76581.tar.gz

ibssa:
mlnx_ofed_ssa/distrib/ibssa-0.0.9.3.MLNX20151203.50eb579.tar.gz

ibutils:
ofed-1.5.3-rpms/ibutils/ibutils-1.5.7.1-0.12.gdcaeae2.tar.gz

ibutils2:
ibutils2/ibutils2-2.1.1-0.76.MLNX20160222.gd366c7b.tar.gz

infiniband-diags:
mlnx_ofed_infiniband_diags/infiniband-diags-1.6.6.MLNX20151130.7f0213e.tar.gz

infinipath-psm:
mlnx_ofed_infinipath-psm/infinipath-psm-3.3-2_g6f42cdb_open.tar.gz

iser:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d

isert:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d

kernel-mft:
mlnx_ofed_mft/kernel-mft-4.3.0-25.src.rpm

knem:
knem.git mellanox-master
commit f143ee19a575cd42a334422fa8bd329d671238db
libibcm:
mlnx_ofed/libibcm.git mlnx_ofed_3_0
commit d7d485df305e6536711485bd7e477668e77d8320
libibmad:
mlnx_ofed_libibmad/libibmad-1.3.12.MLNX20151122.d140cb1.tar.gz

libibprof:
mlnx_ofed_libibprof/libibprof-1.1.22-1.src.rpm

libibumad:
mlnx_ofed_libibumad/libibumad-1.3.10.2.MLNX20150406.966500d.tar.gz

libibverbs:
mlnx_ofed/libibverbs.git mlnx_ofed_3_2_2
commit 217a77686f4861229f0e4b94485a13f024634caf
libmlx4:
mlnx_ofed/libmlx4.git mlnx_ofed_3_2_2
commit dda6d7ae1e6e3779a485ebdd0a882f4bcbd027a6
libmlx5:
mlnx_ofed/libmlx5.git mlnx_ofed_3_2_2
commit d0c8645359e0f0aba0408b2d344f3b418d27019b
libopensmssa:
mlnx_ofed_ssa/plugin/libopensmssa-0.0.9.3.MLNX20151203.50eb579.tar.gz

librdmacm:
mlnx_ofed/librdmacm.git mlnx_ofed_3_2_2
commit 6bd430fed9e7b3d57a1876c040431ce7295c7703
libsdp:
libsdp.git mlnx_ofed_3_0
commit fbd01dfff05f42d6b82506e7dbf4bc6b7e6a59a4
libvma:
vma/source_rpms//libvma-7.0.14-0.src.rpm

mlnx-ethtool:
upstream/ethtool.git for-upstream
commit ac0cf295abe0c0832f0711fed66ab9601c8b2513
mlnx-ofa_kernel:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d

mpi-selector:
ofed-1.5.3-rpms/mpi-selector/mpi-selector-1.0.3-1.src.rpm

mpitests:
mlnx_ofed_mpitest/mpitests-3.2.17-e1c7f2f.src.rpm

mstflint:
mlnx_ofed_mstflint/mstflint-4.3.0-1.49.g9b9af70.tar.gz

multiperf:
mlnx_ofed_multiperf/multiperf-3.0-0.10.gda89e8c.tar.gz

mvapich2:
mlnx_ofed_mvapich2/mvapich2-2.2a-1.src.rpm

mxm:
mlnx_ofed_mxm/mxm-3.4.3079-1.src.rpm

ofed-docs:
docs.git mlnx_ofed-3.2
commit ea3386416f9f7130edd2c70fc3424cb2cda50f7d

openmpi:
mlnx_ofed_ompi_1.8/openmpi-1.10.3a1-1.src.rpm

opensm:
mlnx_ofed_opensm/opensm-4.6.1.MLNX20160112.774e977.tar.gz

perftest:
mlnx_ofed_perftest/perftest-3.0-0.18.gb464d59.tar.gz

qperf:
mlnx_ofed_qperf/qperf-0.4.9.tar.gz

rds-tools:
rds-tools.git mlnx_ofed_2_4
commit 299420ca25cf9996bc0748e3bc4b08748996ba49
sdpnetstat:
sdpnetstat.git mlnx_ofed_3_0
commit 3cf409a7cc07e5c71f9640eddbb801ece21b4169
sockperf:
sockperf/sockperf-2.7-43.git3ee62bd8107a.src.rpm

srp:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d

srptools:
srptools/srptools-1.0.2-12.src.rpm


Installed Packages:
-------------------
infiniband-diags
librdmacm
libmlx4
libibverbs-utils
mpi-selector
libibmad-devel
sdpnetstat
knem
libibumad-devel
libsdp
mlnx-ethtool
libibverbs-debuginfo
mlnx-ofa_kernel-modules
srp
opensm
mstflint
cc_mgr
libibmad
libibverbs
kernel-mft
libibverbs-devel-static
libibumad
librdmacm-devel
mlnx-ofa_kernel
libsdp-devel
ibutils2
mlnxofed-docs
libibmad-static
iser
opensm-devel
dump_pr
libibumad-static
rds-tools
libmlx4-debuginfo
mlnx-ofa_kernel-devel
opensm-libs
opensm-static
ar_mgr
dapl-devel-static
infiniband-diags-compat
libibverbs-devel
ibsim

[root(a)slave3 fio]# ibstat
CA 'mlx4_0'
        CA type: MT4103
        Number of ports: 2
        Firmware version: 2.35.5100
        Hardware version: 0
        Node GUID: 0x248a0703006090e0
        System image GUID: 0x248a0703006090e0
        Port 1:
                State: Active
                Physical state: LinkUp
                Rate: 40
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x04010000
                Port GUID: 0x268a07fffe6090e0
                Link layer: Ethernet
        Port 2:
                State: Active
                Physical state: LinkUp
                Rate: 40
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x04010000
                Port GUID: 0x268a07fffe6090e1
                Link layer: Ethernet

[root(a)slave3 fio]# ibv_devinfo
hca_id: mlx4_0
        transport:                      InfiniBand (0)
        fw_ver:                         2.35.5100
        node_guid:                      248a:0703:0060:90e0
        sys_image_guid:                 248a:0703:0060:90e0
        vendor_id:                      0x02c9
        vendor_part_id:                 4103
        hw_ver:                         0x0
        board_id:                       MT_1090111023
        phys_port_cnt:                  2
                port:   1
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                4096 (5)
                        active_mtu:             1024 (3)
                        sm_lid:                 0
                        port_lid:               0
                        port_lmc:               0x00
                        link_layer:             Ethernet

                port:   2
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                4096 (5)
                        active_mtu:             1024 (3)
                        sm_lid:                 0
                        port_lid:               0
                        port_lmc:               0x00
                        link_layer:             Ethernet

From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Thursday, October 19, 2017 12:34 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Do you install Mellanox OFED on the Target and Client server?
Can you run ibstat on both servers?
Thanks
Victor
________________________________
From: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Sent: Wednesday, October 18, 2017 8:59:28 PM
To: Victor Banh; Storage Performance Development Kit; Harris, James R
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Victor,

I've just tried the SPDK v17.07.1 and DPDK v17.08.

nvme version: 1.1.38.gfaab
fio version: 3.1

Tried the 512k and 1024k IO size and there is no error. demsg information as following.

So that there may be other difference here? Looks like you are using ConnectX-5 while I am using ConnectX-4?

Other related information:

[root(a)node4 fio]# lspci | grep -i mell
81:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
81:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]

[root(a)node4 fio]# lsmod | grep -i mlx
mlx5_ib               172032  0
ib_core               200704  15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,mlx5_ib,ib_srpt,ib_ucm,rdma_ucm
mlx5_core             380928  1 mlx5_ib
ptp                    20480  3 ixgbe,igb,mlx5_core

[root(a)node4 fio]# uname -a
Linux node4 4.10.1 #1 SMP Fri Mar 10 15:59:57 CST 2017 x86_64 x86_64 x86_64 GNU/Linux

[577707.543326] nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 192.168.100.8:4420
[577730.854540] detected loopback device
[577730.893761] nvme nvme0: creating 7 I/O queues.
[577730.893797] detected loopback device
[577730.898611] detected loopback device
[577730.908917] detected loopback device
[577730.919073] detected loopback device
[577730.928922] detected loopback device
[577730.938679] detected loopback device
[577730.948365] detected loopback device
[577731.146290] nvme nvme0: new ctrl: NQN "nqn.2016-06.io.spdk:cnode2", addr 192.168.100.8:4420

Thanks,
Gang

From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Thursday, October 19, 2017 9:43 AM
To: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Gang
Any update?
Do you see any error message from "dmesg" with 512k block size running fio?
Thanks
Victor

From: Victor Banh
Sent: Tuesday, October 17, 2017 7:37 PM
To: 'Cao, Gang' <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Gang
spdk-17.07.1 and dpdk-17.08
Thanks
Victor

From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Monday, October 16, 2017 8:51 PM
To: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Victor,

Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?

Thanks,
Gang

From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Tuesday, October 17, 2017 5:30 AM
To: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Cao
Do you see any message from dmesg?

I tried this fio version and still saw these error message from dmesg.

fio-3.1

[869053.218235] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218250] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218259] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218263] ldm_validate_partition_table(): Disk read failed.
[869053.218269] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218277] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218285] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218292] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218296] Dev nvme2n1: unable to read RDB block 0
[869053.218303] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218311] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218323] Buffer I/O error on dev nvme2n1, logical block 3, async page read
[869053.218338]  nvme2n1: unable to read partition table
[869053.246126] nvme2n1: detected capacity change from -62111005559226368 to -62042256479092736
[869053.246195] ldm_validate_partition_table(): Disk read failed.
[869053.246217] Dev nvme2n1: unable to read RDB block 0

From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Monday, October 09, 2017 10:59 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Cc: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Victor,

Thanks for your detailed information on the testing.

I've tried the latest SPDK code and with latest fio-3.1-20-g132b and fio-2.19. It seems like no this kind of error.

Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?

fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme0n1  --name=read-phase --rw=randwrite
read-phase: (g=0): rw=randwrite, bs=(R) 512KiB-512KiB, (W) 512KiB-512KiB, (T) 512KiB-512KiB, ioengine=libaio, iodepth=16
...
fio-3.1-20-g132b
Starting 4 processes
Jobs: 4 (f=4): [w(4)][100.0%][r=0KiB/s,w=1592MiB/s][r=0,w=3183 IOPS][eta 00m:00s]
read-phase: (groupid=0, jobs=1): err= 0: pid=46378: Tue Oct 10 01:23:39 2017

My NIC information:
[root(a)node4 nvme-cli-gerrit]# lsmod | grep -i mlx
mlx5_ib               172032  0
ib_core               200704  15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,mlx5_ib,ib_srpt,ib_ucm,rdma_ucm
mlx5_core             380928  1 mlx5_ib
ptp                    20480  3 ixgbe,igb,mlx5_core
[root(a)node4 nvme-cli-gerrit]# lspci | grep -i mell
81:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
81:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]

From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Victor Banh
Sent: Friday, October 6, 2017 2:41 PM
To: Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio



From: Harris, James R [mailto:james.r.harris(a)intel.com]
Sent: Friday, October 06, 2017 2:32 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Cc: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio

(cc Victor)

From: James Harris <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Date: Thursday, October 5, 2017 at 1:59 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio

Hi Victor,

Could you provide a few more details?  This will help the list to provide some ideas.


1)      On the client, are you using the SPDK NVMe-oF initiator or the kernel initiator?


Kernel initiator, run these commands on client server.



modprobe mlx5_ib

modprobe nvme-rdma

nvme discover -t rdma -a 192.168.10.11 -s 4420

nvme connect -t rdma -n nqn.2016-06.io.spdk:nvme-subsystem-1  -a 192.168.10.11 -s 4420




2)      Can you provide the fio configuration file or command line?  Just so we can have more specifics on "bigger block size".


fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme1n1  --name=read-phase --rw=randwrite


3)      Any details on the HW setup - specifically details on the RDMA NIC (or if you're using SW RoCE).

Nvmf.conf on target server

[Global]
  Comment "Global section"
    ReactorMask 0xff00

[Rpc]
  Enable No
  Listen 127.0.0.1

[Nvmf]
  MaxQueuesPerSession 8
  MaxQueueDepth 128

[Subsystem1]
  NQN nqn.2016-06.io.spdk:nvme-subsystem-1
  Core 9
  Mode Direct
  Listen RDMA 192.168.10.11:4420
  NVMe 0000:82:00.0
  SN S2PMNAAH400039


           It is RDMA NIC, ConnectX 5, Intel CPU Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
NUMA node0 CPU(s):     0-7
NUMA node1 CPU(s):     8-15




Thanks,

-Jim


From: SPDK <spdk-bounces(a)lists.01.org<mailto:spdk-bounces(a)lists.01.org>> on behalf of Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Date: Thursday, October 5, 2017 at 11:26 AM
To: "spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>" <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: [SPDK] Buffer I/O error on bigger block size running fio

Hi
I have SPDK NVMeoF and keep getting error with bigger block size with fio on randwrite tests.
I am using Ubuntu 16.04 with kernel version 4.12.0-041200-generic on target and client.
The DPDK is 17.08 and SPDK is 17.07.1.
Thanks
Victor


[46905.233553] perf: interrupt took too long (2503 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
[48285.159186] blk_update_request: I/O error, dev nvme1n1, sector 2507351968
[48285.159207] blk_update_request: I/O error, dev nvme1n1, sector 1301294496
[48285.159226] blk_update_request: I/O error, dev nvme1n1, sector 1947371168
[48285.159239] blk_update_request: I/O error, dev nvme1n1, sector 1891797568
[48285.159252] blk_update_request: I/O error, dev nvme1n1, sector 10833824
[48285.159265] blk_update_request: I/O error, dev nvme1n1, sector 614937152
[48285.159277] blk_update_request: I/O error, dev nvme1n1, sector 1872305088
[48285.159290] blk_update_request: I/O error, dev nvme1n1, sector 1504491040
[48285.159299] blk_update_request: I/O error, dev nvme1n1, sector 1182136128
[48285.159308] blk_update_request: I/O error, dev nvme1n1, sector 1662985792
[48285.191185] nvme nvme1: Reconnecting in 10 seconds...
[48285.191254] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191291] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191305] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191314] ldm_validate_partition_table(): Disk read failed.
[48285.191320] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191327] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191335] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191342] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191347] Dev nvme1n1: unable to read RDB block 0
[48285.191353] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191360] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191375] Buffer I/O error on dev nvme1n1, logical block 3, async page read
[48285.191389]  nvme1n1: unable to read partition table
[48285.223197] nvme1n1: detected capacity change from 1600321314816 to 0
[48289.623192] nvme1n1: detected capacity change from 0 to -65647705833078784
[48289.623411] ldm_validate_partition_table(): Disk read failed.
[48289.623447] Dev nvme1n1: unable to read RDB block 0
[48289.623486]  nvme1n1: unable to read partition table
[48289.643305] ldm_validate_partition_table(): Disk read failed.
[48289.643328] Dev nvme1n1: unable to read RDB block 0
[48289.643373]  nvme1n1: unable to read partition table





[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 102950 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [SPDK] Buffer I/O error on bigger block size running fio
@ 2017-10-20 22:13 Victor Banh
  0 siblings, 0 replies; 24+ messages in thread
From: Victor Banh @ 2017-10-20 22:13 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 22955 bytes --]

Hi Gang
Is it possible to try on Ubuntu OS?
Which verion of DPDK and SPDK are you using?
Where can I get them? Github?
Thanks
Victor

root(a)clientintel:~# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 16.04.3 LTS
Release:        16.04
Codename:       xenial
root(a)clientintel:~# uname -a
Linux clientintel 4.12.0-041200-generic #201707022031 SMP Mon Jul 3 00:32:52 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux



From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Wednesday, October 18, 2017 11:52 PM
To: Victor Banh <victorb(a)mellanox.com>; Storage Performance Development Kit <spdk(a)lists.01.org>; Harris, James R <james.r.harris(a)intel.com>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

[root(a)node4 gangcao]# uname -a
Linux node4 4.10.1 #1 SMP Fri Mar 10 15:59:57 CST 2017 x86_64 x86_64 x86_64 GNU/Linux

[root(a)node4 gangcao]# lsb_release -a
LSB Version:    :core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch
Distributor ID: CentOS
Description:    CentOS Linux release 7.2.1511 (Core)
Release:        7.2.1511
Codename:       Core

From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Thursday, October 19, 2017 2:14 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Can you give the OS version and kernel version again for target and client?
I couldn't compile dkdp without installed the latest Kernel on 4.12.X kernel on Ubuntu 16.04.
Thanks
Victor
________________________________
From: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Sent: Wednesday, October 18, 2017 10:59:19 PM
To: Victor Banh; Storage Performance Development Kit; Harris, James R
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

This server does not have OFED installed and it is in the loopback mode.

Found another server also in the loopback mode with ConnectX-3 and have OFED installed.

By the way, what SSD are you using? Maybe it is relating to the SSD? I've just run with 2048k for a short duration and seems no issue. Will run more time to see whether can hit this error.

[root(a)slave3 fio]# lspci | grep -i mell
08:00.0 Ethernet controller: Mellanox Technologies MT27520 Family [ConnectX-3 Pro]

[root(a)slave3 fio]# lsmod | grep -i mlx
mlx4_ib               159744  0
ib_core               208896  15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,ib_srpt,ib_ucm,rdma_ucm,mlx4_ib
mlx4_en               114688  0
mlx4_core             307200  2 mlx4_en,mlx4_ib
ptp                    20480  3 ixgbe,igb,mlx4_en

[root(a)slave3 fio]# ofed_info
MLNX_OFED_LINUX-3.2-2.0.0.0 (OFED-3.2-2.0.0):

ar_mgr:
osm_plugins/ar_mgr/ar_mgr-1.0-0.30.ga1ea4b7.tar.gz

cc_mgr:
osm_plugins/cc_mgr/cc_mgr-1.0-0.29.ga1ea4b7.tar.gz

dapl:
dapl.git mlnx_ofed_3_1
commit c30fb6ce2cbc29d8ed4bde51437f7abb93378c78
dump_pr:
osm_plugins/dump_pr//dump_pr-1.0-0.25.ga1ea4b7.tar.gz

fabric-collector:
fabric_collector//fabric-collector-1.1.0.MLNX20140410.51b267e.tar.gz

fca:
mlnx_ofed_fca/fca-2.5.2431-1.src.rpm

hcoll:
mlnx_ofed_hcol/hcoll-3.4.807-1.src.rpm

ibacm:
mlnx_ofed/ibacm.git mlnx_ofed_3_2
commit 15ad8c13bdebbe62edea0b7df030710b65c14f7f
ibacm_ssa:
mlnx_ofed_ssa/acm/ibacm_ssa-0.0.9.3.MLNX20151203.50eb579.tar.gz

ibdump:
sniffer/sniffer-4.0.0-2/ibdump/linux/ibdump-4.0.0-2.tgz

ibsim:
mlnx_ofed_ibsim/ibsim-0.6-0.8.g9d76581.tar.gz

ibssa:
mlnx_ofed_ssa/distrib/ibssa-0.0.9.3.MLNX20151203.50eb579.tar.gz

ibutils:
ofed-1.5.3-rpms/ibutils/ibutils-1.5.7.1-0.12.gdcaeae2.tar.gz

ibutils2:
ibutils2/ibutils2-2.1.1-0.76.MLNX20160222.gd366c7b.tar.gz

infiniband-diags:
mlnx_ofed_infiniband_diags/infiniband-diags-1.6.6.MLNX20151130.7f0213e.tar.gz

infinipath-psm:
mlnx_ofed_infinipath-psm/infinipath-psm-3.3-2_g6f42cdb_open.tar.gz

iser:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d

isert:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d

kernel-mft:
mlnx_ofed_mft/kernel-mft-4.3.0-25.src.rpm

knem:
knem.git mellanox-master
commit f143ee19a575cd42a334422fa8bd329d671238db
libibcm:
mlnx_ofed/libibcm.git mlnx_ofed_3_0
commit d7d485df305e6536711485bd7e477668e77d8320
libibmad:
mlnx_ofed_libibmad/libibmad-1.3.12.MLNX20151122.d140cb1.tar.gz

libibprof:
mlnx_ofed_libibprof/libibprof-1.1.22-1.src.rpm

libibumad:
mlnx_ofed_libibumad/libibumad-1.3.10.2.MLNX20150406.966500d.tar.gz

libibverbs:
mlnx_ofed/libibverbs.git mlnx_ofed_3_2_2
commit 217a77686f4861229f0e4b94485a13f024634caf
libmlx4:
mlnx_ofed/libmlx4.git mlnx_ofed_3_2_2
commit dda6d7ae1e6e3779a485ebdd0a882f4bcbd027a6
libmlx5:
mlnx_ofed/libmlx5.git mlnx_ofed_3_2_2
commit d0c8645359e0f0aba0408b2d344f3b418d27019b
libopensmssa:
mlnx_ofed_ssa/plugin/libopensmssa-0.0.9.3.MLNX20151203.50eb579.tar.gz

librdmacm:
mlnx_ofed/librdmacm.git mlnx_ofed_3_2_2
commit 6bd430fed9e7b3d57a1876c040431ce7295c7703
libsdp:
libsdp.git mlnx_ofed_3_0
commit fbd01dfff05f42d6b82506e7dbf4bc6b7e6a59a4
libvma:
vma/source_rpms//libvma-7.0.14-0.src.rpm

mlnx-ethtool:
upstream/ethtool.git for-upstream
commit ac0cf295abe0c0832f0711fed66ab9601c8b2513
mlnx-ofa_kernel:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d

mpi-selector:
ofed-1.5.3-rpms/mpi-selector/mpi-selector-1.0.3-1.src.rpm

mpitests:
mlnx_ofed_mpitest/mpitests-3.2.17-e1c7f2f.src.rpm

mstflint:
mlnx_ofed_mstflint/mstflint-4.3.0-1.49.g9b9af70.tar.gz

multiperf:
mlnx_ofed_multiperf/multiperf-3.0-0.10.gda89e8c.tar.gz

mvapich2:
mlnx_ofed_mvapich2/mvapich2-2.2a-1.src.rpm

mxm:
mlnx_ofed_mxm/mxm-3.4.3079-1.src.rpm

ofed-docs:
docs.git mlnx_ofed-3.2
commit ea3386416f9f7130edd2c70fc3424cb2cda50f7d

openmpi:
mlnx_ofed_ompi_1.8/openmpi-1.10.3a1-1.src.rpm

opensm:
mlnx_ofed_opensm/opensm-4.6.1.MLNX20160112.774e977.tar.gz

perftest:
mlnx_ofed_perftest/perftest-3.0-0.18.gb464d59.tar.gz

qperf:
mlnx_ofed_qperf/qperf-0.4.9.tar.gz

rds-tools:
rds-tools.git mlnx_ofed_2_4
commit 299420ca25cf9996bc0748e3bc4b08748996ba49
sdpnetstat:
sdpnetstat.git mlnx_ofed_3_0
commit 3cf409a7cc07e5c71f9640eddbb801ece21b4169
sockperf:
sockperf/sockperf-2.7-43.git3ee62bd8107a.src.rpm

srp:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d

srptools:
srptools/srptools-1.0.2-12.src.rpm


Installed Packages:
-------------------
infiniband-diags
librdmacm
libmlx4
libibverbs-utils
mpi-selector
libibmad-devel
sdpnetstat
knem
libibumad-devel
libsdp
mlnx-ethtool
libibverbs-debuginfo
mlnx-ofa_kernel-modules
srp
opensm
mstflint
cc_mgr
libibmad
libibverbs
kernel-mft
libibverbs-devel-static
libibumad
librdmacm-devel
mlnx-ofa_kernel
libsdp-devel
ibutils2
mlnxofed-docs
libibmad-static
iser
opensm-devel
dump_pr
libibumad-static
rds-tools
libmlx4-debuginfo
mlnx-ofa_kernel-devel
opensm-libs
opensm-static
ar_mgr
dapl-devel-static
infiniband-diags-compat
libibverbs-devel
ibsim

[root(a)slave3 fio]# ibstat
CA 'mlx4_0'
        CA type: MT4103
        Number of ports: 2
        Firmware version: 2.35.5100
        Hardware version: 0
        Node GUID: 0x248a0703006090e0
        System image GUID: 0x248a0703006090e0
        Port 1:
                State: Active
                Physical state: LinkUp
                Rate: 40
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x04010000
                Port GUID: 0x268a07fffe6090e0
                Link layer: Ethernet
        Port 2:
                State: Active
                Physical state: LinkUp
                Rate: 40
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x04010000
                Port GUID: 0x268a07fffe6090e1
                Link layer: Ethernet

[root(a)slave3 fio]# ibv_devinfo
hca_id: mlx4_0
        transport:                      InfiniBand (0)
        fw_ver:                         2.35.5100
        node_guid:                      248a:0703:0060:90e0
        sys_image_guid:                 248a:0703:0060:90e0
        vendor_id:                      0x02c9
        vendor_part_id:                 4103
        hw_ver:                         0x0
        board_id:                       MT_1090111023
        phys_port_cnt:                  2
                port:   1
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                4096 (5)
                        active_mtu:             1024 (3)
                        sm_lid:                 0
                        port_lid:               0
                        port_lmc:               0x00
                        link_layer:             Ethernet

                port:   2
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                4096 (5)
                        active_mtu:             1024 (3)
                        sm_lid:                 0
                        port_lid:               0
                        port_lmc:               0x00
                        link_layer:             Ethernet

From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Thursday, October 19, 2017 12:34 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Do you install Mellanox OFED on the Target and Client server?
Can you run ibstat on both servers?
Thanks
Victor
________________________________
From: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Sent: Wednesday, October 18, 2017 8:59:28 PM
To: Victor Banh; Storage Performance Development Kit; Harris, James R
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Victor,

I've just tried the SPDK v17.07.1 and DPDK v17.08.

nvme version: 1.1.38.gfaab
fio version: 3.1

Tried the 512k and 1024k IO size and there is no error. demsg information as following.

So that there may be other difference here? Looks like you are using ConnectX-5 while I am using ConnectX-4?

Other related information:

[root(a)node4 fio]# lspci | grep -i mell
81:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
81:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]

[root(a)node4 fio]# lsmod | grep -i mlx
mlx5_ib               172032  0
ib_core               200704  15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,mlx5_ib,ib_srpt,ib_ucm,rdma_ucm
mlx5_core             380928  1 mlx5_ib
ptp                    20480  3 ixgbe,igb,mlx5_core

[root(a)node4 fio]# uname -a
Linux node4 4.10.1 #1 SMP Fri Mar 10 15:59:57 CST 2017 x86_64 x86_64 x86_64 GNU/Linux

[577707.543326] nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 192.168.100.8:4420
[577730.854540] detected loopback device
[577730.893761] nvme nvme0: creating 7 I/O queues.
[577730.893797] detected loopback device
[577730.898611] detected loopback device
[577730.908917] detected loopback device
[577730.919073] detected loopback device
[577730.928922] detected loopback device
[577730.938679] detected loopback device
[577730.948365] detected loopback device
[577731.146290] nvme nvme0: new ctrl: NQN "nqn.2016-06.io.spdk:cnode2", addr 192.168.100.8:4420

Thanks,
Gang

From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Thursday, October 19, 2017 9:43 AM
To: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Gang
Any update?
Do you see any error message from "dmesg" with 512k block size running fio?
Thanks
Victor

From: Victor Banh
Sent: Tuesday, October 17, 2017 7:37 PM
To: 'Cao, Gang' <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Gang
spdk-17.07.1 and dpdk-17.08
Thanks
Victor

From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Monday, October 16, 2017 8:51 PM
To: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Victor,

Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?

Thanks,
Gang

From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Tuesday, October 17, 2017 5:30 AM
To: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Cao
Do you see any message from dmesg?

I tried this fio version and still saw these error message from dmesg.

fio-3.1

[869053.218235] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218250] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218259] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218263] ldm_validate_partition_table(): Disk read failed.
[869053.218269] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218277] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218285] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218292] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218296] Dev nvme2n1: unable to read RDB block 0
[869053.218303] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218311] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218323] Buffer I/O error on dev nvme2n1, logical block 3, async page read
[869053.218338]  nvme2n1: unable to read partition table
[869053.246126] nvme2n1: detected capacity change from -62111005559226368 to -62042256479092736
[869053.246195] ldm_validate_partition_table(): Disk read failed.
[869053.246217] Dev nvme2n1: unable to read RDB block 0

From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Monday, October 09, 2017 10:59 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Cc: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Victor,

Thanks for your detailed information on the testing.

I've tried the latest SPDK code and with latest fio-3.1-20-g132b and fio-2.19. It seems like no this kind of error.

Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?

fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme0n1  --name=read-phase --rw=randwrite
read-phase: (g=0): rw=randwrite, bs=(R) 512KiB-512KiB, (W) 512KiB-512KiB, (T) 512KiB-512KiB, ioengine=libaio, iodepth=16
...
fio-3.1-20-g132b
Starting 4 processes
Jobs: 4 (f=4): [w(4)][100.0%][r=0KiB/s,w=1592MiB/s][r=0,w=3183 IOPS][eta 00m:00s]
read-phase: (groupid=0, jobs=1): err= 0: pid=46378: Tue Oct 10 01:23:39 2017

My NIC information:
[root(a)node4 nvme-cli-gerrit]# lsmod | grep -i mlx
mlx5_ib               172032  0
ib_core               200704  15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,mlx5_ib,ib_srpt,ib_ucm,rdma_ucm
mlx5_core             380928  1 mlx5_ib
ptp                    20480  3 ixgbe,igb,mlx5_core
[root(a)node4 nvme-cli-gerrit]# lspci | grep -i mell
81:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
81:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]

From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Victor Banh
Sent: Friday, October 6, 2017 2:41 PM
To: Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio



From: Harris, James R [mailto:james.r.harris(a)intel.com]
Sent: Friday, October 06, 2017 2:32 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Cc: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio

(cc Victor)

From: James Harris <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Date: Thursday, October 5, 2017 at 1:59 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio

Hi Victor,

Could you provide a few more details?  This will help the list to provide some ideas.


1)      On the client, are you using the SPDK NVMe-oF initiator or the kernel initiator?


Kernel initiator, run these commands on client server.



modprobe mlx5_ib

modprobe nvme-rdma

nvme discover -t rdma -a 192.168.10.11 -s 4420

nvme connect -t rdma -n nqn.2016-06.io.spdk:nvme-subsystem-1  -a 192.168.10.11 -s 4420




2)      Can you provide the fio configuration file or command line?  Just so we can have more specifics on "bigger block size".


fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme1n1  --name=read-phase --rw=randwrite


3)      Any details on the HW setup - specifically details on the RDMA NIC (or if you're using SW RoCE).

Nvmf.conf on target server

[Global]
  Comment "Global section"
    ReactorMask 0xff00

[Rpc]
  Enable No
  Listen 127.0.0.1

[Nvmf]
  MaxQueuesPerSession 8
  MaxQueueDepth 128

[Subsystem1]
  NQN nqn.2016-06.io.spdk:nvme-subsystem-1
  Core 9
  Mode Direct
  Listen RDMA 192.168.10.11:4420
  NVMe 0000:82:00.0
  SN S2PMNAAH400039


           It is RDMA NIC, ConnectX 5, Intel CPU Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
NUMA node0 CPU(s):     0-7
NUMA node1 CPU(s):     8-15




Thanks,

-Jim


From: SPDK <spdk-bounces(a)lists.01.org<mailto:spdk-bounces(a)lists.01.org>> on behalf of Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Date: Thursday, October 5, 2017 at 11:26 AM
To: "spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>" <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: [SPDK] Buffer I/O error on bigger block size running fio

Hi
I have SPDK NVMeoF and keep getting error with bigger block size with fio on randwrite tests.
I am using Ubuntu 16.04 with kernel version 4.12.0-041200-generic on target and client.
The DPDK is 17.08 and SPDK is 17.07.1.
Thanks
Victor


[46905.233553] perf: interrupt took too long (2503 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
[48285.159186] blk_update_request: I/O error, dev nvme1n1, sector 2507351968
[48285.159207] blk_update_request: I/O error, dev nvme1n1, sector 1301294496
[48285.159226] blk_update_request: I/O error, dev nvme1n1, sector 1947371168
[48285.159239] blk_update_request: I/O error, dev nvme1n1, sector 1891797568
[48285.159252] blk_update_request: I/O error, dev nvme1n1, sector 10833824
[48285.159265] blk_update_request: I/O error, dev nvme1n1, sector 614937152
[48285.159277] blk_update_request: I/O error, dev nvme1n1, sector 1872305088
[48285.159290] blk_update_request: I/O error, dev nvme1n1, sector 1504491040
[48285.159299] blk_update_request: I/O error, dev nvme1n1, sector 1182136128
[48285.159308] blk_update_request: I/O error, dev nvme1n1, sector 1662985792
[48285.191185] nvme nvme1: Reconnecting in 10 seconds...
[48285.191254] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191291] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191305] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191314] ldm_validate_partition_table(): Disk read failed.
[48285.191320] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191327] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191335] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191342] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191347] Dev nvme1n1: unable to read RDB block 0
[48285.191353] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191360] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191375] Buffer I/O error on dev nvme1n1, logical block 3, async page read
[48285.191389]  nvme1n1: unable to read partition table
[48285.223197] nvme1n1: detected capacity change from 1600321314816 to 0
[48289.623192] nvme1n1: detected capacity change from 0 to -65647705833078784
[48289.623411] ldm_validate_partition_table(): Disk read failed.
[48289.623447] Dev nvme1n1: unable to read RDB block 0
[48289.623486]  nvme1n1: unable to read partition table
[48289.643305] ldm_validate_partition_table(): Disk read failed.
[48289.643328] Dev nvme1n1: unable to read RDB block 0
[48289.643373]  nvme1n1: unable to read partition table





[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 117117 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [SPDK] Buffer I/O error on bigger block size running fio
@ 2017-10-19  7:16 Victor Banh
  0 siblings, 0 replies; 24+ messages in thread
From: Victor Banh @ 2017-10-19  7:16 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 22406 bytes --]

I am using Ubuntu 16.04 and kernel 4.12.X.
________________________________
From: Cao, Gang <gang.cao(a)intel.com>
Sent: Wednesday, October 18, 2017 11:51:39 PM
To: Victor Banh; Storage Performance Development Kit; Harris, James R
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

[root(a)node4 gangcao]# uname -a
Linux node4 4.10.1 #1 SMP Fri Mar 10 15:59:57 CST 2017 x86_64 x86_64 x86_64 GNU/Linux

[root(a)node4 gangcao]# lsb_release -a
LSB Version:    :core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch
Distributor ID: CentOS
Description:    CentOS Linux release 7.2.1511 (Core)
Release:        7.2.1511
Codename:       Core

From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Thursday, October 19, 2017 2:14 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org>; Harris, James R <james.r.harris(a)intel.com>; Cao, Gang <gang.cao(a)intel.com>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Can you give the OS version and kernel version again for target and client?
I couldn't compile dkdp without installed the latest Kernel on 4.12.X kernel on Ubuntu 16.04.
Thanks
Victor
________________________________
From: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Sent: Wednesday, October 18, 2017 10:59:19 PM
To: Victor Banh; Storage Performance Development Kit; Harris, James R
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

This server does not have OFED installed and it is in the loopback mode.

Found another server also in the loopback mode with ConnectX-3 and have OFED installed.

By the way, what SSD are you using? Maybe it is relating to the SSD? I’ve just run with 2048k for a short duration and seems no issue. Will run more time to see whether can hit this error.

[root(a)slave3 fio]# lspci | grep -i mell
08:00.0 Ethernet controller: Mellanox Technologies MT27520 Family [ConnectX-3 Pro]

[root(a)slave3 fio]# lsmod | grep -i mlx
mlx4_ib               159744  0
ib_core               208896  15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,ib_srpt,ib_ucm,rdma_ucm,mlx4_ib
mlx4_en               114688  0
mlx4_core             307200  2 mlx4_en,mlx4_ib
ptp                    20480  3 ixgbe,igb,mlx4_en

[root(a)slave3 fio]# ofed_info
MLNX_OFED_LINUX-3.2-2.0.0.0 (OFED-3.2-2.0.0):

ar_mgr:
osm_plugins/ar_mgr/ar_mgr-1.0-0.30.ga1ea4b7.tar.gz

cc_mgr:
osm_plugins/cc_mgr/cc_mgr-1.0-0.29.ga1ea4b7.tar.gz

dapl:
dapl.git mlnx_ofed_3_1
commit c30fb6ce2cbc29d8ed4bde51437f7abb93378c78
dump_pr:
osm_plugins/dump_pr//dump_pr-1.0-0.25.ga1ea4b7.tar.gz

fabric-collector:
fabric_collector//fabric-collector-1.1.0.MLNX20140410.51b267e.tar.gz

fca:
mlnx_ofed_fca/fca-2.5.2431-1.src.rpm

hcoll:
mlnx_ofed_hcol/hcoll-3.4.807-1.src.rpm

ibacm:
mlnx_ofed/ibacm.git mlnx_ofed_3_2
commit 15ad8c13bdebbe62edea0b7df030710b65c14f7f
ibacm_ssa:
mlnx_ofed_ssa/acm/ibacm_ssa-0.0.9.3.MLNX20151203.50eb579.tar.gz

ibdump:
sniffer/sniffer-4.0.0-2/ibdump/linux/ibdump-4.0.0-2.tgz

ibsim:
mlnx_ofed_ibsim/ibsim-0.6-0.8.g9d76581.tar.gz

ibssa:
mlnx_ofed_ssa/distrib/ibssa-0.0.9.3.MLNX20151203.50eb579.tar.gz

ibutils:
ofed-1.5.3-rpms/ibutils/ibutils-1.5.7.1-0.12.gdcaeae2.tar.gz

ibutils2:
ibutils2/ibutils2-2.1.1-0.76.MLNX20160222.gd366c7b.tar.gz

infiniband-diags:
mlnx_ofed_infiniband_diags/infiniband-diags-1.6.6.MLNX20151130.7f0213e.tar.gz

infinipath-psm:
mlnx_ofed_infinipath-psm/infinipath-psm-3.3-2_g6f42cdb_open.tar.gz

iser:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d

isert:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d

kernel-mft:
mlnx_ofed_mft/kernel-mft-4.3.0-25.src.rpm

knem:
knem.git mellanox-master
commit f143ee19a575cd42a334422fa8bd329d671238db
libibcm:
mlnx_ofed/libibcm.git mlnx_ofed_3_0
commit d7d485df305e6536711485bd7e477668e77d8320
libibmad:
mlnx_ofed_libibmad/libibmad-1.3.12.MLNX20151122.d140cb1.tar.gz

libibprof:
mlnx_ofed_libibprof/libibprof-1.1.22-1.src.rpm

libibumad:
mlnx_ofed_libibumad/libibumad-1.3.10.2.MLNX20150406.966500d.tar.gz

libibverbs:
mlnx_ofed/libibverbs.git mlnx_ofed_3_2_2
commit 217a77686f4861229f0e4b94485a13f024634caf
libmlx4:
mlnx_ofed/libmlx4.git mlnx_ofed_3_2_2
commit dda6d7ae1e6e3779a485ebdd0a882f4bcbd027a6
libmlx5:
mlnx_ofed/libmlx5.git mlnx_ofed_3_2_2
commit d0c8645359e0f0aba0408b2d344f3b418d27019b
libopensmssa:
mlnx_ofed_ssa/plugin/libopensmssa-0.0.9.3.MLNX20151203.50eb579.tar.gz

librdmacm:
mlnx_ofed/librdmacm.git mlnx_ofed_3_2_2
commit 6bd430fed9e7b3d57a1876c040431ce7295c7703
libsdp:
libsdp.git mlnx_ofed_3_0
commit fbd01dfff05f42d6b82506e7dbf4bc6b7e6a59a4
libvma:
vma/source_rpms//libvma-7.0.14-0.src.rpm

mlnx-ethtool:
upstream/ethtool.git for-upstream
commit ac0cf295abe0c0832f0711fed66ab9601c8b2513
mlnx-ofa_kernel:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d

mpi-selector:
ofed-1.5.3-rpms/mpi-selector/mpi-selector-1.0.3-1.src.rpm

mpitests:
mlnx_ofed_mpitest/mpitests-3.2.17-e1c7f2f.src.rpm

mstflint:
mlnx_ofed_mstflint/mstflint-4.3.0-1.49.g9b9af70.tar.gz

multiperf:
mlnx_ofed_multiperf/multiperf-3.0-0.10.gda89e8c.tar.gz

mvapich2:
mlnx_ofed_mvapich2/mvapich2-2.2a-1.src.rpm

mxm:
mlnx_ofed_mxm/mxm-3.4.3079-1.src.rpm

ofed-docs:
docs.git mlnx_ofed-3.2
commit ea3386416f9f7130edd2c70fc3424cb2cda50f7d

openmpi:
mlnx_ofed_ompi_1.8/openmpi-1.10.3a1-1.src.rpm

opensm:
mlnx_ofed_opensm/opensm-4.6.1.MLNX20160112.774e977.tar.gz

perftest:
mlnx_ofed_perftest/perftest-3.0-0.18.gb464d59.tar.gz

qperf:
mlnx_ofed_qperf/qperf-0.4.9.tar.gz

rds-tools:
rds-tools.git mlnx_ofed_2_4
commit 299420ca25cf9996bc0748e3bc4b08748996ba49
sdpnetstat:
sdpnetstat.git mlnx_ofed_3_0
commit 3cf409a7cc07e5c71f9640eddbb801ece21b4169
sockperf:
sockperf/sockperf-2.7-43.git3ee62bd8107a.src.rpm

srp:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d

srptools:
srptools/srptools-1.0.2-12.src.rpm


Installed Packages:
-------------------
infiniband-diags
librdmacm
libmlx4
libibverbs-utils
mpi-selector
libibmad-devel
sdpnetstat
knem
libibumad-devel
libsdp
mlnx-ethtool
libibverbs-debuginfo
mlnx-ofa_kernel-modules
srp
opensm
mstflint
cc_mgr
libibmad
libibverbs
kernel-mft
libibverbs-devel-static
libibumad
librdmacm-devel
mlnx-ofa_kernel
libsdp-devel
ibutils2
mlnxofed-docs
libibmad-static
iser
opensm-devel
dump_pr
libibumad-static
rds-tools
libmlx4-debuginfo
mlnx-ofa_kernel-devel
opensm-libs
opensm-static
ar_mgr
dapl-devel-static
infiniband-diags-compat
libibverbs-devel
ibsim

[root(a)slave3 fio]# ibstat
CA 'mlx4_0'
        CA type: MT4103
        Number of ports: 2
        Firmware version: 2.35.5100
        Hardware version: 0
        Node GUID: 0x248a0703006090e0
        System image GUID: 0x248a0703006090e0
        Port 1:
                State: Active
                Physical state: LinkUp
                Rate: 40
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x04010000
                Port GUID: 0x268a07fffe6090e0
                Link layer: Ethernet
        Port 2:
                State: Active
                Physical state: LinkUp
                Rate: 40
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x04010000
                Port GUID: 0x268a07fffe6090e1
                Link layer: Ethernet

[root(a)slave3 fio]# ibv_devinfo
hca_id: mlx4_0
        transport:                      InfiniBand (0)
        fw_ver:                         2.35.5100
        node_guid:                      248a:0703:0060:90e0
        sys_image_guid:                 248a:0703:0060:90e0
        vendor_id:                      0x02c9
        vendor_part_id:                 4103
        hw_ver:                         0x0
        board_id:                       MT_1090111023
        phys_port_cnt:                  2
                port:   1
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                4096 (5)
                        active_mtu:             1024 (3)
                        sm_lid:                 0
                        port_lid:               0
                        port_lmc:               0x00
                        link_layer:             Ethernet

                port:   2
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                4096 (5)
                        active_mtu:             1024 (3)
                        sm_lid:                 0
                        port_lid:               0
                        port_lmc:               0x00
                        link_layer:             Ethernet

From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Thursday, October 19, 2017 12:34 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Do you install Mellanox OFED on the Target and Client server?
Can you run ibstat on both servers?
Thanks
Victor
________________________________
From: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Sent: Wednesday, October 18, 2017 8:59:28 PM
To: Victor Banh; Storage Performance Development Kit; Harris, James R
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Victor,

I’ve just tried the SPDK v17.07.1 and DPDK v17.08.

nvme version: 1.1.38.gfaab
fio version: 3.1

Tried the 512k and 1024k IO size and there is no error. demsg information as following.

So that there may be other difference here? Looks like you are using ConnectX-5 while I am using ConnectX-4?

Other related information:

[root(a)node4 fio]# lspci | grep -i mell
81:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
81:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]

[root(a)node4 fio]# lsmod | grep -i mlx
mlx5_ib               172032  0
ib_core               200704  15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,mlx5_ib,ib_srpt,ib_ucm,rdma_ucm
mlx5_core             380928  1 mlx5_ib
ptp                    20480  3 ixgbe,igb,mlx5_core

[root(a)node4 fio]# uname -a
Linux node4 4.10.1 #1 SMP Fri Mar 10 15:59:57 CST 2017 x86_64 x86_64 x86_64 GNU/Linux

[577707.543326] nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 192.168.100.8:4420
[577730.854540] detected loopback device
[577730.893761] nvme nvme0: creating 7 I/O queues.
[577730.893797] detected loopback device
[577730.898611] detected loopback device
[577730.908917] detected loopback device
[577730.919073] detected loopback device
[577730.928922] detected loopback device
[577730.938679] detected loopback device
[577730.948365] detected loopback device
[577731.146290] nvme nvme0: new ctrl: NQN "nqn.2016-06.io.spdk:cnode2", addr 192.168.100.8:4420

Thanks,
Gang

From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Thursday, October 19, 2017 9:43 AM
To: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Gang
Any update?
Do you see any error message from “dmesg” with 512k block size running fio?
Thanks
Victor

From: Victor Banh
Sent: Tuesday, October 17, 2017 7:37 PM
To: 'Cao, Gang' <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Gang
spdk-17.07.1 and dpdk-17.08
Thanks
Victor

From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Monday, October 16, 2017 8:51 PM
To: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Victor,

Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?

Thanks,
Gang

From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Tuesday, October 17, 2017 5:30 AM
To: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Cao
Do you see any message from dmesg?

I tried this fio version and still saw these error message from dmesg.

fio-3.1

[869053.218235] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218250] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218259] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218263] ldm_validate_partition_table(): Disk read failed.
[869053.218269] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218277] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218285] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218292] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218296] Dev nvme2n1: unable to read RDB block 0
[869053.218303] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218311] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218323] Buffer I/O error on dev nvme2n1, logical block 3, async page read
[869053.218338]  nvme2n1: unable to read partition table
[869053.246126] nvme2n1: detected capacity change from -62111005559226368 to -62042256479092736
[869053.246195] ldm_validate_partition_table(): Disk read failed.
[869053.246217] Dev nvme2n1: unable to read RDB block 0

From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Monday, October 09, 2017 10:59 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Cc: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Victor,

Thanks for your detailed information on the testing.

I’ve tried the latest SPDK code and with latest fio-3.1-20-g132b and fio-2.19. It seems like no this kind of error.

Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?

fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme0n1  --name=read-phase --rw=randwrite
read-phase: (g=0): rw=randwrite, bs=(R) 512KiB-512KiB, (W) 512KiB-512KiB, (T) 512KiB-512KiB, ioengine=libaio, iodepth=16
...
fio-3.1-20-g132b
Starting 4 processes
Jobs: 4 (f=4): [w(4)][100.0%][r=0KiB/s,w=1592MiB/s][r=0,w=3183 IOPS][eta 00m:00s]
read-phase: (groupid=0, jobs=1): err= 0: pid=46378: Tue Oct 10 01:23:39 2017

My NIC information:
[root(a)node4 nvme-cli-gerrit]# lsmod | grep -i mlx
mlx5_ib               172032  0
ib_core               200704  15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,mlx5_ib,ib_srpt,ib_ucm,rdma_ucm
mlx5_core             380928  1 mlx5_ib
ptp                    20480  3 ixgbe,igb,mlx5_core
[root(a)node4 nvme-cli-gerrit]# lspci | grep -i mell
81:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
81:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]

From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Victor Banh
Sent: Friday, October 6, 2017 2:41 PM
To: Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio



From: Harris, James R [mailto:james.r.harris(a)intel.com]
Sent: Friday, October 06, 2017 2:32 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Cc: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio

(cc Victor)

From: James Harris <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Date: Thursday, October 5, 2017 at 1:59 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio

Hi Victor,

Could you provide a few more details?  This will help the list to provide some ideas.


1)      On the client, are you using the SPDK NVMe-oF initiator or the kernel initiator?


Kernel initiator, run these commands on client server.



modprobe mlx5_ib

modprobe nvme-rdma

nvme discover -t rdma -a 192.168.10.11 -s 4420

nvme connect -t rdma -n nqn.2016-06.io.spdk:nvme-subsystem-1  -a 192.168.10.11 -s 4420




2)      Can you provide the fio configuration file or command line?  Just so we can have more specifics on “bigger block size”.


fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme1n1  --name=read-phase --rw=randwrite


3)      Any details on the HW setup – specifically details on the RDMA NIC (or if you’re using SW RoCE).

Nvmf.conf on target server

[Global]
  Comment "Global section"
    ReactorMask 0xff00

[Rpc]
  Enable No
  Listen 127.0.0.1

[Nvmf]
  MaxQueuesPerSession 8
  MaxQueueDepth 128

[Subsystem1]
  NQN nqn.2016-06.io.spdk:nvme-subsystem-1
  Core 9
  Mode Direct
  Listen RDMA 192.168.10.11:4420
  NVMe 0000:82:00.0
  SN S2PMNAAH400039


           It is RDMA NIC, ConnectX 5, Intel CPU Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
NUMA node0 CPU(s):     0-7
NUMA node1 CPU(s):     8-15




Thanks,

-Jim


From: SPDK <spdk-bounces(a)lists.01.org<mailto:spdk-bounces(a)lists.01.org>> on behalf of Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Date: Thursday, October 5, 2017 at 11:26 AM
To: "spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>" <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: [SPDK] Buffer I/O error on bigger block size running fio

Hi
I have SPDK NVMeoF and keep getting error with bigger block size with fio on randwrite tests.
I am using Ubuntu 16.04 with kernel version 4.12.0-041200-generic on target and client.
The DPDK is 17.08 and SPDK is 17.07.1.
Thanks
Victor


[46905.233553] perf: interrupt took too long (2503 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
[48285.159186] blk_update_request: I/O error, dev nvme1n1, sector 2507351968
[48285.159207] blk_update_request: I/O error, dev nvme1n1, sector 1301294496
[48285.159226] blk_update_request: I/O error, dev nvme1n1, sector 1947371168
[48285.159239] blk_update_request: I/O error, dev nvme1n1, sector 1891797568
[48285.159252] blk_update_request: I/O error, dev nvme1n1, sector 10833824
[48285.159265] blk_update_request: I/O error, dev nvme1n1, sector 614937152
[48285.159277] blk_update_request: I/O error, dev nvme1n1, sector 1872305088
[48285.159290] blk_update_request: I/O error, dev nvme1n1, sector 1504491040
[48285.159299] blk_update_request: I/O error, dev nvme1n1, sector 1182136128
[48285.159308] blk_update_request: I/O error, dev nvme1n1, sector 1662985792
[48285.191185] nvme nvme1: Reconnecting in 10 seconds...
[48285.191254] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191291] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191305] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191314] ldm_validate_partition_table(): Disk read failed.
[48285.191320] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191327] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191335] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191342] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191347] Dev nvme1n1: unable to read RDB block 0
[48285.191353] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191360] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191375] Buffer I/O error on dev nvme1n1, logical block 3, async page read
[48285.191389]  nvme1n1: unable to read partition table
[48285.223197] nvme1n1: detected capacity change from 1600321314816 to 0
[48289.623192] nvme1n1: detected capacity change from 0 to -65647705833078784
[48289.623411] ldm_validate_partition_table(): Disk read failed.
[48289.623447] Dev nvme1n1: unable to read RDB block 0
[48289.623486]  nvme1n1: unable to read partition table
[48289.643305] ldm_validate_partition_table(): Disk read failed.
[48289.643328] Dev nvme1n1: unable to read RDB block 0
[48289.643373]  nvme1n1: unable to read partition table





[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 91642 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [SPDK] Buffer I/O error on bigger block size running fio
@ 2017-10-19  6:51 Cao, Gang
  0 siblings, 0 replies; 24+ messages in thread
From: Cao, Gang @ 2017-10-19  6:51 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 22079 bytes --]

[root(a)node4 gangcao]# uname -a
Linux node4 4.10.1 #1 SMP Fri Mar 10 15:59:57 CST 2017 x86_64 x86_64 x86_64 GNU/Linux

[root(a)node4 gangcao]# lsb_release -a
LSB Version:    :core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch
Distributor ID: CentOS
Description:    CentOS Linux release 7.2.1511 (Core)
Release:        7.2.1511
Codename:       Core

From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Thursday, October 19, 2017 2:14 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org>; Harris, James R <james.r.harris(a)intel.com>; Cao, Gang <gang.cao(a)intel.com>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Can you give the OS version and kernel version again for target and client?
I couldn't compile dkdp without installed the latest Kernel on 4.12.X kernel on Ubuntu 16.04.
Thanks
Victor
________________________________
From: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Sent: Wednesday, October 18, 2017 10:59:19 PM
To: Victor Banh; Storage Performance Development Kit; Harris, James R
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

This server does not have OFED installed and it is in the loopback mode.

Found another server also in the loopback mode with ConnectX-3 and have OFED installed.

By the way, what SSD are you using? Maybe it is relating to the SSD? I've just run with 2048k for a short duration and seems no issue. Will run more time to see whether can hit this error.

[root(a)slave3 fio]# lspci | grep -i mell
08:00.0 Ethernet controller: Mellanox Technologies MT27520 Family [ConnectX-3 Pro]

[root(a)slave3 fio]# lsmod | grep -i mlx
mlx4_ib               159744  0
ib_core               208896  15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,ib_srpt,ib_ucm,rdma_ucm,mlx4_ib
mlx4_en               114688  0
mlx4_core             307200  2 mlx4_en,mlx4_ib
ptp                    20480  3 ixgbe,igb,mlx4_en

[root(a)slave3 fio]# ofed_info
MLNX_OFED_LINUX-3.2-2.0.0.0 (OFED-3.2-2.0.0):

ar_mgr:
osm_plugins/ar_mgr/ar_mgr-1.0-0.30.ga1ea4b7.tar.gz

cc_mgr:
osm_plugins/cc_mgr/cc_mgr-1.0-0.29.ga1ea4b7.tar.gz

dapl:
dapl.git mlnx_ofed_3_1
commit c30fb6ce2cbc29d8ed4bde51437f7abb93378c78
dump_pr:
osm_plugins/dump_pr//dump_pr-1.0-0.25.ga1ea4b7.tar.gz

fabric-collector:
fabric_collector//fabric-collector-1.1.0.MLNX20140410.51b267e.tar.gz

fca:
mlnx_ofed_fca/fca-2.5.2431-1.src.rpm

hcoll:
mlnx_ofed_hcol/hcoll-3.4.807-1.src.rpm

ibacm:
mlnx_ofed/ibacm.git mlnx_ofed_3_2
commit 15ad8c13bdebbe62edea0b7df030710b65c14f7f
ibacm_ssa:
mlnx_ofed_ssa/acm/ibacm_ssa-0.0.9.3.MLNX20151203.50eb579.tar.gz

ibdump:
sniffer/sniffer-4.0.0-2/ibdump/linux/ibdump-4.0.0-2.tgz

ibsim:
mlnx_ofed_ibsim/ibsim-0.6-0.8.g9d76581.tar.gz

ibssa:
mlnx_ofed_ssa/distrib/ibssa-0.0.9.3.MLNX20151203.50eb579.tar.gz

ibutils:
ofed-1.5.3-rpms/ibutils/ibutils-1.5.7.1-0.12.gdcaeae2.tar.gz

ibutils2:
ibutils2/ibutils2-2.1.1-0.76.MLNX20160222.gd366c7b.tar.gz

infiniband-diags:
mlnx_ofed_infiniband_diags/infiniband-diags-1.6.6.MLNX20151130.7f0213e.tar.gz

infinipath-psm:
mlnx_ofed_infinipath-psm/infinipath-psm-3.3-2_g6f42cdb_open.tar.gz

iser:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d

isert:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d

kernel-mft:
mlnx_ofed_mft/kernel-mft-4.3.0-25.src.rpm

knem:
knem.git mellanox-master
commit f143ee19a575cd42a334422fa8bd329d671238db
libibcm:
mlnx_ofed/libibcm.git mlnx_ofed_3_0
commit d7d485df305e6536711485bd7e477668e77d8320
libibmad:
mlnx_ofed_libibmad/libibmad-1.3.12.MLNX20151122.d140cb1.tar.gz

libibprof:
mlnx_ofed_libibprof/libibprof-1.1.22-1.src.rpm

libibumad:
mlnx_ofed_libibumad/libibumad-1.3.10.2.MLNX20150406.966500d.tar.gz

libibverbs:
mlnx_ofed/libibverbs.git mlnx_ofed_3_2_2
commit 217a77686f4861229f0e4b94485a13f024634caf
libmlx4:
mlnx_ofed/libmlx4.git mlnx_ofed_3_2_2
commit dda6d7ae1e6e3779a485ebdd0a882f4bcbd027a6
libmlx5:
mlnx_ofed/libmlx5.git mlnx_ofed_3_2_2
commit d0c8645359e0f0aba0408b2d344f3b418d27019b
libopensmssa:
mlnx_ofed_ssa/plugin/libopensmssa-0.0.9.3.MLNX20151203.50eb579.tar.gz

librdmacm:
mlnx_ofed/librdmacm.git mlnx_ofed_3_2_2
commit 6bd430fed9e7b3d57a1876c040431ce7295c7703
libsdp:
libsdp.git mlnx_ofed_3_0
commit fbd01dfff05f42d6b82506e7dbf4bc6b7e6a59a4
libvma:
vma/source_rpms//libvma-7.0.14-0.src.rpm

mlnx-ethtool:
upstream/ethtool.git for-upstream
commit ac0cf295abe0c0832f0711fed66ab9601c8b2513
mlnx-ofa_kernel:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d

mpi-selector:
ofed-1.5.3-rpms/mpi-selector/mpi-selector-1.0.3-1.src.rpm

mpitests:
mlnx_ofed_mpitest/mpitests-3.2.17-e1c7f2f.src.rpm

mstflint:
mlnx_ofed_mstflint/mstflint-4.3.0-1.49.g9b9af70.tar.gz

multiperf:
mlnx_ofed_multiperf/multiperf-3.0-0.10.gda89e8c.tar.gz

mvapich2:
mlnx_ofed_mvapich2/mvapich2-2.2a-1.src.rpm

mxm:
mlnx_ofed_mxm/mxm-3.4.3079-1.src.rpm

ofed-docs:
docs.git mlnx_ofed-3.2
commit ea3386416f9f7130edd2c70fc3424cb2cda50f7d

openmpi:
mlnx_ofed_ompi_1.8/openmpi-1.10.3a1-1.src.rpm

opensm:
mlnx_ofed_opensm/opensm-4.6.1.MLNX20160112.774e977.tar.gz

perftest:
mlnx_ofed_perftest/perftest-3.0-0.18.gb464d59.tar.gz

qperf:
mlnx_ofed_qperf/qperf-0.4.9.tar.gz

rds-tools:
rds-tools.git mlnx_ofed_2_4
commit 299420ca25cf9996bc0748e3bc4b08748996ba49
sdpnetstat:
sdpnetstat.git mlnx_ofed_3_0
commit 3cf409a7cc07e5c71f9640eddbb801ece21b4169
sockperf:
sockperf/sockperf-2.7-43.git3ee62bd8107a.src.rpm

srp:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d

srptools:
srptools/srptools-1.0.2-12.src.rpm


Installed Packages:
-------------------
infiniband-diags
librdmacm
libmlx4
libibverbs-utils
mpi-selector
libibmad-devel
sdpnetstat
knem
libibumad-devel
libsdp
mlnx-ethtool
libibverbs-debuginfo
mlnx-ofa_kernel-modules
srp
opensm
mstflint
cc_mgr
libibmad
libibverbs
kernel-mft
libibverbs-devel-static
libibumad
librdmacm-devel
mlnx-ofa_kernel
libsdp-devel
ibutils2
mlnxofed-docs
libibmad-static
iser
opensm-devel
dump_pr
libibumad-static
rds-tools
libmlx4-debuginfo
mlnx-ofa_kernel-devel
opensm-libs
opensm-static
ar_mgr
dapl-devel-static
infiniband-diags-compat
libibverbs-devel
ibsim

[root(a)slave3 fio]# ibstat
CA 'mlx4_0'
        CA type: MT4103
        Number of ports: 2
        Firmware version: 2.35.5100
        Hardware version: 0
        Node GUID: 0x248a0703006090e0
        System image GUID: 0x248a0703006090e0
        Port 1:
                State: Active
                Physical state: LinkUp
                Rate: 40
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x04010000
                Port GUID: 0x268a07fffe6090e0
                Link layer: Ethernet
        Port 2:
                State: Active
                Physical state: LinkUp
                Rate: 40
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x04010000
                Port GUID: 0x268a07fffe6090e1
                Link layer: Ethernet

[root(a)slave3 fio]# ibv_devinfo
hca_id: mlx4_0
        transport:                      InfiniBand (0)
        fw_ver:                         2.35.5100
        node_guid:                      248a:0703:0060:90e0
        sys_image_guid:                 248a:0703:0060:90e0
        vendor_id:                      0x02c9
        vendor_part_id:                 4103
        hw_ver:                         0x0
        board_id:                       MT_1090111023
        phys_port_cnt:                  2
                port:   1
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                4096 (5)
                        active_mtu:             1024 (3)
                        sm_lid:                 0
                        port_lid:               0
                        port_lmc:               0x00
                        link_layer:             Ethernet

                port:   2
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                4096 (5)
                        active_mtu:             1024 (3)
                        sm_lid:                 0
                        port_lid:               0
                        port_lmc:               0x00
                        link_layer:             Ethernet

From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Thursday, October 19, 2017 12:34 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Do you install Mellanox OFED on the Target and Client server?
Can you run ibstat on both servers?
Thanks
Victor
________________________________
From: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Sent: Wednesday, October 18, 2017 8:59:28 PM
To: Victor Banh; Storage Performance Development Kit; Harris, James R
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Victor,

I've just tried the SPDK v17.07.1 and DPDK v17.08.

nvme version: 1.1.38.gfaab
fio version: 3.1

Tried the 512k and 1024k IO size and there is no error. demsg information as following.

So that there may be other difference here? Looks like you are using ConnectX-5 while I am using ConnectX-4?

Other related information:

[root(a)node4 fio]# lspci | grep -i mell
81:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
81:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]

[root(a)node4 fio]# lsmod | grep -i mlx
mlx5_ib               172032  0
ib_core               200704  15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,mlx5_ib,ib_srpt,ib_ucm,rdma_ucm
mlx5_core             380928  1 mlx5_ib
ptp                    20480  3 ixgbe,igb,mlx5_core

[root(a)node4 fio]# uname -a
Linux node4 4.10.1 #1 SMP Fri Mar 10 15:59:57 CST 2017 x86_64 x86_64 x86_64 GNU/Linux

[577707.543326] nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 192.168.100.8:4420
[577730.854540] detected loopback device
[577730.893761] nvme nvme0: creating 7 I/O queues.
[577730.893797] detected loopback device
[577730.898611] detected loopback device
[577730.908917] detected loopback device
[577730.919073] detected loopback device
[577730.928922] detected loopback device
[577730.938679] detected loopback device
[577730.948365] detected loopback device
[577731.146290] nvme nvme0: new ctrl: NQN "nqn.2016-06.io.spdk:cnode2", addr 192.168.100.8:4420

Thanks,
Gang

From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Thursday, October 19, 2017 9:43 AM
To: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Gang
Any update?
Do you see any error message from "dmesg" with 512k block size running fio?
Thanks
Victor

From: Victor Banh
Sent: Tuesday, October 17, 2017 7:37 PM
To: 'Cao, Gang' <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Gang
spdk-17.07.1 and dpdk-17.08
Thanks
Victor

From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Monday, October 16, 2017 8:51 PM
To: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Victor,

Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?

Thanks,
Gang

From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Tuesday, October 17, 2017 5:30 AM
To: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Cao
Do you see any message from dmesg?

I tried this fio version and still saw these error message from dmesg.

fio-3.1

[869053.218235] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218250] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218259] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218263] ldm_validate_partition_table(): Disk read failed.
[869053.218269] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218277] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218285] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218292] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218296] Dev nvme2n1: unable to read RDB block 0
[869053.218303] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218311] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218323] Buffer I/O error on dev nvme2n1, logical block 3, async page read
[869053.218338]  nvme2n1: unable to read partition table
[869053.246126] nvme2n1: detected capacity change from -62111005559226368 to -62042256479092736
[869053.246195] ldm_validate_partition_table(): Disk read failed.
[869053.246217] Dev nvme2n1: unable to read RDB block 0

From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Monday, October 09, 2017 10:59 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Cc: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Victor,

Thanks for your detailed information on the testing.

I've tried the latest SPDK code and with latest fio-3.1-20-g132b and fio-2.19. It seems like no this kind of error.

Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?

fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme0n1  --name=read-phase --rw=randwrite
read-phase: (g=0): rw=randwrite, bs=(R) 512KiB-512KiB, (W) 512KiB-512KiB, (T) 512KiB-512KiB, ioengine=libaio, iodepth=16
...
fio-3.1-20-g132b
Starting 4 processes
Jobs: 4 (f=4): [w(4)][100.0%][r=0KiB/s,w=1592MiB/s][r=0,w=3183 IOPS][eta 00m:00s]
read-phase: (groupid=0, jobs=1): err= 0: pid=46378: Tue Oct 10 01:23:39 2017

My NIC information:
[root(a)node4 nvme-cli-gerrit]# lsmod | grep -i mlx
mlx5_ib               172032  0
ib_core               200704  15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,mlx5_ib,ib_srpt,ib_ucm,rdma_ucm
mlx5_core             380928  1 mlx5_ib
ptp                    20480  3 ixgbe,igb,mlx5_core
[root(a)node4 nvme-cli-gerrit]# lspci | grep -i mell
81:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
81:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]

From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Victor Banh
Sent: Friday, October 6, 2017 2:41 PM
To: Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio



From: Harris, James R [mailto:james.r.harris(a)intel.com]
Sent: Friday, October 06, 2017 2:32 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Cc: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio

(cc Victor)

From: James Harris <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Date: Thursday, October 5, 2017 at 1:59 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio

Hi Victor,

Could you provide a few more details?  This will help the list to provide some ideas.


1)      On the client, are you using the SPDK NVMe-oF initiator or the kernel initiator?


Kernel initiator, run these commands on client server.



modprobe mlx5_ib

modprobe nvme-rdma

nvme discover -t rdma -a 192.168.10.11 -s 4420

nvme connect -t rdma -n nqn.2016-06.io.spdk:nvme-subsystem-1  -a 192.168.10.11 -s 4420




2)      Can you provide the fio configuration file or command line?  Just so we can have more specifics on "bigger block size".


fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme1n1  --name=read-phase --rw=randwrite


3)      Any details on the HW setup - specifically details on the RDMA NIC (or if you're using SW RoCE).

Nvmf.conf on target server

[Global]
  Comment "Global section"
    ReactorMask 0xff00

[Rpc]
  Enable No
  Listen 127.0.0.1

[Nvmf]
  MaxQueuesPerSession 8
  MaxQueueDepth 128

[Subsystem1]
  NQN nqn.2016-06.io.spdk:nvme-subsystem-1
  Core 9
  Mode Direct
  Listen RDMA 192.168.10.11:4420
  NVMe 0000:82:00.0
  SN S2PMNAAH400039


           It is RDMA NIC, ConnectX 5, Intel CPU Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
NUMA node0 CPU(s):     0-7
NUMA node1 CPU(s):     8-15




Thanks,

-Jim


From: SPDK <spdk-bounces(a)lists.01.org<mailto:spdk-bounces(a)lists.01.org>> on behalf of Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Date: Thursday, October 5, 2017 at 11:26 AM
To: "spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>" <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: [SPDK] Buffer I/O error on bigger block size running fio

Hi
I have SPDK NVMeoF and keep getting error with bigger block size with fio on randwrite tests.
I am using Ubuntu 16.04 with kernel version 4.12.0-041200-generic on target and client.
The DPDK is 17.08 and SPDK is 17.07.1.
Thanks
Victor


[46905.233553] perf: interrupt took too long (2503 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
[48285.159186] blk_update_request: I/O error, dev nvme1n1, sector 2507351968
[48285.159207] blk_update_request: I/O error, dev nvme1n1, sector 1301294496
[48285.159226] blk_update_request: I/O error, dev nvme1n1, sector 1947371168
[48285.159239] blk_update_request: I/O error, dev nvme1n1, sector 1891797568
[48285.159252] blk_update_request: I/O error, dev nvme1n1, sector 10833824
[48285.159265] blk_update_request: I/O error, dev nvme1n1, sector 614937152
[48285.159277] blk_update_request: I/O error, dev nvme1n1, sector 1872305088
[48285.159290] blk_update_request: I/O error, dev nvme1n1, sector 1504491040
[48285.159299] blk_update_request: I/O error, dev nvme1n1, sector 1182136128
[48285.159308] blk_update_request: I/O error, dev nvme1n1, sector 1662985792
[48285.191185] nvme nvme1: Reconnecting in 10 seconds...
[48285.191254] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191291] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191305] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191314] ldm_validate_partition_table(): Disk read failed.
[48285.191320] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191327] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191335] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191342] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191347] Dev nvme1n1: unable to read RDB block 0
[48285.191353] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191360] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191375] Buffer I/O error on dev nvme1n1, logical block 3, async page read
[48285.191389]  nvme1n1: unable to read partition table
[48285.223197] nvme1n1: detected capacity change from 1600321314816 to 0
[48289.623192] nvme1n1: detected capacity change from 0 to -65647705833078784
[48289.623411] ldm_validate_partition_table(): Disk read failed.
[48289.623447] Dev nvme1n1: unable to read RDB block 0
[48289.623486]  nvme1n1: unable to read partition table
[48289.643305] ldm_validate_partition_table(): Disk read failed.
[48289.643328] Dev nvme1n1: unable to read RDB block 0
[48289.643373]  nvme1n1: unable to read partition table





[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 97980 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [SPDK] Buffer I/O error on bigger block size running fio
@ 2017-10-19  6:13 Victor Banh
  0 siblings, 0 replies; 24+ messages in thread
From: Victor Banh @ 2017-10-19  6:13 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 21181 bytes --]

Can you give the OS version and kernel version again for target and client?
I couldn't compile dkdp without installed the latest Kernel on 4.12.X kernel on Ubuntu 16.04.
Thanks
Victor
________________________________
From: Cao, Gang <gang.cao(a)intel.com>
Sent: Wednesday, October 18, 2017 10:59:19 PM
To: Victor Banh; Storage Performance Development Kit; Harris, James R
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

This server does not have OFED installed and it is in the loopback mode.

Found another server also in the loopback mode with ConnectX-3 and have OFED installed.

By the way, what SSD are you using? Maybe it is relating to the SSD? I’ve just run with 2048k for a short duration and seems no issue. Will run more time to see whether can hit this error.

[root(a)slave3 fio]# lspci | grep -i mell
08:00.0 Ethernet controller: Mellanox Technologies MT27520 Family [ConnectX-3 Pro]

[root(a)slave3 fio]# lsmod | grep -i mlx
mlx4_ib               159744  0
ib_core               208896  15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,ib_srpt,ib_ucm,rdma_ucm,mlx4_ib
mlx4_en               114688  0
mlx4_core             307200  2 mlx4_en,mlx4_ib
ptp                    20480  3 ixgbe,igb,mlx4_en

[root(a)slave3 fio]# ofed_info
MLNX_OFED_LINUX-3.2-2.0.0.0 (OFED-3.2-2.0.0):

ar_mgr:
osm_plugins/ar_mgr/ar_mgr-1.0-0.30.ga1ea4b7.tar.gz

cc_mgr:
osm_plugins/cc_mgr/cc_mgr-1.0-0.29.ga1ea4b7.tar.gz

dapl:
dapl.git mlnx_ofed_3_1
commit c30fb6ce2cbc29d8ed4bde51437f7abb93378c78
dump_pr:
osm_plugins/dump_pr//dump_pr-1.0-0.25.ga1ea4b7.tar.gz

fabric-collector:
fabric_collector//fabric-collector-1.1.0.MLNX20140410.51b267e.tar.gz

fca:
mlnx_ofed_fca/fca-2.5.2431-1.src.rpm

hcoll:
mlnx_ofed_hcol/hcoll-3.4.807-1.src.rpm

ibacm:
mlnx_ofed/ibacm.git mlnx_ofed_3_2
commit 15ad8c13bdebbe62edea0b7df030710b65c14f7f
ibacm_ssa:
mlnx_ofed_ssa/acm/ibacm_ssa-0.0.9.3.MLNX20151203.50eb579.tar.gz

ibdump:
sniffer/sniffer-4.0.0-2/ibdump/linux/ibdump-4.0.0-2.tgz

ibsim:
mlnx_ofed_ibsim/ibsim-0.6-0.8.g9d76581.tar.gz

ibssa:
mlnx_ofed_ssa/distrib/ibssa-0.0.9.3.MLNX20151203.50eb579.tar.gz

ibutils:
ofed-1.5.3-rpms/ibutils/ibutils-1.5.7.1-0.12.gdcaeae2.tar.gz

ibutils2:
ibutils2/ibutils2-2.1.1-0.76.MLNX20160222.gd366c7b.tar.gz

infiniband-diags:
mlnx_ofed_infiniband_diags/infiniband-diags-1.6.6.MLNX20151130.7f0213e.tar.gz

infinipath-psm:
mlnx_ofed_infinipath-psm/infinipath-psm-3.3-2_g6f42cdb_open.tar.gz

iser:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d

isert:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d

kernel-mft:
mlnx_ofed_mft/kernel-mft-4.3.0-25.src.rpm

knem:
knem.git mellanox-master
commit f143ee19a575cd42a334422fa8bd329d671238db
libibcm:
mlnx_ofed/libibcm.git mlnx_ofed_3_0
commit d7d485df305e6536711485bd7e477668e77d8320
libibmad:
mlnx_ofed_libibmad/libibmad-1.3.12.MLNX20151122.d140cb1.tar.gz

libibprof:
mlnx_ofed_libibprof/libibprof-1.1.22-1.src.rpm

libibumad:
mlnx_ofed_libibumad/libibumad-1.3.10.2.MLNX20150406.966500d.tar.gz

libibverbs:
mlnx_ofed/libibverbs.git mlnx_ofed_3_2_2
commit 217a77686f4861229f0e4b94485a13f024634caf
libmlx4:
mlnx_ofed/libmlx4.git mlnx_ofed_3_2_2
commit dda6d7ae1e6e3779a485ebdd0a882f4bcbd027a6
libmlx5:
mlnx_ofed/libmlx5.git mlnx_ofed_3_2_2
commit d0c8645359e0f0aba0408b2d344f3b418d27019b
libopensmssa:
mlnx_ofed_ssa/plugin/libopensmssa-0.0.9.3.MLNX20151203.50eb579.tar.gz

librdmacm:
mlnx_ofed/librdmacm.git mlnx_ofed_3_2_2
commit 6bd430fed9e7b3d57a1876c040431ce7295c7703
libsdp:
libsdp.git mlnx_ofed_3_0
commit fbd01dfff05f42d6b82506e7dbf4bc6b7e6a59a4
libvma:
vma/source_rpms//libvma-7.0.14-0.src.rpm

mlnx-ethtool:
upstream/ethtool.git for-upstream
commit ac0cf295abe0c0832f0711fed66ab9601c8b2513
mlnx-ofa_kernel:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d

mpi-selector:
ofed-1.5.3-rpms/mpi-selector/mpi-selector-1.0.3-1.src.rpm

mpitests:
mlnx_ofed_mpitest/mpitests-3.2.17-e1c7f2f.src.rpm

mstflint:
mlnx_ofed_mstflint/mstflint-4.3.0-1.49.g9b9af70.tar.gz

multiperf:
mlnx_ofed_multiperf/multiperf-3.0-0.10.gda89e8c.tar.gz

mvapich2:
mlnx_ofed_mvapich2/mvapich2-2.2a-1.src.rpm

mxm:
mlnx_ofed_mxm/mxm-3.4.3079-1.src.rpm

ofed-docs:
docs.git mlnx_ofed-3.2
commit ea3386416f9f7130edd2c70fc3424cb2cda50f7d

openmpi:
mlnx_ofed_ompi_1.8/openmpi-1.10.3a1-1.src.rpm

opensm:
mlnx_ofed_opensm/opensm-4.6.1.MLNX20160112.774e977.tar.gz

perftest:
mlnx_ofed_perftest/perftest-3.0-0.18.gb464d59.tar.gz

qperf:
mlnx_ofed_qperf/qperf-0.4.9.tar.gz

rds-tools:
rds-tools.git mlnx_ofed_2_4
commit 299420ca25cf9996bc0748e3bc4b08748996ba49
sdpnetstat:
sdpnetstat.git mlnx_ofed_3_0
commit 3cf409a7cc07e5c71f9640eddbb801ece21b4169
sockperf:
sockperf/sockperf-2.7-43.git3ee62bd8107a.src.rpm

srp:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d

srptools:
srptools/srptools-1.0.2-12.src.rpm


Installed Packages:
-------------------
infiniband-diags
librdmacm
libmlx4
libibverbs-utils
mpi-selector
libibmad-devel
sdpnetstat
knem
libibumad-devel
libsdp
mlnx-ethtool
libibverbs-debuginfo
mlnx-ofa_kernel-modules
srp
opensm
mstflint
cc_mgr
libibmad
libibverbs
kernel-mft
libibverbs-devel-static
libibumad
librdmacm-devel
mlnx-ofa_kernel
libsdp-devel
ibutils2
mlnxofed-docs
libibmad-static
iser
opensm-devel
dump_pr
libibumad-static
rds-tools
libmlx4-debuginfo
mlnx-ofa_kernel-devel
opensm-libs
opensm-static
ar_mgr
dapl-devel-static
infiniband-diags-compat
libibverbs-devel
ibsim

[root(a)slave3 fio]# ibstat
CA 'mlx4_0'
        CA type: MT4103
        Number of ports: 2
        Firmware version: 2.35.5100
        Hardware version: 0
        Node GUID: 0x248a0703006090e0
        System image GUID: 0x248a0703006090e0
        Port 1:
                State: Active
                Physical state: LinkUp
                Rate: 40
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x04010000
                Port GUID: 0x268a07fffe6090e0
                Link layer: Ethernet
        Port 2:
                State: Active
                Physical state: LinkUp
                Rate: 40
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x04010000
                Port GUID: 0x268a07fffe6090e1
                Link layer: Ethernet

[root(a)slave3 fio]# ibv_devinfo
hca_id: mlx4_0
        transport:                      InfiniBand (0)
        fw_ver:                         2.35.5100
        node_guid:                      248a:0703:0060:90e0
        sys_image_guid:                 248a:0703:0060:90e0
        vendor_id:                      0x02c9
        vendor_part_id:                 4103
        hw_ver:                         0x0
        board_id:                       MT_1090111023
        phys_port_cnt:                  2
                port:   1
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                4096 (5)
                        active_mtu:             1024 (3)
                        sm_lid:                 0
                        port_lid:               0
                        port_lmc:               0x00
                        link_layer:             Ethernet

                port:   2
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                4096 (5)
                        active_mtu:             1024 (3)
                        sm_lid:                 0
                        port_lid:               0
                        port_lmc:               0x00
                        link_layer:             Ethernet

From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Thursday, October 19, 2017 12:34 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org>; Harris, James R <james.r.harris(a)intel.com>; Cao, Gang <gang.cao(a)intel.com>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Do you install Mellanox OFED on the Target and Client server?
Can you run ibstat on both servers?
Thanks
Victor
________________________________
From: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Sent: Wednesday, October 18, 2017 8:59:28 PM
To: Victor Banh; Storage Performance Development Kit; Harris, James R
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Victor,

I’ve just tried the SPDK v17.07.1 and DPDK v17.08.

nvme version: 1.1.38.gfaab
fio version: 3.1

Tried the 512k and 1024k IO size and there is no error. demsg information as following.

So that there may be other difference here? Looks like you are using ConnectX-5 while I am using ConnectX-4?

Other related information:

[root(a)node4 fio]# lspci | grep -i mell
81:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
81:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]

[root(a)node4 fio]# lsmod | grep -i mlx
mlx5_ib               172032  0
ib_core               200704  15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,mlx5_ib,ib_srpt,ib_ucm,rdma_ucm
mlx5_core             380928  1 mlx5_ib
ptp                    20480  3 ixgbe,igb,mlx5_core

[root(a)node4 fio]# uname -a
Linux node4 4.10.1 #1 SMP Fri Mar 10 15:59:57 CST 2017 x86_64 x86_64 x86_64 GNU/Linux

[577707.543326] nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 192.168.100.8:4420
[577730.854540] detected loopback device
[577730.893761] nvme nvme0: creating 7 I/O queues.
[577730.893797] detected loopback device
[577730.898611] detected loopback device
[577730.908917] detected loopback device
[577730.919073] detected loopback device
[577730.928922] detected loopback device
[577730.938679] detected loopback device
[577730.948365] detected loopback device
[577731.146290] nvme nvme0: new ctrl: NQN "nqn.2016-06.io.spdk:cnode2", addr 192.168.100.8:4420

Thanks,
Gang

From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Thursday, October 19, 2017 9:43 AM
To: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Gang
Any update?
Do you see any error message from “dmesg” with 512k block size running fio?
Thanks
Victor

From: Victor Banh
Sent: Tuesday, October 17, 2017 7:37 PM
To: 'Cao, Gang' <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Gang
spdk-17.07.1 and dpdk-17.08
Thanks
Victor

From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Monday, October 16, 2017 8:51 PM
To: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Victor,

Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?

Thanks,
Gang

From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Tuesday, October 17, 2017 5:30 AM
To: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Cao
Do you see any message from dmesg?

I tried this fio version and still saw these error message from dmesg.

fio-3.1

[869053.218235] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218250] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218259] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218263] ldm_validate_partition_table(): Disk read failed.
[869053.218269] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218277] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218285] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218292] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218296] Dev nvme2n1: unable to read RDB block 0
[869053.218303] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218311] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218323] Buffer I/O error on dev nvme2n1, logical block 3, async page read
[869053.218338]  nvme2n1: unable to read partition table
[869053.246126] nvme2n1: detected capacity change from -62111005559226368 to -62042256479092736
[869053.246195] ldm_validate_partition_table(): Disk read failed.
[869053.246217] Dev nvme2n1: unable to read RDB block 0

From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Monday, October 09, 2017 10:59 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Cc: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Victor,

Thanks for your detailed information on the testing.

I’ve tried the latest SPDK code and with latest fio-3.1-20-g132b and fio-2.19. It seems like no this kind of error.

Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?

fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme0n1  --name=read-phase --rw=randwrite
read-phase: (g=0): rw=randwrite, bs=(R) 512KiB-512KiB, (W) 512KiB-512KiB, (T) 512KiB-512KiB, ioengine=libaio, iodepth=16
...
fio-3.1-20-g132b
Starting 4 processes
Jobs: 4 (f=4): [w(4)][100.0%][r=0KiB/s,w=1592MiB/s][r=0,w=3183 IOPS][eta 00m:00s]
read-phase: (groupid=0, jobs=1): err= 0: pid=46378: Tue Oct 10 01:23:39 2017

My NIC information:
[root(a)node4 nvme-cli-gerrit]# lsmod | grep -i mlx
mlx5_ib               172032  0
ib_core               200704  15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,mlx5_ib,ib_srpt,ib_ucm,rdma_ucm
mlx5_core             380928  1 mlx5_ib
ptp                    20480  3 ixgbe,igb,mlx5_core
[root(a)node4 nvme-cli-gerrit]# lspci | grep -i mell
81:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
81:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]

From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Victor Banh
Sent: Friday, October 6, 2017 2:41 PM
To: Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio



From: Harris, James R [mailto:james.r.harris(a)intel.com]
Sent: Friday, October 06, 2017 2:32 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Cc: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio

(cc Victor)

From: James Harris <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Date: Thursday, October 5, 2017 at 1:59 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio

Hi Victor,

Could you provide a few more details?  This will help the list to provide some ideas.


1)      On the client, are you using the SPDK NVMe-oF initiator or the kernel initiator?


Kernel initiator, run these commands on client server.



modprobe mlx5_ib

modprobe nvme-rdma

nvme discover -t rdma -a 192.168.10.11 -s 4420

nvme connect -t rdma -n nqn.2016-06.io.spdk:nvme-subsystem-1  -a 192.168.10.11 -s 4420




2)      Can you provide the fio configuration file or command line?  Just so we can have more specifics on “bigger block size”.


fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme1n1  --name=read-phase --rw=randwrite


3)      Any details on the HW setup – specifically details on the RDMA NIC (or if you’re using SW RoCE).

Nvmf.conf on target server

[Global]
  Comment "Global section"
    ReactorMask 0xff00

[Rpc]
  Enable No
  Listen 127.0.0.1

[Nvmf]
  MaxQueuesPerSession 8
  MaxQueueDepth 128

[Subsystem1]
  NQN nqn.2016-06.io.spdk:nvme-subsystem-1
  Core 9
  Mode Direct
  Listen RDMA 192.168.10.11:4420
  NVMe 0000:82:00.0
  SN S2PMNAAH400039


           It is RDMA NIC, ConnectX 5, Intel CPU Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
NUMA node0 CPU(s):     0-7
NUMA node1 CPU(s):     8-15




Thanks,

-Jim


From: SPDK <spdk-bounces(a)lists.01.org<mailto:spdk-bounces(a)lists.01.org>> on behalf of Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Date: Thursday, October 5, 2017 at 11:26 AM
To: "spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>" <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: [SPDK] Buffer I/O error on bigger block size running fio

Hi
I have SPDK NVMeoF and keep getting error with bigger block size with fio on randwrite tests.
I am using Ubuntu 16.04 with kernel version 4.12.0-041200-generic on target and client.
The DPDK is 17.08 and SPDK is 17.07.1.
Thanks
Victor


[46905.233553] perf: interrupt took too long (2503 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
[48285.159186] blk_update_request: I/O error, dev nvme1n1, sector 2507351968
[48285.159207] blk_update_request: I/O error, dev nvme1n1, sector 1301294496
[48285.159226] blk_update_request: I/O error, dev nvme1n1, sector 1947371168
[48285.159239] blk_update_request: I/O error, dev nvme1n1, sector 1891797568
[48285.159252] blk_update_request: I/O error, dev nvme1n1, sector 10833824
[48285.159265] blk_update_request: I/O error, dev nvme1n1, sector 614937152
[48285.159277] blk_update_request: I/O error, dev nvme1n1, sector 1872305088
[48285.159290] blk_update_request: I/O error, dev nvme1n1, sector 1504491040
[48285.159299] blk_update_request: I/O error, dev nvme1n1, sector 1182136128
[48285.159308] blk_update_request: I/O error, dev nvme1n1, sector 1662985792
[48285.191185] nvme nvme1: Reconnecting in 10 seconds...
[48285.191254] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191291] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191305] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191314] ldm_validate_partition_table(): Disk read failed.
[48285.191320] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191327] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191335] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191342] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191347] Dev nvme1n1: unable to read RDB block 0
[48285.191353] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191360] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191375] Buffer I/O error on dev nvme1n1, logical block 3, async page read
[48285.191389]  nvme1n1: unable to read partition table
[48285.223197] nvme1n1: detected capacity change from 1600321314816 to 0
[48289.623192] nvme1n1: detected capacity change from 0 to -65647705833078784
[48289.623411] ldm_validate_partition_table(): Disk read failed.
[48289.623447] Dev nvme1n1: unable to read RDB block 0
[48289.623486]  nvme1n1: unable to read partition table
[48289.643305] ldm_validate_partition_table(): Disk read failed.
[48289.643328] Dev nvme1n1: unable to read RDB block 0
[48289.643373]  nvme1n1: unable to read partition table





[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 80871 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [SPDK] Buffer I/O error on bigger block size running fio
@ 2017-10-19  5:59 Cao, Gang
  0 siblings, 0 replies; 24+ messages in thread
From: Cao, Gang @ 2017-10-19  5:59 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 20710 bytes --]

This server does not have OFED installed and it is in the loopback mode.

Found another server also in the loopback mode with ConnectX-3 and have OFED installed.

By the way, what SSD are you using? Maybe it is relating to the SSD? I've just run with 2048k for a short duration and seems no issue. Will run more time to see whether can hit this error.

[root(a)slave3 fio]# lspci | grep -i mell
08:00.0 Ethernet controller: Mellanox Technologies MT27520 Family [ConnectX-3 Pro]

[root(a)slave3 fio]# lsmod | grep -i mlx
mlx4_ib               159744  0
ib_core               208896  15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,ib_srpt,ib_ucm,rdma_ucm,mlx4_ib
mlx4_en               114688  0
mlx4_core             307200  2 mlx4_en,mlx4_ib
ptp                    20480  3 ixgbe,igb,mlx4_en

[root(a)slave3 fio]# ofed_info
MLNX_OFED_LINUX-3.2-2.0.0.0 (OFED-3.2-2.0.0):

ar_mgr:
osm_plugins/ar_mgr/ar_mgr-1.0-0.30.ga1ea4b7.tar.gz

cc_mgr:
osm_plugins/cc_mgr/cc_mgr-1.0-0.29.ga1ea4b7.tar.gz

dapl:
dapl.git mlnx_ofed_3_1
commit c30fb6ce2cbc29d8ed4bde51437f7abb93378c78
dump_pr:
osm_plugins/dump_pr//dump_pr-1.0-0.25.ga1ea4b7.tar.gz

fabric-collector:
fabric_collector//fabric-collector-1.1.0.MLNX20140410.51b267e.tar.gz

fca:
mlnx_ofed_fca/fca-2.5.2431-1.src.rpm

hcoll:
mlnx_ofed_hcol/hcoll-3.4.807-1.src.rpm

ibacm:
mlnx_ofed/ibacm.git mlnx_ofed_3_2
commit 15ad8c13bdebbe62edea0b7df030710b65c14f7f
ibacm_ssa:
mlnx_ofed_ssa/acm/ibacm_ssa-0.0.9.3.MLNX20151203.50eb579.tar.gz

ibdump:
sniffer/sniffer-4.0.0-2/ibdump/linux/ibdump-4.0.0-2.tgz

ibsim:
mlnx_ofed_ibsim/ibsim-0.6-0.8.g9d76581.tar.gz

ibssa:
mlnx_ofed_ssa/distrib/ibssa-0.0.9.3.MLNX20151203.50eb579.tar.gz

ibutils:
ofed-1.5.3-rpms/ibutils/ibutils-1.5.7.1-0.12.gdcaeae2.tar.gz

ibutils2:
ibutils2/ibutils2-2.1.1-0.76.MLNX20160222.gd366c7b.tar.gz

infiniband-diags:
mlnx_ofed_infiniband_diags/infiniband-diags-1.6.6.MLNX20151130.7f0213e.tar.gz

infinipath-psm:
mlnx_ofed_infinipath-psm/infinipath-psm-3.3-2_g6f42cdb_open.tar.gz

iser:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d

isert:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d

kernel-mft:
mlnx_ofed_mft/kernel-mft-4.3.0-25.src.rpm

knem:
knem.git mellanox-master
commit f143ee19a575cd42a334422fa8bd329d671238db
libibcm:
mlnx_ofed/libibcm.git mlnx_ofed_3_0
commit d7d485df305e6536711485bd7e477668e77d8320
libibmad:
mlnx_ofed_libibmad/libibmad-1.3.12.MLNX20151122.d140cb1.tar.gz

libibprof:
mlnx_ofed_libibprof/libibprof-1.1.22-1.src.rpm

libibumad:
mlnx_ofed_libibumad/libibumad-1.3.10.2.MLNX20150406.966500d.tar.gz

libibverbs:
mlnx_ofed/libibverbs.git mlnx_ofed_3_2_2
commit 217a77686f4861229f0e4b94485a13f024634caf
libmlx4:
mlnx_ofed/libmlx4.git mlnx_ofed_3_2_2
commit dda6d7ae1e6e3779a485ebdd0a882f4bcbd027a6
libmlx5:
mlnx_ofed/libmlx5.git mlnx_ofed_3_2_2
commit d0c8645359e0f0aba0408b2d344f3b418d27019b
libopensmssa:
mlnx_ofed_ssa/plugin/libopensmssa-0.0.9.3.MLNX20151203.50eb579.tar.gz

librdmacm:
mlnx_ofed/librdmacm.git mlnx_ofed_3_2_2
commit 6bd430fed9e7b3d57a1876c040431ce7295c7703
libsdp:
libsdp.git mlnx_ofed_3_0
commit fbd01dfff05f42d6b82506e7dbf4bc6b7e6a59a4
libvma:
vma/source_rpms//libvma-7.0.14-0.src.rpm

mlnx-ethtool:
upstream/ethtool.git for-upstream
commit ac0cf295abe0c0832f0711fed66ab9601c8b2513
mlnx-ofa_kernel:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d

mpi-selector:
ofed-1.5.3-rpms/mpi-selector/mpi-selector-1.0.3-1.src.rpm

mpitests:
mlnx_ofed_mpitest/mpitests-3.2.17-e1c7f2f.src.rpm

mstflint:
mlnx_ofed_mstflint/mstflint-4.3.0-1.49.g9b9af70.tar.gz

multiperf:
mlnx_ofed_multiperf/multiperf-3.0-0.10.gda89e8c.tar.gz

mvapich2:
mlnx_ofed_mvapich2/mvapich2-2.2a-1.src.rpm

mxm:
mlnx_ofed_mxm/mxm-3.4.3079-1.src.rpm

ofed-docs:
docs.git mlnx_ofed-3.2
commit ea3386416f9f7130edd2c70fc3424cb2cda50f7d

openmpi:
mlnx_ofed_ompi_1.8/openmpi-1.10.3a1-1.src.rpm

opensm:
mlnx_ofed_opensm/opensm-4.6.1.MLNX20160112.774e977.tar.gz

perftest:
mlnx_ofed_perftest/perftest-3.0-0.18.gb464d59.tar.gz

qperf:
mlnx_ofed_qperf/qperf-0.4.9.tar.gz

rds-tools:
rds-tools.git mlnx_ofed_2_4
commit 299420ca25cf9996bc0748e3bc4b08748996ba49
sdpnetstat:
sdpnetstat.git mlnx_ofed_3_0
commit 3cf409a7cc07e5c71f9640eddbb801ece21b4169
sockperf:
sockperf/sockperf-2.7-43.git3ee62bd8107a.src.rpm

srp:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d

srptools:
srptools/srptools-1.0.2-12.src.rpm


Installed Packages:
-------------------
infiniband-diags
librdmacm
libmlx4
libibverbs-utils
mpi-selector
libibmad-devel
sdpnetstat
knem
libibumad-devel
libsdp
mlnx-ethtool
libibverbs-debuginfo
mlnx-ofa_kernel-modules
srp
opensm
mstflint
cc_mgr
libibmad
libibverbs
kernel-mft
libibverbs-devel-static
libibumad
librdmacm-devel
mlnx-ofa_kernel
libsdp-devel
ibutils2
mlnxofed-docs
libibmad-static
iser
opensm-devel
dump_pr
libibumad-static
rds-tools
libmlx4-debuginfo
mlnx-ofa_kernel-devel
opensm-libs
opensm-static
ar_mgr
dapl-devel-static
infiniband-diags-compat
libibverbs-devel
ibsim

[root(a)slave3 fio]# ibstat
CA 'mlx4_0'
        CA type: MT4103
        Number of ports: 2
        Firmware version: 2.35.5100
        Hardware version: 0
        Node GUID: 0x248a0703006090e0
        System image GUID: 0x248a0703006090e0
        Port 1:
                State: Active
                Physical state: LinkUp
                Rate: 40
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x04010000
                Port GUID: 0x268a07fffe6090e0
                Link layer: Ethernet
        Port 2:
                State: Active
                Physical state: LinkUp
                Rate: 40
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x04010000
                Port GUID: 0x268a07fffe6090e1
                Link layer: Ethernet

[root(a)slave3 fio]# ibv_devinfo
hca_id: mlx4_0
        transport:                      InfiniBand (0)
        fw_ver:                         2.35.5100
        node_guid:                      248a:0703:0060:90e0
        sys_image_guid:                 248a:0703:0060:90e0
        vendor_id:                      0x02c9
        vendor_part_id:                 4103
        hw_ver:                         0x0
        board_id:                       MT_1090111023
        phys_port_cnt:                  2
                port:   1
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                4096 (5)
                        active_mtu:             1024 (3)
                        sm_lid:                 0
                        port_lid:               0
                        port_lmc:               0x00
                        link_layer:             Ethernet

                port:   2
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                4096 (5)
                        active_mtu:             1024 (3)
                        sm_lid:                 0
                        port_lid:               0
                        port_lmc:               0x00
                        link_layer:             Ethernet

From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Thursday, October 19, 2017 12:34 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org>; Harris, James R <james.r.harris(a)intel.com>; Cao, Gang <gang.cao(a)intel.com>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Do you install Mellanox OFED on the Target and Client server?
Can you run ibstat on both servers?
Thanks
Victor
________________________________
From: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Sent: Wednesday, October 18, 2017 8:59:28 PM
To: Victor Banh; Storage Performance Development Kit; Harris, James R
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Victor,

I've just tried the SPDK v17.07.1 and DPDK v17.08.

nvme version: 1.1.38.gfaab
fio version: 3.1

Tried the 512k and 1024k IO size and there is no error. demsg information as following.

So that there may be other difference here? Looks like you are using ConnectX-5 while I am using ConnectX-4?

Other related information:

[root(a)node4 fio]# lspci | grep -i mell
81:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
81:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]

[root(a)node4 fio]# lsmod | grep -i mlx
mlx5_ib               172032  0
ib_core               200704  15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,mlx5_ib,ib_srpt,ib_ucm,rdma_ucm
mlx5_core             380928  1 mlx5_ib
ptp                    20480  3 ixgbe,igb,mlx5_core

[root(a)node4 fio]# uname -a
Linux node4 4.10.1 #1 SMP Fri Mar 10 15:59:57 CST 2017 x86_64 x86_64 x86_64 GNU/Linux

[577707.543326] nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 192.168.100.8:4420
[577730.854540] detected loopback device
[577730.893761] nvme nvme0: creating 7 I/O queues.
[577730.893797] detected loopback device
[577730.898611] detected loopback device
[577730.908917] detected loopback device
[577730.919073] detected loopback device
[577730.928922] detected loopback device
[577730.938679] detected loopback device
[577730.948365] detected loopback device
[577731.146290] nvme nvme0: new ctrl: NQN "nqn.2016-06.io.spdk:cnode2", addr 192.168.100.8:4420

Thanks,
Gang

From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Thursday, October 19, 2017 9:43 AM
To: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Gang
Any update?
Do you see any error message from "dmesg" with 512k block size running fio?
Thanks
Victor

From: Victor Banh
Sent: Tuesday, October 17, 2017 7:37 PM
To: 'Cao, Gang' <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Gang
spdk-17.07.1 and dpdk-17.08
Thanks
Victor

From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Monday, October 16, 2017 8:51 PM
To: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Victor,

Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?

Thanks,
Gang

From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Tuesday, October 17, 2017 5:30 AM
To: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Cao
Do you see any message from dmesg?

I tried this fio version and still saw these error message from dmesg.

fio-3.1

[869053.218235] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218250] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218259] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218263] ldm_validate_partition_table(): Disk read failed.
[869053.218269] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218277] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218285] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218292] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218296] Dev nvme2n1: unable to read RDB block 0
[869053.218303] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218311] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218323] Buffer I/O error on dev nvme2n1, logical block 3, async page read
[869053.218338]  nvme2n1: unable to read partition table
[869053.246126] nvme2n1: detected capacity change from -62111005559226368 to -62042256479092736
[869053.246195] ldm_validate_partition_table(): Disk read failed.
[869053.246217] Dev nvme2n1: unable to read RDB block 0

From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Monday, October 09, 2017 10:59 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Cc: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Victor,

Thanks for your detailed information on the testing.

I've tried the latest SPDK code and with latest fio-3.1-20-g132b and fio-2.19. It seems like no this kind of error.

Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?

fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme0n1  --name=read-phase --rw=randwrite
read-phase: (g=0): rw=randwrite, bs=(R) 512KiB-512KiB, (W) 512KiB-512KiB, (T) 512KiB-512KiB, ioengine=libaio, iodepth=16
...
fio-3.1-20-g132b
Starting 4 processes
Jobs: 4 (f=4): [w(4)][100.0%][r=0KiB/s,w=1592MiB/s][r=0,w=3183 IOPS][eta 00m:00s]
read-phase: (groupid=0, jobs=1): err= 0: pid=46378: Tue Oct 10 01:23:39 2017

My NIC information:
[root(a)node4 nvme-cli-gerrit]# lsmod | grep -i mlx
mlx5_ib               172032  0
ib_core               200704  15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,mlx5_ib,ib_srpt,ib_ucm,rdma_ucm
mlx5_core             380928  1 mlx5_ib
ptp                    20480  3 ixgbe,igb,mlx5_core
[root(a)node4 nvme-cli-gerrit]# lspci | grep -i mell
81:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
81:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]

From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Victor Banh
Sent: Friday, October 6, 2017 2:41 PM
To: Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio



From: Harris, James R [mailto:james.r.harris(a)intel.com]
Sent: Friday, October 06, 2017 2:32 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Cc: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio

(cc Victor)

From: James Harris <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Date: Thursday, October 5, 2017 at 1:59 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio

Hi Victor,

Could you provide a few more details?  This will help the list to provide some ideas.


1)      On the client, are you using the SPDK NVMe-oF initiator or the kernel initiator?


Kernel initiator, run these commands on client server.



modprobe mlx5_ib

modprobe nvme-rdma

nvme discover -t rdma -a 192.168.10.11 -s 4420

nvme connect -t rdma -n nqn.2016-06.io.spdk:nvme-subsystem-1  -a 192.168.10.11 -s 4420




2)      Can you provide the fio configuration file or command line?  Just so we can have more specifics on "bigger block size".


fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme1n1  --name=read-phase --rw=randwrite


3)      Any details on the HW setup - specifically details on the RDMA NIC (or if you're using SW RoCE).

Nvmf.conf on target server

[Global]
  Comment "Global section"
    ReactorMask 0xff00

[Rpc]
  Enable No
  Listen 127.0.0.1

[Nvmf]
  MaxQueuesPerSession 8
  MaxQueueDepth 128

[Subsystem1]
  NQN nqn.2016-06.io.spdk:nvme-subsystem-1
  Core 9
  Mode Direct
  Listen RDMA 192.168.10.11:4420
  NVMe 0000:82:00.0
  SN S2PMNAAH400039


           It is RDMA NIC, ConnectX 5, Intel CPU Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
NUMA node0 CPU(s):     0-7
NUMA node1 CPU(s):     8-15




Thanks,

-Jim


From: SPDK <spdk-bounces(a)lists.01.org<mailto:spdk-bounces(a)lists.01.org>> on behalf of Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Date: Thursday, October 5, 2017 at 11:26 AM
To: "spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>" <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: [SPDK] Buffer I/O error on bigger block size running fio

Hi
I have SPDK NVMeoF and keep getting error with bigger block size with fio on randwrite tests.
I am using Ubuntu 16.04 with kernel version 4.12.0-041200-generic on target and client.
The DPDK is 17.08 and SPDK is 17.07.1.
Thanks
Victor


[46905.233553] perf: interrupt took too long (2503 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
[48285.159186] blk_update_request: I/O error, dev nvme1n1, sector 2507351968
[48285.159207] blk_update_request: I/O error, dev nvme1n1, sector 1301294496
[48285.159226] blk_update_request: I/O error, dev nvme1n1, sector 1947371168
[48285.159239] blk_update_request: I/O error, dev nvme1n1, sector 1891797568
[48285.159252] blk_update_request: I/O error, dev nvme1n1, sector 10833824
[48285.159265] blk_update_request: I/O error, dev nvme1n1, sector 614937152
[48285.159277] blk_update_request: I/O error, dev nvme1n1, sector 1872305088
[48285.159290] blk_update_request: I/O error, dev nvme1n1, sector 1504491040
[48285.159299] blk_update_request: I/O error, dev nvme1n1, sector 1182136128
[48285.159308] blk_update_request: I/O error, dev nvme1n1, sector 1662985792
[48285.191185] nvme nvme1: Reconnecting in 10 seconds...
[48285.191254] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191291] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191305] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191314] ldm_validate_partition_table(): Disk read failed.
[48285.191320] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191327] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191335] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191342] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191347] Dev nvme1n1: unable to read RDB block 0
[48285.191353] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191360] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191375] Buffer I/O error on dev nvme1n1, logical block 3, async page read
[48285.191389]  nvme1n1: unable to read partition table
[48285.223197] nvme1n1: detected capacity change from 1600321314816 to 0
[48289.623192] nvme1n1: detected capacity change from 0 to -65647705833078784
[48289.623411] ldm_validate_partition_table(): Disk read failed.
[48289.623447] Dev nvme1n1: unable to read RDB block 0
[48289.623486]  nvme1n1: unable to read partition table
[48289.643305] ldm_validate_partition_table(): Disk read failed.
[48289.643328] Dev nvme1n1: unable to read RDB block 0
[48289.643373]  nvme1n1: unable to read partition table





[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 86438 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [SPDK] Buffer I/O error on bigger block size running fio
@ 2017-10-19  4:34 Victor Banh
  0 siblings, 0 replies; 24+ messages in thread
From: Victor Banh @ 2017-10-19  4:34 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 12711 bytes --]

Do you install Mellanox OFED on the Target and Client server?

Can you run ibstat on both servers?
Thanks
Victor
________________________________
From: Cao, Gang <gang.cao(a)intel.com>
Sent: Wednesday, October 18, 2017 8:59:28 PM
To: Victor Banh; Storage Performance Development Kit; Harris, James R
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Victor,

I’ve just tried the SPDK v17.07.1 and DPDK v17.08.

nvme version: 1.1.38.gfaab
fio version: 3.1

Tried the 512k and 1024k IO size and there is no error. demsg information as following.

So that there may be other difference here? Looks like you are using ConnectX-5 while I am using ConnectX-4?

Other related information:

[root(a)node4 fio]# lspci | grep -i mell
81:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
81:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]

[root(a)node4 fio]# lsmod | grep -i mlx
mlx5_ib               172032  0
ib_core               200704  15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,mlx5_ib,ib_srpt,ib_ucm,rdma_ucm
mlx5_core             380928  1 mlx5_ib
ptp                    20480  3 ixgbe,igb,mlx5_core

[root(a)node4 fio]# uname -a
Linux node4 4.10.1 #1 SMP Fri Mar 10 15:59:57 CST 2017 x86_64 x86_64 x86_64 GNU/Linux

[577707.543326] nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 192.168.100.8:4420
[577730.854540] detected loopback device
[577730.893761] nvme nvme0: creating 7 I/O queues.
[577730.893797] detected loopback device
[577730.898611] detected loopback device
[577730.908917] detected loopback device
[577730.919073] detected loopback device
[577730.928922] detected loopback device
[577730.938679] detected loopback device
[577730.948365] detected loopback device
[577731.146290] nvme nvme0: new ctrl: NQN "nqn.2016-06.io.spdk:cnode2", addr 192.168.100.8:4420

Thanks,
Gang

From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Thursday, October 19, 2017 9:43 AM
To: Cao, Gang <gang.cao(a)intel.com>; Storage Performance Development Kit <spdk(a)lists.01.org>; Harris, James R <james.r.harris(a)intel.com>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Gang
Any update?
Do you see any error message from “dmesg” with 512k block size running fio?
Thanks
Victor

From: Victor Banh
Sent: Tuesday, October 17, 2017 7:37 PM
To: 'Cao, Gang' <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Gang
spdk-17.07.1 and dpdk-17.08
Thanks
Victor

From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Monday, October 16, 2017 8:51 PM
To: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Victor,

Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?

Thanks,
Gang

From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Tuesday, October 17, 2017 5:30 AM
To: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Cao
Do you see any message from dmesg?

I tried this fio version and still saw these error message from dmesg.

fio-3.1

[869053.218235] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218250] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218259] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218263] ldm_validate_partition_table(): Disk read failed.
[869053.218269] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218277] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218285] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218292] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218296] Dev nvme2n1: unable to read RDB block 0
[869053.218303] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218311] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218323] Buffer I/O error on dev nvme2n1, logical block 3, async page read
[869053.218338]  nvme2n1: unable to read partition table
[869053.246126] nvme2n1: detected capacity change from -62111005559226368 to -62042256479092736
[869053.246195] ldm_validate_partition_table(): Disk read failed.
[869053.246217] Dev nvme2n1: unable to read RDB block 0

From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Monday, October 09, 2017 10:59 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Cc: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Victor,

Thanks for your detailed information on the testing.

I’ve tried the latest SPDK code and with latest fio-3.1-20-g132b and fio-2.19. It seems like no this kind of error.

Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?

fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme0n1  --name=read-phase --rw=randwrite
read-phase: (g=0): rw=randwrite, bs=(R) 512KiB-512KiB, (W) 512KiB-512KiB, (T) 512KiB-512KiB, ioengine=libaio, iodepth=16
...
fio-3.1-20-g132b
Starting 4 processes
Jobs: 4 (f=4): [w(4)][100.0%][r=0KiB/s,w=1592MiB/s][r=0,w=3183 IOPS][eta 00m:00s]
read-phase: (groupid=0, jobs=1): err= 0: pid=46378: Tue Oct 10 01:23:39 2017

My NIC information:
[root(a)node4 nvme-cli-gerrit]# lsmod | grep -i mlx
mlx5_ib               172032  0
ib_core               200704  15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,mlx5_ib,ib_srpt,ib_ucm,rdma_ucm
mlx5_core             380928  1 mlx5_ib
ptp                    20480  3 ixgbe,igb,mlx5_core
[root(a)node4 nvme-cli-gerrit]# lspci | grep -i mell
81:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
81:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]

From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Victor Banh
Sent: Friday, October 6, 2017 2:41 PM
To: Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio



From: Harris, James R [mailto:james.r.harris(a)intel.com]
Sent: Friday, October 06, 2017 2:32 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Cc: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio

(cc Victor)

From: James Harris <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Date: Thursday, October 5, 2017 at 1:59 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio

Hi Victor,

Could you provide a few more details?  This will help the list to provide some ideas.


1)      On the client, are you using the SPDK NVMe-oF initiator or the kernel initiator?


Kernel initiator, run these commands on client server.



modprobe mlx5_ib

modprobe nvme-rdma

nvme discover -t rdma -a 192.168.10.11 -s 4420

nvme connect -t rdma -n nqn.2016-06.io.spdk:nvme-subsystem-1  -a 192.168.10.11 -s 4420




2)      Can you provide the fio configuration file or command line?  Just so we can have more specifics on “bigger block size”.


fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme1n1  --name=read-phase --rw=randwrite


3)      Any details on the HW setup – specifically details on the RDMA NIC (or if you’re using SW RoCE).

Nvmf.conf on target server

[Global]
  Comment "Global section"
    ReactorMask 0xff00

[Rpc]
  Enable No
  Listen 127.0.0.1

[Nvmf]
  MaxQueuesPerSession 8
  MaxQueueDepth 128

[Subsystem1]
  NQN nqn.2016-06.io.spdk:nvme-subsystem-1
  Core 9
  Mode Direct
  Listen RDMA 192.168.10.11:4420
  NVMe 0000:82:00.0
  SN S2PMNAAH400039


           It is RDMA NIC, ConnectX 5, Intel CPU Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
NUMA node0 CPU(s):     0-7
NUMA node1 CPU(s):     8-15




Thanks,

-Jim


From: SPDK <spdk-bounces(a)lists.01.org<mailto:spdk-bounces(a)lists.01.org>> on behalf of Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Date: Thursday, October 5, 2017 at 11:26 AM
To: "spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>" <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: [SPDK] Buffer I/O error on bigger block size running fio

Hi
I have SPDK NVMeoF and keep getting error with bigger block size with fio on randwrite tests.
I am using Ubuntu 16.04 with kernel version 4.12.0-041200-generic on target and client.
The DPDK is 17.08 and SPDK is 17.07.1.
Thanks
Victor


[46905.233553] perf: interrupt took too long (2503 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
[48285.159186] blk_update_request: I/O error, dev nvme1n1, sector 2507351968
[48285.159207] blk_update_request: I/O error, dev nvme1n1, sector 1301294496
[48285.159226] blk_update_request: I/O error, dev nvme1n1, sector 1947371168
[48285.159239] blk_update_request: I/O error, dev nvme1n1, sector 1891797568
[48285.159252] blk_update_request: I/O error, dev nvme1n1, sector 10833824
[48285.159265] blk_update_request: I/O error, dev nvme1n1, sector 614937152
[48285.159277] blk_update_request: I/O error, dev nvme1n1, sector 1872305088
[48285.159290] blk_update_request: I/O error, dev nvme1n1, sector 1504491040
[48285.159299] blk_update_request: I/O error, dev nvme1n1, sector 1182136128
[48285.159308] blk_update_request: I/O error, dev nvme1n1, sector 1662985792
[48285.191185] nvme nvme1: Reconnecting in 10 seconds...
[48285.191254] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191291] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191305] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191314] ldm_validate_partition_table(): Disk read failed.
[48285.191320] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191327] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191335] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191342] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191347] Dev nvme1n1: unable to read RDB block 0
[48285.191353] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191360] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191375] Buffer I/O error on dev nvme1n1, logical block 3, async page read
[48285.191389]  nvme1n1: unable to read partition table
[48285.223197] nvme1n1: detected capacity change from 1600321314816 to 0
[48289.623192] nvme1n1: detected capacity change from 0 to -65647705833078784
[48289.623411] ldm_validate_partition_table(): Disk read failed.
[48289.623447] Dev nvme1n1: unable to read RDB block 0
[48289.623486]  nvme1n1: unable to read partition table
[48289.643305] ldm_validate_partition_table(): Disk read failed.
[48289.643328] Dev nvme1n1: unable to read RDB block 0
[48289.643373]  nvme1n1: unable to read partition table





[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 37548 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [SPDK] Buffer I/O error on bigger block size running fio
@ 2017-10-19  3:59 Cao, Gang
  0 siblings, 0 replies; 24+ messages in thread
From: Cao, Gang @ 2017-10-19  3:59 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 12329 bytes --]

Hi Victor,

I’ve just tried the SPDK v17.07.1 and DPDK v17.08.

nvme version: 1.1.38.gfaab
fio version: 3.1

Tried the 512k and 1024k IO size and there is no error. demsg information as following.

So that there may be other difference here? Looks like you are using ConnectX-5 while I am using ConnectX-4?

Other related information:

[root(a)node4 fio]# lspci | grep -i mell
81:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
81:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]

[root(a)node4 fio]# lsmod | grep -i mlx
mlx5_ib               172032  0
ib_core               200704  15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,mlx5_ib,ib_srpt,ib_ucm,rdma_ucm
mlx5_core             380928  1 mlx5_ib
ptp                    20480  3 ixgbe,igb,mlx5_core

[root(a)node4 fio]# uname -a
Linux node4 4.10.1 #1 SMP Fri Mar 10 15:59:57 CST 2017 x86_64 x86_64 x86_64 GNU/Linux

[577707.543326] nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 192.168.100.8:4420
[577730.854540] detected loopback device
[577730.893761] nvme nvme0: creating 7 I/O queues.
[577730.893797] detected loopback device
[577730.898611] detected loopback device
[577730.908917] detected loopback device
[577730.919073] detected loopback device
[577730.928922] detected loopback device
[577730.938679] detected loopback device
[577730.948365] detected loopback device
[577731.146290] nvme nvme0: new ctrl: NQN "nqn.2016-06.io.spdk:cnode2", addr 192.168.100.8:4420

Thanks,
Gang

From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Thursday, October 19, 2017 9:43 AM
To: Cao, Gang <gang.cao(a)intel.com>; Storage Performance Development Kit <spdk(a)lists.01.org>; Harris, James R <james.r.harris(a)intel.com>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Gang
Any update?
Do you see any error message from “dmesg” with 512k block size running fio?
Thanks
Victor

From: Victor Banh
Sent: Tuesday, October 17, 2017 7:37 PM
To: 'Cao, Gang' <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Gang
spdk-17.07.1 and dpdk-17.08
Thanks
Victor

From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Monday, October 16, 2017 8:51 PM
To: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Victor,

Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?

Thanks,
Gang

From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Tuesday, October 17, 2017 5:30 AM
To: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Cao
Do you see any message from dmesg?

I tried this fio version and still saw these error message from dmesg.

fio-3.1

[869053.218235] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218250] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218259] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218263] ldm_validate_partition_table(): Disk read failed.
[869053.218269] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218277] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218285] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218292] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218296] Dev nvme2n1: unable to read RDB block 0
[869053.218303] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218311] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218323] Buffer I/O error on dev nvme2n1, logical block 3, async page read
[869053.218338]  nvme2n1: unable to read partition table
[869053.246126] nvme2n1: detected capacity change from -62111005559226368 to -62042256479092736
[869053.246195] ldm_validate_partition_table(): Disk read failed.
[869053.246217] Dev nvme2n1: unable to read RDB block 0

From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Monday, October 09, 2017 10:59 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Cc: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Victor,

Thanks for your detailed information on the testing.

I’ve tried the latest SPDK code and with latest fio-3.1-20-g132b and fio-2.19. It seems like no this kind of error.

Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?

fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme0n1  --name=read-phase --rw=randwrite
read-phase: (g=0): rw=randwrite, bs=(R) 512KiB-512KiB, (W) 512KiB-512KiB, (T) 512KiB-512KiB, ioengine=libaio, iodepth=16
...
fio-3.1-20-g132b
Starting 4 processes
Jobs: 4 (f=4): [w(4)][100.0%][r=0KiB/s,w=1592MiB/s][r=0,w=3183 IOPS][eta 00m:00s]
read-phase: (groupid=0, jobs=1): err= 0: pid=46378: Tue Oct 10 01:23:39 2017

My NIC information:
[root(a)node4 nvme-cli-gerrit]# lsmod | grep -i mlx
mlx5_ib               172032  0
ib_core               200704  15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,mlx5_ib,ib_srpt,ib_ucm,rdma_ucm
mlx5_core             380928  1 mlx5_ib
ptp                    20480  3 ixgbe,igb,mlx5_core
[root(a)node4 nvme-cli-gerrit]# lspci | grep -i mell
81:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
81:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]

From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Victor Banh
Sent: Friday, October 6, 2017 2:41 PM
To: Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio



From: Harris, James R [mailto:james.r.harris(a)intel.com]
Sent: Friday, October 06, 2017 2:32 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Cc: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio

(cc Victor)

From: James Harris <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Date: Thursday, October 5, 2017 at 1:59 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio

Hi Victor,

Could you provide a few more details?  This will help the list to provide some ideas.


1)      On the client, are you using the SPDK NVMe-oF initiator or the kernel initiator?


Kernel initiator, run these commands on client server.



modprobe mlx5_ib

modprobe nvme-rdma

nvme discover -t rdma -a 192.168.10.11 -s 4420

nvme connect -t rdma -n nqn.2016-06.io.spdk:nvme-subsystem-1  -a 192.168.10.11 -s 4420




2)      Can you provide the fio configuration file or command line?  Just so we can have more specifics on “bigger block size”.


fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme1n1  --name=read-phase --rw=randwrite


3)      Any details on the HW setup – specifically details on the RDMA NIC (or if you’re using SW RoCE).

Nvmf.conf on target server

[Global]
  Comment "Global section"
    ReactorMask 0xff00

[Rpc]
  Enable No
  Listen 127.0.0.1

[Nvmf]
  MaxQueuesPerSession 8
  MaxQueueDepth 128

[Subsystem1]
  NQN nqn.2016-06.io.spdk:nvme-subsystem-1
  Core 9
  Mode Direct
  Listen RDMA 192.168.10.11:4420
  NVMe 0000:82:00.0
  SN S2PMNAAH400039


           It is RDMA NIC, ConnectX 5, Intel CPU Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
NUMA node0 CPU(s):     0-7
NUMA node1 CPU(s):     8-15




Thanks,

-Jim


From: SPDK <spdk-bounces(a)lists.01.org<mailto:spdk-bounces(a)lists.01.org>> on behalf of Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Date: Thursday, October 5, 2017 at 11:26 AM
To: "spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>" <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: [SPDK] Buffer I/O error on bigger block size running fio

Hi
I have SPDK NVMeoF and keep getting error with bigger block size with fio on randwrite tests.
I am using Ubuntu 16.04 with kernel version 4.12.0-041200-generic on target and client.
The DPDK is 17.08 and SPDK is 17.07.1.
Thanks
Victor


[46905.233553] perf: interrupt took too long (2503 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
[48285.159186] blk_update_request: I/O error, dev nvme1n1, sector 2507351968
[48285.159207] blk_update_request: I/O error, dev nvme1n1, sector 1301294496
[48285.159226] blk_update_request: I/O error, dev nvme1n1, sector 1947371168
[48285.159239] blk_update_request: I/O error, dev nvme1n1, sector 1891797568
[48285.159252] blk_update_request: I/O error, dev nvme1n1, sector 10833824
[48285.159265] blk_update_request: I/O error, dev nvme1n1, sector 614937152
[48285.159277] blk_update_request: I/O error, dev nvme1n1, sector 1872305088
[48285.159290] blk_update_request: I/O error, dev nvme1n1, sector 1504491040
[48285.159299] blk_update_request: I/O error, dev nvme1n1, sector 1182136128
[48285.159308] blk_update_request: I/O error, dev nvme1n1, sector 1662985792
[48285.191185] nvme nvme1: Reconnecting in 10 seconds...
[48285.191254] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191291] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191305] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191314] ldm_validate_partition_table(): Disk read failed.
[48285.191320] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191327] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191335] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191342] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191347] Dev nvme1n1: unable to read RDB block 0
[48285.191353] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191360] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191375] Buffer I/O error on dev nvme1n1, logical block 3, async page read
[48285.191389]  nvme1n1: unable to read partition table
[48285.223197] nvme1n1: detected capacity change from 1600321314816 to 0
[48289.623192] nvme1n1: detected capacity change from 0 to -65647705833078784
[48289.623411] ldm_validate_partition_table(): Disk read failed.
[48289.623447] Dev nvme1n1: unable to read RDB block 0
[48289.623486]  nvme1n1: unable to read partition table
[48289.643305] ldm_validate_partition_table(): Disk read failed.
[48289.643328] Dev nvme1n1: unable to read RDB block 0
[48289.643373]  nvme1n1: unable to read partition table





[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 42044 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [SPDK] Buffer I/O error on bigger block size running fio
@ 2017-10-19  3:05 Cao, Gang
  0 siblings, 0 replies; 24+ messages in thread
From: Cao, Gang @ 2017-10-19  3:05 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 10788 bytes --]

Let me have a try on your version and check the dmesg.

Thanks,
Gang

From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Thursday, October 19, 2017 9:43 AM
To: Cao, Gang <gang.cao(a)intel.com>; Storage Performance Development Kit <spdk(a)lists.01.org>; Harris, James R <james.r.harris(a)intel.com>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Gang
Any update?
Do you see any error message from “dmesg” with 512k block size running fio?
Thanks
Victor

From: Victor Banh
Sent: Tuesday, October 17, 2017 7:37 PM
To: 'Cao, Gang' <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Gang
spdk-17.07.1 and dpdk-17.08
Thanks
Victor

From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Monday, October 16, 2017 8:51 PM
To: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Victor,

Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?

Thanks,
Gang

From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Tuesday, October 17, 2017 5:30 AM
To: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Cao
Do you see any message from dmesg?

I tried this fio version and still saw these error message from dmesg.

fio-3.1

[869053.218235] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218250] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218259] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218263] ldm_validate_partition_table(): Disk read failed.
[869053.218269] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218277] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218285] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218292] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218296] Dev nvme2n1: unable to read RDB block 0
[869053.218303] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218311] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218323] Buffer I/O error on dev nvme2n1, logical block 3, async page read
[869053.218338]  nvme2n1: unable to read partition table
[869053.246126] nvme2n1: detected capacity change from -62111005559226368 to -62042256479092736
[869053.246195] ldm_validate_partition_table(): Disk read failed.
[869053.246217] Dev nvme2n1: unable to read RDB block 0

From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Monday, October 09, 2017 10:59 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Cc: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Victor,

Thanks for your detailed information on the testing.

I’ve tried the latest SPDK code and with latest fio-3.1-20-g132b and fio-2.19. It seems like no this kind of error.

Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?

fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme0n1  --name=read-phase --rw=randwrite
read-phase: (g=0): rw=randwrite, bs=(R) 512KiB-512KiB, (W) 512KiB-512KiB, (T) 512KiB-512KiB, ioengine=libaio, iodepth=16
...
fio-3.1-20-g132b
Starting 4 processes
Jobs: 4 (f=4): [w(4)][100.0%][r=0KiB/s,w=1592MiB/s][r=0,w=3183 IOPS][eta 00m:00s]
read-phase: (groupid=0, jobs=1): err= 0: pid=46378: Tue Oct 10 01:23:39 2017

My NIC information:
[root(a)node4 nvme-cli-gerrit]# lsmod | grep -i mlx
mlx5_ib               172032  0
ib_core               200704  15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,mlx5_ib,ib_srpt,ib_ucm,rdma_ucm
mlx5_core             380928  1 mlx5_ib
ptp                    20480  3 ixgbe,igb,mlx5_core
[root(a)node4 nvme-cli-gerrit]# lspci | grep -i mell
81:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
81:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]

From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Victor Banh
Sent: Friday, October 6, 2017 2:41 PM
To: Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio



From: Harris, James R [mailto:james.r.harris(a)intel.com]
Sent: Friday, October 06, 2017 2:32 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Cc: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio

(cc Victor)

From: James Harris <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Date: Thursday, October 5, 2017 at 1:59 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio

Hi Victor,

Could you provide a few more details?  This will help the list to provide some ideas.


1)      On the client, are you using the SPDK NVMe-oF initiator or the kernel initiator?


Kernel initiator, run these commands on client server.



modprobe mlx5_ib

modprobe nvme-rdma

nvme discover -t rdma -a 192.168.10.11 -s 4420

nvme connect -t rdma -n nqn.2016-06.io.spdk:nvme-subsystem-1  -a 192.168.10.11 -s 4420




2)      Can you provide the fio configuration file or command line?  Just so we can have more specifics on “bigger block size”.


fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme1n1  --name=read-phase --rw=randwrite


3)      Any details on the HW setup – specifically details on the RDMA NIC (or if you’re using SW RoCE).

Nvmf.conf on target server

[Global]
  Comment "Global section"
    ReactorMask 0xff00

[Rpc]
  Enable No
  Listen 127.0.0.1

[Nvmf]
  MaxQueuesPerSession 8
  MaxQueueDepth 128

[Subsystem1]
  NQN nqn.2016-06.io.spdk:nvme-subsystem-1
  Core 9
  Mode Direct
  Listen RDMA 192.168.10.11:4420
  NVMe 0000:82:00.0
  SN S2PMNAAH400039


           It is RDMA NIC, ConnectX 5, Intel CPU Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
NUMA node0 CPU(s):     0-7
NUMA node1 CPU(s):     8-15




Thanks,

-Jim


From: SPDK <spdk-bounces(a)lists.01.org<mailto:spdk-bounces(a)lists.01.org>> on behalf of Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Date: Thursday, October 5, 2017 at 11:26 AM
To: "spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>" <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: [SPDK] Buffer I/O error on bigger block size running fio

Hi
I have SPDK NVMeoF and keep getting error with bigger block size with fio on randwrite tests.
I am using Ubuntu 16.04 with kernel version 4.12.0-041200-generic on target and client.
The DPDK is 17.08 and SPDK is 17.07.1.
Thanks
Victor


[46905.233553] perf: interrupt took too long (2503 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
[48285.159186] blk_update_request: I/O error, dev nvme1n1, sector 2507351968
[48285.159207] blk_update_request: I/O error, dev nvme1n1, sector 1301294496
[48285.159226] blk_update_request: I/O error, dev nvme1n1, sector 1947371168
[48285.159239] blk_update_request: I/O error, dev nvme1n1, sector 1891797568
[48285.159252] blk_update_request: I/O error, dev nvme1n1, sector 10833824
[48285.159265] blk_update_request: I/O error, dev nvme1n1, sector 614937152
[48285.159277] blk_update_request: I/O error, dev nvme1n1, sector 1872305088
[48285.159290] blk_update_request: I/O error, dev nvme1n1, sector 1504491040
[48285.159299] blk_update_request: I/O error, dev nvme1n1, sector 1182136128
[48285.159308] blk_update_request: I/O error, dev nvme1n1, sector 1662985792
[48285.191185] nvme nvme1: Reconnecting in 10 seconds...
[48285.191254] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191291] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191305] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191314] ldm_validate_partition_table(): Disk read failed.
[48285.191320] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191327] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191335] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191342] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191347] Dev nvme1n1: unable to read RDB block 0
[48285.191353] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191360] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191375] Buffer I/O error on dev nvme1n1, logical block 3, async page read
[48285.191389]  nvme1n1: unable to read partition table
[48285.223197] nvme1n1: detected capacity change from 1600321314816 to 0
[48289.623192] nvme1n1: detected capacity change from 0 to -65647705833078784
[48289.623411] ldm_validate_partition_table(): Disk read failed.
[48289.623447] Dev nvme1n1: unable to read RDB block 0
[48289.623486]  nvme1n1: unable to read partition table
[48289.643305] ldm_validate_partition_table(): Disk read failed.
[48289.643328] Dev nvme1n1: unable to read RDB block 0
[48289.643373]  nvme1n1: unable to read partition table





[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 36462 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [SPDK] Buffer I/O error on bigger block size running fio
@ 2017-10-19  1:43 Victor Banh
  0 siblings, 0 replies; 24+ messages in thread
From: Victor Banh @ 2017-10-19  1:43 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 10309 bytes --]

Hi Gang
Any update?
Do you see any error message from “dmesg” with 512k block size running fio?
Thanks
Victor

From: Victor Banh
Sent: Tuesday, October 17, 2017 7:37 PM
To: 'Cao, Gang' <gang.cao(a)intel.com>; Storage Performance Development Kit <spdk(a)lists.01.org>; Harris, James R <james.r.harris(a)intel.com>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Gang
spdk-17.07.1 and dpdk-17.08
Thanks
Victor

From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Monday, October 16, 2017 8:51 PM
To: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Victor,

Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?

Thanks,
Gang

From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Tuesday, October 17, 2017 5:30 AM
To: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Cao
Do you see any message from dmesg?

I tried this fio version and still saw these error message from dmesg.

fio-3.1

[869053.218235] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218250] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218259] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218263] ldm_validate_partition_table(): Disk read failed.
[869053.218269] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218277] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218285] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218292] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218296] Dev nvme2n1: unable to read RDB block 0
[869053.218303] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218311] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218323] Buffer I/O error on dev nvme2n1, logical block 3, async page read
[869053.218338]  nvme2n1: unable to read partition table
[869053.246126] nvme2n1: detected capacity change from -62111005559226368 to -62042256479092736
[869053.246195] ldm_validate_partition_table(): Disk read failed.
[869053.246217] Dev nvme2n1: unable to read RDB block 0

From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Monday, October 09, 2017 10:59 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Cc: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Victor,

Thanks for your detailed information on the testing.

I’ve tried the latest SPDK code and with latest fio-3.1-20-g132b and fio-2.19. It seems like no this kind of error.

Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?

fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme0n1  --name=read-phase --rw=randwrite
read-phase: (g=0): rw=randwrite, bs=(R) 512KiB-512KiB, (W) 512KiB-512KiB, (T) 512KiB-512KiB, ioengine=libaio, iodepth=16
...
fio-3.1-20-g132b
Starting 4 processes
Jobs: 4 (f=4): [w(4)][100.0%][r=0KiB/s,w=1592MiB/s][r=0,w=3183 IOPS][eta 00m:00s]
read-phase: (groupid=0, jobs=1): err= 0: pid=46378: Tue Oct 10 01:23:39 2017

My NIC information:
[root(a)node4 nvme-cli-gerrit]# lsmod | grep -i mlx
mlx5_ib               172032  0
ib_core               200704  15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,mlx5_ib,ib_srpt,ib_ucm,rdma_ucm
mlx5_core             380928  1 mlx5_ib
ptp                    20480  3 ixgbe,igb,mlx5_core
[root(a)node4 nvme-cli-gerrit]# lspci | grep -i mell
81:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
81:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]

From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Victor Banh
Sent: Friday, October 6, 2017 2:41 PM
To: Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio



From: Harris, James R [mailto:james.r.harris(a)intel.com]
Sent: Friday, October 06, 2017 2:32 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Cc: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio

(cc Victor)

From: James Harris <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Date: Thursday, October 5, 2017 at 1:59 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio

Hi Victor,

Could you provide a few more details?  This will help the list to provide some ideas.


1)     On the client, are you using the SPDK NVMe-oF initiator or the kernel initiator?


Kernel initiator, run these commands on client server.



modprobe mlx5_ib

modprobe nvme-rdma

nvme discover -t rdma -a 192.168.10.11 -s 4420

nvme connect -t rdma -n nqn.2016-06.io.spdk:nvme-subsystem-1  -a 192.168.10.11 -s 4420




2)     Can you provide the fio configuration file or command line?  Just so we can have more specifics on “bigger block size”.


fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme1n1  --name=read-phase --rw=randwrite


3)     Any details on the HW setup – specifically details on the RDMA NIC (or if you’re using SW RoCE).

Nvmf.conf on target server

[Global]
  Comment "Global section"
    ReactorMask 0xff00

[Rpc]
  Enable No
  Listen 127.0.0.1

[Nvmf]
  MaxQueuesPerSession 8
  MaxQueueDepth 128

[Subsystem1]
  NQN nqn.2016-06.io.spdk:nvme-subsystem-1
  Core 9
  Mode Direct
  Listen RDMA 192.168.10.11:4420
  NVMe 0000:82:00.0
  SN S2PMNAAH400039


           It is RDMA NIC, ConnectX 5, Intel CPU Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
NUMA node0 CPU(s):     0-7
NUMA node1 CPU(s):     8-15




Thanks,

-Jim


From: SPDK <spdk-bounces(a)lists.01.org<mailto:spdk-bounces(a)lists.01.org>> on behalf of Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Date: Thursday, October 5, 2017 at 11:26 AM
To: "spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>" <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: [SPDK] Buffer I/O error on bigger block size running fio

Hi
I have SPDK NVMeoF and keep getting error with bigger block size with fio on randwrite tests.
I am using Ubuntu 16.04 with kernel version 4.12.0-041200-generic on target and client.
The DPDK is 17.08 and SPDK is 17.07.1.
Thanks
Victor


[46905.233553] perf: interrupt took too long (2503 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
[48285.159186] blk_update_request: I/O error, dev nvme1n1, sector 2507351968
[48285.159207] blk_update_request: I/O error, dev nvme1n1, sector 1301294496
[48285.159226] blk_update_request: I/O error, dev nvme1n1, sector 1947371168
[48285.159239] blk_update_request: I/O error, dev nvme1n1, sector 1891797568
[48285.159252] blk_update_request: I/O error, dev nvme1n1, sector 10833824
[48285.159265] blk_update_request: I/O error, dev nvme1n1, sector 614937152
[48285.159277] blk_update_request: I/O error, dev nvme1n1, sector 1872305088
[48285.159290] blk_update_request: I/O error, dev nvme1n1, sector 1504491040
[48285.159299] blk_update_request: I/O error, dev nvme1n1, sector 1182136128
[48285.159308] blk_update_request: I/O error, dev nvme1n1, sector 1662985792
[48285.191185] nvme nvme1: Reconnecting in 10 seconds...
[48285.191254] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191291] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191305] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191314] ldm_validate_partition_table(): Disk read failed.
[48285.191320] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191327] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191335] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191342] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191347] Dev nvme1n1: unable to read RDB block 0
[48285.191353] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191360] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191375] Buffer I/O error on dev nvme1n1, logical block 3, async page read
[48285.191389]  nvme1n1: unable to read partition table
[48285.223197] nvme1n1: detected capacity change from 1600321314816 to 0
[48289.623192] nvme1n1: detected capacity change from 0 to -65647705833078784
[48289.623411] ldm_validate_partition_table(): Disk read failed.
[48289.623447] Dev nvme1n1: unable to read RDB block 0
[48289.623486]  nvme1n1: unable to read partition table
[48289.643305] ldm_validate_partition_table(): Disk read failed.
[48289.643328] Dev nvme1n1: unable to read RDB block 0
[48289.643373]  nvme1n1: unable to read partition table





[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 38136 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [SPDK] Buffer I/O error on bigger block size running fio
@ 2017-10-18  2:37 Victor Banh
  0 siblings, 0 replies; 24+ messages in thread
From: Victor Banh @ 2017-10-18  2:37 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 9816 bytes --]

Hi Gang
spdk-17.07.1 and dpdk-17.08
Thanks
Victor

From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Monday, October 16, 2017 8:51 PM
To: Victor Banh <victorb(a)mellanox.com>; Storage Performance Development Kit <spdk(a)lists.01.org>; Harris, James R <james.r.harris(a)intel.com>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Victor,

Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?

Thanks,
Gang

From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Tuesday, October 17, 2017 5:30 AM
To: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Cao
Do you see any message from dmesg?

I tried this fio version and still saw these error message from dmesg.

fio-3.1

[869053.218235] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218250] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218259] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218263] ldm_validate_partition_table(): Disk read failed.
[869053.218269] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218277] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218285] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218292] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218296] Dev nvme2n1: unable to read RDB block 0
[869053.218303] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218311] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218323] Buffer I/O error on dev nvme2n1, logical block 3, async page read
[869053.218338]  nvme2n1: unable to read partition table
[869053.246126] nvme2n1: detected capacity change from -62111005559226368 to -62042256479092736
[869053.246195] ldm_validate_partition_table(): Disk read failed.
[869053.246217] Dev nvme2n1: unable to read RDB block 0

From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Monday, October 09, 2017 10:59 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Cc: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Victor,

Thanks for your detailed information on the testing.

I’ve tried the latest SPDK code and with latest fio-3.1-20-g132b and fio-2.19. It seems like no this kind of error.

Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?

fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme0n1  --name=read-phase --rw=randwrite
read-phase: (g=0): rw=randwrite, bs=(R) 512KiB-512KiB, (W) 512KiB-512KiB, (T) 512KiB-512KiB, ioengine=libaio, iodepth=16
...
fio-3.1-20-g132b
Starting 4 processes
Jobs: 4 (f=4): [w(4)][100.0%][r=0KiB/s,w=1592MiB/s][r=0,w=3183 IOPS][eta 00m:00s]
read-phase: (groupid=0, jobs=1): err= 0: pid=46378: Tue Oct 10 01:23:39 2017

My NIC information:
[root(a)node4 nvme-cli-gerrit]# lsmod | grep -i mlx
mlx5_ib               172032  0
ib_core               200704  15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,mlx5_ib,ib_srpt,ib_ucm,rdma_ucm
mlx5_core             380928  1 mlx5_ib
ptp                    20480  3 ixgbe,igb,mlx5_core
[root(a)node4 nvme-cli-gerrit]# lspci | grep -i mell
81:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
81:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]

From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Victor Banh
Sent: Friday, October 6, 2017 2:41 PM
To: Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio



From: Harris, James R [mailto:james.r.harris(a)intel.com]
Sent: Friday, October 06, 2017 2:32 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Cc: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio

(cc Victor)

From: James Harris <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Date: Thursday, October 5, 2017 at 1:59 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio

Hi Victor,

Could you provide a few more details?  This will help the list to provide some ideas.


1)     On the client, are you using the SPDK NVMe-oF initiator or the kernel initiator?


Kernel initiator, run these commands on client server.



modprobe mlx5_ib

modprobe nvme-rdma

nvme discover -t rdma -a 192.168.10.11 -s 4420

nvme connect -t rdma -n nqn.2016-06.io.spdk:nvme-subsystem-1  -a 192.168.10.11 -s 4420




2)     Can you provide the fio configuration file or command line?  Just so we can have more specifics on “bigger block size”.


fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme1n1  --name=read-phase --rw=randwrite


3)     Any details on the HW setup – specifically details on the RDMA NIC (or if you’re using SW RoCE).

Nvmf.conf on target server

[Global]
  Comment "Global section"
    ReactorMask 0xff00

[Rpc]
  Enable No
  Listen 127.0.0.1

[Nvmf]
  MaxQueuesPerSession 8
  MaxQueueDepth 128

[Subsystem1]
  NQN nqn.2016-06.io.spdk:nvme-subsystem-1
  Core 9
  Mode Direct
  Listen RDMA 192.168.10.11:4420
  NVMe 0000:82:00.0
  SN S2PMNAAH400039


           It is RDMA NIC, ConnectX 5, Intel CPU Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
NUMA node0 CPU(s):     0-7
NUMA node1 CPU(s):     8-15




Thanks,

-Jim


From: SPDK <spdk-bounces(a)lists.01.org<mailto:spdk-bounces(a)lists.01.org>> on behalf of Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Date: Thursday, October 5, 2017 at 11:26 AM
To: "spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>" <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: [SPDK] Buffer I/O error on bigger block size running fio

Hi
I have SPDK NVMeoF and keep getting error with bigger block size with fio on randwrite tests.
I am using Ubuntu 16.04 with kernel version 4.12.0-041200-generic on target and client.
The DPDK is 17.08 and SPDK is 17.07.1.
Thanks
Victor


[46905.233553] perf: interrupt took too long (2503 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
[48285.159186] blk_update_request: I/O error, dev nvme1n1, sector 2507351968
[48285.159207] blk_update_request: I/O error, dev nvme1n1, sector 1301294496
[48285.159226] blk_update_request: I/O error, dev nvme1n1, sector 1947371168
[48285.159239] blk_update_request: I/O error, dev nvme1n1, sector 1891797568
[48285.159252] blk_update_request: I/O error, dev nvme1n1, sector 10833824
[48285.159265] blk_update_request: I/O error, dev nvme1n1, sector 614937152
[48285.159277] blk_update_request: I/O error, dev nvme1n1, sector 1872305088
[48285.159290] blk_update_request: I/O error, dev nvme1n1, sector 1504491040
[48285.159299] blk_update_request: I/O error, dev nvme1n1, sector 1182136128
[48285.159308] blk_update_request: I/O error, dev nvme1n1, sector 1662985792
[48285.191185] nvme nvme1: Reconnecting in 10 seconds...
[48285.191254] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191291] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191305] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191314] ldm_validate_partition_table(): Disk read failed.
[48285.191320] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191327] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191335] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191342] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191347] Dev nvme1n1: unable to read RDB block 0
[48285.191353] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191360] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191375] Buffer I/O error on dev nvme1n1, logical block 3, async page read
[48285.191389]  nvme1n1: unable to read partition table
[48285.223197] nvme1n1: detected capacity change from 1600321314816 to 0
[48289.623192] nvme1n1: detected capacity change from 0 to -65647705833078784
[48289.623411] ldm_validate_partition_table(): Disk read failed.
[48289.623447] Dev nvme1n1: unable to read RDB block 0
[48289.623486]  nvme1n1: unable to read partition table
[48289.643305] ldm_validate_partition_table(): Disk read failed.
[48289.643328] Dev nvme1n1: unable to read RDB block 0
[48289.643373]  nvme1n1: unable to read partition table





[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 37005 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [SPDK] Buffer I/O error on bigger block size running fio
@ 2017-10-17  3:51 Cao, Gang
  0 siblings, 0 replies; 24+ messages in thread
From: Cao, Gang @ 2017-10-17  3:51 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 9364 bytes --]

Hi Victor,

Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?

Thanks,
Gang

From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Tuesday, October 17, 2017 5:30 AM
To: Cao, Gang <gang.cao(a)intel.com>; Storage Performance Development Kit <spdk(a)lists.01.org>; Harris, James R <james.r.harris(a)intel.com>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Cao
Do you see any message from dmesg?

I tried this fio version and still saw these error message from dmesg.

fio-3.1

[869053.218235] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218250] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218259] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218263] ldm_validate_partition_table(): Disk read failed.
[869053.218269] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218277] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218285] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218292] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218296] Dev nvme2n1: unable to read RDB block 0
[869053.218303] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218311] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218323] Buffer I/O error on dev nvme2n1, logical block 3, async page read
[869053.218338]  nvme2n1: unable to read partition table
[869053.246126] nvme2n1: detected capacity change from -62111005559226368 to -62042256479092736
[869053.246195] ldm_validate_partition_table(): Disk read failed.
[869053.246217] Dev nvme2n1: unable to read RDB block 0

From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Monday, October 09, 2017 10:59 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Cc: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Victor,

Thanks for your detailed information on the testing.

I’ve tried the latest SPDK code and with latest fio-3.1-20-g132b and fio-2.19. It seems like no this kind of error.

Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?

fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme0n1  --name=read-phase --rw=randwrite
read-phase: (g=0): rw=randwrite, bs=(R) 512KiB-512KiB, (W) 512KiB-512KiB, (T) 512KiB-512KiB, ioengine=libaio, iodepth=16
...
fio-3.1-20-g132b
Starting 4 processes
Jobs: 4 (f=4): [w(4)][100.0%][r=0KiB/s,w=1592MiB/s][r=0,w=3183 IOPS][eta 00m:00s]
read-phase: (groupid=0, jobs=1): err= 0: pid=46378: Tue Oct 10 01:23:39 2017

My NIC information:
[root(a)node4 nvme-cli-gerrit]# lsmod | grep -i mlx
mlx5_ib               172032  0
ib_core               200704  15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,mlx5_ib,ib_srpt,ib_ucm,rdma_ucm
mlx5_core             380928  1 mlx5_ib
ptp                    20480  3 ixgbe,igb,mlx5_core
[root(a)node4 nvme-cli-gerrit]# lspci | grep -i mell
81:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
81:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]

From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Victor Banh
Sent: Friday, October 6, 2017 2:41 PM
To: Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio



From: Harris, James R [mailto:james.r.harris(a)intel.com]
Sent: Friday, October 06, 2017 2:32 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Cc: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio

(cc Victor)

From: James Harris <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Date: Thursday, October 5, 2017 at 1:59 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio

Hi Victor,

Could you provide a few more details?  This will help the list to provide some ideas.


1)      On the client, are you using the SPDK NVMe-oF initiator or the kernel initiator?


Kernel initiator, run these commands on client server.



modprobe mlx5_ib

modprobe nvme-rdma

nvme discover -t rdma -a 192.168.10.11 -s 4420

nvme connect -t rdma -n nqn.2016-06.io.spdk:nvme-subsystem-1  -a 192.168.10.11 -s 4420




2)      Can you provide the fio configuration file or command line?  Just so we can have more specifics on “bigger block size”.


fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme1n1  --name=read-phase --rw=randwrite


3)      Any details on the HW setup – specifically details on the RDMA NIC (or if you’re using SW RoCE).

Nvmf.conf on target server

[Global]
  Comment "Global section"
    ReactorMask 0xff00

[Rpc]
  Enable No
  Listen 127.0.0.1

[Nvmf]
  MaxQueuesPerSession 8
  MaxQueueDepth 128

[Subsystem1]
  NQN nqn.2016-06.io.spdk:nvme-subsystem-1
  Core 9
  Mode Direct
  Listen RDMA 192.168.10.11:4420
  NVMe 0000:82:00.0
  SN S2PMNAAH400039


           It is RDMA NIC, ConnectX 5, Intel CPU Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
NUMA node0 CPU(s):     0-7
NUMA node1 CPU(s):     8-15




Thanks,

-Jim


From: SPDK <spdk-bounces(a)lists.01.org<mailto:spdk-bounces(a)lists.01.org>> on behalf of Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Date: Thursday, October 5, 2017 at 11:26 AM
To: "spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>" <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: [SPDK] Buffer I/O error on bigger block size running fio

Hi
I have SPDK NVMeoF and keep getting error with bigger block size with fio on randwrite tests.
I am using Ubuntu 16.04 with kernel version 4.12.0-041200-generic on target and client.
The DPDK is 17.08 and SPDK is 17.07.1.
Thanks
Victor


[46905.233553] perf: interrupt took too long (2503 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
[48285.159186] blk_update_request: I/O error, dev nvme1n1, sector 2507351968
[48285.159207] blk_update_request: I/O error, dev nvme1n1, sector 1301294496
[48285.159226] blk_update_request: I/O error, dev nvme1n1, sector 1947371168
[48285.159239] blk_update_request: I/O error, dev nvme1n1, sector 1891797568
[48285.159252] blk_update_request: I/O error, dev nvme1n1, sector 10833824
[48285.159265] blk_update_request: I/O error, dev nvme1n1, sector 614937152
[48285.159277] blk_update_request: I/O error, dev nvme1n1, sector 1872305088
[48285.159290] blk_update_request: I/O error, dev nvme1n1, sector 1504491040
[48285.159299] blk_update_request: I/O error, dev nvme1n1, sector 1182136128
[48285.159308] blk_update_request: I/O error, dev nvme1n1, sector 1662985792
[48285.191185] nvme nvme1: Reconnecting in 10 seconds...
[48285.191254] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191291] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191305] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191314] ldm_validate_partition_table(): Disk read failed.
[48285.191320] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191327] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191335] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191342] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191347] Dev nvme1n1: unable to read RDB block 0
[48285.191353] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191360] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191375] Buffer I/O error on dev nvme1n1, logical block 3, async page read
[48285.191389]  nvme1n1: unable to read partition table
[48285.223197] nvme1n1: detected capacity change from 1600321314816 to 0
[48289.623192] nvme1n1: detected capacity change from 0 to -65647705833078784
[48289.623411] ldm_validate_partition_table(): Disk read failed.
[48289.623447] Dev nvme1n1: unable to read RDB block 0
[48289.623486]  nvme1n1: unable to read partition table
[48289.643305] ldm_validate_partition_table(): Disk read failed.
[48289.643328] Dev nvme1n1: unable to read RDB block 0
[48289.643373]  nvme1n1: unable to read partition table





[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 32455 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [SPDK] Buffer I/O error on bigger block size running fio
@ 2017-10-16 21:30 Victor Banh
  0 siblings, 0 replies; 24+ messages in thread
From: Victor Banh @ 2017-10-16 21:30 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 8791 bytes --]

Hi Cao
Do you see any message from dmesg?

I tried this fio version and still saw these error message from dmesg.

fio-3.1

[869053.218235] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218250] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218259] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218263] ldm_validate_partition_table(): Disk read failed.
[869053.218269] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218277] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218285] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218292] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218296] Dev nvme2n1: unable to read RDB block 0
[869053.218303] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218311] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218323] Buffer I/O error on dev nvme2n1, logical block 3, async page read
[869053.218338]  nvme2n1: unable to read partition table
[869053.246126] nvme2n1: detected capacity change from -62111005559226368 to -62042256479092736
[869053.246195] ldm_validate_partition_table(): Disk read failed.
[869053.246217] Dev nvme2n1: unable to read RDB block 0

From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Monday, October 09, 2017 10:59 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org>; Harris, James R <james.r.harris(a)intel.com>
Cc: Victor Banh <victorb(a)mellanox.com>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio

Hi Victor,

Thanks for your detailed information on the testing.

I’ve tried the latest SPDK code and with latest fio-3.1-20-g132b and fio-2.19. It seems like no this kind of error.

Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?

fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme0n1  --name=read-phase --rw=randwrite
read-phase: (g=0): rw=randwrite, bs=(R) 512KiB-512KiB, (W) 512KiB-512KiB, (T) 512KiB-512KiB, ioengine=libaio, iodepth=16
...
fio-3.1-20-g132b
Starting 4 processes
Jobs: 4 (f=4): [w(4)][100.0%][r=0KiB/s,w=1592MiB/s][r=0,w=3183 IOPS][eta 00m:00s]
read-phase: (groupid=0, jobs=1): err= 0: pid=46378: Tue Oct 10 01:23:39 2017

My NIC information:
[root(a)node4 nvme-cli-gerrit]# lsmod | grep -i mlx
mlx5_ib               172032  0
ib_core               200704  15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,mlx5_ib,ib_srpt,ib_ucm,rdma_ucm
mlx5_core             380928  1 mlx5_ib
ptp                    20480  3 ixgbe,igb,mlx5_core
[root(a)node4 nvme-cli-gerrit]# lspci | grep -i mell
81:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
81:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]

From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Victor Banh
Sent: Friday, October 6, 2017 2:41 PM
To: Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio



From: Harris, James R [mailto:james.r.harris(a)intel.com]
Sent: Friday, October 06, 2017 2:32 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Cc: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio

(cc Victor)

From: James Harris <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Date: Thursday, October 5, 2017 at 1:59 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio

Hi Victor,

Could you provide a few more details?  This will help the list to provide some ideas.


1)     On the client, are you using the SPDK NVMe-oF initiator or the kernel initiator?


Kernel initiator, run these commands on client server.



modprobe mlx5_ib

modprobe nvme-rdma

nvme discover -t rdma -a 192.168.10.11 -s 4420

nvme connect -t rdma -n nqn.2016-06.io.spdk:nvme-subsystem-1  -a 192.168.10.11 -s 4420




2)     Can you provide the fio configuration file or command line?  Just so we can have more specifics on “bigger block size”.


fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme1n1  --name=read-phase --rw=randwrite


3)     Any details on the HW setup – specifically details on the RDMA NIC (or if you’re using SW RoCE).

Nvmf.conf on target server

[Global]
  Comment "Global section"
    ReactorMask 0xff00

[Rpc]
  Enable No
  Listen 127.0.0.1

[Nvmf]
  MaxQueuesPerSession 8
  MaxQueueDepth 128

[Subsystem1]
  NQN nqn.2016-06.io.spdk:nvme-subsystem-1
  Core 9
  Mode Direct
  Listen RDMA 192.168.10.11:4420
  NVMe 0000:82:00.0
  SN S2PMNAAH400039


           It is RDMA NIC, ConnectX 5, Intel CPU Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
NUMA node0 CPU(s):     0-7
NUMA node1 CPU(s):     8-15




Thanks,

-Jim


From: SPDK <spdk-bounces(a)lists.01.org<mailto:spdk-bounces(a)lists.01.org>> on behalf of Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Date: Thursday, October 5, 2017 at 11:26 AM
To: "spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>" <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: [SPDK] Buffer I/O error on bigger block size running fio

Hi
I have SPDK NVMeoF and keep getting error with bigger block size with fio on randwrite tests.
I am using Ubuntu 16.04 with kernel version 4.12.0-041200-generic on target and client.
The DPDK is 17.08 and SPDK is 17.07.1.
Thanks
Victor


[46905.233553] perf: interrupt took too long (2503 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
[48285.159186] blk_update_request: I/O error, dev nvme1n1, sector 2507351968
[48285.159207] blk_update_request: I/O error, dev nvme1n1, sector 1301294496
[48285.159226] blk_update_request: I/O error, dev nvme1n1, sector 1947371168
[48285.159239] blk_update_request: I/O error, dev nvme1n1, sector 1891797568
[48285.159252] blk_update_request: I/O error, dev nvme1n1, sector 10833824
[48285.159265] blk_update_request: I/O error, dev nvme1n1, sector 614937152
[48285.159277] blk_update_request: I/O error, dev nvme1n1, sector 1872305088
[48285.159290] blk_update_request: I/O error, dev nvme1n1, sector 1504491040
[48285.159299] blk_update_request: I/O error, dev nvme1n1, sector 1182136128
[48285.159308] blk_update_request: I/O error, dev nvme1n1, sector 1662985792
[48285.191185] nvme nvme1: Reconnecting in 10 seconds...
[48285.191254] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191291] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191305] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191314] ldm_validate_partition_table(): Disk read failed.
[48285.191320] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191327] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191335] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191342] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191347] Dev nvme1n1: unable to read RDB block 0
[48285.191353] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191360] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191375] Buffer I/O error on dev nvme1n1, logical block 3, async page read
[48285.191389]  nvme1n1: unable to read partition table
[48285.223197] nvme1n1: detected capacity change from 1600321314816 to 0
[48289.623192] nvme1n1: detected capacity change from 0 to -65647705833078784
[48289.623411] ldm_validate_partition_table(): Disk read failed.
[48289.623447] Dev nvme1n1: unable to read RDB block 0
[48289.623486]  nvme1n1: unable to read partition table
[48289.643305] ldm_validate_partition_table(): Disk read failed.
[48289.643328] Dev nvme1n1: unable to read RDB block 0
[48289.643373]  nvme1n1: unable to read partition table





[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 32425 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [SPDK] Buffer I/O error on bigger block size running fio
@ 2017-10-09 17:58 Cao, Gang
  0 siblings, 0 replies; 24+ messages in thread
From: Cao, Gang @ 2017-10-09 17:58 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 7053 bytes --]

Hi Victor,

Thanks for your detailed information on the testing.

I’ve tried the latest SPDK code and with latest fio-3.1-20-g132b and fio-2.19. It seems like no this kind of error.

Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?

fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme0n1  --name=read-phase --rw=randwrite
read-phase: (g=0): rw=randwrite, bs=(R) 512KiB-512KiB, (W) 512KiB-512KiB, (T) 512KiB-512KiB, ioengine=libaio, iodepth=16
...
fio-3.1-20-g132b
Starting 4 processes
Jobs: 4 (f=4): [w(4)][100.0%][r=0KiB/s,w=1592MiB/s][r=0,w=3183 IOPS][eta 00m:00s]
read-phase: (groupid=0, jobs=1): err= 0: pid=46378: Tue Oct 10 01:23:39 2017

My NIC information:
[root(a)node4 nvme-cli-gerrit]# lsmod | grep -i mlx
mlx5_ib               172032  0
ib_core               200704  15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,mlx5_ib,ib_srpt,ib_ucm,rdma_ucm
mlx5_core             380928  1 mlx5_ib
ptp                    20480  3 ixgbe,igb,mlx5_core
[root(a)node4 nvme-cli-gerrit]# lspci | grep -i mell
81:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
81:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]

From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Victor Banh
Sent: Friday, October 6, 2017 2:41 PM
To: Harris, James R <james.r.harris(a)intel.com>; Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio



From: Harris, James R [mailto:james.r.harris(a)intel.com]
Sent: Friday, October 06, 2017 2:32 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Cc: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio

(cc Victor)

From: James Harris <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Date: Thursday, October 5, 2017 at 1:59 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio

Hi Victor,

Could you provide a few more details?  This will help the list to provide some ideas.


1)      On the client, are you using the SPDK NVMe-oF initiator or the kernel initiator?


Kernel initiator, run these commands on client server.



modprobe mlx5_ib

modprobe nvme-rdma

nvme discover -t rdma -a 192.168.10.11 -s 4420

nvme connect -t rdma -n nqn.2016-06.io.spdk:nvme-subsystem-1  -a 192.168.10.11 -s 4420




2)      Can you provide the fio configuration file or command line?  Just so we can have more specifics on “bigger block size”.


fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme1n1  --name=read-phase --rw=randwrite


3)      Any details on the HW setup – specifically details on the RDMA NIC (or if you’re using SW RoCE).

Nvmf.conf on target server

[Global]
  Comment "Global section"
    ReactorMask 0xff00

[Rpc]
  Enable No
  Listen 127.0.0.1

[Nvmf]
  MaxQueuesPerSession 8
  MaxQueueDepth 128

[Subsystem1]
  NQN nqn.2016-06.io.spdk:nvme-subsystem-1
  Core 9
  Mode Direct
  Listen RDMA 192.168.10.11:4420
  NVMe 0000:82:00.0
  SN S2PMNAAH400039


           It is RDMA NIC, ConnectX 5, Intel CPU Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
NUMA node0 CPU(s):     0-7
NUMA node1 CPU(s):     8-15




Thanks,

-Jim


From: SPDK <spdk-bounces(a)lists.01.org<mailto:spdk-bounces(a)lists.01.org>> on behalf of Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Date: Thursday, October 5, 2017 at 11:26 AM
To: "spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>" <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: [SPDK] Buffer I/O error on bigger block size running fio

Hi
I have SPDK NVMeoF and keep getting error with bigger block size with fio on randwrite tests.
I am using Ubuntu 16.04 with kernel version 4.12.0-041200-generic on target and client.
The DPDK is 17.08 and SPDK is 17.07.1.
Thanks
Victor


[46905.233553] perf: interrupt took too long (2503 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
[48285.159186] blk_update_request: I/O error, dev nvme1n1, sector 2507351968
[48285.159207] blk_update_request: I/O error, dev nvme1n1, sector 1301294496
[48285.159226] blk_update_request: I/O error, dev nvme1n1, sector 1947371168
[48285.159239] blk_update_request: I/O error, dev nvme1n1, sector 1891797568
[48285.159252] blk_update_request: I/O error, dev nvme1n1, sector 10833824
[48285.159265] blk_update_request: I/O error, dev nvme1n1, sector 614937152
[48285.159277] blk_update_request: I/O error, dev nvme1n1, sector 1872305088
[48285.159290] blk_update_request: I/O error, dev nvme1n1, sector 1504491040
[48285.159299] blk_update_request: I/O error, dev nvme1n1, sector 1182136128
[48285.159308] blk_update_request: I/O error, dev nvme1n1, sector 1662985792
[48285.191185] nvme nvme1: Reconnecting in 10 seconds...
[48285.191254] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191291] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191305] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191314] ldm_validate_partition_table(): Disk read failed.
[48285.191320] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191327] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191335] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191342] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191347] Dev nvme1n1: unable to read RDB block 0
[48285.191353] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191360] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191375] Buffer I/O error on dev nvme1n1, logical block 3, async page read
[48285.191389]  nvme1n1: unable to read partition table
[48285.223197] nvme1n1: detected capacity change from 1600321314816 to 0
[48289.623192] nvme1n1: detected capacity change from 0 to -65647705833078784
[48289.623411] ldm_validate_partition_table(): Disk read failed.
[48289.623447] Dev nvme1n1: unable to read RDB block 0
[48289.623486]  nvme1n1: unable to read partition table
[48289.643305] ldm_validate_partition_table(): Disk read failed.
[48289.643328] Dev nvme1n1: unable to read RDB block 0
[48289.643373]  nvme1n1: unable to read partition table





[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 26594 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [SPDK] Buffer I/O error on bigger block size running fio
@ 2017-10-06 21:40 Victor Banh
  0 siblings, 0 replies; 24+ messages in thread
From: Victor Banh @ 2017-10-06 21:40 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 5234 bytes --]



From: Harris, James R [mailto:james.r.harris(a)intel.com]
Sent: Friday, October 06, 2017 2:32 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Cc: Victor Banh <victorb(a)mellanox.com>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio

(cc Victor)

From: James Harris <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Date: Thursday, October 5, 2017 at 1:59 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio

Hi Victor,

Could you provide a few more details?  This will help the list to provide some ideas.


1)     On the client, are you using the SPDK NVMe-oF initiator or the kernel initiator?


Kernel initiator, run these commands on client server.



modprobe mlx5_ib

modprobe nvme-rdma

nvme discover -t rdma -a 192.168.10.11 -s 4420

nvme connect -t rdma -n nqn.2016-06.io.spdk:nvme-subsystem-1  -a 192.168.10.11 -s 4420




2)     Can you provide the fio configuration file or command line?  Just so we can have more specifics on “bigger block size”.


fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme1n1  --name=read-phase --rw=randwrite


3)     Any details on the HW setup – specifically details on the RDMA NIC (or if you’re using SW RoCE).

Nvmf.conf on target server

[Global]
  Comment "Global section"
    ReactorMask 0xff00

[Rpc]
  Enable No
  Listen 127.0.0.1

[Nvmf]
  MaxQueuesPerSession 8
  MaxQueueDepth 128

[Subsystem1]
  NQN nqn.2016-06.io.spdk:nvme-subsystem-1
  Core 9
  Mode Direct
  Listen RDMA 192.168.10.11:4420
  NVMe 0000:82:00.0
  SN S2PMNAAH400039


           It is RDMA NIC, ConnectX 5, Intel CPU Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
NUMA node0 CPU(s):     0-7
NUMA node1 CPU(s):     8-15




Thanks,

-Jim


From: SPDK <spdk-bounces(a)lists.01.org<mailto:spdk-bounces(a)lists.01.org>> on behalf of Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Date: Thursday, October 5, 2017 at 11:26 AM
To: "spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>" <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: [SPDK] Buffer I/O error on bigger block size running fio

Hi
I have SPDK NVMeoF and keep getting error with bigger block size with fio on randwrite tests.
I am using Ubuntu 16.04 with kernel version 4.12.0-041200-generic on target and client.
The DPDK is 17.08 and SPDK is 17.07.1.
Thanks
Victor


[46905.233553] perf: interrupt took too long (2503 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
[48285.159186] blk_update_request: I/O error, dev nvme1n1, sector 2507351968
[48285.159207] blk_update_request: I/O error, dev nvme1n1, sector 1301294496
[48285.159226] blk_update_request: I/O error, dev nvme1n1, sector 1947371168
[48285.159239] blk_update_request: I/O error, dev nvme1n1, sector 1891797568
[48285.159252] blk_update_request: I/O error, dev nvme1n1, sector 10833824
[48285.159265] blk_update_request: I/O error, dev nvme1n1, sector 614937152
[48285.159277] blk_update_request: I/O error, dev nvme1n1, sector 1872305088
[48285.159290] blk_update_request: I/O error, dev nvme1n1, sector 1504491040
[48285.159299] blk_update_request: I/O error, dev nvme1n1, sector 1182136128
[48285.159308] blk_update_request: I/O error, dev nvme1n1, sector 1662985792
[48285.191185] nvme nvme1: Reconnecting in 10 seconds...
[48285.191254] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191291] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191305] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191314] ldm_validate_partition_table(): Disk read failed.
[48285.191320] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191327] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191335] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191342] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191347] Dev nvme1n1: unable to read RDB block 0
[48285.191353] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191360] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191375] Buffer I/O error on dev nvme1n1, logical block 3, async page read
[48285.191389]  nvme1n1: unable to read partition table
[48285.223197] nvme1n1: detected capacity change from 1600321314816 to 0
[48289.623192] nvme1n1: detected capacity change from 0 to -65647705833078784
[48289.623411] ldm_validate_partition_table(): Disk read failed.
[48289.623447] Dev nvme1n1: unable to read RDB block 0
[48289.623486]  nvme1n1: unable to read partition table
[48289.643305] ldm_validate_partition_table(): Disk read failed.
[48289.643328] Dev nvme1n1: unable to read RDB block 0
[48289.643373]  nvme1n1: unable to read partition table





[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 17572 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [SPDK] Buffer I/O error on bigger block size running fio
@ 2017-10-06 21:32 Harris, James R
  0 siblings, 0 replies; 24+ messages in thread
From: Harris, James R @ 2017-10-06 21:32 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 3750 bytes --]

(cc Victor)

From: James Harris <james.r.harris(a)intel.com>
Date: Thursday, October 5, 2017 at 1:59 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio

Hi Victor,

Could you provide a few more details?  This will help the list to provide some ideas.


1)       On the client, are you using the SPDK NVMe-oF initiator or the kernel initiator?

2)       Can you provide the fio configuration file or command line?  Just so we can have more specifics on “bigger block size”.

3)       Any details on the HW setup – specifically details on the RDMA NIC (or if you’re using SW RoCE).

Thanks,

-Jim


From: SPDK <spdk-bounces(a)lists.01.org> on behalf of Victor Banh <victorb(a)mellanox.com>
Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org>
Date: Thursday, October 5, 2017 at 11:26 AM
To: "spdk(a)lists.01.org" <spdk(a)lists.01.org>
Subject: [SPDK] Buffer I/O error on bigger block size running fio

Hi
I have SPDK NVMeoF and keep getting error with bigger block size with fio on randwrite tests.
I am using Ubuntu 16.04 with kernel version 4.12.0-041200-generic on target and client.
The DPDK is 17.08 and SPDK is 17.07.1.
Thanks
Victor


[46905.233553] perf: interrupt took too long (2503 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
[48285.159186] blk_update_request: I/O error, dev nvme1n1, sector 2507351968
[48285.159207] blk_update_request: I/O error, dev nvme1n1, sector 1301294496
[48285.159226] blk_update_request: I/O error, dev nvme1n1, sector 1947371168
[48285.159239] blk_update_request: I/O error, dev nvme1n1, sector 1891797568
[48285.159252] blk_update_request: I/O error, dev nvme1n1, sector 10833824
[48285.159265] blk_update_request: I/O error, dev nvme1n1, sector 614937152
[48285.159277] blk_update_request: I/O error, dev nvme1n1, sector 1872305088
[48285.159290] blk_update_request: I/O error, dev nvme1n1, sector 1504491040
[48285.159299] blk_update_request: I/O error, dev nvme1n1, sector 1182136128
[48285.159308] blk_update_request: I/O error, dev nvme1n1, sector 1662985792
[48285.191185] nvme nvme1: Reconnecting in 10 seconds...
[48285.191254] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191291] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191305] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191314] ldm_validate_partition_table(): Disk read failed.
[48285.191320] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191327] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191335] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191342] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191347] Dev nvme1n1: unable to read RDB block 0
[48285.191353] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191360] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191375] Buffer I/O error on dev nvme1n1, logical block 3, async page read
[48285.191389]  nvme1n1: unable to read partition table
[48285.223197] nvme1n1: detected capacity change from 1600321314816 to 0
[48289.623192] nvme1n1: detected capacity change from 0 to -65647705833078784
[48289.623411] ldm_validate_partition_table(): Disk read failed.
[48289.623447] Dev nvme1n1: unable to read RDB block 0
[48289.623486]  nvme1n1: unable to read partition table
[48289.643305] ldm_validate_partition_table(): Disk read failed.
[48289.643328] Dev nvme1n1: unable to read RDB block 0
[48289.643373]  nvme1n1: unable to read partition table





[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 12470 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [SPDK] Buffer I/O error on bigger block size running fio
@ 2017-10-05 20:59 Harris, James R
  0 siblings, 0 replies; 24+ messages in thread
From: Harris, James R @ 2017-10-05 20:59 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 3506 bytes --]

Hi Victor,

Could you provide a few more details?  This will help the list to provide some ideas.


1)       On the client, are you using the SPDK NVMe-oF initiator or the kernel initiator?

2)       Can you provide the fio configuration file or command line?  Just so we can have more specifics on “bigger block size”.

3)       Any details on the HW setup – specifically details on the RDMA NIC (or if you’re using SW RoCE).

Thanks,

-Jim


From: SPDK <spdk-bounces(a)lists.01.org> on behalf of Victor Banh <victorb(a)mellanox.com>
Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org>
Date: Thursday, October 5, 2017 at 11:26 AM
To: "spdk(a)lists.01.org" <spdk(a)lists.01.org>
Subject: [SPDK] Buffer I/O error on bigger block size running fio

Hi
I have SPDK NVMeoF and keep getting error with bigger block size with fio on randwrite tests.
I am using Ubuntu 16.04 with kernel version 4.12.0-041200-generic on target and client.
The DPDK is 17.08 and SPDK is 17.07.1.
Thanks
Victor


[46905.233553] perf: interrupt took too long (2503 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
[48285.159186] blk_update_request: I/O error, dev nvme1n1, sector 2507351968
[48285.159207] blk_update_request: I/O error, dev nvme1n1, sector 1301294496
[48285.159226] blk_update_request: I/O error, dev nvme1n1, sector 1947371168
[48285.159239] blk_update_request: I/O error, dev nvme1n1, sector 1891797568
[48285.159252] blk_update_request: I/O error, dev nvme1n1, sector 10833824
[48285.159265] blk_update_request: I/O error, dev nvme1n1, sector 614937152
[48285.159277] blk_update_request: I/O error, dev nvme1n1, sector 1872305088
[48285.159290] blk_update_request: I/O error, dev nvme1n1, sector 1504491040
[48285.159299] blk_update_request: I/O error, dev nvme1n1, sector 1182136128
[48285.159308] blk_update_request: I/O error, dev nvme1n1, sector 1662985792
[48285.191185] nvme nvme1: Reconnecting in 10 seconds...
[48285.191254] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191291] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191305] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191314] ldm_validate_partition_table(): Disk read failed.
[48285.191320] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191327] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191335] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191342] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191347] Dev nvme1n1: unable to read RDB block 0
[48285.191353] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191360] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191375] Buffer I/O error on dev nvme1n1, logical block 3, async page read
[48285.191389]  nvme1n1: unable to read partition table
[48285.223197] nvme1n1: detected capacity change from 1600321314816 to 0
[48289.623192] nvme1n1: detected capacity change from 0 to -65647705833078784
[48289.623411] ldm_validate_partition_table(): Disk read failed.
[48289.623447] Dev nvme1n1: unable to read RDB block 0
[48289.623486]  nvme1n1: unable to read partition table
[48289.643305] ldm_validate_partition_table(): Disk read failed.
[48289.643328] Dev nvme1n1: unable to read RDB block 0
[48289.643373]  nvme1n1: unable to read partition table





[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 11289 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2018-03-05 19:32 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-05 18:26 [SPDK] Buffer I/O error on bigger block size running fio Victor Banh
2017-10-05 20:59 Harris, James R
2017-10-06 21:32 Harris, James R
2017-10-06 21:40 Victor Banh
2017-10-09 17:58 Cao, Gang
2017-10-16 21:30 Victor Banh
2017-10-17  3:51 Cao, Gang
2017-10-18  2:37 Victor Banh
2017-10-19  1:43 Victor Banh
2017-10-19  3:05 Cao, Gang
2017-10-19  3:59 Cao, Gang
2017-10-19  4:34 Victor Banh
2017-10-19  5:59 Cao, Gang
2017-10-19  6:13 Victor Banh
2017-10-19  6:51 Cao, Gang
2017-10-19  7:16 Victor Banh
2017-10-20 22:13 Victor Banh
2017-10-21  8:21 Cao, Gang
2017-10-21 15:22 Victor Banh
2017-10-24  4:25 Cao, Gang
2017-10-24  4:57 Victor Banh
2017-10-24  5:04 Cao, Gang
2018-03-03  4:22 Victor Banh
2018-03-05 19:32 Wodkowski, PawelX

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.