* Re: [SPDK] Buffer I/O error on bigger block size running fio
@ 2017-10-09 17:58 Cao, Gang
0 siblings, 0 replies; 24+ messages in thread
From: Cao, Gang @ 2017-10-09 17:58 UTC (permalink / raw)
To: spdk
[-- Attachment #1: Type: text/plain, Size: 7053 bytes --]
Hi Victor,
Thanks for your detailed information on the testing.
I’ve tried the latest SPDK code and with latest fio-3.1-20-g132b and fio-2.19. It seems like no this kind of error.
Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?
fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme0n1 --name=read-phase --rw=randwrite
read-phase: (g=0): rw=randwrite, bs=(R) 512KiB-512KiB, (W) 512KiB-512KiB, (T) 512KiB-512KiB, ioengine=libaio, iodepth=16
...
fio-3.1-20-g132b
Starting 4 processes
Jobs: 4 (f=4): [w(4)][100.0%][r=0KiB/s,w=1592MiB/s][r=0,w=3183 IOPS][eta 00m:00s]
read-phase: (groupid=0, jobs=1): err= 0: pid=46378: Tue Oct 10 01:23:39 2017
My NIC information:
[root(a)node4 nvme-cli-gerrit]# lsmod | grep -i mlx
mlx5_ib 172032 0
ib_core 200704 15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,mlx5_ib,ib_srpt,ib_ucm,rdma_ucm
mlx5_core 380928 1 mlx5_ib
ptp 20480 3 ixgbe,igb,mlx5_core
[root(a)node4 nvme-cli-gerrit]# lspci | grep -i mell
81:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
81:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Victor Banh
Sent: Friday, October 6, 2017 2:41 PM
To: Harris, James R <james.r.harris(a)intel.com>; Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio
From: Harris, James R [mailto:james.r.harris(a)intel.com]
Sent: Friday, October 06, 2017 2:32 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Cc: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio
(cc Victor)
From: James Harris <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Date: Thursday, October 5, 2017 at 1:59 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
Could you provide a few more details? This will help the list to provide some ideas.
1) On the client, are you using the SPDK NVMe-oF initiator or the kernel initiator?
Kernel initiator, run these commands on client server.
modprobe mlx5_ib
modprobe nvme-rdma
nvme discover -t rdma -a 192.168.10.11 -s 4420
nvme connect -t rdma -n nqn.2016-06.io.spdk:nvme-subsystem-1 -a 192.168.10.11 -s 4420
2) Can you provide the fio configuration file or command line? Just so we can have more specifics on “bigger block size”.
fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme1n1 --name=read-phase --rw=randwrite
3) Any details on the HW setup – specifically details on the RDMA NIC (or if you’re using SW RoCE).
Nvmf.conf on target server
[Global]
Comment "Global section"
ReactorMask 0xff00
[Rpc]
Enable No
Listen 127.0.0.1
[Nvmf]
MaxQueuesPerSession 8
MaxQueueDepth 128
[Subsystem1]
NQN nqn.2016-06.io.spdk:nvme-subsystem-1
Core 9
Mode Direct
Listen RDMA 192.168.10.11:4420
NVMe 0000:82:00.0
SN S2PMNAAH400039
It is RDMA NIC, ConnectX 5, Intel CPU Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
NUMA node0 CPU(s): 0-7
NUMA node1 CPU(s): 8-15
Thanks,
-Jim
From: SPDK <spdk-bounces(a)lists.01.org<mailto:spdk-bounces(a)lists.01.org>> on behalf of Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Date: Thursday, October 5, 2017 at 11:26 AM
To: "spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>" <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: [SPDK] Buffer I/O error on bigger block size running fio
Hi
I have SPDK NVMeoF and keep getting error with bigger block size with fio on randwrite tests.
I am using Ubuntu 16.04 with kernel version 4.12.0-041200-generic on target and client.
The DPDK is 17.08 and SPDK is 17.07.1.
Thanks
Victor
[46905.233553] perf: interrupt took too long (2503 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
[48285.159186] blk_update_request: I/O error, dev nvme1n1, sector 2507351968
[48285.159207] blk_update_request: I/O error, dev nvme1n1, sector 1301294496
[48285.159226] blk_update_request: I/O error, dev nvme1n1, sector 1947371168
[48285.159239] blk_update_request: I/O error, dev nvme1n1, sector 1891797568
[48285.159252] blk_update_request: I/O error, dev nvme1n1, sector 10833824
[48285.159265] blk_update_request: I/O error, dev nvme1n1, sector 614937152
[48285.159277] blk_update_request: I/O error, dev nvme1n1, sector 1872305088
[48285.159290] blk_update_request: I/O error, dev nvme1n1, sector 1504491040
[48285.159299] blk_update_request: I/O error, dev nvme1n1, sector 1182136128
[48285.159308] blk_update_request: I/O error, dev nvme1n1, sector 1662985792
[48285.191185] nvme nvme1: Reconnecting in 10 seconds...
[48285.191254] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191291] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191305] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191314] ldm_validate_partition_table(): Disk read failed.
[48285.191320] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191327] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191335] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191342] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191347] Dev nvme1n1: unable to read RDB block 0
[48285.191353] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191360] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191375] Buffer I/O error on dev nvme1n1, logical block 3, async page read
[48285.191389] nvme1n1: unable to read partition table
[48285.223197] nvme1n1: detected capacity change from 1600321314816 to 0
[48289.623192] nvme1n1: detected capacity change from 0 to -65647705833078784
[48289.623411] ldm_validate_partition_table(): Disk read failed.
[48289.623447] Dev nvme1n1: unable to read RDB block 0
[48289.623486] nvme1n1: unable to read partition table
[48289.643305] ldm_validate_partition_table(): Disk read failed.
[48289.643328] Dev nvme1n1: unable to read RDB block 0
[48289.643373] nvme1n1: unable to read partition table
[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 26594 bytes --]
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [SPDK] Buffer I/O error on bigger block size running fio
@ 2018-03-05 19:32 Wodkowski, PawelX
0 siblings, 0 replies; 24+ messages in thread
From: Wodkowski, PawelX @ 2018-03-05 19:32 UTC (permalink / raw)
To: spdk
[-- Attachment #1: Type: text/plain, Size: 25283 bytes --]
Hi Victor,
We do not test mainline kernels. Please use official Ubuntu 16.04.3 kernel (currently I think it is 4.4.0-116).
If you need kernel version >= 4.12.0 please use Ubuntu 17.10.
Pawel
From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Victor Banh
Sent: Saturday, March 3, 2018 5:23 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org>; Harris, James R <james.r.harris(a)intel.com>; John F. Kim <johnk(a)mellanox.com>; Cao, Gang <gang.cao(a)intel.com>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio
Hi Cao
Have you tried DPDK 18.02 with SPDK V18.01?
Thanks
Victor
From: Victor Banh
Sent: Saturday, October 21, 2017 8:23 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; John F. Kim <johnk(a)mellanox.com<mailto:johnk(a)mellanox.com>>; Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
I am using NVMe.
Thanks
Victor
________________________________
From: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Sent: Saturday, October 21, 2017 1:21:03 AM
To: Victor Banh; Storage Performance Development Kit; Harris, James R; John F. Kim
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
I am also using the SPDK v17.07.1 and DPDK 17.08 on my system to have the trying based on your IO configuration.
Let me check whether our colleague has this Ubuntu OS installed. If not, could you please have a try on CentOS.
Also what kind of backend device did you configured for the NVMe-oF, like Malloc, NVMe SSD or other?
Thanks,
Gang
From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Saturday, October 21, 2017 6:14 AM
To: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; John F. Kim <johnk(a)mellanox.com<mailto:johnk(a)mellanox.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Gang
Is it possible to try on Ubuntu OS?
Which verion of DPDK and SPDK are you using?
Where can I get them? Github?
Thanks
Victor
root(a)clientintel:~# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.3 LTS
Release: 16.04
Codename: xenial
root(a)clientintel:~# uname -a
Linux clientintel 4.12.0-041200-generic #201707022031 SMP Mon Jul 3 00:32:52 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Wednesday, October 18, 2017 11:52 PM
To: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
[root(a)node4 gangcao]# uname -a
Linux node4 4.10.1 #1 SMP Fri Mar 10 15:59:57 CST 2017 x86_64 x86_64 x86_64 GNU/Linux
[root(a)node4 gangcao]# lsb_release -a
LSB Version: :core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch
Distributor ID: CentOS
Description: CentOS Linux release 7.2.1511 (Core)
Release: 7.2.1511
Codename: Core
From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Thursday, October 19, 2017 2:14 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Can you give the OS version and kernel version again for target and client?
I couldn't compile dkdp without installed the latest Kernel on 4.12.X kernel on Ubuntu 16.04.
Thanks
Victor
________________________________
From: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Sent: Wednesday, October 18, 2017 10:59:19 PM
To: Victor Banh; Storage Performance Development Kit; Harris, James R
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
This server does not have OFED installed and it is in the loopback mode.
Found another server also in the loopback mode with ConnectX-3 and have OFED installed.
By the way, what SSD are you using? Maybe it is relating to the SSD? I've just run with 2048k for a short duration and seems no issue. Will run more time to see whether can hit this error.
[root(a)slave3 fio]# lspci | grep -i mell
08:00.0 Ethernet controller: Mellanox Technologies MT27520 Family [ConnectX-3 Pro]
[root(a)slave3 fio]# lsmod | grep -i mlx
mlx4_ib 159744 0
ib_core 208896 15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,ib_srpt,ib_ucm,rdma_ucm,mlx4_ib
mlx4_en 114688 0
mlx4_core 307200 2 mlx4_en,mlx4_ib
ptp 20480 3 ixgbe,igb,mlx4_en
[root(a)slave3 fio]# ofed_info
MLNX_OFED_LINUX-3.2-2.0.0.0 (OFED-3.2-2.0.0):
ar_mgr:
osm_plugins/ar_mgr/ar_mgr-1.0-0.30.ga1ea4b7.tar.gz
cc_mgr:
osm_plugins/cc_mgr/cc_mgr-1.0-0.29.ga1ea4b7.tar.gz
dapl:
dapl.git mlnx_ofed_3_1
commit c30fb6ce2cbc29d8ed4bde51437f7abb93378c78
dump_pr:
osm_plugins/dump_pr//dump_pr-1.0-0.25.ga1ea4b7.tar.gz
fabric-collector:
fabric_collector//fabric-collector-1.1.0.MLNX20140410.51b267e.tar.gz
fca:
mlnx_ofed_fca/fca-2.5.2431-1.src.rpm
hcoll:
mlnx_ofed_hcol/hcoll-3.4.807-1.src.rpm
ibacm:
mlnx_ofed/ibacm.git mlnx_ofed_3_2
commit 15ad8c13bdebbe62edea0b7df030710b65c14f7f
ibacm_ssa:
mlnx_ofed_ssa/acm/ibacm_ssa-0.0.9.3.MLNX20151203.50eb579.tar.gz
ibdump:
sniffer/sniffer-4.0.0-2/ibdump/linux/ibdump-4.0.0-2.tgz
ibsim:
mlnx_ofed_ibsim/ibsim-0.6-0.8.g9d76581.tar.gz
ibssa:
mlnx_ofed_ssa/distrib/ibssa-0.0.9.3.MLNX20151203.50eb579.tar.gz
ibutils:
ofed-1.5.3-rpms/ibutils/ibutils-1.5.7.1-0.12.gdcaeae2.tar.gz
ibutils2:
ibutils2/ibutils2-2.1.1-0.76.MLNX20160222.gd366c7b.tar.gz
infiniband-diags:
mlnx_ofed_infiniband_diags/infiniband-diags-1.6.6.MLNX20151130.7f0213e.tar.gz
infinipath-psm:
mlnx_ofed_infinipath-psm/infinipath-psm-3.3-2_g6f42cdb_open.tar.gz
iser:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d
isert:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d
kernel-mft:
mlnx_ofed_mft/kernel-mft-4.3.0-25.src.rpm
knem:
knem.git mellanox-master
commit f143ee19a575cd42a334422fa8bd329d671238db
libibcm:
mlnx_ofed/libibcm.git mlnx_ofed_3_0
commit d7d485df305e6536711485bd7e477668e77d8320
libibmad:
mlnx_ofed_libibmad/libibmad-1.3.12.MLNX20151122.d140cb1.tar.gz
libibprof:
mlnx_ofed_libibprof/libibprof-1.1.22-1.src.rpm
libibumad:
mlnx_ofed_libibumad/libibumad-1.3.10.2.MLNX20150406.966500d.tar.gz
libibverbs:
mlnx_ofed/libibverbs.git mlnx_ofed_3_2_2
commit 217a77686f4861229f0e4b94485a13f024634caf
libmlx4:
mlnx_ofed/libmlx4.git mlnx_ofed_3_2_2
commit dda6d7ae1e6e3779a485ebdd0a882f4bcbd027a6
libmlx5:
mlnx_ofed/libmlx5.git mlnx_ofed_3_2_2
commit d0c8645359e0f0aba0408b2d344f3b418d27019b
libopensmssa:
mlnx_ofed_ssa/plugin/libopensmssa-0.0.9.3.MLNX20151203.50eb579.tar.gz
librdmacm:
mlnx_ofed/librdmacm.git mlnx_ofed_3_2_2
commit 6bd430fed9e7b3d57a1876c040431ce7295c7703
libsdp:
libsdp.git mlnx_ofed_3_0
commit fbd01dfff05f42d6b82506e7dbf4bc6b7e6a59a4
libvma:
vma/source_rpms//libvma-7.0.14-0.src.rpm
mlnx-ethtool:
upstream/ethtool.git for-upstream
commit ac0cf295abe0c0832f0711fed66ab9601c8b2513
mlnx-ofa_kernel:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d
mpi-selector:
ofed-1.5.3-rpms/mpi-selector/mpi-selector-1.0.3-1.src.rpm
mpitests:
mlnx_ofed_mpitest/mpitests-3.2.17-e1c7f2f.src.rpm
mstflint:
mlnx_ofed_mstflint/mstflint-4.3.0-1.49.g9b9af70.tar.gz
multiperf:
mlnx_ofed_multiperf/multiperf-3.0-0.10.gda89e8c.tar.gz
mvapich2:
mlnx_ofed_mvapich2/mvapich2-2.2a-1.src.rpm
mxm:
mlnx_ofed_mxm/mxm-3.4.3079-1.src.rpm
ofed-docs:
docs.git mlnx_ofed-3.2
commit ea3386416f9f7130edd2c70fc3424cb2cda50f7d
openmpi:
mlnx_ofed_ompi_1.8/openmpi-1.10.3a1-1.src.rpm
opensm:
mlnx_ofed_opensm/opensm-4.6.1.MLNX20160112.774e977.tar.gz
perftest:
mlnx_ofed_perftest/perftest-3.0-0.18.gb464d59.tar.gz
qperf:
mlnx_ofed_qperf/qperf-0.4.9.tar.gz
rds-tools:
rds-tools.git mlnx_ofed_2_4
commit 299420ca25cf9996bc0748e3bc4b08748996ba49
sdpnetstat:
sdpnetstat.git mlnx_ofed_3_0
commit 3cf409a7cc07e5c71f9640eddbb801ece21b4169
sockperf:
sockperf/sockperf-2.7-43.git3ee62bd8107a.src.rpm
srp:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d
srptools:
srptools/srptools-1.0.2-12.src.rpm
Installed Packages:
-------------------
infiniband-diags
librdmacm
libmlx4
libibverbs-utils
mpi-selector
libibmad-devel
sdpnetstat
knem
libibumad-devel
libsdp
mlnx-ethtool
libibverbs-debuginfo
mlnx-ofa_kernel-modules
srp
opensm
mstflint
cc_mgr
libibmad
libibverbs
kernel-mft
libibverbs-devel-static
libibumad
librdmacm-devel
mlnx-ofa_kernel
libsdp-devel
ibutils2
mlnxofed-docs
libibmad-static
iser
opensm-devel
dump_pr
libibumad-static
rds-tools
libmlx4-debuginfo
mlnx-ofa_kernel-devel
opensm-libs
opensm-static
ar_mgr
dapl-devel-static
infiniband-diags-compat
libibverbs-devel
ibsim
[root(a)slave3 fio]# ibstat
CA 'mlx4_0'
CA type: MT4103
Number of ports: 2
Firmware version: 2.35.5100
Hardware version: 0
Node GUID: 0x248a0703006090e0
System image GUID: 0x248a0703006090e0
Port 1:
State: Active
Physical state: LinkUp
Rate: 40
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x04010000
Port GUID: 0x268a07fffe6090e0
Link layer: Ethernet
Port 2:
State: Active
Physical state: LinkUp
Rate: 40
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x04010000
Port GUID: 0x268a07fffe6090e1
Link layer: Ethernet
[root(a)slave3 fio]# ibv_devinfo
hca_id: mlx4_0
transport: InfiniBand (0)
fw_ver: 2.35.5100
node_guid: 248a:0703:0060:90e0
sys_image_guid: 248a:0703:0060:90e0
vendor_id: 0x02c9
vendor_part_id: 4103
hw_ver: 0x0
board_id: MT_1090111023
phys_port_cnt: 2
port: 1
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 1024 (3)
sm_lid: 0
port_lid: 0
port_lmc: 0x00
link_layer: Ethernet
port: 2
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 1024 (3)
sm_lid: 0
port_lid: 0
port_lmc: 0x00
link_layer: Ethernet
From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Thursday, October 19, 2017 12:34 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Do you install Mellanox OFED on the Target and Client server?
Can you run ibstat on both servers?
Thanks
Victor
________________________________
From: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Sent: Wednesday, October 18, 2017 8:59:28 PM
To: Victor Banh; Storage Performance Development Kit; Harris, James R
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
I've just tried the SPDK v17.07.1 and DPDK v17.08.
nvme version: 1.1.38.gfaab
fio version: 3.1
Tried the 512k and 1024k IO size and there is no error. demsg information as following.
So that there may be other difference here? Looks like you are using ConnectX-5 while I am using ConnectX-4?
Other related information:
[root(a)node4 fio]# lspci | grep -i mell
81:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
81:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
[root(a)node4 fio]# lsmod | grep -i mlx
mlx5_ib 172032 0
ib_core 200704 15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,mlx5_ib,ib_srpt,ib_ucm,rdma_ucm
mlx5_core 380928 1 mlx5_ib
ptp 20480 3 ixgbe,igb,mlx5_core
[root(a)node4 fio]# uname -a
Linux node4 4.10.1 #1 SMP Fri Mar 10 15:59:57 CST 2017 x86_64 x86_64 x86_64 GNU/Linux
[577707.543326] nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 192.168.100.8:4420
[577730.854540] detected loopback device
[577730.893761] nvme nvme0: creating 7 I/O queues.
[577730.893797] detected loopback device
[577730.898611] detected loopback device
[577730.908917] detected loopback device
[577730.919073] detected loopback device
[577730.928922] detected loopback device
[577730.938679] detected loopback device
[577730.948365] detected loopback device
[577731.146290] nvme nvme0: new ctrl: NQN "nqn.2016-06.io.spdk:cnode2", addr 192.168.100.8:4420
Thanks,
Gang
From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Thursday, October 19, 2017 9:43 AM
To: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Gang
Any update?
Do you see any error message from "dmesg" with 512k block size running fio?
Thanks
Victor
From: Victor Banh
Sent: Tuesday, October 17, 2017 7:37 PM
To: 'Cao, Gang' <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Gang
spdk-17.07.1 and dpdk-17.08
Thanks
Victor
From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Monday, October 16, 2017 8:51 PM
To: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?
Thanks,
Gang
From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Tuesday, October 17, 2017 5:30 AM
To: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Cao
Do you see any message from dmesg?
I tried this fio version and still saw these error message from dmesg.
fio-3.1
[869053.218235] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218250] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218259] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218263] ldm_validate_partition_table(): Disk read failed.
[869053.218269] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218277] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218285] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218292] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218296] Dev nvme2n1: unable to read RDB block 0
[869053.218303] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218311] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218323] Buffer I/O error on dev nvme2n1, logical block 3, async page read
[869053.218338] nvme2n1: unable to read partition table
[869053.246126] nvme2n1: detected capacity change from -62111005559226368 to -62042256479092736
[869053.246195] ldm_validate_partition_table(): Disk read failed.
[869053.246217] Dev nvme2n1: unable to read RDB block 0
From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Monday, October 09, 2017 10:59 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Cc: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
Thanks for your detailed information on the testing.
I've tried the latest SPDK code and with latest fio-3.1-20-g132b and fio-2.19. It seems like no this kind of error.
Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?
fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme0n1 --name=read-phase --rw=randwrite
read-phase: (g=0): rw=randwrite, bs=(R) 512KiB-512KiB, (W) 512KiB-512KiB, (T) 512KiB-512KiB, ioengine=libaio, iodepth=16
...
fio-3.1-20-g132b
Starting 4 processes
Jobs: 4 (f=4): [w(4)][100.0%][r=0KiB/s,w=1592MiB/s][r=0,w=3183 IOPS][eta 00m:00s]
read-phase: (groupid=0, jobs=1): err= 0: pid=46378: Tue Oct 10 01:23:39 2017
My NIC information:
[root(a)node4 nvme-cli-gerrit]# lsmod | grep -i mlx
mlx5_ib 172032 0
ib_core 200704 15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,mlx5_ib,ib_srpt,ib_ucm,rdma_ucm
mlx5_core 380928 1 mlx5_ib
ptp 20480 3 ixgbe,igb,mlx5_core
[root(a)node4 nvme-cli-gerrit]# lspci | grep -i mell
81:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
81:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Victor Banh
Sent: Friday, October 6, 2017 2:41 PM
To: Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio
From: Harris, James R [mailto:james.r.harris(a)intel.com]
Sent: Friday, October 06, 2017 2:32 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Cc: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio
(cc Victor)
From: James Harris <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Date: Thursday, October 5, 2017 at 1:59 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
Could you provide a few more details? This will help the list to provide some ideas.
1) On the client, are you using the SPDK NVMe-oF initiator or the kernel initiator?
Kernel initiator, run these commands on client server.
modprobe mlx5_ib
modprobe nvme-rdma
nvme discover -t rdma -a 192.168.10.11 -s 4420
nvme connect -t rdma -n nqn.2016-06.io.spdk:nvme-subsystem-1 -a 192.168.10.11 -s 4420
2) Can you provide the fio configuration file or command line? Just so we can have more specifics on "bigger block size".
fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme1n1 --name=read-phase --rw=randwrite
3) Any details on the HW setup - specifically details on the RDMA NIC (or if you're using SW RoCE).
Nvmf.conf on target server
[Global]
Comment "Global section"
ReactorMask 0xff00
[Rpc]
Enable No
Listen 127.0.0.1
[Nvmf]
MaxQueuesPerSession 8
MaxQueueDepth 128
[Subsystem1]
NQN nqn.2016-06.io.spdk:nvme-subsystem-1
Core 9
Mode Direct
Listen RDMA 192.168.10.11:4420
NVMe 0000:82:00.0
SN S2PMNAAH400039
It is RDMA NIC, ConnectX 5, Intel CPU Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
NUMA node0 CPU(s): 0-7
NUMA node1 CPU(s): 8-15
Thanks,
-Jim
From: SPDK <spdk-bounces(a)lists.01.org<mailto:spdk-bounces(a)lists.01.org>> on behalf of Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Date: Thursday, October 5, 2017 at 11:26 AM
To: "spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>" <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: [SPDK] Buffer I/O error on bigger block size running fio
Hi
I have SPDK NVMeoF and keep getting error with bigger block size with fio on randwrite tests.
I am using Ubuntu 16.04 with kernel version 4.12.0-041200-generic on target and client.
The DPDK is 17.08 and SPDK is 17.07.1.
Thanks
Victor
[46905.233553] perf: interrupt took too long (2503 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
[48285.159186] blk_update_request: I/O error, dev nvme1n1, sector 2507351968
[48285.159207] blk_update_request: I/O error, dev nvme1n1, sector 1301294496
[48285.159226] blk_update_request: I/O error, dev nvme1n1, sector 1947371168
[48285.159239] blk_update_request: I/O error, dev nvme1n1, sector 1891797568
[48285.159252] blk_update_request: I/O error, dev nvme1n1, sector 10833824
[48285.159265] blk_update_request: I/O error, dev nvme1n1, sector 614937152
[48285.159277] blk_update_request: I/O error, dev nvme1n1, sector 1872305088
[48285.159290] blk_update_request: I/O error, dev nvme1n1, sector 1504491040
[48285.159299] blk_update_request: I/O error, dev nvme1n1, sector 1182136128
[48285.159308] blk_update_request: I/O error, dev nvme1n1, sector 1662985792
[48285.191185] nvme nvme1: Reconnecting in 10 seconds...
[48285.191254] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191291] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191305] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191314] ldm_validate_partition_table(): Disk read failed.
[48285.191320] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191327] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191335] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191342] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191347] Dev nvme1n1: unable to read RDB block 0
[48285.191353] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191360] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191375] Buffer I/O error on dev nvme1n1, logical block 3, async page read
[48285.191389] nvme1n1: unable to read partition table
[48285.223197] nvme1n1: detected capacity change from 1600321314816 to 0
[48289.623192] nvme1n1: detected capacity change from 0 to -65647705833078784
[48289.623411] ldm_validate_partition_table(): Disk read failed.
[48289.623447] Dev nvme1n1: unable to read RDB block 0
[48289.623486] nvme1n1: unable to read partition table
[48289.643305] ldm_validate_partition_table(): Disk read failed.
[48289.643328] Dev nvme1n1: unable to read RDB block 0
[48289.643373] nvme1n1: unable to read partition table
[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 139622 bytes --]
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [SPDK] Buffer I/O error on bigger block size running fio
@ 2018-03-03 4:22 Victor Banh
0 siblings, 0 replies; 24+ messages in thread
From: Victor Banh @ 2018-03-03 4:22 UTC (permalink / raw)
To: spdk
[-- Attachment #1: Type: text/plain, Size: 24596 bytes --]
Hi Cao
Have you tried DPDK 18.02 with SPDK V18.01?
Thanks
Victor
From: Victor Banh
Sent: Saturday, October 21, 2017 8:23 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org>; Harris, James R <james.r.harris(a)intel.com>; John F. Kim <johnk(a)mellanox.com>; Cao, Gang <gang.cao(a)intel.com>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
I am using NVMe.
Thanks
Victor
________________________________
From: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Sent: Saturday, October 21, 2017 1:21:03 AM
To: Victor Banh; Storage Performance Development Kit; Harris, James R; John F. Kim
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
I am also using the SPDK v17.07.1 and DPDK 17.08 on my system to have the trying based on your IO configuration.
Let me check whether our colleague has this Ubuntu OS installed. If not, could you please have a try on CentOS.
Also what kind of backend device did you configured for the NVMe-oF, like Malloc, NVMe SSD or other?
Thanks,
Gang
From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Saturday, October 21, 2017 6:14 AM
To: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; John F. Kim <johnk(a)mellanox.com<mailto:johnk(a)mellanox.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Gang
Is it possible to try on Ubuntu OS?
Which verion of DPDK and SPDK are you using?
Where can I get them? Github?
Thanks
Victor
root(a)clientintel:~# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.3 LTS
Release: 16.04
Codename: xenial
root(a)clientintel:~# uname -a
Linux clientintel 4.12.0-041200-generic #201707022031 SMP Mon Jul 3 00:32:52 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Wednesday, October 18, 2017 11:52 PM
To: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
[root(a)node4 gangcao]# uname -a
Linux node4 4.10.1 #1 SMP Fri Mar 10 15:59:57 CST 2017 x86_64 x86_64 x86_64 GNU/Linux
[root(a)node4 gangcao]# lsb_release -a
LSB Version: :core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch
Distributor ID: CentOS
Description: CentOS Linux release 7.2.1511 (Core)
Release: 7.2.1511
Codename: Core
From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Thursday, October 19, 2017 2:14 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Can you give the OS version and kernel version again for target and client?
I couldn't compile dkdp without installed the latest Kernel on 4.12.X kernel on Ubuntu 16.04.
Thanks
Victor
________________________________
From: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Sent: Wednesday, October 18, 2017 10:59:19 PM
To: Victor Banh; Storage Performance Development Kit; Harris, James R
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
This server does not have OFED installed and it is in the loopback mode.
Found another server also in the loopback mode with ConnectX-3 and have OFED installed.
By the way, what SSD are you using? Maybe it is relating to the SSD? I've just run with 2048k for a short duration and seems no issue. Will run more time to see whether can hit this error.
[root(a)slave3 fio]# lspci | grep -i mell
08:00.0 Ethernet controller: Mellanox Technologies MT27520 Family [ConnectX-3 Pro]
[root(a)slave3 fio]# lsmod | grep -i mlx
mlx4_ib 159744 0
ib_core 208896 15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,ib_srpt,ib_ucm,rdma_ucm,mlx4_ib
mlx4_en 114688 0
mlx4_core 307200 2 mlx4_en,mlx4_ib
ptp 20480 3 ixgbe,igb,mlx4_en
[root(a)slave3 fio]# ofed_info
MLNX_OFED_LINUX-3.2-2.0.0.0 (OFED-3.2-2.0.0):
ar_mgr:
osm_plugins/ar_mgr/ar_mgr-1.0-0.30.ga1ea4b7.tar.gz
cc_mgr:
osm_plugins/cc_mgr/cc_mgr-1.0-0.29.ga1ea4b7.tar.gz
dapl:
dapl.git mlnx_ofed_3_1
commit c30fb6ce2cbc29d8ed4bde51437f7abb93378c78
dump_pr:
osm_plugins/dump_pr//dump_pr-1.0-0.25.ga1ea4b7.tar.gz
fabric-collector:
fabric_collector//fabric-collector-1.1.0.MLNX20140410.51b267e.tar.gz
fca:
mlnx_ofed_fca/fca-2.5.2431-1.src.rpm
hcoll:
mlnx_ofed_hcol/hcoll-3.4.807-1.src.rpm
ibacm:
mlnx_ofed/ibacm.git mlnx_ofed_3_2
commit 15ad8c13bdebbe62edea0b7df030710b65c14f7f
ibacm_ssa:
mlnx_ofed_ssa/acm/ibacm_ssa-0.0.9.3.MLNX20151203.50eb579.tar.gz
ibdump:
sniffer/sniffer-4.0.0-2/ibdump/linux/ibdump-4.0.0-2.tgz
ibsim:
mlnx_ofed_ibsim/ibsim-0.6-0.8.g9d76581.tar.gz
ibssa:
mlnx_ofed_ssa/distrib/ibssa-0.0.9.3.MLNX20151203.50eb579.tar.gz
ibutils:
ofed-1.5.3-rpms/ibutils/ibutils-1.5.7.1-0.12.gdcaeae2.tar.gz
ibutils2:
ibutils2/ibutils2-2.1.1-0.76.MLNX20160222.gd366c7b.tar.gz
infiniband-diags:
mlnx_ofed_infiniband_diags/infiniband-diags-1.6.6.MLNX20151130.7f0213e.tar.gz
infinipath-psm:
mlnx_ofed_infinipath-psm/infinipath-psm-3.3-2_g6f42cdb_open.tar.gz
iser:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d
isert:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d
kernel-mft:
mlnx_ofed_mft/kernel-mft-4.3.0-25.src.rpm
knem:
knem.git mellanox-master
commit f143ee19a575cd42a334422fa8bd329d671238db
libibcm:
mlnx_ofed/libibcm.git mlnx_ofed_3_0
commit d7d485df305e6536711485bd7e477668e77d8320
libibmad:
mlnx_ofed_libibmad/libibmad-1.3.12.MLNX20151122.d140cb1.tar.gz
libibprof:
mlnx_ofed_libibprof/libibprof-1.1.22-1.src.rpm
libibumad:
mlnx_ofed_libibumad/libibumad-1.3.10.2.MLNX20150406.966500d.tar.gz
libibverbs:
mlnx_ofed/libibverbs.git mlnx_ofed_3_2_2
commit 217a77686f4861229f0e4b94485a13f024634caf
libmlx4:
mlnx_ofed/libmlx4.git mlnx_ofed_3_2_2
commit dda6d7ae1e6e3779a485ebdd0a882f4bcbd027a6
libmlx5:
mlnx_ofed/libmlx5.git mlnx_ofed_3_2_2
commit d0c8645359e0f0aba0408b2d344f3b418d27019b
libopensmssa:
mlnx_ofed_ssa/plugin/libopensmssa-0.0.9.3.MLNX20151203.50eb579.tar.gz
librdmacm:
mlnx_ofed/librdmacm.git mlnx_ofed_3_2_2
commit 6bd430fed9e7b3d57a1876c040431ce7295c7703
libsdp:
libsdp.git mlnx_ofed_3_0
commit fbd01dfff05f42d6b82506e7dbf4bc6b7e6a59a4
libvma:
vma/source_rpms//libvma-7.0.14-0.src.rpm
mlnx-ethtool:
upstream/ethtool.git for-upstream
commit ac0cf295abe0c0832f0711fed66ab9601c8b2513
mlnx-ofa_kernel:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d
mpi-selector:
ofed-1.5.3-rpms/mpi-selector/mpi-selector-1.0.3-1.src.rpm
mpitests:
mlnx_ofed_mpitest/mpitests-3.2.17-e1c7f2f.src.rpm
mstflint:
mlnx_ofed_mstflint/mstflint-4.3.0-1.49.g9b9af70.tar.gz
multiperf:
mlnx_ofed_multiperf/multiperf-3.0-0.10.gda89e8c.tar.gz
mvapich2:
mlnx_ofed_mvapich2/mvapich2-2.2a-1.src.rpm
mxm:
mlnx_ofed_mxm/mxm-3.4.3079-1.src.rpm
ofed-docs:
docs.git mlnx_ofed-3.2
commit ea3386416f9f7130edd2c70fc3424cb2cda50f7d
openmpi:
mlnx_ofed_ompi_1.8/openmpi-1.10.3a1-1.src.rpm
opensm:
mlnx_ofed_opensm/opensm-4.6.1.MLNX20160112.774e977.tar.gz
perftest:
mlnx_ofed_perftest/perftest-3.0-0.18.gb464d59.tar.gz
qperf:
mlnx_ofed_qperf/qperf-0.4.9.tar.gz
rds-tools:
rds-tools.git mlnx_ofed_2_4
commit 299420ca25cf9996bc0748e3bc4b08748996ba49
sdpnetstat:
sdpnetstat.git mlnx_ofed_3_0
commit 3cf409a7cc07e5c71f9640eddbb801ece21b4169
sockperf:
sockperf/sockperf-2.7-43.git3ee62bd8107a.src.rpm
srp:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d
srptools:
srptools/srptools-1.0.2-12.src.rpm
Installed Packages:
-------------------
infiniband-diags
librdmacm
libmlx4
libibverbs-utils
mpi-selector
libibmad-devel
sdpnetstat
knem
libibumad-devel
libsdp
mlnx-ethtool
libibverbs-debuginfo
mlnx-ofa_kernel-modules
srp
opensm
mstflint
cc_mgr
libibmad
libibverbs
kernel-mft
libibverbs-devel-static
libibumad
librdmacm-devel
mlnx-ofa_kernel
libsdp-devel
ibutils2
mlnxofed-docs
libibmad-static
iser
opensm-devel
dump_pr
libibumad-static
rds-tools
libmlx4-debuginfo
mlnx-ofa_kernel-devel
opensm-libs
opensm-static
ar_mgr
dapl-devel-static
infiniband-diags-compat
libibverbs-devel
ibsim
[root(a)slave3 fio]# ibstat
CA 'mlx4_0'
CA type: MT4103
Number of ports: 2
Firmware version: 2.35.5100
Hardware version: 0
Node GUID: 0x248a0703006090e0
System image GUID: 0x248a0703006090e0
Port 1:
State: Active
Physical state: LinkUp
Rate: 40
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x04010000
Port GUID: 0x268a07fffe6090e0
Link layer: Ethernet
Port 2:
State: Active
Physical state: LinkUp
Rate: 40
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x04010000
Port GUID: 0x268a07fffe6090e1
Link layer: Ethernet
[root(a)slave3 fio]# ibv_devinfo
hca_id: mlx4_0
transport: InfiniBand (0)
fw_ver: 2.35.5100
node_guid: 248a:0703:0060:90e0
sys_image_guid: 248a:0703:0060:90e0
vendor_id: 0x02c9
vendor_part_id: 4103
hw_ver: 0x0
board_id: MT_1090111023
phys_port_cnt: 2
port: 1
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 1024 (3)
sm_lid: 0
port_lid: 0
port_lmc: 0x00
link_layer: Ethernet
port: 2
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 1024 (3)
sm_lid: 0
port_lid: 0
port_lmc: 0x00
link_layer: Ethernet
From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Thursday, October 19, 2017 12:34 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Do you install Mellanox OFED on the Target and Client server?
Can you run ibstat on both servers?
Thanks
Victor
________________________________
From: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Sent: Wednesday, October 18, 2017 8:59:28 PM
To: Victor Banh; Storage Performance Development Kit; Harris, James R
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
I've just tried the SPDK v17.07.1 and DPDK v17.08.
nvme version: 1.1.38.gfaab
fio version: 3.1
Tried the 512k and 1024k IO size and there is no error. demsg information as following.
So that there may be other difference here? Looks like you are using ConnectX-5 while I am using ConnectX-4?
Other related information:
[root(a)node4 fio]# lspci | grep -i mell
81:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
81:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
[root(a)node4 fio]# lsmod | grep -i mlx
mlx5_ib 172032 0
ib_core 200704 15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,mlx5_ib,ib_srpt,ib_ucm,rdma_ucm
mlx5_core 380928 1 mlx5_ib
ptp 20480 3 ixgbe,igb,mlx5_core
[root(a)node4 fio]# uname -a
Linux node4 4.10.1 #1 SMP Fri Mar 10 15:59:57 CST 2017 x86_64 x86_64 x86_64 GNU/Linux
[577707.543326] nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 192.168.100.8:4420
[577730.854540] detected loopback device
[577730.893761] nvme nvme0: creating 7 I/O queues.
[577730.893797] detected loopback device
[577730.898611] detected loopback device
[577730.908917] detected loopback device
[577730.919073] detected loopback device
[577730.928922] detected loopback device
[577730.938679] detected loopback device
[577730.948365] detected loopback device
[577731.146290] nvme nvme0: new ctrl: NQN "nqn.2016-06.io.spdk:cnode2", addr 192.168.100.8:4420
Thanks,
Gang
From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Thursday, October 19, 2017 9:43 AM
To: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Gang
Any update?
Do you see any error message from "dmesg" with 512k block size running fio?
Thanks
Victor
From: Victor Banh
Sent: Tuesday, October 17, 2017 7:37 PM
To: 'Cao, Gang' <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Gang
spdk-17.07.1 and dpdk-17.08
Thanks
Victor
From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Monday, October 16, 2017 8:51 PM
To: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?
Thanks,
Gang
From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Tuesday, October 17, 2017 5:30 AM
To: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Cao
Do you see any message from dmesg?
I tried this fio version and still saw these error message from dmesg.
fio-3.1
[869053.218235] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218250] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218259] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218263] ldm_validate_partition_table(): Disk read failed.
[869053.218269] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218277] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218285] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218292] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218296] Dev nvme2n1: unable to read RDB block 0
[869053.218303] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218311] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218323] Buffer I/O error on dev nvme2n1, logical block 3, async page read
[869053.218338] nvme2n1: unable to read partition table
[869053.246126] nvme2n1: detected capacity change from -62111005559226368 to -62042256479092736
[869053.246195] ldm_validate_partition_table(): Disk read failed.
[869053.246217] Dev nvme2n1: unable to read RDB block 0
From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Monday, October 09, 2017 10:59 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Cc: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
Thanks for your detailed information on the testing.
I've tried the latest SPDK code and with latest fio-3.1-20-g132b and fio-2.19. It seems like no this kind of error.
Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?
fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme0n1 --name=read-phase --rw=randwrite
read-phase: (g=0): rw=randwrite, bs=(R) 512KiB-512KiB, (W) 512KiB-512KiB, (T) 512KiB-512KiB, ioengine=libaio, iodepth=16
...
fio-3.1-20-g132b
Starting 4 processes
Jobs: 4 (f=4): [w(4)][100.0%][r=0KiB/s,w=1592MiB/s][r=0,w=3183 IOPS][eta 00m:00s]
read-phase: (groupid=0, jobs=1): err= 0: pid=46378: Tue Oct 10 01:23:39 2017
My NIC information:
[root(a)node4 nvme-cli-gerrit]# lsmod | grep -i mlx
mlx5_ib 172032 0
ib_core 200704 15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,mlx5_ib,ib_srpt,ib_ucm,rdma_ucm
mlx5_core 380928 1 mlx5_ib
ptp 20480 3 ixgbe,igb,mlx5_core
[root(a)node4 nvme-cli-gerrit]# lspci | grep -i mell
81:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
81:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Victor Banh
Sent: Friday, October 6, 2017 2:41 PM
To: Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio
From: Harris, James R [mailto:james.r.harris(a)intel.com]
Sent: Friday, October 06, 2017 2:32 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Cc: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio
(cc Victor)
From: James Harris <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Date: Thursday, October 5, 2017 at 1:59 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
Could you provide a few more details? This will help the list to provide some ideas.
1) On the client, are you using the SPDK NVMe-oF initiator or the kernel initiator?
Kernel initiator, run these commands on client server.
modprobe mlx5_ib
modprobe nvme-rdma
nvme discover -t rdma -a 192.168.10.11 -s 4420
nvme connect -t rdma -n nqn.2016-06.io.spdk:nvme-subsystem-1 -a 192.168.10.11 -s 4420
2) Can you provide the fio configuration file or command line? Just so we can have more specifics on "bigger block size".
fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme1n1 --name=read-phase --rw=randwrite
3) Any details on the HW setup - specifically details on the RDMA NIC (or if you're using SW RoCE).
Nvmf.conf on target server
[Global]
Comment "Global section"
ReactorMask 0xff00
[Rpc]
Enable No
Listen 127.0.0.1
[Nvmf]
MaxQueuesPerSession 8
MaxQueueDepth 128
[Subsystem1]
NQN nqn.2016-06.io.spdk:nvme-subsystem-1
Core 9
Mode Direct
Listen RDMA 192.168.10.11:4420
NVMe 0000:82:00.0
SN S2PMNAAH400039
It is RDMA NIC, ConnectX 5, Intel CPU Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
NUMA node0 CPU(s): 0-7
NUMA node1 CPU(s): 8-15
Thanks,
-Jim
From: SPDK <spdk-bounces(a)lists.01.org<mailto:spdk-bounces(a)lists.01.org>> on behalf of Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Date: Thursday, October 5, 2017 at 11:26 AM
To: "spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>" <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: [SPDK] Buffer I/O error on bigger block size running fio
Hi
I have SPDK NVMeoF and keep getting error with bigger block size with fio on randwrite tests.
I am using Ubuntu 16.04 with kernel version 4.12.0-041200-generic on target and client.
The DPDK is 17.08 and SPDK is 17.07.1.
Thanks
Victor
[46905.233553] perf: interrupt took too long (2503 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
[48285.159186] blk_update_request: I/O error, dev nvme1n1, sector 2507351968
[48285.159207] blk_update_request: I/O error, dev nvme1n1, sector 1301294496
[48285.159226] blk_update_request: I/O error, dev nvme1n1, sector 1947371168
[48285.159239] blk_update_request: I/O error, dev nvme1n1, sector 1891797568
[48285.159252] blk_update_request: I/O error, dev nvme1n1, sector 10833824
[48285.159265] blk_update_request: I/O error, dev nvme1n1, sector 614937152
[48285.159277] blk_update_request: I/O error, dev nvme1n1, sector 1872305088
[48285.159290] blk_update_request: I/O error, dev nvme1n1, sector 1504491040
[48285.159299] blk_update_request: I/O error, dev nvme1n1, sector 1182136128
[48285.159308] blk_update_request: I/O error, dev nvme1n1, sector 1662985792
[48285.191185] nvme nvme1: Reconnecting in 10 seconds...
[48285.191254] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191291] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191305] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191314] ldm_validate_partition_table(): Disk read failed.
[48285.191320] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191327] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191335] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191342] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191347] Dev nvme1n1: unable to read RDB block 0
[48285.191353] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191360] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191375] Buffer I/O error on dev nvme1n1, logical block 3, async page read
[48285.191389] nvme1n1: unable to read partition table
[48285.223197] nvme1n1: detected capacity change from 1600321314816 to 0
[48289.623192] nvme1n1: detected capacity change from 0 to -65647705833078784
[48289.623411] ldm_validate_partition_table(): Disk read failed.
[48289.623447] Dev nvme1n1: unable to read RDB block 0
[48289.623486] nvme1n1: unable to read partition table
[48289.643305] ldm_validate_partition_table(): Disk read failed.
[48289.643328] Dev nvme1n1: unable to read RDB block 0
[48289.643373] nvme1n1: unable to read partition table
[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 124749 bytes --]
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [SPDK] Buffer I/O error on bigger block size running fio
@ 2017-10-24 5:04 Cao, Gang
0 siblings, 0 replies; 24+ messages in thread
From: Cao, Gang @ 2017-10-24 5:04 UTC (permalink / raw)
To: spdk
[-- Attachment #1: Type: text/plain, Size: 41661 bytes --]
I have run the NVMe-oF in the loopback mode. Both host and target on the same server.
From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Tuesday, October 24, 2017 12:57 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org>; Harris, James R <james.r.harris(a)intel.com>; John F. Kim <johnk(a)mellanox.com>; Cao, Gang <gang.cao(a)intel.com>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Can you run dmesg on client?
Thanks
Victor
________________________________
From: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Sent: Monday, October 23, 2017 9:25:16 PM
To: Victor Banh; Storage Performance Development Kit; Harris, James R; John F. Kim
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
I've just successfully run the SPDK NVMe-oF target in the loopback mode with Ubuntu 16.04 and Kernel 4.13.7. Detailed information as following.
Also tried the 5120K big IO size from the FIO for 5 minutes.
root(a)waikikibeach111:~# uname -a
Linux waikikibeach111 4.13.7 #1 SMP Sun Oct 22 23:24:08 PDT 2017 x86_64 x86_64 x86_64 GNU/Linux
root(a)waikikibeach111:~# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.3 LTS
Release: 16.04
Codename: xenial
root(a)waikikibeach111:~# lspci | grep -i mellanox
03:00.0 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4]
03:00.1 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4]
root(a)waikikibeach111:~# ibstat
CA 'mlx5_0'
CA type: MT4115
Number of ports: 1
Firmware version: 12.18.1000
Hardware version: 0
Node GUID: 0x248a07030049d218
System image GUID: 0x248a07030049d218
Port 1:
State: Active
Physical state: LinkUp
Rate: 100
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x04010000
Port GUID: 0x268a07fffe49d218
Link layer: Ethernet
CA 'mlx5_1'
CA type: MT4115
Number of ports: 1
Firmware version: 12.18.1000
Hardware version: 0
Node GUID: 0x248a07030049d219
System image GUID: 0x248a07030049d218
Port 1:
State: Active
Physical state: LinkUp
Rate: 100
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x04010000
Port GUID: 0x268a07fffe49d219
Link layer: Ethernet
root(a)waikikibeach111:~# cd /home/gangcao/spdk
root(a)waikikibeach111:/home/gangcao/spdk# git branch
* (HEAD detached at v17.07.1) --> DPDK 17.08
master
root(a)waikikibeach111:/home/gangcao/spdk# nvme --version
nvme version 1.2
root(a)waikikibeach111:/home/gangcao/spdk# /home/gangcao/fio/fio --version
fio-3.1
root(a)waikikibeach111:/home/gangcao/spdk# lspci | grep -i volatile
84:00.0 Non-Volatile memory controller: Intel Corporation PCIe Data Center SSD (rev 01)
root(a)waikikibeach111:/home/gangcao/spdk# lspci -s 84:00.0 -v
84:00.0 Non-Volatile memory controller: Intel Corporation PCIe Data Center SSD (rev 01) (prog-if 02 [NVM Express])
Subsystem: Intel Corporation DC P3700 SSD [2.5" SFF]
Physical Slot: 37
Flags: bus master, fast devsel, latency 0, IRQ 39
Memory at fbc10000 (64-bit, non-prefetchable) [size=16K]
Expansion ROM at fbc00000 [disabled] [size=64K]
Capabilities: [40] Power Management version 3
Capabilities: [50] MSI-X: Enable- Count=32 Masked-
Capabilities: [60] Express Endpoint, MSI 00
Capabilities: [100] Advanced Error Reporting
Capabilities: [150] Virtual Channel
Capabilities: [180] Power Budgeting <?>
Capabilities: [190] Alternative Routing-ID Interpretation (ARI)
Capabilities: [270] Device Serial Number 55-cd-2e-41-4d-2f-5b-5c
Capabilities: [2a0] #19
Kernel driver in use: uio_pci_generic
Kernel modules: nvme
root(a)waikikibeach111:/home/gangcao/spdk# cat test/nvmf/fio/nvmf.conf
# NVMf Target Configuration File
#
# Please write all parameters using ASCII.
# The parameter must be quoted if it includes whitespace.
#
# Configuration syntax:
# Leading whitespace is ignored.
# Lines starting with '#' are comments.
# Lines ending with '\' are concatenated with the next line.
# Bracketed ([]) names define sections
[Global]
# Users can restrict work items to only run on certain cores by
# specifying a ReactorMask. Default ReactorMask mask is defined as
# -c option in the 'ealargs' setting at beginning of file nvmf_tgt.c.
#ReactorMask 0x00FF
# Tracepoint group mask for spdk trace buffers
# Default: 0x0 (all tracepoint groups disabled)
# Set to 0xFFFFFFFFFFFFFFFF to enable all tracepoint groups.
#TpointGroupMask 0x0
# syslog facility
LogFacility "local7"
[Rpc]
# Defines whether to enable configuration via RPC.
# Default is disabled. Note that the RPC interface is not
# authenticated, so users should be careful about enabling
# RPC in non-trusted environments.
Enable no
# Listen address for the RPC service.
# May be an IP address or an absolute path to a Unix socket.
Listen 192.168.5.11
# Users may change this section to create a different number or size of
# malloc LUNs.
# This will generate 8 LUNs with a malloc-allocated backend.
# Each LUN will be size 64MB and these will be named
# Malloc0 through Malloc7. Not all LUNs defined here are necessarily
# used below.
#[Malloc]
#NumberOfLuns 8
#LunSizeInMB 64
# Users must change this section to match the /dev/sdX devices to be virtual
# NVMe devices. The devices are accessed using Linux AIO.
#[AIO]
#AIO /dev/sdb
#AIO /dev/sdc
# Define NVMf protocol global options
[Nvmf]
# Set the maximum number of submission and completion queues per session.
# Setting this to '8', for example, allows for 8 submission and 8 completion queues
# per session.
MaxQueuesPerSession 8
# Set the maximum number of outstanding I/O per queue.
MaxQueueDepth 512
# Set the maximum in-capsule data size. Must be a multiple of 16.
#InCapsuleDataSize 4096
# Set the maximum I/O size. Must be a multiple of 4096.
#MaxIOSize 131072
# Set the global acceptor lcore ID, lcores are numbered starting at 0.
#AcceptorCore 0
# Set how often the acceptor polls for incoming connections. The acceptor is also
# responsible for polling existing connections that have gone idle. 0 means continuously
# poll. Units in microseconds.
AcceptorPollRate 10000
[Nvme]
# Registers the application to receive timeout callback and to reset the controller.
#ResetControllerOnTimeout Yes
# Timeout value.
#NvmeTimeoutValue 30
# Set how often the admin queue is polled for asynchronous events
# Units in microseconds.
#AdminPollRate 100000
HotplugEnable no
# The Split virtual block device slices block devices into multiple smaller bdevs.
#[Split]
# Syntax:
# Split <bdev> <count> [<size_in_megabytes>]
# Split Malloc2 into two equally-sized portions, Malloc2p0 and Malloc2p1
#Split Malloc2 2
# Split Malloc3 into eight 1-megabyte portions, Malloc3p0 ... Malloc3p7,
# leaving the rest of the device inaccessible
#Split Malloc3 8 1
# Define an NVMf Subsystem.
# - NQN is required and must be unique.
# - Core may be set or not. If set, the specified subsystem will run on
# it, otherwise each subsystem will use a round-robin method to allocate
# core from available cores, lcores are numbered starting at 0.
# - Mode may be either "Direct" or "Virtual". Direct means that physical
# devices attached to the target will be presented to hosts as if they
# were directly attached to the host. No software emulation or command
# validation is performed. Virtual means that an NVMe controller is
# emulated in software and the namespaces it contains map to block devices
# on the target system. These block devices do not need to be NVMe devices.
# Only Direct mode is currently supported.
# - Between 1 and 255 Listen directives are allowed. This defines
# the addresses on which new connections may be accepted. The format
# is Listen <type> <address> where type currently can only be RDMA.
# - Between 0 and 255 Host directives are allowed. This defines the
# NQNs of allowed hosts. If no Host directive is specified, all hosts
# are allowed to connect.
# - Exactly 1 NVMe directive specifying an NVMe device by PCI BDF. The
# PCI domain:bus:device.function can be replaced by "*" to indicate
# any PCI device.
# Direct controller
[Subsystem1]
NQN nqn.2016-06.io.spdk:cnode2
Core 0
Mode Direct
Listen rdma 192.168.5.11:4420
#Host nqn.2016-06.io.spdk:init
NVMe 0000:84:00.0
root(a)waikikibeach111:~# dmesg -w -T
[Mon Oct 23 19:45:21 2017] mlx5_core 0000:03:00.0: firmware version: 12.18.1000
[Mon Oct 23 19:45:21 2017] (0000:03:00.0): E-Switch: Total vports 1, l2 table size(65536), per vport: max uc(1024) max mc(16384)
[Mon Oct 23 19:45:21 2017] mlx5_core 0000:03:00.0: Port module event: module 0, Cable plugged
[Mon Oct 23 19:45:21 2017] mlx5_core 0000:03:00.1: firmware version: 12.18.1000
[Mon Oct 23 19:45:22 2017] (0000:03:00.1): E-Switch: Total vports 1, l2 table size(65536), per vport: max uc(1024) max mc(16384)
[Mon Oct 23 19:45:22 2017] mlx5_core 0000:03:00.1: Port module event: module 1, Cable plugged
[Mon Oct 23 19:45:22 2017] mlx5_core 0000:03:00.0: MLX5E: StrdRq(0) RqSz(1024) StrdSz(1) RxCqeCmprss(0)
[Mon Oct 23 19:45:22 2017] mlx5_core 0000:03:00.1: MLX5E: StrdRq(0) RqSz(1024) StrdSz(1) RxCqeCmprss(0)
[Mon Oct 23 19:45:22 2017] mlx5_core 0000:03:00.0 ens1f0: renamed from eth0
[Mon Oct 23 19:45:23 2017] mlx5_core 0000:03:00.1 ens1f1: renamed from eth0
[Mon Oct 23 19:45:23 2017] mlx5_ib: Mellanox Connect-IB Infiniband driver v5.0-0
[Mon Oct 23 19:47:01 2017] mlx5_core 0000:03:00.0 ens1f0: Link up
[Mon Oct 23 19:47:03 2017] mlx5_core 0000:03:00.1 ens1f1: Link up
[Mon Oct 23 19:54:26 2017] nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 192.168.5.11:4420
[Mon Oct 23 19:54:49 2017] nvme nvme0: creating 7 I/O queues.
[Mon Oct 23 19:54:50 2017] nvme nvme0: new ctrl: NQN "nqn.2016-06.io.spdk:cnode2", addr 192.168.5.11:4420
root(a)waikikibeach111:/home/gangcao/fio# ./fio --bs=5120k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=300 --filename=/dev/nvme0n1 --name=read-phase --rw=randwrite
read-phase: (g=0): rw=randwrite, bs=(R) 5120KiB-5120KiB, (W) 5120KiB-5120KiB, (T) 5120KiB-5120KiB, ioengine=libaio, iodepth=16
...
fio-3.1
Starting 4 processes
Jobs: 4 (f=4): [w(4)][100.0%][r=0KiB/s,w=220MiB/s][r=0,w=44 IOPS][eta 00m:00s]
read-phase: (groupid=0, jobs=1): err= 0: pid=17076: Mon Oct 23 20:05:47 2017
write: IOPS=19, BW=96.2MiB/s (101MB/s)(28.2GiB/300399msec)
slat (usec): min=387, max=350086, avg=51937.15, stdev=38959.27
clat (msec): min=199, max=2162, avg=779.50, stdev=553.60
lat (msec): min=214, max=2287, avg=831.44, stdev=589.70
clat percentiles (msec):
| 1.00th=[ 209], 5.00th=[ 275], 10.00th=[ 279], 20.00th=[ 296],
| 30.00th=[ 321], 40.00th=[ 397], 50.00th=[ 489], 60.00th=[ 709],
| 70.00th=[ 1133], 80.00th=[ 1485], 90.00th=[ 1586], 95.00th=[ 1787],
| 99.00th=[ 1989], 99.50th=[ 2039], 99.90th=[ 2123], 99.95th=[ 2140],
| 99.99th=[ 2165]
bw ( KiB/s): min=10260, max=368640, per=25.13%, avg=98423.58, stdev=77390.22, samples=600
iops : min= 2, max= 72, avg=19.20, stdev=15.10, samples=600
lat (msec) : 250=3.70%, 500=47.17%, 750=11.84%, 1000=5.59%, 2000=30.83%
lat (msec) : >=2000=0.87%
cpu : usr=0.61%, sys=0.48%, ctx=27957, majf=0, minf=91
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=99.7%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwt: total=0,5777,0, short=0,0,0, dropped=0,0,0
latency : target=0, window=0, percentile=100.00%, depth=16
read-phase: (groupid=0, jobs=1): err= 0: pid=17077: Mon Oct 23 20:05:47 2017
write: IOPS=18, BW=94.9MiB/s (99.5MB/s)(27.8GiB/300346msec)
slat (usec): min=386, max=407847, avg=52605.77, stdev=39162.39
clat (msec): min=197, max=2168, avg=789.72, stdev=551.07
lat (msec): min=220, max=2291, avg=842.33, stdev=587.00
clat percentiles (msec):
| 1.00th=[ 255], 5.00th=[ 275], 10.00th=[ 284], 20.00th=[ 300],
| 30.00th=[ 342], 40.00th=[ 405], 50.00th=[ 493], 60.00th=[ 709],
| 70.00th=[ 1167], 80.00th=[ 1485], 90.00th=[ 1603], 95.00th=[ 1787],
| 99.00th=[ 1989], 99.50th=[ 2056], 99.90th=[ 2140], 99.95th=[ 2165],
| 99.99th=[ 2165]
bw ( KiB/s): min=10240, max=337920, per=24.78%, avg=97039.87, stdev=73967.65, samples=600
iops : min= 2, max= 66, avg=18.94, stdev=14.46, samples=600
lat (msec) : 250=0.98%, 500=49.23%, 750=12.01%, 1000=5.68%, 2000=31.20%
lat (msec) : >=2000=0.89%
cpu : usr=0.60%, sys=0.50%, ctx=27504, majf=0, minf=90
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=99.7%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwt: total=0,5702,0, short=0,0,0, dropped=0,0,0
latency : target=0, window=0, percentile=100.00%, depth=16
read-phase: (groupid=0, jobs=1): err= 0: pid=17078: Mon Oct 23 20:05:47 2017
write: IOPS=18, BW=94.5MiB/s (99.1MB/s)(27.7GiB/300403msec)
slat (usec): min=385, max=454024, avg=52851.88, stdev=39734.33
clat (msec): min=174, max=2136, avg=793.55, stdev=550.47
lat (msec): min=175, max=2272, avg=846.41, stdev=586.35
clat percentiles (msec):
| 1.00th=[ 275], 5.00th=[ 279], 10.00th=[ 284], 20.00th=[ 300],
| 30.00th=[ 355], 40.00th=[ 418], 50.00th=[ 498], 60.00th=[ 709],
| 70.00th=[ 1167], 80.00th=[ 1485], 90.00th=[ 1620], 95.00th=[ 1787],
| 99.00th=[ 1989], 99.50th=[ 2056], 99.90th=[ 2123], 99.95th=[ 2140],
| 99.99th=[ 2140]
bw ( KiB/s): min=10240, max=286720, per=24.69%, avg=96665.08, stdev=73206.19, samples=600
iops : min= 2, max= 56, avg=18.87, stdev=14.29, samples=600
lat (msec) : 250=0.12%, 500=49.97%, 750=11.96%, 1000=5.67%, 2000=31.30%
lat (msec) : >=2000=0.97%
cpu : usr=0.60%, sys=0.50%, ctx=27456, majf=0, minf=91
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=99.7%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwt: total=0,5677,0, short=0,0,0, dropped=0,0,0
latency : target=0, window=0, percentile=100.00%, depth=16
read-phase: (groupid=0, jobs=1): err= 0: pid=17079: Mon Oct 23 20:05:47 2017
write: IOPS=19, BW=96.9MiB/s (102MB/s)(28.4GiB/300394msec)
slat (usec): min=364, max=435971, avg=51560.26, stdev=39426.52
clat (msec): min=176, max=2179, avg=774.17, stdev=555.08
lat (msec): min=177, max=2299, avg=825.73, stdev=591.31
clat percentiles (msec):
| 1.00th=[ 207], 5.00th=[ 232], 10.00th=[ 279], 20.00th=[ 296],
| 30.00th=[ 313], 40.00th=[ 393], 50.00th=[ 485], 60.00th=[ 701],
| 70.00th=[ 1116], 80.00th=[ 1485], 90.00th=[ 1586], 95.00th=[ 1787],
| 99.00th=[ 1989], 99.50th=[ 2039], 99.90th=[ 2140], 99.95th=[ 2165],
| 99.99th=[ 2165]
bw ( KiB/s): min=10240, max=378880, per=25.30%, avg=99076.15, stdev=78461.72, samples=600
iops : min= 2, max= 74, avg=19.34, stdev=15.32, samples=600
lat (msec) : 250=5.07%, 500=46.18%, 750=11.84%, 1000=5.41%, 2000=30.73%
lat (msec) : >=2000=0.77%
cpu : usr=0.57%, sys=0.56%, ctx=28034, majf=0, minf=97
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=99.7%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwt: total=0,5819,0, short=0,0,0, dropped=0,0,0
latency : target=0, window=0, percentile=100.00%, depth=16
Run status group 0 (all jobs):
WRITE: bw=382MiB/s (401MB/s), 94.5MiB/s-96.9MiB/s (99.1MB/s-102MB/s), io=112GiB (120GB), run=300346-300403msec
Disk stats (read/write):
nvme0n1: ios=44/918923, merge=0/0, ticks=8/147977388, in_queue=148007124, util=100.00%
From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Saturday, October 21, 2017 11:23 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; John F. Kim <johnk(a)mellanox.com<mailto:johnk(a)mellanox.com>>; Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
I am using NVMe.
Thanks
Victor
________________________________
From: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Sent: Saturday, October 21, 2017 1:21:03 AM
To: Victor Banh; Storage Performance Development Kit; Harris, James R; John F. Kim
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
I am also using the SPDK v17.07.1 and DPDK 17.08 on my system to have the trying based on your IO configuration.
Let me check whether our colleague has this Ubuntu OS installed. If not, could you please have a try on CentOS.
Also what kind of backend device did you configured for the NVMe-oF, like Malloc, NVMe SSD or other?
Thanks,
Gang
From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Saturday, October 21, 2017 6:14 AM
To: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; John F. Kim <johnk(a)mellanox.com<mailto:johnk(a)mellanox.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Gang
Is it possible to try on Ubuntu OS?
Which verion of DPDK and SPDK are you using?
Where can I get them? Github?
Thanks
Victor
root(a)clientintel:~# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.3 LTS
Release: 16.04
Codename: xenial
root(a)clientintel:~# uname -a
Linux clientintel 4.12.0-041200-generic #201707022031 SMP Mon Jul 3 00:32:52 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Wednesday, October 18, 2017 11:52 PM
To: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
[root(a)node4 gangcao]# uname -a
Linux node4 4.10.1 #1 SMP Fri Mar 10 15:59:57 CST 2017 x86_64 x86_64 x86_64 GNU/Linux
[root(a)node4 gangcao]# lsb_release -a
LSB Version: :core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch
Distributor ID: CentOS
Description: CentOS Linux release 7.2.1511 (Core)
Release: 7.2.1511
Codename: Core
From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Thursday, October 19, 2017 2:14 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Can you give the OS version and kernel version again for target and client?
I couldn't compile dkdp without installed the latest Kernel on 4.12.X kernel on Ubuntu 16.04.
Thanks
Victor
________________________________
From: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Sent: Wednesday, October 18, 2017 10:59:19 PM
To: Victor Banh; Storage Performance Development Kit; Harris, James R
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
This server does not have OFED installed and it is in the loopback mode.
Found another server also in the loopback mode with ConnectX-3 and have OFED installed.
By the way, what SSD are you using? Maybe it is relating to the SSD? I've just run with 2048k for a short duration and seems no issue. Will run more time to see whether can hit this error.
[root(a)slave3 fio]# lspci | grep -i mell
08:00.0 Ethernet controller: Mellanox Technologies MT27520 Family [ConnectX-3 Pro]
[root(a)slave3 fio]# lsmod | grep -i mlx
mlx4_ib 159744 0
ib_core 208896 15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,ib_srpt,ib_ucm,rdma_ucm,mlx4_ib
mlx4_en 114688 0
mlx4_core 307200 2 mlx4_en,mlx4_ib
ptp 20480 3 ixgbe,igb,mlx4_en
[root(a)slave3 fio]# ofed_info
MLNX_OFED_LINUX-3.2-2.0.0.0 (OFED-3.2-2.0.0):
ar_mgr:
osm_plugins/ar_mgr/ar_mgr-1.0-0.30.ga1ea4b7.tar.gz
cc_mgr:
osm_plugins/cc_mgr/cc_mgr-1.0-0.29.ga1ea4b7.tar.gz
dapl:
dapl.git mlnx_ofed_3_1
commit c30fb6ce2cbc29d8ed4bde51437f7abb93378c78
dump_pr:
osm_plugins/dump_pr//dump_pr-1.0-0.25.ga1ea4b7.tar.gz
fabric-collector:
fabric_collector//fabric-collector-1.1.0.MLNX20140410.51b267e.tar.gz
fca:
mlnx_ofed_fca/fca-2.5.2431-1.src.rpm
hcoll:
mlnx_ofed_hcol/hcoll-3.4.807-1.src.rpm
ibacm:
mlnx_ofed/ibacm.git mlnx_ofed_3_2
commit 15ad8c13bdebbe62edea0b7df030710b65c14f7f
ibacm_ssa:
mlnx_ofed_ssa/acm/ibacm_ssa-0.0.9.3.MLNX20151203.50eb579.tar.gz
ibdump:
sniffer/sniffer-4.0.0-2/ibdump/linux/ibdump-4.0.0-2.tgz
ibsim:
mlnx_ofed_ibsim/ibsim-0.6-0.8.g9d76581.tar.gz
ibssa:
mlnx_ofed_ssa/distrib/ibssa-0.0.9.3.MLNX20151203.50eb579.tar.gz
ibutils:
ofed-1.5.3-rpms/ibutils/ibutils-1.5.7.1-0.12.gdcaeae2.tar.gz
ibutils2:
ibutils2/ibutils2-2.1.1-0.76.MLNX20160222.gd366c7b.tar.gz
infiniband-diags:
mlnx_ofed_infiniband_diags/infiniband-diags-1.6.6.MLNX20151130.7f0213e.tar.gz
infinipath-psm:
mlnx_ofed_infinipath-psm/infinipath-psm-3.3-2_g6f42cdb_open.tar.gz
iser:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d
isert:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d
kernel-mft:
mlnx_ofed_mft/kernel-mft-4.3.0-25.src.rpm
knem:
knem.git mellanox-master
commit f143ee19a575cd42a334422fa8bd329d671238db
libibcm:
mlnx_ofed/libibcm.git mlnx_ofed_3_0
commit d7d485df305e6536711485bd7e477668e77d8320
libibmad:
mlnx_ofed_libibmad/libibmad-1.3.12.MLNX20151122.d140cb1.tar.gz
libibprof:
mlnx_ofed_libibprof/libibprof-1.1.22-1.src.rpm
libibumad:
mlnx_ofed_libibumad/libibumad-1.3.10.2.MLNX20150406.966500d.tar.gz
libibverbs:
mlnx_ofed/libibverbs.git mlnx_ofed_3_2_2
commit 217a77686f4861229f0e4b94485a13f024634caf
libmlx4:
mlnx_ofed/libmlx4.git mlnx_ofed_3_2_2
commit dda6d7ae1e6e3779a485ebdd0a882f4bcbd027a6
libmlx5:
mlnx_ofed/libmlx5.git mlnx_ofed_3_2_2
commit d0c8645359e0f0aba0408b2d344f3b418d27019b
libopensmssa:
mlnx_ofed_ssa/plugin/libopensmssa-0.0.9.3.MLNX20151203.50eb579.tar.gz
librdmacm:
mlnx_ofed/librdmacm.git mlnx_ofed_3_2_2
commit 6bd430fed9e7b3d57a1876c040431ce7295c7703
libsdp:
libsdp.git mlnx_ofed_3_0
commit fbd01dfff05f42d6b82506e7dbf4bc6b7e6a59a4
libvma:
vma/source_rpms//libvma-7.0.14-0.src.rpm
mlnx-ethtool:
upstream/ethtool.git for-upstream
commit ac0cf295abe0c0832f0711fed66ab9601c8b2513
mlnx-ofa_kernel:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d
mpi-selector:
ofed-1.5.3-rpms/mpi-selector/mpi-selector-1.0.3-1.src.rpm
mpitests:
mlnx_ofed_mpitest/mpitests-3.2.17-e1c7f2f.src.rpm
mstflint:
mlnx_ofed_mstflint/mstflint-4.3.0-1.49.g9b9af70.tar.gz
multiperf:
mlnx_ofed_multiperf/multiperf-3.0-0.10.gda89e8c.tar.gz
mvapich2:
mlnx_ofed_mvapich2/mvapich2-2.2a-1.src.rpm
mxm:
mlnx_ofed_mxm/mxm-3.4.3079-1.src.rpm
ofed-docs:
docs.git mlnx_ofed-3.2
commit ea3386416f9f7130edd2c70fc3424cb2cda50f7d
openmpi:
mlnx_ofed_ompi_1.8/openmpi-1.10.3a1-1.src.rpm
opensm:
mlnx_ofed_opensm/opensm-4.6.1.MLNX20160112.774e977.tar.gz
perftest:
mlnx_ofed_perftest/perftest-3.0-0.18.gb464d59.tar.gz
qperf:
mlnx_ofed_qperf/qperf-0.4.9.tar.gz
rds-tools:
rds-tools.git mlnx_ofed_2_4
commit 299420ca25cf9996bc0748e3bc4b08748996ba49
sdpnetstat:
sdpnetstat.git mlnx_ofed_3_0
commit 3cf409a7cc07e5c71f9640eddbb801ece21b4169
sockperf:
sockperf/sockperf-2.7-43.git3ee62bd8107a.src.rpm
srp:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d
srptools:
srptools/srptools-1.0.2-12.src.rpm
Installed Packages:
-------------------
infiniband-diags
librdmacm
libmlx4
libibverbs-utils
mpi-selector
libibmad-devel
sdpnetstat
knem
libibumad-devel
libsdp
mlnx-ethtool
libibverbs-debuginfo
mlnx-ofa_kernel-modules
srp
opensm
mstflint
cc_mgr
libibmad
libibverbs
kernel-mft
libibverbs-devel-static
libibumad
librdmacm-devel
mlnx-ofa_kernel
libsdp-devel
ibutils2
mlnxofed-docs
libibmad-static
iser
opensm-devel
dump_pr
libibumad-static
rds-tools
libmlx4-debuginfo
mlnx-ofa_kernel-devel
opensm-libs
opensm-static
ar_mgr
dapl-devel-static
infiniband-diags-compat
libibverbs-devel
ibsim
[root(a)slave3 fio]# ibstat
CA 'mlx4_0'
CA type: MT4103
Number of ports: 2
Firmware version: 2.35.5100
Hardware version: 0
Node GUID: 0x248a0703006090e0
System image GUID: 0x248a0703006090e0
Port 1:
State: Active
Physical state: LinkUp
Rate: 40
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x04010000
Port GUID: 0x268a07fffe6090e0
Link layer: Ethernet
Port 2:
State: Active
Physical state: LinkUp
Rate: 40
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x04010000
Port GUID: 0x268a07fffe6090e1
Link layer: Ethernet
[root(a)slave3 fio]# ibv_devinfo
hca_id: mlx4_0
transport: InfiniBand (0)
fw_ver: 2.35.5100
node_guid: 248a:0703:0060:90e0
sys_image_guid: 248a:0703:0060:90e0
vendor_id: 0x02c9
vendor_part_id: 4103
hw_ver: 0x0
board_id: MT_1090111023
phys_port_cnt: 2
port: 1
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 1024 (3)
sm_lid: 0
port_lid: 0
port_lmc: 0x00
link_layer: Ethernet
port: 2
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 1024 (3)
sm_lid: 0
port_lid: 0
port_lmc: 0x00
link_layer: Ethernet
From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Thursday, October 19, 2017 12:34 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Do you install Mellanox OFED on the Target and Client server?
Can you run ibstat on both servers?
Thanks
Victor
________________________________
From: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Sent: Wednesday, October 18, 2017 8:59:28 PM
To: Victor Banh; Storage Performance Development Kit; Harris, James R
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
I've just tried the SPDK v17.07.1 and DPDK v17.08.
nvme version: 1.1.38.gfaab
fio version: 3.1
Tried the 512k and 1024k IO size and there is no error. demsg information as following.
So that there may be other difference here? Looks like you are using ConnectX-5 while I am using ConnectX-4?
Other related information:
[root(a)node4 fio]# lspci | grep -i mell
81:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
81:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
[root(a)node4 fio]# lsmod | grep -i mlx
mlx5_ib 172032 0
ib_core 200704 15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,mlx5_ib,ib_srpt,ib_ucm,rdma_ucm
mlx5_core 380928 1 mlx5_ib
ptp 20480 3 ixgbe,igb,mlx5_core
[root(a)node4 fio]# uname -a
Linux node4 4.10.1 #1 SMP Fri Mar 10 15:59:57 CST 2017 x86_64 x86_64 x86_64 GNU/Linux
[577707.543326] nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 192.168.100.8:4420
[577730.854540] detected loopback device
[577730.893761] nvme nvme0: creating 7 I/O queues.
[577730.893797] detected loopback device
[577730.898611] detected loopback device
[577730.908917] detected loopback device
[577730.919073] detected loopback device
[577730.928922] detected loopback device
[577730.938679] detected loopback device
[577730.948365] detected loopback device
[577731.146290] nvme nvme0: new ctrl: NQN "nqn.2016-06.io.spdk:cnode2", addr 192.168.100.8:4420
Thanks,
Gang
From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Thursday, October 19, 2017 9:43 AM
To: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Gang
Any update?
Do you see any error message from "dmesg" with 512k block size running fio?
Thanks
Victor
From: Victor Banh
Sent: Tuesday, October 17, 2017 7:37 PM
To: 'Cao, Gang' <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Gang
spdk-17.07.1 and dpdk-17.08
Thanks
Victor
From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Monday, October 16, 2017 8:51 PM
To: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?
Thanks,
Gang
From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Tuesday, October 17, 2017 5:30 AM
To: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Cao
Do you see any message from dmesg?
I tried this fio version and still saw these error message from dmesg.
fio-3.1
[869053.218235] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218250] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218259] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218263] ldm_validate_partition_table(): Disk read failed.
[869053.218269] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218277] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218285] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218292] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218296] Dev nvme2n1: unable to read RDB block 0
[869053.218303] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218311] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218323] Buffer I/O error on dev nvme2n1, logical block 3, async page read
[869053.218338] nvme2n1: unable to read partition table
[869053.246126] nvme2n1: detected capacity change from -62111005559226368 to -62042256479092736
[869053.246195] ldm_validate_partition_table(): Disk read failed.
[869053.246217] Dev nvme2n1: unable to read RDB block 0
From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Monday, October 09, 2017 10:59 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Cc: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
Thanks for your detailed information on the testing.
I've tried the latest SPDK code and with latest fio-3.1-20-g132b and fio-2.19. It seems like no this kind of error.
Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?
fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme0n1 --name=read-phase --rw=randwrite
read-phase: (g=0): rw=randwrite, bs=(R) 512KiB-512KiB, (W) 512KiB-512KiB, (T) 512KiB-512KiB, ioengine=libaio, iodepth=16
...
fio-3.1-20-g132b
Starting 4 processes
Jobs: 4 (f=4): [w(4)][100.0%][r=0KiB/s,w=1592MiB/s][r=0,w=3183 IOPS][eta 00m:00s]
read-phase: (groupid=0, jobs=1): err= 0: pid=46378: Tue Oct 10 01:23:39 2017
My NIC information:
[root(a)node4 nvme-cli-gerrit]# lsmod | grep -i mlx
mlx5_ib 172032 0
ib_core 200704 15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,mlx5_ib,ib_srpt,ib_ucm,rdma_ucm
mlx5_core 380928 1 mlx5_ib
ptp 20480 3 ixgbe,igb,mlx5_core
[root(a)node4 nvme-cli-gerrit]# lspci | grep -i mell
81:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
81:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Victor Banh
Sent: Friday, October 6, 2017 2:41 PM
To: Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio
From: Harris, James R [mailto:james.r.harris(a)intel.com]
Sent: Friday, October 06, 2017 2:32 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Cc: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio
(cc Victor)
From: James Harris <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Date: Thursday, October 5, 2017 at 1:59 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
Could you provide a few more details? This will help the list to provide some ideas.
1) On the client, are you using the SPDK NVMe-oF initiator or the kernel initiator?
Kernel initiator, run these commands on client server.
modprobe mlx5_ib
modprobe nvme-rdma
nvme discover -t rdma -a 192.168.10.11 -s 4420
nvme connect -t rdma -n nqn.2016-06.io.spdk:nvme-subsystem-1 -a 192.168.10.11 -s 4420
2) Can you provide the fio configuration file or command line? Just so we can have more specifics on "bigger block size".
fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme1n1 --name=read-phase --rw=randwrite
3) Any details on the HW setup - specifically details on the RDMA NIC (or if you're using SW RoCE).
Nvmf.conf on target server
[Global]
Comment "Global section"
ReactorMask 0xff00
[Rpc]
Enable No
Listen 127.0.0.1
[Nvmf]
MaxQueuesPerSession 8
MaxQueueDepth 128
[Subsystem1]
NQN nqn.2016-06.io.spdk:nvme-subsystem-1
Core 9
Mode Direct
Listen RDMA 192.168.10.11:4420
NVMe 0000:82:00.0
SN S2PMNAAH400039
It is RDMA NIC, ConnectX 5, Intel CPU Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
NUMA node0 CPU(s): 0-7
NUMA node1 CPU(s): 8-15
Thanks,
-Jim
From: SPDK <spdk-bounces(a)lists.01.org<mailto:spdk-bounces(a)lists.01.org>> on behalf of Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Date: Thursday, October 5, 2017 at 11:26 AM
To: "spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>" <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: [SPDK] Buffer I/O error on bigger block size running fio
Hi
I have SPDK NVMeoF and keep getting error with bigger block size with fio on randwrite tests.
I am using Ubuntu 16.04 with kernel version 4.12.0-041200-generic on target and client.
The DPDK is 17.08 and SPDK is 17.07.1.
Thanks
Victor
[46905.233553] perf: interrupt took too long (2503 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
[48285.159186] blk_update_request: I/O error, dev nvme1n1, sector 2507351968
[48285.159207] blk_update_request: I/O error, dev nvme1n1, sector 1301294496
[48285.159226] blk_update_request: I/O error, dev nvme1n1, sector 1947371168
[48285.159239] blk_update_request: I/O error, dev nvme1n1, sector 1891797568
[48285.159252] blk_update_request: I/O error, dev nvme1n1, sector 10833824
[48285.159265] blk_update_request: I/O error, dev nvme1n1, sector 614937152
[48285.159277] blk_update_request: I/O error, dev nvme1n1, sector 1872305088
[48285.159290] blk_update_request: I/O error, dev nvme1n1, sector 1504491040
[48285.159299] blk_update_request: I/O error, dev nvme1n1, sector 1182136128
[48285.159308] blk_update_request: I/O error, dev nvme1n1, sector 1662985792
[48285.191185] nvme nvme1: Reconnecting in 10 seconds...
[48285.191254] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191291] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191305] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191314] ldm_validate_partition_table(): Disk read failed.
[48285.191320] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191327] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191335] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191342] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191347] Dev nvme1n1: unable to read RDB block 0
[48285.191353] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191360] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191375] Buffer I/O error on dev nvme1n1, logical block 3, async page read
[48285.191389] nvme1n1: unable to read partition table
[48285.223197] nvme1n1: detected capacity change from 1600321314816 to 0
[48289.623192] nvme1n1: detected capacity change from 0 to -65647705833078784
[48289.623411] ldm_validate_partition_table(): Disk read failed.
[48289.623447] Dev nvme1n1: unable to read RDB block 0
[48289.623486] nvme1n1: unable to read partition table
[48289.643305] ldm_validate_partition_table(): Disk read failed.
[48289.643328] Dev nvme1n1: unable to read RDB block 0
[48289.643373] nvme1n1: unable to read partition table
[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 173003 bytes --]
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [SPDK] Buffer I/O error on bigger block size running fio
@ 2017-10-24 4:57 Victor Banh
0 siblings, 0 replies; 24+ messages in thread
From: Victor Banh @ 2017-10-24 4:57 UTC (permalink / raw)
To: spdk
[-- Attachment #1: Type: text/plain, Size: 41097 bytes --]
Can you run dmesg on client?
Thanks
Victor
________________________________
From: Cao, Gang <gang.cao(a)intel.com>
Sent: Monday, October 23, 2017 9:25:16 PM
To: Victor Banh; Storage Performance Development Kit; Harris, James R; John F. Kim
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
I’ve just successfully run the SPDK NVMe-oF target in the loopback mode with Ubuntu 16.04 and Kernel 4.13.7. Detailed information as following.
Also tried the 5120K big IO size from the FIO for 5 minutes.
root(a)waikikibeach111:~# uname -a
Linux waikikibeach111 4.13.7 #1 SMP Sun Oct 22 23:24:08 PDT 2017 x86_64 x86_64 x86_64 GNU/Linux
root(a)waikikibeach111:~# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.3 LTS
Release: 16.04
Codename: xenial
root(a)waikikibeach111:~# lspci | grep -i mellanox
03:00.0 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4]
03:00.1 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4]
root(a)waikikibeach111:~# ibstat
CA 'mlx5_0'
CA type: MT4115
Number of ports: 1
Firmware version: 12.18.1000
Hardware version: 0
Node GUID: 0x248a07030049d218
System image GUID: 0x248a07030049d218
Port 1:
State: Active
Physical state: LinkUp
Rate: 100
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x04010000
Port GUID: 0x268a07fffe49d218
Link layer: Ethernet
CA 'mlx5_1'
CA type: MT4115
Number of ports: 1
Firmware version: 12.18.1000
Hardware version: 0
Node GUID: 0x248a07030049d219
System image GUID: 0x248a07030049d218
Port 1:
State: Active
Physical state: LinkUp
Rate: 100
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x04010000
Port GUID: 0x268a07fffe49d219
Link layer: Ethernet
root(a)waikikibeach111:~# cd /home/gangcao/spdk
root(a)waikikibeach111:/home/gangcao/spdk# git branch
* (HEAD detached at v17.07.1) --> DPDK 17.08
master
root(a)waikikibeach111:/home/gangcao/spdk# nvme --version
nvme version 1.2
root(a)waikikibeach111:/home/gangcao/spdk# /home/gangcao/fio/fio --version
fio-3.1
root(a)waikikibeach111:/home/gangcao/spdk# lspci | grep -i volatile
84:00.0 Non-Volatile memory controller: Intel Corporation PCIe Data Center SSD (rev 01)
root(a)waikikibeach111:/home/gangcao/spdk# lspci -s 84:00.0 -v
84:00.0 Non-Volatile memory controller: Intel Corporation PCIe Data Center SSD (rev 01) (prog-if 02 [NVM Express])
Subsystem: Intel Corporation DC P3700 SSD [2.5" SFF]
Physical Slot: 37
Flags: bus master, fast devsel, latency 0, IRQ 39
Memory at fbc10000 (64-bit, non-prefetchable) [size=16K]
Expansion ROM at fbc00000 [disabled] [size=64K]
Capabilities: [40] Power Management version 3
Capabilities: [50] MSI-X: Enable- Count=32 Masked-
Capabilities: [60] Express Endpoint, MSI 00
Capabilities: [100] Advanced Error Reporting
Capabilities: [150] Virtual Channel
Capabilities: [180] Power Budgeting <?>
Capabilities: [190] Alternative Routing-ID Interpretation (ARI)
Capabilities: [270] Device Serial Number 55-cd-2e-41-4d-2f-5b-5c
Capabilities: [2a0] #19
Kernel driver in use: uio_pci_generic
Kernel modules: nvme
root(a)waikikibeach111:/home/gangcao/spdk# cat test/nvmf/fio/nvmf.conf
# NVMf Target Configuration File
#
# Please write all parameters using ASCII.
# The parameter must be quoted if it includes whitespace.
#
# Configuration syntax:
# Leading whitespace is ignored.
# Lines starting with '#' are comments.
# Lines ending with '\' are concatenated with the next line.
# Bracketed ([]) names define sections
[Global]
# Users can restrict work items to only run on certain cores by
# specifying a ReactorMask. Default ReactorMask mask is defined as
# -c option in the 'ealargs' setting at beginning of file nvmf_tgt.c.
#ReactorMask 0x00FF
# Tracepoint group mask for spdk trace buffers
# Default: 0x0 (all tracepoint groups disabled)
# Set to 0xFFFFFFFFFFFFFFFF to enable all tracepoint groups.
#TpointGroupMask 0x0
# syslog facility
LogFacility "local7"
[Rpc]
# Defines whether to enable configuration via RPC.
# Default is disabled. Note that the RPC interface is not
# authenticated, so users should be careful about enabling
# RPC in non-trusted environments.
Enable no
# Listen address for the RPC service.
# May be an IP address or an absolute path to a Unix socket.
Listen 192.168.5.11
# Users may change this section to create a different number or size of
# malloc LUNs.
# This will generate 8 LUNs with a malloc-allocated backend.
# Each LUN will be size 64MB and these will be named
# Malloc0 through Malloc7. Not all LUNs defined here are necessarily
# used below.
#[Malloc]
#NumberOfLuns 8
#LunSizeInMB 64
# Users must change this section to match the /dev/sdX devices to be virtual
# NVMe devices. The devices are accessed using Linux AIO.
#[AIO]
#AIO /dev/sdb
#AIO /dev/sdc
# Define NVMf protocol global options
[Nvmf]
# Set the maximum number of submission and completion queues per session.
# Setting this to '8', for example, allows for 8 submission and 8 completion queues
# per session.
MaxQueuesPerSession 8
# Set the maximum number of outstanding I/O per queue.
MaxQueueDepth 512
# Set the maximum in-capsule data size. Must be a multiple of 16.
#InCapsuleDataSize 4096
# Set the maximum I/O size. Must be a multiple of 4096.
#MaxIOSize 131072
# Set the global acceptor lcore ID, lcores are numbered starting at 0.
#AcceptorCore 0
# Set how often the acceptor polls for incoming connections. The acceptor is also
# responsible for polling existing connections that have gone idle. 0 means continuously
# poll. Units in microseconds.
AcceptorPollRate 10000
[Nvme]
# Registers the application to receive timeout callback and to reset the controller.
#ResetControllerOnTimeout Yes
# Timeout value.
#NvmeTimeoutValue 30
# Set how often the admin queue is polled for asynchronous events
# Units in microseconds.
#AdminPollRate 100000
HotplugEnable no
# The Split virtual block device slices block devices into multiple smaller bdevs.
#[Split]
# Syntax:
# Split <bdev> <count> [<size_in_megabytes>]
# Split Malloc2 into two equally-sized portions, Malloc2p0 and Malloc2p1
#Split Malloc2 2
# Split Malloc3 into eight 1-megabyte portions, Malloc3p0 ... Malloc3p7,
# leaving the rest of the device inaccessible
#Split Malloc3 8 1
# Define an NVMf Subsystem.
# - NQN is required and must be unique.
# - Core may be set or not. If set, the specified subsystem will run on
# it, otherwise each subsystem will use a round-robin method to allocate
# core from available cores, lcores are numbered starting at 0.
# - Mode may be either "Direct" or "Virtual". Direct means that physical
# devices attached to the target will be presented to hosts as if they
# were directly attached to the host. No software emulation or command
# validation is performed. Virtual means that an NVMe controller is
# emulated in software and the namespaces it contains map to block devices
# on the target system. These block devices do not need to be NVMe devices.
# Only Direct mode is currently supported.
# - Between 1 and 255 Listen directives are allowed. This defines
# the addresses on which new connections may be accepted. The format
# is Listen <type> <address> where type currently can only be RDMA.
# - Between 0 and 255 Host directives are allowed. This defines the
# NQNs of allowed hosts. If no Host directive is specified, all hosts
# are allowed to connect.
# - Exactly 1 NVMe directive specifying an NVMe device by PCI BDF. The
# PCI domain:bus:device.function can be replaced by "*" to indicate
# any PCI device.
# Direct controller
[Subsystem1]
NQN nqn.2016-06.io.spdk:cnode2
Core 0
Mode Direct
Listen rdma 192.168.5.11:4420
#Host nqn.2016-06.io.spdk:init
NVMe 0000:84:00.0
root(a)waikikibeach111:~# dmesg -w -T
[Mon Oct 23 19:45:21 2017] mlx5_core 0000:03:00.0: firmware version: 12.18.1000
[Mon Oct 23 19:45:21 2017] (0000:03:00.0): E-Switch: Total vports 1, l2 table size(65536), per vport: max uc(1024) max mc(16384)
[Mon Oct 23 19:45:21 2017] mlx5_core 0000:03:00.0: Port module event: module 0, Cable plugged
[Mon Oct 23 19:45:21 2017] mlx5_core 0000:03:00.1: firmware version: 12.18.1000
[Mon Oct 23 19:45:22 2017] (0000:03:00.1): E-Switch: Total vports 1, l2 table size(65536), per vport: max uc(1024) max mc(16384)
[Mon Oct 23 19:45:22 2017] mlx5_core 0000:03:00.1: Port module event: module 1, Cable plugged
[Mon Oct 23 19:45:22 2017] mlx5_core 0000:03:00.0: MLX5E: StrdRq(0) RqSz(1024) StrdSz(1) RxCqeCmprss(0)
[Mon Oct 23 19:45:22 2017] mlx5_core 0000:03:00.1: MLX5E: StrdRq(0) RqSz(1024) StrdSz(1) RxCqeCmprss(0)
[Mon Oct 23 19:45:22 2017] mlx5_core 0000:03:00.0 ens1f0: renamed from eth0
[Mon Oct 23 19:45:23 2017] mlx5_core 0000:03:00.1 ens1f1: renamed from eth0
[Mon Oct 23 19:45:23 2017] mlx5_ib: Mellanox Connect-IB Infiniband driver v5.0-0
[Mon Oct 23 19:47:01 2017] mlx5_core 0000:03:00.0 ens1f0: Link up
[Mon Oct 23 19:47:03 2017] mlx5_core 0000:03:00.1 ens1f1: Link up
[Mon Oct 23 19:54:26 2017] nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 192.168.5.11:4420
[Mon Oct 23 19:54:49 2017] nvme nvme0: creating 7 I/O queues.
[Mon Oct 23 19:54:50 2017] nvme nvme0: new ctrl: NQN "nqn.2016-06.io.spdk:cnode2", addr 192.168.5.11:4420
root(a)waikikibeach111:/home/gangcao/fio# ./fio --bs=5120k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=300 --filename=/dev/nvme0n1 --name=read-phase --rw=randwrite
read-phase: (g=0): rw=randwrite, bs=(R) 5120KiB-5120KiB, (W) 5120KiB-5120KiB, (T) 5120KiB-5120KiB, ioengine=libaio, iodepth=16
...
fio-3.1
Starting 4 processes
Jobs: 4 (f=4): [w(4)][100.0%][r=0KiB/s,w=220MiB/s][r=0,w=44 IOPS][eta 00m:00s]
read-phase: (groupid=0, jobs=1): err= 0: pid=17076: Mon Oct 23 20:05:47 2017
write: IOPS=19, BW=96.2MiB/s (101MB/s)(28.2GiB/300399msec)
slat (usec): min=387, max=350086, avg=51937.15, stdev=38959.27
clat (msec): min=199, max=2162, avg=779.50, stdev=553.60
lat (msec): min=214, max=2287, avg=831.44, stdev=589.70
clat percentiles (msec):
| 1.00th=[ 209], 5.00th=[ 275], 10.00th=[ 279], 20.00th=[ 296],
| 30.00th=[ 321], 40.00th=[ 397], 50.00th=[ 489], 60.00th=[ 709],
| 70.00th=[ 1133], 80.00th=[ 1485], 90.00th=[ 1586], 95.00th=[ 1787],
| 99.00th=[ 1989], 99.50th=[ 2039], 99.90th=[ 2123], 99.95th=[ 2140],
| 99.99th=[ 2165]
bw ( KiB/s): min=10260, max=368640, per=25.13%, avg=98423.58, stdev=77390.22, samples=600
iops : min= 2, max= 72, avg=19.20, stdev=15.10, samples=600
lat (msec) : 250=3.70%, 500=47.17%, 750=11.84%, 1000=5.59%, 2000=30.83%
lat (msec) : >=2000=0.87%
cpu : usr=0.61%, sys=0.48%, ctx=27957, majf=0, minf=91
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=99.7%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwt: total=0,5777,0, short=0,0,0, dropped=0,0,0
latency : target=0, window=0, percentile=100.00%, depth=16
read-phase: (groupid=0, jobs=1): err= 0: pid=17077: Mon Oct 23 20:05:47 2017
write: IOPS=18, BW=94.9MiB/s (99.5MB/s)(27.8GiB/300346msec)
slat (usec): min=386, max=407847, avg=52605.77, stdev=39162.39
clat (msec): min=197, max=2168, avg=789.72, stdev=551.07
lat (msec): min=220, max=2291, avg=842.33, stdev=587.00
clat percentiles (msec):
| 1.00th=[ 255], 5.00th=[ 275], 10.00th=[ 284], 20.00th=[ 300],
| 30.00th=[ 342], 40.00th=[ 405], 50.00th=[ 493], 60.00th=[ 709],
| 70.00th=[ 1167], 80.00th=[ 1485], 90.00th=[ 1603], 95.00th=[ 1787],
| 99.00th=[ 1989], 99.50th=[ 2056], 99.90th=[ 2140], 99.95th=[ 2165],
| 99.99th=[ 2165]
bw ( KiB/s): min=10240, max=337920, per=24.78%, avg=97039.87, stdev=73967.65, samples=600
iops : min= 2, max= 66, avg=18.94, stdev=14.46, samples=600
lat (msec) : 250=0.98%, 500=49.23%, 750=12.01%, 1000=5.68%, 2000=31.20%
lat (msec) : >=2000=0.89%
cpu : usr=0.60%, sys=0.50%, ctx=27504, majf=0, minf=90
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=99.7%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwt: total=0,5702,0, short=0,0,0, dropped=0,0,0
latency : target=0, window=0, percentile=100.00%, depth=16
read-phase: (groupid=0, jobs=1): err= 0: pid=17078: Mon Oct 23 20:05:47 2017
write: IOPS=18, BW=94.5MiB/s (99.1MB/s)(27.7GiB/300403msec)
slat (usec): min=385, max=454024, avg=52851.88, stdev=39734.33
clat (msec): min=174, max=2136, avg=793.55, stdev=550.47
lat (msec): min=175, max=2272, avg=846.41, stdev=586.35
clat percentiles (msec):
| 1.00th=[ 275], 5.00th=[ 279], 10.00th=[ 284], 20.00th=[ 300],
| 30.00th=[ 355], 40.00th=[ 418], 50.00th=[ 498], 60.00th=[ 709],
| 70.00th=[ 1167], 80.00th=[ 1485], 90.00th=[ 1620], 95.00th=[ 1787],
| 99.00th=[ 1989], 99.50th=[ 2056], 99.90th=[ 2123], 99.95th=[ 2140],
| 99.99th=[ 2140]
bw ( KiB/s): min=10240, max=286720, per=24.69%, avg=96665.08, stdev=73206.19, samples=600
iops : min= 2, max= 56, avg=18.87, stdev=14.29, samples=600
lat (msec) : 250=0.12%, 500=49.97%, 750=11.96%, 1000=5.67%, 2000=31.30%
lat (msec) : >=2000=0.97%
cpu : usr=0.60%, sys=0.50%, ctx=27456, majf=0, minf=91
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=99.7%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwt: total=0,5677,0, short=0,0,0, dropped=0,0,0
latency : target=0, window=0, percentile=100.00%, depth=16
read-phase: (groupid=0, jobs=1): err= 0: pid=17079: Mon Oct 23 20:05:47 2017
write: IOPS=19, BW=96.9MiB/s (102MB/s)(28.4GiB/300394msec)
slat (usec): min=364, max=435971, avg=51560.26, stdev=39426.52
clat (msec): min=176, max=2179, avg=774.17, stdev=555.08
lat (msec): min=177, max=2299, avg=825.73, stdev=591.31
clat percentiles (msec):
| 1.00th=[ 207], 5.00th=[ 232], 10.00th=[ 279], 20.00th=[ 296],
| 30.00th=[ 313], 40.00th=[ 393], 50.00th=[ 485], 60.00th=[ 701],
| 70.00th=[ 1116], 80.00th=[ 1485], 90.00th=[ 1586], 95.00th=[ 1787],
| 99.00th=[ 1989], 99.50th=[ 2039], 99.90th=[ 2140], 99.95th=[ 2165],
| 99.99th=[ 2165]
bw ( KiB/s): min=10240, max=378880, per=25.30%, avg=99076.15, stdev=78461.72, samples=600
iops : min= 2, max= 74, avg=19.34, stdev=15.32, samples=600
lat (msec) : 250=5.07%, 500=46.18%, 750=11.84%, 1000=5.41%, 2000=30.73%
lat (msec) : >=2000=0.77%
cpu : usr=0.57%, sys=0.56%, ctx=28034, majf=0, minf=97
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=99.7%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwt: total=0,5819,0, short=0,0,0, dropped=0,0,0
latency : target=0, window=0, percentile=100.00%, depth=16
Run status group 0 (all jobs):
WRITE: bw=382MiB/s (401MB/s), 94.5MiB/s-96.9MiB/s (99.1MB/s-102MB/s), io=112GiB (120GB), run=300346-300403msec
Disk stats (read/write):
nvme0n1: ios=44/918923, merge=0/0, ticks=8/147977388, in_queue=148007124, util=100.00%
From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Saturday, October 21, 2017 11:23 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org>; Harris, James R <james.r.harris(a)intel.com>; John F. Kim <johnk(a)mellanox.com>; Cao, Gang <gang.cao(a)intel.com>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
I am using NVMe.
Thanks
Victor
________________________________
From: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Sent: Saturday, October 21, 2017 1:21:03 AM
To: Victor Banh; Storage Performance Development Kit; Harris, James R; John F. Kim
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
I am also using the SPDK v17.07.1 and DPDK 17.08 on my system to have the trying based on your IO configuration.
Let me check whether our colleague has this Ubuntu OS installed. If not, could you please have a try on CentOS.
Also what kind of backend device did you configured for the NVMe-oF, like Malloc, NVMe SSD or other?
Thanks,
Gang
From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Saturday, October 21, 2017 6:14 AM
To: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; John F. Kim <johnk(a)mellanox.com<mailto:johnk(a)mellanox.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Gang
Is it possible to try on Ubuntu OS?
Which verion of DPDK and SPDK are you using?
Where can I get them? Github?
Thanks
Victor
root(a)clientintel:~# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.3 LTS
Release: 16.04
Codename: xenial
root(a)clientintel:~# uname -a
Linux clientintel 4.12.0-041200-generic #201707022031 SMP Mon Jul 3 00:32:52 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Wednesday, October 18, 2017 11:52 PM
To: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
[root(a)node4 gangcao]# uname -a
Linux node4 4.10.1 #1 SMP Fri Mar 10 15:59:57 CST 2017 x86_64 x86_64 x86_64 GNU/Linux
[root(a)node4 gangcao]# lsb_release -a
LSB Version: :core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch
Distributor ID: CentOS
Description: CentOS Linux release 7.2.1511 (Core)
Release: 7.2.1511
Codename: Core
From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Thursday, October 19, 2017 2:14 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Can you give the OS version and kernel version again for target and client?
I couldn't compile dkdp without installed the latest Kernel on 4.12.X kernel on Ubuntu 16.04.
Thanks
Victor
________________________________
From: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Sent: Wednesday, October 18, 2017 10:59:19 PM
To: Victor Banh; Storage Performance Development Kit; Harris, James R
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
This server does not have OFED installed and it is in the loopback mode.
Found another server also in the loopback mode with ConnectX-3 and have OFED installed.
By the way, what SSD are you using? Maybe it is relating to the SSD? I’ve just run with 2048k for a short duration and seems no issue. Will run more time to see whether can hit this error.
[root(a)slave3 fio]# lspci | grep -i mell
08:00.0 Ethernet controller: Mellanox Technologies MT27520 Family [ConnectX-3 Pro]
[root(a)slave3 fio]# lsmod | grep -i mlx
mlx4_ib 159744 0
ib_core 208896 15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,ib_srpt,ib_ucm,rdma_ucm,mlx4_ib
mlx4_en 114688 0
mlx4_core 307200 2 mlx4_en,mlx4_ib
ptp 20480 3 ixgbe,igb,mlx4_en
[root(a)slave3 fio]# ofed_info
MLNX_OFED_LINUX-3.2-2.0.0.0 (OFED-3.2-2.0.0):
ar_mgr:
osm_plugins/ar_mgr/ar_mgr-1.0-0.30.ga1ea4b7.tar.gz
cc_mgr:
osm_plugins/cc_mgr/cc_mgr-1.0-0.29.ga1ea4b7.tar.gz
dapl:
dapl.git mlnx_ofed_3_1
commit c30fb6ce2cbc29d8ed4bde51437f7abb93378c78
dump_pr:
osm_plugins/dump_pr//dump_pr-1.0-0.25.ga1ea4b7.tar.gz
fabric-collector:
fabric_collector//fabric-collector-1.1.0.MLNX20140410.51b267e.tar.gz
fca:
mlnx_ofed_fca/fca-2.5.2431-1.src.rpm
hcoll:
mlnx_ofed_hcol/hcoll-3.4.807-1.src.rpm
ibacm:
mlnx_ofed/ibacm.git mlnx_ofed_3_2
commit 15ad8c13bdebbe62edea0b7df030710b65c14f7f
ibacm_ssa:
mlnx_ofed_ssa/acm/ibacm_ssa-0.0.9.3.MLNX20151203.50eb579.tar.gz
ibdump:
sniffer/sniffer-4.0.0-2/ibdump/linux/ibdump-4.0.0-2.tgz
ibsim:
mlnx_ofed_ibsim/ibsim-0.6-0.8.g9d76581.tar.gz
ibssa:
mlnx_ofed_ssa/distrib/ibssa-0.0.9.3.MLNX20151203.50eb579.tar.gz
ibutils:
ofed-1.5.3-rpms/ibutils/ibutils-1.5.7.1-0.12.gdcaeae2.tar.gz
ibutils2:
ibutils2/ibutils2-2.1.1-0.76.MLNX20160222.gd366c7b.tar.gz
infiniband-diags:
mlnx_ofed_infiniband_diags/infiniband-diags-1.6.6.MLNX20151130.7f0213e.tar.gz
infinipath-psm:
mlnx_ofed_infinipath-psm/infinipath-psm-3.3-2_g6f42cdb_open.tar.gz
iser:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d
isert:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d
kernel-mft:
mlnx_ofed_mft/kernel-mft-4.3.0-25.src.rpm
knem:
knem.git mellanox-master
commit f143ee19a575cd42a334422fa8bd329d671238db
libibcm:
mlnx_ofed/libibcm.git mlnx_ofed_3_0
commit d7d485df305e6536711485bd7e477668e77d8320
libibmad:
mlnx_ofed_libibmad/libibmad-1.3.12.MLNX20151122.d140cb1.tar.gz
libibprof:
mlnx_ofed_libibprof/libibprof-1.1.22-1.src.rpm
libibumad:
mlnx_ofed_libibumad/libibumad-1.3.10.2.MLNX20150406.966500d.tar.gz
libibverbs:
mlnx_ofed/libibverbs.git mlnx_ofed_3_2_2
commit 217a77686f4861229f0e4b94485a13f024634caf
libmlx4:
mlnx_ofed/libmlx4.git mlnx_ofed_3_2_2
commit dda6d7ae1e6e3779a485ebdd0a882f4bcbd027a6
libmlx5:
mlnx_ofed/libmlx5.git mlnx_ofed_3_2_2
commit d0c8645359e0f0aba0408b2d344f3b418d27019b
libopensmssa:
mlnx_ofed_ssa/plugin/libopensmssa-0.0.9.3.MLNX20151203.50eb579.tar.gz
librdmacm:
mlnx_ofed/librdmacm.git mlnx_ofed_3_2_2
commit 6bd430fed9e7b3d57a1876c040431ce7295c7703
libsdp:
libsdp.git mlnx_ofed_3_0
commit fbd01dfff05f42d6b82506e7dbf4bc6b7e6a59a4
libvma:
vma/source_rpms//libvma-7.0.14-0.src.rpm
mlnx-ethtool:
upstream/ethtool.git for-upstream
commit ac0cf295abe0c0832f0711fed66ab9601c8b2513
mlnx-ofa_kernel:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d
mpi-selector:
ofed-1.5.3-rpms/mpi-selector/mpi-selector-1.0.3-1.src.rpm
mpitests:
mlnx_ofed_mpitest/mpitests-3.2.17-e1c7f2f.src.rpm
mstflint:
mlnx_ofed_mstflint/mstflint-4.3.0-1.49.g9b9af70.tar.gz
multiperf:
mlnx_ofed_multiperf/multiperf-3.0-0.10.gda89e8c.tar.gz
mvapich2:
mlnx_ofed_mvapich2/mvapich2-2.2a-1.src.rpm
mxm:
mlnx_ofed_mxm/mxm-3.4.3079-1.src.rpm
ofed-docs:
docs.git mlnx_ofed-3.2
commit ea3386416f9f7130edd2c70fc3424cb2cda50f7d
openmpi:
mlnx_ofed_ompi_1.8/openmpi-1.10.3a1-1.src.rpm
opensm:
mlnx_ofed_opensm/opensm-4.6.1.MLNX20160112.774e977.tar.gz
perftest:
mlnx_ofed_perftest/perftest-3.0-0.18.gb464d59.tar.gz
qperf:
mlnx_ofed_qperf/qperf-0.4.9.tar.gz
rds-tools:
rds-tools.git mlnx_ofed_2_4
commit 299420ca25cf9996bc0748e3bc4b08748996ba49
sdpnetstat:
sdpnetstat.git mlnx_ofed_3_0
commit 3cf409a7cc07e5c71f9640eddbb801ece21b4169
sockperf:
sockperf/sockperf-2.7-43.git3ee62bd8107a.src.rpm
srp:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d
srptools:
srptools/srptools-1.0.2-12.src.rpm
Installed Packages:
-------------------
infiniband-diags
librdmacm
libmlx4
libibverbs-utils
mpi-selector
libibmad-devel
sdpnetstat
knem
libibumad-devel
libsdp
mlnx-ethtool
libibverbs-debuginfo
mlnx-ofa_kernel-modules
srp
opensm
mstflint
cc_mgr
libibmad
libibverbs
kernel-mft
libibverbs-devel-static
libibumad
librdmacm-devel
mlnx-ofa_kernel
libsdp-devel
ibutils2
mlnxofed-docs
libibmad-static
iser
opensm-devel
dump_pr
libibumad-static
rds-tools
libmlx4-debuginfo
mlnx-ofa_kernel-devel
opensm-libs
opensm-static
ar_mgr
dapl-devel-static
infiniband-diags-compat
libibverbs-devel
ibsim
[root(a)slave3 fio]# ibstat
CA 'mlx4_0'
CA type: MT4103
Number of ports: 2
Firmware version: 2.35.5100
Hardware version: 0
Node GUID: 0x248a0703006090e0
System image GUID: 0x248a0703006090e0
Port 1:
State: Active
Physical state: LinkUp
Rate: 40
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x04010000
Port GUID: 0x268a07fffe6090e0
Link layer: Ethernet
Port 2:
State: Active
Physical state: LinkUp
Rate: 40
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x04010000
Port GUID: 0x268a07fffe6090e1
Link layer: Ethernet
[root(a)slave3 fio]# ibv_devinfo
hca_id: mlx4_0
transport: InfiniBand (0)
fw_ver: 2.35.5100
node_guid: 248a:0703:0060:90e0
sys_image_guid: 248a:0703:0060:90e0
vendor_id: 0x02c9
vendor_part_id: 4103
hw_ver: 0x0
board_id: MT_1090111023
phys_port_cnt: 2
port: 1
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 1024 (3)
sm_lid: 0
port_lid: 0
port_lmc: 0x00
link_layer: Ethernet
port: 2
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 1024 (3)
sm_lid: 0
port_lid: 0
port_lmc: 0x00
link_layer: Ethernet
From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Thursday, October 19, 2017 12:34 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Do you install Mellanox OFED on the Target and Client server?
Can you run ibstat on both servers?
Thanks
Victor
________________________________
From: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Sent: Wednesday, October 18, 2017 8:59:28 PM
To: Victor Banh; Storage Performance Development Kit; Harris, James R
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
I’ve just tried the SPDK v17.07.1 and DPDK v17.08.
nvme version: 1.1.38.gfaab
fio version: 3.1
Tried the 512k and 1024k IO size and there is no error. demsg information as following.
So that there may be other difference here? Looks like you are using ConnectX-5 while I am using ConnectX-4?
Other related information:
[root(a)node4 fio]# lspci | grep -i mell
81:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
81:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
[root(a)node4 fio]# lsmod | grep -i mlx
mlx5_ib 172032 0
ib_core 200704 15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,mlx5_ib,ib_srpt,ib_ucm,rdma_ucm
mlx5_core 380928 1 mlx5_ib
ptp 20480 3 ixgbe,igb,mlx5_core
[root(a)node4 fio]# uname -a
Linux node4 4.10.1 #1 SMP Fri Mar 10 15:59:57 CST 2017 x86_64 x86_64 x86_64 GNU/Linux
[577707.543326] nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 192.168.100.8:4420
[577730.854540] detected loopback device
[577730.893761] nvme nvme0: creating 7 I/O queues.
[577730.893797] detected loopback device
[577730.898611] detected loopback device
[577730.908917] detected loopback device
[577730.919073] detected loopback device
[577730.928922] detected loopback device
[577730.938679] detected loopback device
[577730.948365] detected loopback device
[577731.146290] nvme nvme0: new ctrl: NQN "nqn.2016-06.io.spdk:cnode2", addr 192.168.100.8:4420
Thanks,
Gang
From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Thursday, October 19, 2017 9:43 AM
To: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Gang
Any update?
Do you see any error message from “dmesg” with 512k block size running fio?
Thanks
Victor
From: Victor Banh
Sent: Tuesday, October 17, 2017 7:37 PM
To: 'Cao, Gang' <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Gang
spdk-17.07.1 and dpdk-17.08
Thanks
Victor
From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Monday, October 16, 2017 8:51 PM
To: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?
Thanks,
Gang
From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Tuesday, October 17, 2017 5:30 AM
To: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Cao
Do you see any message from dmesg?
I tried this fio version and still saw these error message from dmesg.
fio-3.1
[869053.218235] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218250] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218259] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218263] ldm_validate_partition_table(): Disk read failed.
[869053.218269] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218277] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218285] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218292] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218296] Dev nvme2n1: unable to read RDB block 0
[869053.218303] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218311] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218323] Buffer I/O error on dev nvme2n1, logical block 3, async page read
[869053.218338] nvme2n1: unable to read partition table
[869053.246126] nvme2n1: detected capacity change from -62111005559226368 to -62042256479092736
[869053.246195] ldm_validate_partition_table(): Disk read failed.
[869053.246217] Dev nvme2n1: unable to read RDB block 0
From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Monday, October 09, 2017 10:59 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Cc: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
Thanks for your detailed information on the testing.
I’ve tried the latest SPDK code and with latest fio-3.1-20-g132b and fio-2.19. It seems like no this kind of error.
Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?
fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme0n1 --name=read-phase --rw=randwrite
read-phase: (g=0): rw=randwrite, bs=(R) 512KiB-512KiB, (W) 512KiB-512KiB, (T) 512KiB-512KiB, ioengine=libaio, iodepth=16
...
fio-3.1-20-g132b
Starting 4 processes
Jobs: 4 (f=4): [w(4)][100.0%][r=0KiB/s,w=1592MiB/s][r=0,w=3183 IOPS][eta 00m:00s]
read-phase: (groupid=0, jobs=1): err= 0: pid=46378: Tue Oct 10 01:23:39 2017
My NIC information:
[root(a)node4 nvme-cli-gerrit]# lsmod | grep -i mlx
mlx5_ib 172032 0
ib_core 200704 15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,mlx5_ib,ib_srpt,ib_ucm,rdma_ucm
mlx5_core 380928 1 mlx5_ib
ptp 20480 3 ixgbe,igb,mlx5_core
[root(a)node4 nvme-cli-gerrit]# lspci | grep -i mell
81:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
81:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Victor Banh
Sent: Friday, October 6, 2017 2:41 PM
To: Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio
From: Harris, James R [mailto:james.r.harris(a)intel.com]
Sent: Friday, October 06, 2017 2:32 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Cc: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio
(cc Victor)
From: James Harris <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Date: Thursday, October 5, 2017 at 1:59 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
Could you provide a few more details? This will help the list to provide some ideas.
1) On the client, are you using the SPDK NVMe-oF initiator or the kernel initiator?
Kernel initiator, run these commands on client server.
modprobe mlx5_ib
modprobe nvme-rdma
nvme discover -t rdma -a 192.168.10.11 -s 4420
nvme connect -t rdma -n nqn.2016-06.io.spdk:nvme-subsystem-1 -a 192.168.10.11 -s 4420
2) Can you provide the fio configuration file or command line? Just so we can have more specifics on “bigger block size”.
fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme1n1 --name=read-phase --rw=randwrite
3) Any details on the HW setup – specifically details on the RDMA NIC (or if you’re using SW RoCE).
Nvmf.conf on target server
[Global]
Comment "Global section"
ReactorMask 0xff00
[Rpc]
Enable No
Listen 127.0.0.1
[Nvmf]
MaxQueuesPerSession 8
MaxQueueDepth 128
[Subsystem1]
NQN nqn.2016-06.io.spdk:nvme-subsystem-1
Core 9
Mode Direct
Listen RDMA 192.168.10.11:4420
NVMe 0000:82:00.0
SN S2PMNAAH400039
It is RDMA NIC, ConnectX 5, Intel CPU Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
NUMA node0 CPU(s): 0-7
NUMA node1 CPU(s): 8-15
Thanks,
-Jim
From: SPDK <spdk-bounces(a)lists.01.org<mailto:spdk-bounces(a)lists.01.org>> on behalf of Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Date: Thursday, October 5, 2017 at 11:26 AM
To: "spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>" <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: [SPDK] Buffer I/O error on bigger block size running fio
Hi
I have SPDK NVMeoF and keep getting error with bigger block size with fio on randwrite tests.
I am using Ubuntu 16.04 with kernel version 4.12.0-041200-generic on target and client.
The DPDK is 17.08 and SPDK is 17.07.1.
Thanks
Victor
[46905.233553] perf: interrupt took too long (2503 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
[48285.159186] blk_update_request: I/O error, dev nvme1n1, sector 2507351968
[48285.159207] blk_update_request: I/O error, dev nvme1n1, sector 1301294496
[48285.159226] blk_update_request: I/O error, dev nvme1n1, sector 1947371168
[48285.159239] blk_update_request: I/O error, dev nvme1n1, sector 1891797568
[48285.159252] blk_update_request: I/O error, dev nvme1n1, sector 10833824
[48285.159265] blk_update_request: I/O error, dev nvme1n1, sector 614937152
[48285.159277] blk_update_request: I/O error, dev nvme1n1, sector 1872305088
[48285.159290] blk_update_request: I/O error, dev nvme1n1, sector 1504491040
[48285.159299] blk_update_request: I/O error, dev nvme1n1, sector 1182136128
[48285.159308] blk_update_request: I/O error, dev nvme1n1, sector 1662985792
[48285.191185] nvme nvme1: Reconnecting in 10 seconds...
[48285.191254] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191291] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191305] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191314] ldm_validate_partition_table(): Disk read failed.
[48285.191320] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191327] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191335] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191342] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191347] Dev nvme1n1: unable to read RDB block 0
[48285.191353] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191360] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191375] Buffer I/O error on dev nvme1n1, logical block 3, async page read
[48285.191389] nvme1n1: unable to read partition table
[48285.223197] nvme1n1: detected capacity change from 1600321314816 to 0
[48289.623192] nvme1n1: detected capacity change from 0 to -65647705833078784
[48289.623411] ldm_validate_partition_table(): Disk read failed.
[48289.623447] Dev nvme1n1: unable to read RDB block 0
[48289.623486] nvme1n1: unable to read partition table
[48289.643305] ldm_validate_partition_table(): Disk read failed.
[48289.643328] Dev nvme1n1: unable to read RDB block 0
[48289.643373] nvme1n1: unable to read partition table
[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 151450 bytes --]
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [SPDK] Buffer I/O error on bigger block size running fio
@ 2017-10-24 4:25 Cao, Gang
0 siblings, 0 replies; 24+ messages in thread
From: Cao, Gang @ 2017-10-24 4:25 UTC (permalink / raw)
To: spdk
[-- Attachment #1: Type: text/plain, Size: 40757 bytes --]
Hi Victor,
I've just successfully run the SPDK NVMe-oF target in the loopback mode with Ubuntu 16.04 and Kernel 4.13.7. Detailed information as following.
Also tried the 5120K big IO size from the FIO for 5 minutes.
root(a)waikikibeach111:~# uname -a
Linux waikikibeach111 4.13.7 #1 SMP Sun Oct 22 23:24:08 PDT 2017 x86_64 x86_64 x86_64 GNU/Linux
root(a)waikikibeach111:~# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.3 LTS
Release: 16.04
Codename: xenial
root(a)waikikibeach111:~# lspci | grep -i mellanox
03:00.0 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4]
03:00.1 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4]
root(a)waikikibeach111:~# ibstat
CA 'mlx5_0'
CA type: MT4115
Number of ports: 1
Firmware version: 12.18.1000
Hardware version: 0
Node GUID: 0x248a07030049d218
System image GUID: 0x248a07030049d218
Port 1:
State: Active
Physical state: LinkUp
Rate: 100
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x04010000
Port GUID: 0x268a07fffe49d218
Link layer: Ethernet
CA 'mlx5_1'
CA type: MT4115
Number of ports: 1
Firmware version: 12.18.1000
Hardware version: 0
Node GUID: 0x248a07030049d219
System image GUID: 0x248a07030049d218
Port 1:
State: Active
Physical state: LinkUp
Rate: 100
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x04010000
Port GUID: 0x268a07fffe49d219
Link layer: Ethernet
root(a)waikikibeach111:~# cd /home/gangcao/spdk
root(a)waikikibeach111:/home/gangcao/spdk# git branch
* (HEAD detached at v17.07.1) --> DPDK 17.08
master
root(a)waikikibeach111:/home/gangcao/spdk# nvme --version
nvme version 1.2
root(a)waikikibeach111:/home/gangcao/spdk# /home/gangcao/fio/fio --version
fio-3.1
root(a)waikikibeach111:/home/gangcao/spdk# lspci | grep -i volatile
84:00.0 Non-Volatile memory controller: Intel Corporation PCIe Data Center SSD (rev 01)
root(a)waikikibeach111:/home/gangcao/spdk# lspci -s 84:00.0 -v
84:00.0 Non-Volatile memory controller: Intel Corporation PCIe Data Center SSD (rev 01) (prog-if 02 [NVM Express])
Subsystem: Intel Corporation DC P3700 SSD [2.5" SFF]
Physical Slot: 37
Flags: bus master, fast devsel, latency 0, IRQ 39
Memory at fbc10000 (64-bit, non-prefetchable) [size=16K]
Expansion ROM at fbc00000 [disabled] [size=64K]
Capabilities: [40] Power Management version 3
Capabilities: [50] MSI-X: Enable- Count=32 Masked-
Capabilities: [60] Express Endpoint, MSI 00
Capabilities: [100] Advanced Error Reporting
Capabilities: [150] Virtual Channel
Capabilities: [180] Power Budgeting <?>
Capabilities: [190] Alternative Routing-ID Interpretation (ARI)
Capabilities: [270] Device Serial Number 55-cd-2e-41-4d-2f-5b-5c
Capabilities: [2a0] #19
Kernel driver in use: uio_pci_generic
Kernel modules: nvme
root(a)waikikibeach111:/home/gangcao/spdk# cat test/nvmf/fio/nvmf.conf
# NVMf Target Configuration File
#
# Please write all parameters using ASCII.
# The parameter must be quoted if it includes whitespace.
#
# Configuration syntax:
# Leading whitespace is ignored.
# Lines starting with '#' are comments.
# Lines ending with '\' are concatenated with the next line.
# Bracketed ([]) names define sections
[Global]
# Users can restrict work items to only run on certain cores by
# specifying a ReactorMask. Default ReactorMask mask is defined as
# -c option in the 'ealargs' setting at beginning of file nvmf_tgt.c.
#ReactorMask 0x00FF
# Tracepoint group mask for spdk trace buffers
# Default: 0x0 (all tracepoint groups disabled)
# Set to 0xFFFFFFFFFFFFFFFF to enable all tracepoint groups.
#TpointGroupMask 0x0
# syslog facility
LogFacility "local7"
[Rpc]
# Defines whether to enable configuration via RPC.
# Default is disabled. Note that the RPC interface is not
# authenticated, so users should be careful about enabling
# RPC in non-trusted environments.
Enable no
# Listen address for the RPC service.
# May be an IP address or an absolute path to a Unix socket.
Listen 192.168.5.11
# Users may change this section to create a different number or size of
# malloc LUNs.
# This will generate 8 LUNs with a malloc-allocated backend.
# Each LUN will be size 64MB and these will be named
# Malloc0 through Malloc7. Not all LUNs defined here are necessarily
# used below.
#[Malloc]
#NumberOfLuns 8
#LunSizeInMB 64
# Users must change this section to match the /dev/sdX devices to be virtual
# NVMe devices. The devices are accessed using Linux AIO.
#[AIO]
#AIO /dev/sdb
#AIO /dev/sdc
# Define NVMf protocol global options
[Nvmf]
# Set the maximum number of submission and completion queues per session.
# Setting this to '8', for example, allows for 8 submission and 8 completion queues
# per session.
MaxQueuesPerSession 8
# Set the maximum number of outstanding I/O per queue.
MaxQueueDepth 512
# Set the maximum in-capsule data size. Must be a multiple of 16.
#InCapsuleDataSize 4096
# Set the maximum I/O size. Must be a multiple of 4096.
#MaxIOSize 131072
# Set the global acceptor lcore ID, lcores are numbered starting at 0.
#AcceptorCore 0
# Set how often the acceptor polls for incoming connections. The acceptor is also
# responsible for polling existing connections that have gone idle. 0 means continuously
# poll. Units in microseconds.
AcceptorPollRate 10000
[Nvme]
# Registers the application to receive timeout callback and to reset the controller.
#ResetControllerOnTimeout Yes
# Timeout value.
#NvmeTimeoutValue 30
# Set how often the admin queue is polled for asynchronous events
# Units in microseconds.
#AdminPollRate 100000
HotplugEnable no
# The Split virtual block device slices block devices into multiple smaller bdevs.
#[Split]
# Syntax:
# Split <bdev> <count> [<size_in_megabytes>]
# Split Malloc2 into two equally-sized portions, Malloc2p0 and Malloc2p1
#Split Malloc2 2
# Split Malloc3 into eight 1-megabyte portions, Malloc3p0 ... Malloc3p7,
# leaving the rest of the device inaccessible
#Split Malloc3 8 1
# Define an NVMf Subsystem.
# - NQN is required and must be unique.
# - Core may be set or not. If set, the specified subsystem will run on
# it, otherwise each subsystem will use a round-robin method to allocate
# core from available cores, lcores are numbered starting at 0.
# - Mode may be either "Direct" or "Virtual". Direct means that physical
# devices attached to the target will be presented to hosts as if they
# were directly attached to the host. No software emulation or command
# validation is performed. Virtual means that an NVMe controller is
# emulated in software and the namespaces it contains map to block devices
# on the target system. These block devices do not need to be NVMe devices.
# Only Direct mode is currently supported.
# - Between 1 and 255 Listen directives are allowed. This defines
# the addresses on which new connections may be accepted. The format
# is Listen <type> <address> where type currently can only be RDMA.
# - Between 0 and 255 Host directives are allowed. This defines the
# NQNs of allowed hosts. If no Host directive is specified, all hosts
# are allowed to connect.
# - Exactly 1 NVMe directive specifying an NVMe device by PCI BDF. The
# PCI domain:bus:device.function can be replaced by "*" to indicate
# any PCI device.
# Direct controller
[Subsystem1]
NQN nqn.2016-06.io.spdk:cnode2
Core 0
Mode Direct
Listen rdma 192.168.5.11:4420
#Host nqn.2016-06.io.spdk:init
NVMe 0000:84:00.0
root(a)waikikibeach111:~# dmesg -w -T
[Mon Oct 23 19:45:21 2017] mlx5_core 0000:03:00.0: firmware version: 12.18.1000
[Mon Oct 23 19:45:21 2017] (0000:03:00.0): E-Switch: Total vports 1, l2 table size(65536), per vport: max uc(1024) max mc(16384)
[Mon Oct 23 19:45:21 2017] mlx5_core 0000:03:00.0: Port module event: module 0, Cable plugged
[Mon Oct 23 19:45:21 2017] mlx5_core 0000:03:00.1: firmware version: 12.18.1000
[Mon Oct 23 19:45:22 2017] (0000:03:00.1): E-Switch: Total vports 1, l2 table size(65536), per vport: max uc(1024) max mc(16384)
[Mon Oct 23 19:45:22 2017] mlx5_core 0000:03:00.1: Port module event: module 1, Cable plugged
[Mon Oct 23 19:45:22 2017] mlx5_core 0000:03:00.0: MLX5E: StrdRq(0) RqSz(1024) StrdSz(1) RxCqeCmprss(0)
[Mon Oct 23 19:45:22 2017] mlx5_core 0000:03:00.1: MLX5E: StrdRq(0) RqSz(1024) StrdSz(1) RxCqeCmprss(0)
[Mon Oct 23 19:45:22 2017] mlx5_core 0000:03:00.0 ens1f0: renamed from eth0
[Mon Oct 23 19:45:23 2017] mlx5_core 0000:03:00.1 ens1f1: renamed from eth0
[Mon Oct 23 19:45:23 2017] mlx5_ib: Mellanox Connect-IB Infiniband driver v5.0-0
[Mon Oct 23 19:47:01 2017] mlx5_core 0000:03:00.0 ens1f0: Link up
[Mon Oct 23 19:47:03 2017] mlx5_core 0000:03:00.1 ens1f1: Link up
[Mon Oct 23 19:54:26 2017] nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 192.168.5.11:4420
[Mon Oct 23 19:54:49 2017] nvme nvme0: creating 7 I/O queues.
[Mon Oct 23 19:54:50 2017] nvme nvme0: new ctrl: NQN "nqn.2016-06.io.spdk:cnode2", addr 192.168.5.11:4420
root(a)waikikibeach111:/home/gangcao/fio# ./fio --bs=5120k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=300 --filename=/dev/nvme0n1 --name=read-phase --rw=randwrite
read-phase: (g=0): rw=randwrite, bs=(R) 5120KiB-5120KiB, (W) 5120KiB-5120KiB, (T) 5120KiB-5120KiB, ioengine=libaio, iodepth=16
...
fio-3.1
Starting 4 processes
Jobs: 4 (f=4): [w(4)][100.0%][r=0KiB/s,w=220MiB/s][r=0,w=44 IOPS][eta 00m:00s]
read-phase: (groupid=0, jobs=1): err= 0: pid=17076: Mon Oct 23 20:05:47 2017
write: IOPS=19, BW=96.2MiB/s (101MB/s)(28.2GiB/300399msec)
slat (usec): min=387, max=350086, avg=51937.15, stdev=38959.27
clat (msec): min=199, max=2162, avg=779.50, stdev=553.60
lat (msec): min=214, max=2287, avg=831.44, stdev=589.70
clat percentiles (msec):
| 1.00th=[ 209], 5.00th=[ 275], 10.00th=[ 279], 20.00th=[ 296],
| 30.00th=[ 321], 40.00th=[ 397], 50.00th=[ 489], 60.00th=[ 709],
| 70.00th=[ 1133], 80.00th=[ 1485], 90.00th=[ 1586], 95.00th=[ 1787],
| 99.00th=[ 1989], 99.50th=[ 2039], 99.90th=[ 2123], 99.95th=[ 2140],
| 99.99th=[ 2165]
bw ( KiB/s): min=10260, max=368640, per=25.13%, avg=98423.58, stdev=77390.22, samples=600
iops : min= 2, max= 72, avg=19.20, stdev=15.10, samples=600
lat (msec) : 250=3.70%, 500=47.17%, 750=11.84%, 1000=5.59%, 2000=30.83%
lat (msec) : >=2000=0.87%
cpu : usr=0.61%, sys=0.48%, ctx=27957, majf=0, minf=91
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=99.7%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwt: total=0,5777,0, short=0,0,0, dropped=0,0,0
latency : target=0, window=0, percentile=100.00%, depth=16
read-phase: (groupid=0, jobs=1): err= 0: pid=17077: Mon Oct 23 20:05:47 2017
write: IOPS=18, BW=94.9MiB/s (99.5MB/s)(27.8GiB/300346msec)
slat (usec): min=386, max=407847, avg=52605.77, stdev=39162.39
clat (msec): min=197, max=2168, avg=789.72, stdev=551.07
lat (msec): min=220, max=2291, avg=842.33, stdev=587.00
clat percentiles (msec):
| 1.00th=[ 255], 5.00th=[ 275], 10.00th=[ 284], 20.00th=[ 300],
| 30.00th=[ 342], 40.00th=[ 405], 50.00th=[ 493], 60.00th=[ 709],
| 70.00th=[ 1167], 80.00th=[ 1485], 90.00th=[ 1603], 95.00th=[ 1787],
| 99.00th=[ 1989], 99.50th=[ 2056], 99.90th=[ 2140], 99.95th=[ 2165],
| 99.99th=[ 2165]
bw ( KiB/s): min=10240, max=337920, per=24.78%, avg=97039.87, stdev=73967.65, samples=600
iops : min= 2, max= 66, avg=18.94, stdev=14.46, samples=600
lat (msec) : 250=0.98%, 500=49.23%, 750=12.01%, 1000=5.68%, 2000=31.20%
lat (msec) : >=2000=0.89%
cpu : usr=0.60%, sys=0.50%, ctx=27504, majf=0, minf=90
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=99.7%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwt: total=0,5702,0, short=0,0,0, dropped=0,0,0
latency : target=0, window=0, percentile=100.00%, depth=16
read-phase: (groupid=0, jobs=1): err= 0: pid=17078: Mon Oct 23 20:05:47 2017
write: IOPS=18, BW=94.5MiB/s (99.1MB/s)(27.7GiB/300403msec)
slat (usec): min=385, max=454024, avg=52851.88, stdev=39734.33
clat (msec): min=174, max=2136, avg=793.55, stdev=550.47
lat (msec): min=175, max=2272, avg=846.41, stdev=586.35
clat percentiles (msec):
| 1.00th=[ 275], 5.00th=[ 279], 10.00th=[ 284], 20.00th=[ 300],
| 30.00th=[ 355], 40.00th=[ 418], 50.00th=[ 498], 60.00th=[ 709],
| 70.00th=[ 1167], 80.00th=[ 1485], 90.00th=[ 1620], 95.00th=[ 1787],
| 99.00th=[ 1989], 99.50th=[ 2056], 99.90th=[ 2123], 99.95th=[ 2140],
| 99.99th=[ 2140]
bw ( KiB/s): min=10240, max=286720, per=24.69%, avg=96665.08, stdev=73206.19, samples=600
iops : min= 2, max= 56, avg=18.87, stdev=14.29, samples=600
lat (msec) : 250=0.12%, 500=49.97%, 750=11.96%, 1000=5.67%, 2000=31.30%
lat (msec) : >=2000=0.97%
cpu : usr=0.60%, sys=0.50%, ctx=27456, majf=0, minf=91
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=99.7%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwt: total=0,5677,0, short=0,0,0, dropped=0,0,0
latency : target=0, window=0, percentile=100.00%, depth=16
read-phase: (groupid=0, jobs=1): err= 0: pid=17079: Mon Oct 23 20:05:47 2017
write: IOPS=19, BW=96.9MiB/s (102MB/s)(28.4GiB/300394msec)
slat (usec): min=364, max=435971, avg=51560.26, stdev=39426.52
clat (msec): min=176, max=2179, avg=774.17, stdev=555.08
lat (msec): min=177, max=2299, avg=825.73, stdev=591.31
clat percentiles (msec):
| 1.00th=[ 207], 5.00th=[ 232], 10.00th=[ 279], 20.00th=[ 296],
| 30.00th=[ 313], 40.00th=[ 393], 50.00th=[ 485], 60.00th=[ 701],
| 70.00th=[ 1116], 80.00th=[ 1485], 90.00th=[ 1586], 95.00th=[ 1787],
| 99.00th=[ 1989], 99.50th=[ 2039], 99.90th=[ 2140], 99.95th=[ 2165],
| 99.99th=[ 2165]
bw ( KiB/s): min=10240, max=378880, per=25.30%, avg=99076.15, stdev=78461.72, samples=600
iops : min= 2, max= 74, avg=19.34, stdev=15.32, samples=600
lat (msec) : 250=5.07%, 500=46.18%, 750=11.84%, 1000=5.41%, 2000=30.73%
lat (msec) : >=2000=0.77%
cpu : usr=0.57%, sys=0.56%, ctx=28034, majf=0, minf=97
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=99.7%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwt: total=0,5819,0, short=0,0,0, dropped=0,0,0
latency : target=0, window=0, percentile=100.00%, depth=16
Run status group 0 (all jobs):
WRITE: bw=382MiB/s (401MB/s), 94.5MiB/s-96.9MiB/s (99.1MB/s-102MB/s), io=112GiB (120GB), run=300346-300403msec
Disk stats (read/write):
nvme0n1: ios=44/918923, merge=0/0, ticks=8/147977388, in_queue=148007124, util=100.00%
From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Saturday, October 21, 2017 11:23 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org>; Harris, James R <james.r.harris(a)intel.com>; John F. Kim <johnk(a)mellanox.com>; Cao, Gang <gang.cao(a)intel.com>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
I am using NVMe.
Thanks
Victor
________________________________
From: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Sent: Saturday, October 21, 2017 1:21:03 AM
To: Victor Banh; Storage Performance Development Kit; Harris, James R; John F. Kim
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
I am also using the SPDK v17.07.1 and DPDK 17.08 on my system to have the trying based on your IO configuration.
Let me check whether our colleague has this Ubuntu OS installed. If not, could you please have a try on CentOS.
Also what kind of backend device did you configured for the NVMe-oF, like Malloc, NVMe SSD or other?
Thanks,
Gang
From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Saturday, October 21, 2017 6:14 AM
To: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; John F. Kim <johnk(a)mellanox.com<mailto:johnk(a)mellanox.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Gang
Is it possible to try on Ubuntu OS?
Which verion of DPDK and SPDK are you using?
Where can I get them? Github?
Thanks
Victor
root(a)clientintel:~# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.3 LTS
Release: 16.04
Codename: xenial
root(a)clientintel:~# uname -a
Linux clientintel 4.12.0-041200-generic #201707022031 SMP Mon Jul 3 00:32:52 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Wednesday, October 18, 2017 11:52 PM
To: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
[root(a)node4 gangcao]# uname -a
Linux node4 4.10.1 #1 SMP Fri Mar 10 15:59:57 CST 2017 x86_64 x86_64 x86_64 GNU/Linux
[root(a)node4 gangcao]# lsb_release -a
LSB Version: :core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch
Distributor ID: CentOS
Description: CentOS Linux release 7.2.1511 (Core)
Release: 7.2.1511
Codename: Core
From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Thursday, October 19, 2017 2:14 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Can you give the OS version and kernel version again for target and client?
I couldn't compile dkdp without installed the latest Kernel on 4.12.X kernel on Ubuntu 16.04.
Thanks
Victor
________________________________
From: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Sent: Wednesday, October 18, 2017 10:59:19 PM
To: Victor Banh; Storage Performance Development Kit; Harris, James R
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
This server does not have OFED installed and it is in the loopback mode.
Found another server also in the loopback mode with ConnectX-3 and have OFED installed.
By the way, what SSD are you using? Maybe it is relating to the SSD? I've just run with 2048k for a short duration and seems no issue. Will run more time to see whether can hit this error.
[root(a)slave3 fio]# lspci | grep -i mell
08:00.0 Ethernet controller: Mellanox Technologies MT27520 Family [ConnectX-3 Pro]
[root(a)slave3 fio]# lsmod | grep -i mlx
mlx4_ib 159744 0
ib_core 208896 15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,ib_srpt,ib_ucm,rdma_ucm,mlx4_ib
mlx4_en 114688 0
mlx4_core 307200 2 mlx4_en,mlx4_ib
ptp 20480 3 ixgbe,igb,mlx4_en
[root(a)slave3 fio]# ofed_info
MLNX_OFED_LINUX-3.2-2.0.0.0 (OFED-3.2-2.0.0):
ar_mgr:
osm_plugins/ar_mgr/ar_mgr-1.0-0.30.ga1ea4b7.tar.gz
cc_mgr:
osm_plugins/cc_mgr/cc_mgr-1.0-0.29.ga1ea4b7.tar.gz
dapl:
dapl.git mlnx_ofed_3_1
commit c30fb6ce2cbc29d8ed4bde51437f7abb93378c78
dump_pr:
osm_plugins/dump_pr//dump_pr-1.0-0.25.ga1ea4b7.tar.gz
fabric-collector:
fabric_collector//fabric-collector-1.1.0.MLNX20140410.51b267e.tar.gz
fca:
mlnx_ofed_fca/fca-2.5.2431-1.src.rpm
hcoll:
mlnx_ofed_hcol/hcoll-3.4.807-1.src.rpm
ibacm:
mlnx_ofed/ibacm.git mlnx_ofed_3_2
commit 15ad8c13bdebbe62edea0b7df030710b65c14f7f
ibacm_ssa:
mlnx_ofed_ssa/acm/ibacm_ssa-0.0.9.3.MLNX20151203.50eb579.tar.gz
ibdump:
sniffer/sniffer-4.0.0-2/ibdump/linux/ibdump-4.0.0-2.tgz
ibsim:
mlnx_ofed_ibsim/ibsim-0.6-0.8.g9d76581.tar.gz
ibssa:
mlnx_ofed_ssa/distrib/ibssa-0.0.9.3.MLNX20151203.50eb579.tar.gz
ibutils:
ofed-1.5.3-rpms/ibutils/ibutils-1.5.7.1-0.12.gdcaeae2.tar.gz
ibutils2:
ibutils2/ibutils2-2.1.1-0.76.MLNX20160222.gd366c7b.tar.gz
infiniband-diags:
mlnx_ofed_infiniband_diags/infiniband-diags-1.6.6.MLNX20151130.7f0213e.tar.gz
infinipath-psm:
mlnx_ofed_infinipath-psm/infinipath-psm-3.3-2_g6f42cdb_open.tar.gz
iser:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d
isert:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d
kernel-mft:
mlnx_ofed_mft/kernel-mft-4.3.0-25.src.rpm
knem:
knem.git mellanox-master
commit f143ee19a575cd42a334422fa8bd329d671238db
libibcm:
mlnx_ofed/libibcm.git mlnx_ofed_3_0
commit d7d485df305e6536711485bd7e477668e77d8320
libibmad:
mlnx_ofed_libibmad/libibmad-1.3.12.MLNX20151122.d140cb1.tar.gz
libibprof:
mlnx_ofed_libibprof/libibprof-1.1.22-1.src.rpm
libibumad:
mlnx_ofed_libibumad/libibumad-1.3.10.2.MLNX20150406.966500d.tar.gz
libibverbs:
mlnx_ofed/libibverbs.git mlnx_ofed_3_2_2
commit 217a77686f4861229f0e4b94485a13f024634caf
libmlx4:
mlnx_ofed/libmlx4.git mlnx_ofed_3_2_2
commit dda6d7ae1e6e3779a485ebdd0a882f4bcbd027a6
libmlx5:
mlnx_ofed/libmlx5.git mlnx_ofed_3_2_2
commit d0c8645359e0f0aba0408b2d344f3b418d27019b
libopensmssa:
mlnx_ofed_ssa/plugin/libopensmssa-0.0.9.3.MLNX20151203.50eb579.tar.gz
librdmacm:
mlnx_ofed/librdmacm.git mlnx_ofed_3_2_2
commit 6bd430fed9e7b3d57a1876c040431ce7295c7703
libsdp:
libsdp.git mlnx_ofed_3_0
commit fbd01dfff05f42d6b82506e7dbf4bc6b7e6a59a4
libvma:
vma/source_rpms//libvma-7.0.14-0.src.rpm
mlnx-ethtool:
upstream/ethtool.git for-upstream
commit ac0cf295abe0c0832f0711fed66ab9601c8b2513
mlnx-ofa_kernel:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d
mpi-selector:
ofed-1.5.3-rpms/mpi-selector/mpi-selector-1.0.3-1.src.rpm
mpitests:
mlnx_ofed_mpitest/mpitests-3.2.17-e1c7f2f.src.rpm
mstflint:
mlnx_ofed_mstflint/mstflint-4.3.0-1.49.g9b9af70.tar.gz
multiperf:
mlnx_ofed_multiperf/multiperf-3.0-0.10.gda89e8c.tar.gz
mvapich2:
mlnx_ofed_mvapich2/mvapich2-2.2a-1.src.rpm
mxm:
mlnx_ofed_mxm/mxm-3.4.3079-1.src.rpm
ofed-docs:
docs.git mlnx_ofed-3.2
commit ea3386416f9f7130edd2c70fc3424cb2cda50f7d
openmpi:
mlnx_ofed_ompi_1.8/openmpi-1.10.3a1-1.src.rpm
opensm:
mlnx_ofed_opensm/opensm-4.6.1.MLNX20160112.774e977.tar.gz
perftest:
mlnx_ofed_perftest/perftest-3.0-0.18.gb464d59.tar.gz
qperf:
mlnx_ofed_qperf/qperf-0.4.9.tar.gz
rds-tools:
rds-tools.git mlnx_ofed_2_4
commit 299420ca25cf9996bc0748e3bc4b08748996ba49
sdpnetstat:
sdpnetstat.git mlnx_ofed_3_0
commit 3cf409a7cc07e5c71f9640eddbb801ece21b4169
sockperf:
sockperf/sockperf-2.7-43.git3ee62bd8107a.src.rpm
srp:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d
srptools:
srptools/srptools-1.0.2-12.src.rpm
Installed Packages:
-------------------
infiniband-diags
librdmacm
libmlx4
libibverbs-utils
mpi-selector
libibmad-devel
sdpnetstat
knem
libibumad-devel
libsdp
mlnx-ethtool
libibverbs-debuginfo
mlnx-ofa_kernel-modules
srp
opensm
mstflint
cc_mgr
libibmad
libibverbs
kernel-mft
libibverbs-devel-static
libibumad
librdmacm-devel
mlnx-ofa_kernel
libsdp-devel
ibutils2
mlnxofed-docs
libibmad-static
iser
opensm-devel
dump_pr
libibumad-static
rds-tools
libmlx4-debuginfo
mlnx-ofa_kernel-devel
opensm-libs
opensm-static
ar_mgr
dapl-devel-static
infiniband-diags-compat
libibverbs-devel
ibsim
[root(a)slave3 fio]# ibstat
CA 'mlx4_0'
CA type: MT4103
Number of ports: 2
Firmware version: 2.35.5100
Hardware version: 0
Node GUID: 0x248a0703006090e0
System image GUID: 0x248a0703006090e0
Port 1:
State: Active
Physical state: LinkUp
Rate: 40
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x04010000
Port GUID: 0x268a07fffe6090e0
Link layer: Ethernet
Port 2:
State: Active
Physical state: LinkUp
Rate: 40
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x04010000
Port GUID: 0x268a07fffe6090e1
Link layer: Ethernet
[root(a)slave3 fio]# ibv_devinfo
hca_id: mlx4_0
transport: InfiniBand (0)
fw_ver: 2.35.5100
node_guid: 248a:0703:0060:90e0
sys_image_guid: 248a:0703:0060:90e0
vendor_id: 0x02c9
vendor_part_id: 4103
hw_ver: 0x0
board_id: MT_1090111023
phys_port_cnt: 2
port: 1
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 1024 (3)
sm_lid: 0
port_lid: 0
port_lmc: 0x00
link_layer: Ethernet
port: 2
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 1024 (3)
sm_lid: 0
port_lid: 0
port_lmc: 0x00
link_layer: Ethernet
From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Thursday, October 19, 2017 12:34 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Do you install Mellanox OFED on the Target and Client server?
Can you run ibstat on both servers?
Thanks
Victor
________________________________
From: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Sent: Wednesday, October 18, 2017 8:59:28 PM
To: Victor Banh; Storage Performance Development Kit; Harris, James R
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
I've just tried the SPDK v17.07.1 and DPDK v17.08.
nvme version: 1.1.38.gfaab
fio version: 3.1
Tried the 512k and 1024k IO size and there is no error. demsg information as following.
So that there may be other difference here? Looks like you are using ConnectX-5 while I am using ConnectX-4?
Other related information:
[root(a)node4 fio]# lspci | grep -i mell
81:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
81:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
[root(a)node4 fio]# lsmod | grep -i mlx
mlx5_ib 172032 0
ib_core 200704 15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,mlx5_ib,ib_srpt,ib_ucm,rdma_ucm
mlx5_core 380928 1 mlx5_ib
ptp 20480 3 ixgbe,igb,mlx5_core
[root(a)node4 fio]# uname -a
Linux node4 4.10.1 #1 SMP Fri Mar 10 15:59:57 CST 2017 x86_64 x86_64 x86_64 GNU/Linux
[577707.543326] nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 192.168.100.8:4420
[577730.854540] detected loopback device
[577730.893761] nvme nvme0: creating 7 I/O queues.
[577730.893797] detected loopback device
[577730.898611] detected loopback device
[577730.908917] detected loopback device
[577730.919073] detected loopback device
[577730.928922] detected loopback device
[577730.938679] detected loopback device
[577730.948365] detected loopback device
[577731.146290] nvme nvme0: new ctrl: NQN "nqn.2016-06.io.spdk:cnode2", addr 192.168.100.8:4420
Thanks,
Gang
From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Thursday, October 19, 2017 9:43 AM
To: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Gang
Any update?
Do you see any error message from "dmesg" with 512k block size running fio?
Thanks
Victor
From: Victor Banh
Sent: Tuesday, October 17, 2017 7:37 PM
To: 'Cao, Gang' <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Gang
spdk-17.07.1 and dpdk-17.08
Thanks
Victor
From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Monday, October 16, 2017 8:51 PM
To: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?
Thanks,
Gang
From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Tuesday, October 17, 2017 5:30 AM
To: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Cao
Do you see any message from dmesg?
I tried this fio version and still saw these error message from dmesg.
fio-3.1
[869053.218235] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218250] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218259] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218263] ldm_validate_partition_table(): Disk read failed.
[869053.218269] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218277] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218285] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218292] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218296] Dev nvme2n1: unable to read RDB block 0
[869053.218303] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218311] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218323] Buffer I/O error on dev nvme2n1, logical block 3, async page read
[869053.218338] nvme2n1: unable to read partition table
[869053.246126] nvme2n1: detected capacity change from -62111005559226368 to -62042256479092736
[869053.246195] ldm_validate_partition_table(): Disk read failed.
[869053.246217] Dev nvme2n1: unable to read RDB block 0
From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Monday, October 09, 2017 10:59 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Cc: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
Thanks for your detailed information on the testing.
I've tried the latest SPDK code and with latest fio-3.1-20-g132b and fio-2.19. It seems like no this kind of error.
Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?
fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme0n1 --name=read-phase --rw=randwrite
read-phase: (g=0): rw=randwrite, bs=(R) 512KiB-512KiB, (W) 512KiB-512KiB, (T) 512KiB-512KiB, ioengine=libaio, iodepth=16
...
fio-3.1-20-g132b
Starting 4 processes
Jobs: 4 (f=4): [w(4)][100.0%][r=0KiB/s,w=1592MiB/s][r=0,w=3183 IOPS][eta 00m:00s]
read-phase: (groupid=0, jobs=1): err= 0: pid=46378: Tue Oct 10 01:23:39 2017
My NIC information:
[root(a)node4 nvme-cli-gerrit]# lsmod | grep -i mlx
mlx5_ib 172032 0
ib_core 200704 15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,mlx5_ib,ib_srpt,ib_ucm,rdma_ucm
mlx5_core 380928 1 mlx5_ib
ptp 20480 3 ixgbe,igb,mlx5_core
[root(a)node4 nvme-cli-gerrit]# lspci | grep -i mell
81:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
81:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Victor Banh
Sent: Friday, October 6, 2017 2:41 PM
To: Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio
From: Harris, James R [mailto:james.r.harris(a)intel.com]
Sent: Friday, October 06, 2017 2:32 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Cc: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio
(cc Victor)
From: James Harris <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Date: Thursday, October 5, 2017 at 1:59 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
Could you provide a few more details? This will help the list to provide some ideas.
1) On the client, are you using the SPDK NVMe-oF initiator or the kernel initiator?
Kernel initiator, run these commands on client server.
modprobe mlx5_ib
modprobe nvme-rdma
nvme discover -t rdma -a 192.168.10.11 -s 4420
nvme connect -t rdma -n nqn.2016-06.io.spdk:nvme-subsystem-1 -a 192.168.10.11 -s 4420
2) Can you provide the fio configuration file or command line? Just so we can have more specifics on "bigger block size".
fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme1n1 --name=read-phase --rw=randwrite
3) Any details on the HW setup - specifically details on the RDMA NIC (or if you're using SW RoCE).
Nvmf.conf on target server
[Global]
Comment "Global section"
ReactorMask 0xff00
[Rpc]
Enable No
Listen 127.0.0.1
[Nvmf]
MaxQueuesPerSession 8
MaxQueueDepth 128
[Subsystem1]
NQN nqn.2016-06.io.spdk:nvme-subsystem-1
Core 9
Mode Direct
Listen RDMA 192.168.10.11:4420
NVMe 0000:82:00.0
SN S2PMNAAH400039
It is RDMA NIC, ConnectX 5, Intel CPU Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
NUMA node0 CPU(s): 0-7
NUMA node1 CPU(s): 8-15
Thanks,
-Jim
From: SPDK <spdk-bounces(a)lists.01.org<mailto:spdk-bounces(a)lists.01.org>> on behalf of Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Date: Thursday, October 5, 2017 at 11:26 AM
To: "spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>" <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: [SPDK] Buffer I/O error on bigger block size running fio
Hi
I have SPDK NVMeoF and keep getting error with bigger block size with fio on randwrite tests.
I am using Ubuntu 16.04 with kernel version 4.12.0-041200-generic on target and client.
The DPDK is 17.08 and SPDK is 17.07.1.
Thanks
Victor
[46905.233553] perf: interrupt took too long (2503 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
[48285.159186] blk_update_request: I/O error, dev nvme1n1, sector 2507351968
[48285.159207] blk_update_request: I/O error, dev nvme1n1, sector 1301294496
[48285.159226] blk_update_request: I/O error, dev nvme1n1, sector 1947371168
[48285.159239] blk_update_request: I/O error, dev nvme1n1, sector 1891797568
[48285.159252] blk_update_request: I/O error, dev nvme1n1, sector 10833824
[48285.159265] blk_update_request: I/O error, dev nvme1n1, sector 614937152
[48285.159277] blk_update_request: I/O error, dev nvme1n1, sector 1872305088
[48285.159290] blk_update_request: I/O error, dev nvme1n1, sector 1504491040
[48285.159299] blk_update_request: I/O error, dev nvme1n1, sector 1182136128
[48285.159308] blk_update_request: I/O error, dev nvme1n1, sector 1662985792
[48285.191185] nvme nvme1: Reconnecting in 10 seconds...
[48285.191254] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191291] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191305] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191314] ldm_validate_partition_table(): Disk read failed.
[48285.191320] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191327] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191335] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191342] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191347] Dev nvme1n1: unable to read RDB block 0
[48285.191353] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191360] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191375] Buffer I/O error on dev nvme1n1, logical block 3, async page read
[48285.191389] nvme1n1: unable to read partition table
[48285.223197] nvme1n1: detected capacity change from 1600321314816 to 0
[48289.623192] nvme1n1: detected capacity change from 0 to -65647705833078784
[48289.623411] ldm_validate_partition_table(): Disk read failed.
[48289.623447] Dev nvme1n1: unable to read RDB block 0
[48289.623486] nvme1n1: unable to read partition table
[48289.643305] ldm_validate_partition_table(): Disk read failed.
[48289.643328] Dev nvme1n1: unable to read RDB block 0
[48289.643373] nvme1n1: unable to read partition table
[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 161485 bytes --]
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [SPDK] Buffer I/O error on bigger block size running fio
@ 2017-10-21 15:22 Victor Banh
0 siblings, 0 replies; 24+ messages in thread
From: Victor Banh @ 2017-10-21 15:22 UTC (permalink / raw)
To: spdk
[-- Attachment #1: Type: text/plain, Size: 24074 bytes --]
I am using NVMe.
Thanks
Victor
________________________________
From: Cao, Gang <gang.cao(a)intel.com>
Sent: Saturday, October 21, 2017 1:21:03 AM
To: Victor Banh; Storage Performance Development Kit; Harris, James R; John F. Kim
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
I am also using the SPDK v17.07.1 and DPDK 17.08 on my system to have the trying based on your IO configuration.
Let me check whether our colleague has this Ubuntu OS installed. If not, could you please have a try on CentOS.
Also what kind of backend device did you configured for the NVMe-oF, like Malloc, NVMe SSD or other?
Thanks,
Gang
From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Saturday, October 21, 2017 6:14 AM
To: Cao, Gang <gang.cao(a)intel.com>; Storage Performance Development Kit <spdk(a)lists.01.org>; Harris, James R <james.r.harris(a)intel.com>; John F. Kim <johnk(a)mellanox.com>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Gang
Is it possible to try on Ubuntu OS?
Which verion of DPDK and SPDK are you using?
Where can I get them? Github?
Thanks
Victor
root(a)clientintel:~# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.3 LTS
Release: 16.04
Codename: xenial
root(a)clientintel:~# uname -a
Linux clientintel 4.12.0-041200-generic #201707022031 SMP Mon Jul 3 00:32:52 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Wednesday, October 18, 2017 11:52 PM
To: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
[root(a)node4 gangcao]# uname -a
Linux node4 4.10.1 #1 SMP Fri Mar 10 15:59:57 CST 2017 x86_64 x86_64 x86_64 GNU/Linux
[root(a)node4 gangcao]# lsb_release -a
LSB Version: :core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch
Distributor ID: CentOS
Description: CentOS Linux release 7.2.1511 (Core)
Release: 7.2.1511
Codename: Core
From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Thursday, October 19, 2017 2:14 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Can you give the OS version and kernel version again for target and client?
I couldn't compile dkdp without installed the latest Kernel on 4.12.X kernel on Ubuntu 16.04.
Thanks
Victor
________________________________
From: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Sent: Wednesday, October 18, 2017 10:59:19 PM
To: Victor Banh; Storage Performance Development Kit; Harris, James R
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
This server does not have OFED installed and it is in the loopback mode.
Found another server also in the loopback mode with ConnectX-3 and have OFED installed.
By the way, what SSD are you using? Maybe it is relating to the SSD? I’ve just run with 2048k for a short duration and seems no issue. Will run more time to see whether can hit this error.
[root(a)slave3 fio]# lspci | grep -i mell
08:00.0 Ethernet controller: Mellanox Technologies MT27520 Family [ConnectX-3 Pro]
[root(a)slave3 fio]# lsmod | grep -i mlx
mlx4_ib 159744 0
ib_core 208896 15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,ib_srpt,ib_ucm,rdma_ucm,mlx4_ib
mlx4_en 114688 0
mlx4_core 307200 2 mlx4_en,mlx4_ib
ptp 20480 3 ixgbe,igb,mlx4_en
[root(a)slave3 fio]# ofed_info
MLNX_OFED_LINUX-3.2-2.0.0.0 (OFED-3.2-2.0.0):
ar_mgr:
osm_plugins/ar_mgr/ar_mgr-1.0-0.30.ga1ea4b7.tar.gz
cc_mgr:
osm_plugins/cc_mgr/cc_mgr-1.0-0.29.ga1ea4b7.tar.gz
dapl:
dapl.git mlnx_ofed_3_1
commit c30fb6ce2cbc29d8ed4bde51437f7abb93378c78
dump_pr:
osm_plugins/dump_pr//dump_pr-1.0-0.25.ga1ea4b7.tar.gz
fabric-collector:
fabric_collector//fabric-collector-1.1.0.MLNX20140410.51b267e.tar.gz
fca:
mlnx_ofed_fca/fca-2.5.2431-1.src.rpm
hcoll:
mlnx_ofed_hcol/hcoll-3.4.807-1.src.rpm
ibacm:
mlnx_ofed/ibacm.git mlnx_ofed_3_2
commit 15ad8c13bdebbe62edea0b7df030710b65c14f7f
ibacm_ssa:
mlnx_ofed_ssa/acm/ibacm_ssa-0.0.9.3.MLNX20151203.50eb579.tar.gz
ibdump:
sniffer/sniffer-4.0.0-2/ibdump/linux/ibdump-4.0.0-2.tgz
ibsim:
mlnx_ofed_ibsim/ibsim-0.6-0.8.g9d76581.tar.gz
ibssa:
mlnx_ofed_ssa/distrib/ibssa-0.0.9.3.MLNX20151203.50eb579.tar.gz
ibutils:
ofed-1.5.3-rpms/ibutils/ibutils-1.5.7.1-0.12.gdcaeae2.tar.gz
ibutils2:
ibutils2/ibutils2-2.1.1-0.76.MLNX20160222.gd366c7b.tar.gz
infiniband-diags:
mlnx_ofed_infiniband_diags/infiniband-diags-1.6.6.MLNX20151130.7f0213e.tar.gz
infinipath-psm:
mlnx_ofed_infinipath-psm/infinipath-psm-3.3-2_g6f42cdb_open.tar.gz
iser:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d
isert:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d
kernel-mft:
mlnx_ofed_mft/kernel-mft-4.3.0-25.src.rpm
knem:
knem.git mellanox-master
commit f143ee19a575cd42a334422fa8bd329d671238db
libibcm:
mlnx_ofed/libibcm.git mlnx_ofed_3_0
commit d7d485df305e6536711485bd7e477668e77d8320
libibmad:
mlnx_ofed_libibmad/libibmad-1.3.12.MLNX20151122.d140cb1.tar.gz
libibprof:
mlnx_ofed_libibprof/libibprof-1.1.22-1.src.rpm
libibumad:
mlnx_ofed_libibumad/libibumad-1.3.10.2.MLNX20150406.966500d.tar.gz
libibverbs:
mlnx_ofed/libibverbs.git mlnx_ofed_3_2_2
commit 217a77686f4861229f0e4b94485a13f024634caf
libmlx4:
mlnx_ofed/libmlx4.git mlnx_ofed_3_2_2
commit dda6d7ae1e6e3779a485ebdd0a882f4bcbd027a6
libmlx5:
mlnx_ofed/libmlx5.git mlnx_ofed_3_2_2
commit d0c8645359e0f0aba0408b2d344f3b418d27019b
libopensmssa:
mlnx_ofed_ssa/plugin/libopensmssa-0.0.9.3.MLNX20151203.50eb579.tar.gz
librdmacm:
mlnx_ofed/librdmacm.git mlnx_ofed_3_2_2
commit 6bd430fed9e7b3d57a1876c040431ce7295c7703
libsdp:
libsdp.git mlnx_ofed_3_0
commit fbd01dfff05f42d6b82506e7dbf4bc6b7e6a59a4
libvma:
vma/source_rpms//libvma-7.0.14-0.src.rpm
mlnx-ethtool:
upstream/ethtool.git for-upstream
commit ac0cf295abe0c0832f0711fed66ab9601c8b2513
mlnx-ofa_kernel:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d
mpi-selector:
ofed-1.5.3-rpms/mpi-selector/mpi-selector-1.0.3-1.src.rpm
mpitests:
mlnx_ofed_mpitest/mpitests-3.2.17-e1c7f2f.src.rpm
mstflint:
mlnx_ofed_mstflint/mstflint-4.3.0-1.49.g9b9af70.tar.gz
multiperf:
mlnx_ofed_multiperf/multiperf-3.0-0.10.gda89e8c.tar.gz
mvapich2:
mlnx_ofed_mvapich2/mvapich2-2.2a-1.src.rpm
mxm:
mlnx_ofed_mxm/mxm-3.4.3079-1.src.rpm
ofed-docs:
docs.git mlnx_ofed-3.2
commit ea3386416f9f7130edd2c70fc3424cb2cda50f7d
openmpi:
mlnx_ofed_ompi_1.8/openmpi-1.10.3a1-1.src.rpm
opensm:
mlnx_ofed_opensm/opensm-4.6.1.MLNX20160112.774e977.tar.gz
perftest:
mlnx_ofed_perftest/perftest-3.0-0.18.gb464d59.tar.gz
qperf:
mlnx_ofed_qperf/qperf-0.4.9.tar.gz
rds-tools:
rds-tools.git mlnx_ofed_2_4
commit 299420ca25cf9996bc0748e3bc4b08748996ba49
sdpnetstat:
sdpnetstat.git mlnx_ofed_3_0
commit 3cf409a7cc07e5c71f9640eddbb801ece21b4169
sockperf:
sockperf/sockperf-2.7-43.git3ee62bd8107a.src.rpm
srp:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d
srptools:
srptools/srptools-1.0.2-12.src.rpm
Installed Packages:
-------------------
infiniband-diags
librdmacm
libmlx4
libibverbs-utils
mpi-selector
libibmad-devel
sdpnetstat
knem
libibumad-devel
libsdp
mlnx-ethtool
libibverbs-debuginfo
mlnx-ofa_kernel-modules
srp
opensm
mstflint
cc_mgr
libibmad
libibverbs
kernel-mft
libibverbs-devel-static
libibumad
librdmacm-devel
mlnx-ofa_kernel
libsdp-devel
ibutils2
mlnxofed-docs
libibmad-static
iser
opensm-devel
dump_pr
libibumad-static
rds-tools
libmlx4-debuginfo
mlnx-ofa_kernel-devel
opensm-libs
opensm-static
ar_mgr
dapl-devel-static
infiniband-diags-compat
libibverbs-devel
ibsim
[root(a)slave3 fio]# ibstat
CA 'mlx4_0'
CA type: MT4103
Number of ports: 2
Firmware version: 2.35.5100
Hardware version: 0
Node GUID: 0x248a0703006090e0
System image GUID: 0x248a0703006090e0
Port 1:
State: Active
Physical state: LinkUp
Rate: 40
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x04010000
Port GUID: 0x268a07fffe6090e0
Link layer: Ethernet
Port 2:
State: Active
Physical state: LinkUp
Rate: 40
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x04010000
Port GUID: 0x268a07fffe6090e1
Link layer: Ethernet
[root(a)slave3 fio]# ibv_devinfo
hca_id: mlx4_0
transport: InfiniBand (0)
fw_ver: 2.35.5100
node_guid: 248a:0703:0060:90e0
sys_image_guid: 248a:0703:0060:90e0
vendor_id: 0x02c9
vendor_part_id: 4103
hw_ver: 0x0
board_id: MT_1090111023
phys_port_cnt: 2
port: 1
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 1024 (3)
sm_lid: 0
port_lid: 0
port_lmc: 0x00
link_layer: Ethernet
port: 2
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 1024 (3)
sm_lid: 0
port_lid: 0
port_lmc: 0x00
link_layer: Ethernet
From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Thursday, October 19, 2017 12:34 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Do you install Mellanox OFED on the Target and Client server?
Can you run ibstat on both servers?
Thanks
Victor
________________________________
From: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Sent: Wednesday, October 18, 2017 8:59:28 PM
To: Victor Banh; Storage Performance Development Kit; Harris, James R
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
I’ve just tried the SPDK v17.07.1 and DPDK v17.08.
nvme version: 1.1.38.gfaab
fio version: 3.1
Tried the 512k and 1024k IO size and there is no error. demsg information as following.
So that there may be other difference here? Looks like you are using ConnectX-5 while I am using ConnectX-4?
Other related information:
[root(a)node4 fio]# lspci | grep -i mell
81:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
81:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
[root(a)node4 fio]# lsmod | grep -i mlx
mlx5_ib 172032 0
ib_core 200704 15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,mlx5_ib,ib_srpt,ib_ucm,rdma_ucm
mlx5_core 380928 1 mlx5_ib
ptp 20480 3 ixgbe,igb,mlx5_core
[root(a)node4 fio]# uname -a
Linux node4 4.10.1 #1 SMP Fri Mar 10 15:59:57 CST 2017 x86_64 x86_64 x86_64 GNU/Linux
[577707.543326] nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 192.168.100.8:4420
[577730.854540] detected loopback device
[577730.893761] nvme nvme0: creating 7 I/O queues.
[577730.893797] detected loopback device
[577730.898611] detected loopback device
[577730.908917] detected loopback device
[577730.919073] detected loopback device
[577730.928922] detected loopback device
[577730.938679] detected loopback device
[577730.948365] detected loopback device
[577731.146290] nvme nvme0: new ctrl: NQN "nqn.2016-06.io.spdk:cnode2", addr 192.168.100.8:4420
Thanks,
Gang
From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Thursday, October 19, 2017 9:43 AM
To: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Gang
Any update?
Do you see any error message from “dmesg” with 512k block size running fio?
Thanks
Victor
From: Victor Banh
Sent: Tuesday, October 17, 2017 7:37 PM
To: 'Cao, Gang' <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Gang
spdk-17.07.1 and dpdk-17.08
Thanks
Victor
From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Monday, October 16, 2017 8:51 PM
To: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?
Thanks,
Gang
From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Tuesday, October 17, 2017 5:30 AM
To: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Cao
Do you see any message from dmesg?
I tried this fio version and still saw these error message from dmesg.
fio-3.1
[869053.218235] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218250] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218259] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218263] ldm_validate_partition_table(): Disk read failed.
[869053.218269] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218277] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218285] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218292] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218296] Dev nvme2n1: unable to read RDB block 0
[869053.218303] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218311] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218323] Buffer I/O error on dev nvme2n1, logical block 3, async page read
[869053.218338] nvme2n1: unable to read partition table
[869053.246126] nvme2n1: detected capacity change from -62111005559226368 to -62042256479092736
[869053.246195] ldm_validate_partition_table(): Disk read failed.
[869053.246217] Dev nvme2n1: unable to read RDB block 0
From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Monday, October 09, 2017 10:59 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Cc: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
Thanks for your detailed information on the testing.
I’ve tried the latest SPDK code and with latest fio-3.1-20-g132b and fio-2.19. It seems like no this kind of error.
Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?
fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme0n1 --name=read-phase --rw=randwrite
read-phase: (g=0): rw=randwrite, bs=(R) 512KiB-512KiB, (W) 512KiB-512KiB, (T) 512KiB-512KiB, ioengine=libaio, iodepth=16
...
fio-3.1-20-g132b
Starting 4 processes
Jobs: 4 (f=4): [w(4)][100.0%][r=0KiB/s,w=1592MiB/s][r=0,w=3183 IOPS][eta 00m:00s]
read-phase: (groupid=0, jobs=1): err= 0: pid=46378: Tue Oct 10 01:23:39 2017
My NIC information:
[root(a)node4 nvme-cli-gerrit]# lsmod | grep -i mlx
mlx5_ib 172032 0
ib_core 200704 15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,mlx5_ib,ib_srpt,ib_ucm,rdma_ucm
mlx5_core 380928 1 mlx5_ib
ptp 20480 3 ixgbe,igb,mlx5_core
[root(a)node4 nvme-cli-gerrit]# lspci | grep -i mell
81:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
81:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Victor Banh
Sent: Friday, October 6, 2017 2:41 PM
To: Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio
From: Harris, James R [mailto:james.r.harris(a)intel.com]
Sent: Friday, October 06, 2017 2:32 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Cc: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio
(cc Victor)
From: James Harris <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Date: Thursday, October 5, 2017 at 1:59 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
Could you provide a few more details? This will help the list to provide some ideas.
1) On the client, are you using the SPDK NVMe-oF initiator or the kernel initiator?
Kernel initiator, run these commands on client server.
modprobe mlx5_ib
modprobe nvme-rdma
nvme discover -t rdma -a 192.168.10.11 -s 4420
nvme connect -t rdma -n nqn.2016-06.io.spdk:nvme-subsystem-1 -a 192.168.10.11 -s 4420
2) Can you provide the fio configuration file or command line? Just so we can have more specifics on “bigger block size”.
fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme1n1 --name=read-phase --rw=randwrite
3) Any details on the HW setup – specifically details on the RDMA NIC (or if you’re using SW RoCE).
Nvmf.conf on target server
[Global]
Comment "Global section"
ReactorMask 0xff00
[Rpc]
Enable No
Listen 127.0.0.1
[Nvmf]
MaxQueuesPerSession 8
MaxQueueDepth 128
[Subsystem1]
NQN nqn.2016-06.io.spdk:nvme-subsystem-1
Core 9
Mode Direct
Listen RDMA 192.168.10.11:4420
NVMe 0000:82:00.0
SN S2PMNAAH400039
It is RDMA NIC, ConnectX 5, Intel CPU Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
NUMA node0 CPU(s): 0-7
NUMA node1 CPU(s): 8-15
Thanks,
-Jim
From: SPDK <spdk-bounces(a)lists.01.org<mailto:spdk-bounces(a)lists.01.org>> on behalf of Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Date: Thursday, October 5, 2017 at 11:26 AM
To: "spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>" <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: [SPDK] Buffer I/O error on bigger block size running fio
Hi
I have SPDK NVMeoF and keep getting error with bigger block size with fio on randwrite tests.
I am using Ubuntu 16.04 with kernel version 4.12.0-041200-generic on target and client.
The DPDK is 17.08 and SPDK is 17.07.1.
Thanks
Victor
[46905.233553] perf: interrupt took too long (2503 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
[48285.159186] blk_update_request: I/O error, dev nvme1n1, sector 2507351968
[48285.159207] blk_update_request: I/O error, dev nvme1n1, sector 1301294496
[48285.159226] blk_update_request: I/O error, dev nvme1n1, sector 1947371168
[48285.159239] blk_update_request: I/O error, dev nvme1n1, sector 1891797568
[48285.159252] blk_update_request: I/O error, dev nvme1n1, sector 10833824
[48285.159265] blk_update_request: I/O error, dev nvme1n1, sector 614937152
[48285.159277] blk_update_request: I/O error, dev nvme1n1, sector 1872305088
[48285.159290] blk_update_request: I/O error, dev nvme1n1, sector 1504491040
[48285.159299] blk_update_request: I/O error, dev nvme1n1, sector 1182136128
[48285.159308] blk_update_request: I/O error, dev nvme1n1, sector 1662985792
[48285.191185] nvme nvme1: Reconnecting in 10 seconds...
[48285.191254] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191291] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191305] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191314] ldm_validate_partition_table(): Disk read failed.
[48285.191320] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191327] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191335] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191342] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191347] Dev nvme1n1: unable to read RDB block 0
[48285.191353] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191360] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191375] Buffer I/O error on dev nvme1n1, logical block 3, async page read
[48285.191389] nvme1n1: unable to read partition table
[48285.223197] nvme1n1: detected capacity change from 1600321314816 to 0
[48289.623192] nvme1n1: detected capacity change from 0 to -65647705833078784
[48289.623411] ldm_validate_partition_table(): Disk read failed.
[48289.623447] Dev nvme1n1: unable to read RDB block 0
[48289.623486] nvme1n1: unable to read partition table
[48289.643305] ldm_validate_partition_table(): Disk read failed.
[48289.643328] Dev nvme1n1: unable to read RDB block 0
[48289.643373] nvme1n1: unable to read partition table
[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 96437 bytes --]
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [SPDK] Buffer I/O error on bigger block size running fio
@ 2017-10-21 8:21 Cao, Gang
0 siblings, 0 replies; 24+ messages in thread
From: Cao, Gang @ 2017-10-21 8:21 UTC (permalink / raw)
To: spdk
[-- Attachment #1: Type: text/plain, Size: 23746 bytes --]
I am also using the SPDK v17.07.1 and DPDK 17.08 on my system to have the trying based on your IO configuration.
Let me check whether our colleague has this Ubuntu OS installed. If not, could you please have a try on CentOS.
Also what kind of backend device did you configured for the NVMe-oF, like Malloc, NVMe SSD or other?
Thanks,
Gang
From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Saturday, October 21, 2017 6:14 AM
To: Cao, Gang <gang.cao(a)intel.com>; Storage Performance Development Kit <spdk(a)lists.01.org>; Harris, James R <james.r.harris(a)intel.com>; John F. Kim <johnk(a)mellanox.com>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Gang
Is it possible to try on Ubuntu OS?
Which verion of DPDK and SPDK are you using?
Where can I get them? Github?
Thanks
Victor
root(a)clientintel:~# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.3 LTS
Release: 16.04
Codename: xenial
root(a)clientintel:~# uname -a
Linux clientintel 4.12.0-041200-generic #201707022031 SMP Mon Jul 3 00:32:52 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Wednesday, October 18, 2017 11:52 PM
To: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
[root(a)node4 gangcao]# uname -a
Linux node4 4.10.1 #1 SMP Fri Mar 10 15:59:57 CST 2017 x86_64 x86_64 x86_64 GNU/Linux
[root(a)node4 gangcao]# lsb_release -a
LSB Version: :core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch
Distributor ID: CentOS
Description: CentOS Linux release 7.2.1511 (Core)
Release: 7.2.1511
Codename: Core
From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Thursday, October 19, 2017 2:14 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Can you give the OS version and kernel version again for target and client?
I couldn't compile dkdp without installed the latest Kernel on 4.12.X kernel on Ubuntu 16.04.
Thanks
Victor
________________________________
From: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Sent: Wednesday, October 18, 2017 10:59:19 PM
To: Victor Banh; Storage Performance Development Kit; Harris, James R
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
This server does not have OFED installed and it is in the loopback mode.
Found another server also in the loopback mode with ConnectX-3 and have OFED installed.
By the way, what SSD are you using? Maybe it is relating to the SSD? I've just run with 2048k for a short duration and seems no issue. Will run more time to see whether can hit this error.
[root(a)slave3 fio]# lspci | grep -i mell
08:00.0 Ethernet controller: Mellanox Technologies MT27520 Family [ConnectX-3 Pro]
[root(a)slave3 fio]# lsmod | grep -i mlx
mlx4_ib 159744 0
ib_core 208896 15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,ib_srpt,ib_ucm,rdma_ucm,mlx4_ib
mlx4_en 114688 0
mlx4_core 307200 2 mlx4_en,mlx4_ib
ptp 20480 3 ixgbe,igb,mlx4_en
[root(a)slave3 fio]# ofed_info
MLNX_OFED_LINUX-3.2-2.0.0.0 (OFED-3.2-2.0.0):
ar_mgr:
osm_plugins/ar_mgr/ar_mgr-1.0-0.30.ga1ea4b7.tar.gz
cc_mgr:
osm_plugins/cc_mgr/cc_mgr-1.0-0.29.ga1ea4b7.tar.gz
dapl:
dapl.git mlnx_ofed_3_1
commit c30fb6ce2cbc29d8ed4bde51437f7abb93378c78
dump_pr:
osm_plugins/dump_pr//dump_pr-1.0-0.25.ga1ea4b7.tar.gz
fabric-collector:
fabric_collector//fabric-collector-1.1.0.MLNX20140410.51b267e.tar.gz
fca:
mlnx_ofed_fca/fca-2.5.2431-1.src.rpm
hcoll:
mlnx_ofed_hcol/hcoll-3.4.807-1.src.rpm
ibacm:
mlnx_ofed/ibacm.git mlnx_ofed_3_2
commit 15ad8c13bdebbe62edea0b7df030710b65c14f7f
ibacm_ssa:
mlnx_ofed_ssa/acm/ibacm_ssa-0.0.9.3.MLNX20151203.50eb579.tar.gz
ibdump:
sniffer/sniffer-4.0.0-2/ibdump/linux/ibdump-4.0.0-2.tgz
ibsim:
mlnx_ofed_ibsim/ibsim-0.6-0.8.g9d76581.tar.gz
ibssa:
mlnx_ofed_ssa/distrib/ibssa-0.0.9.3.MLNX20151203.50eb579.tar.gz
ibutils:
ofed-1.5.3-rpms/ibutils/ibutils-1.5.7.1-0.12.gdcaeae2.tar.gz
ibutils2:
ibutils2/ibutils2-2.1.1-0.76.MLNX20160222.gd366c7b.tar.gz
infiniband-diags:
mlnx_ofed_infiniband_diags/infiniband-diags-1.6.6.MLNX20151130.7f0213e.tar.gz
infinipath-psm:
mlnx_ofed_infinipath-psm/infinipath-psm-3.3-2_g6f42cdb_open.tar.gz
iser:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d
isert:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d
kernel-mft:
mlnx_ofed_mft/kernel-mft-4.3.0-25.src.rpm
knem:
knem.git mellanox-master
commit f143ee19a575cd42a334422fa8bd329d671238db
libibcm:
mlnx_ofed/libibcm.git mlnx_ofed_3_0
commit d7d485df305e6536711485bd7e477668e77d8320
libibmad:
mlnx_ofed_libibmad/libibmad-1.3.12.MLNX20151122.d140cb1.tar.gz
libibprof:
mlnx_ofed_libibprof/libibprof-1.1.22-1.src.rpm
libibumad:
mlnx_ofed_libibumad/libibumad-1.3.10.2.MLNX20150406.966500d.tar.gz
libibverbs:
mlnx_ofed/libibverbs.git mlnx_ofed_3_2_2
commit 217a77686f4861229f0e4b94485a13f024634caf
libmlx4:
mlnx_ofed/libmlx4.git mlnx_ofed_3_2_2
commit dda6d7ae1e6e3779a485ebdd0a882f4bcbd027a6
libmlx5:
mlnx_ofed/libmlx5.git mlnx_ofed_3_2_2
commit d0c8645359e0f0aba0408b2d344f3b418d27019b
libopensmssa:
mlnx_ofed_ssa/plugin/libopensmssa-0.0.9.3.MLNX20151203.50eb579.tar.gz
librdmacm:
mlnx_ofed/librdmacm.git mlnx_ofed_3_2_2
commit 6bd430fed9e7b3d57a1876c040431ce7295c7703
libsdp:
libsdp.git mlnx_ofed_3_0
commit fbd01dfff05f42d6b82506e7dbf4bc6b7e6a59a4
libvma:
vma/source_rpms//libvma-7.0.14-0.src.rpm
mlnx-ethtool:
upstream/ethtool.git for-upstream
commit ac0cf295abe0c0832f0711fed66ab9601c8b2513
mlnx-ofa_kernel:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d
mpi-selector:
ofed-1.5.3-rpms/mpi-selector/mpi-selector-1.0.3-1.src.rpm
mpitests:
mlnx_ofed_mpitest/mpitests-3.2.17-e1c7f2f.src.rpm
mstflint:
mlnx_ofed_mstflint/mstflint-4.3.0-1.49.g9b9af70.tar.gz
multiperf:
mlnx_ofed_multiperf/multiperf-3.0-0.10.gda89e8c.tar.gz
mvapich2:
mlnx_ofed_mvapich2/mvapich2-2.2a-1.src.rpm
mxm:
mlnx_ofed_mxm/mxm-3.4.3079-1.src.rpm
ofed-docs:
docs.git mlnx_ofed-3.2
commit ea3386416f9f7130edd2c70fc3424cb2cda50f7d
openmpi:
mlnx_ofed_ompi_1.8/openmpi-1.10.3a1-1.src.rpm
opensm:
mlnx_ofed_opensm/opensm-4.6.1.MLNX20160112.774e977.tar.gz
perftest:
mlnx_ofed_perftest/perftest-3.0-0.18.gb464d59.tar.gz
qperf:
mlnx_ofed_qperf/qperf-0.4.9.tar.gz
rds-tools:
rds-tools.git mlnx_ofed_2_4
commit 299420ca25cf9996bc0748e3bc4b08748996ba49
sdpnetstat:
sdpnetstat.git mlnx_ofed_3_0
commit 3cf409a7cc07e5c71f9640eddbb801ece21b4169
sockperf:
sockperf/sockperf-2.7-43.git3ee62bd8107a.src.rpm
srp:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d
srptools:
srptools/srptools-1.0.2-12.src.rpm
Installed Packages:
-------------------
infiniband-diags
librdmacm
libmlx4
libibverbs-utils
mpi-selector
libibmad-devel
sdpnetstat
knem
libibumad-devel
libsdp
mlnx-ethtool
libibverbs-debuginfo
mlnx-ofa_kernel-modules
srp
opensm
mstflint
cc_mgr
libibmad
libibverbs
kernel-mft
libibverbs-devel-static
libibumad
librdmacm-devel
mlnx-ofa_kernel
libsdp-devel
ibutils2
mlnxofed-docs
libibmad-static
iser
opensm-devel
dump_pr
libibumad-static
rds-tools
libmlx4-debuginfo
mlnx-ofa_kernel-devel
opensm-libs
opensm-static
ar_mgr
dapl-devel-static
infiniband-diags-compat
libibverbs-devel
ibsim
[root(a)slave3 fio]# ibstat
CA 'mlx4_0'
CA type: MT4103
Number of ports: 2
Firmware version: 2.35.5100
Hardware version: 0
Node GUID: 0x248a0703006090e0
System image GUID: 0x248a0703006090e0
Port 1:
State: Active
Physical state: LinkUp
Rate: 40
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x04010000
Port GUID: 0x268a07fffe6090e0
Link layer: Ethernet
Port 2:
State: Active
Physical state: LinkUp
Rate: 40
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x04010000
Port GUID: 0x268a07fffe6090e1
Link layer: Ethernet
[root(a)slave3 fio]# ibv_devinfo
hca_id: mlx4_0
transport: InfiniBand (0)
fw_ver: 2.35.5100
node_guid: 248a:0703:0060:90e0
sys_image_guid: 248a:0703:0060:90e0
vendor_id: 0x02c9
vendor_part_id: 4103
hw_ver: 0x0
board_id: MT_1090111023
phys_port_cnt: 2
port: 1
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 1024 (3)
sm_lid: 0
port_lid: 0
port_lmc: 0x00
link_layer: Ethernet
port: 2
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 1024 (3)
sm_lid: 0
port_lid: 0
port_lmc: 0x00
link_layer: Ethernet
From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Thursday, October 19, 2017 12:34 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Do you install Mellanox OFED on the Target and Client server?
Can you run ibstat on both servers?
Thanks
Victor
________________________________
From: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Sent: Wednesday, October 18, 2017 8:59:28 PM
To: Victor Banh; Storage Performance Development Kit; Harris, James R
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
I've just tried the SPDK v17.07.1 and DPDK v17.08.
nvme version: 1.1.38.gfaab
fio version: 3.1
Tried the 512k and 1024k IO size and there is no error. demsg information as following.
So that there may be other difference here? Looks like you are using ConnectX-5 while I am using ConnectX-4?
Other related information:
[root(a)node4 fio]# lspci | grep -i mell
81:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
81:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
[root(a)node4 fio]# lsmod | grep -i mlx
mlx5_ib 172032 0
ib_core 200704 15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,mlx5_ib,ib_srpt,ib_ucm,rdma_ucm
mlx5_core 380928 1 mlx5_ib
ptp 20480 3 ixgbe,igb,mlx5_core
[root(a)node4 fio]# uname -a
Linux node4 4.10.1 #1 SMP Fri Mar 10 15:59:57 CST 2017 x86_64 x86_64 x86_64 GNU/Linux
[577707.543326] nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 192.168.100.8:4420
[577730.854540] detected loopback device
[577730.893761] nvme nvme0: creating 7 I/O queues.
[577730.893797] detected loopback device
[577730.898611] detected loopback device
[577730.908917] detected loopback device
[577730.919073] detected loopback device
[577730.928922] detected loopback device
[577730.938679] detected loopback device
[577730.948365] detected loopback device
[577731.146290] nvme nvme0: new ctrl: NQN "nqn.2016-06.io.spdk:cnode2", addr 192.168.100.8:4420
Thanks,
Gang
From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Thursday, October 19, 2017 9:43 AM
To: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Gang
Any update?
Do you see any error message from "dmesg" with 512k block size running fio?
Thanks
Victor
From: Victor Banh
Sent: Tuesday, October 17, 2017 7:37 PM
To: 'Cao, Gang' <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Gang
spdk-17.07.1 and dpdk-17.08
Thanks
Victor
From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Monday, October 16, 2017 8:51 PM
To: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?
Thanks,
Gang
From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Tuesday, October 17, 2017 5:30 AM
To: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Cao
Do you see any message from dmesg?
I tried this fio version and still saw these error message from dmesg.
fio-3.1
[869053.218235] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218250] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218259] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218263] ldm_validate_partition_table(): Disk read failed.
[869053.218269] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218277] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218285] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218292] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218296] Dev nvme2n1: unable to read RDB block 0
[869053.218303] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218311] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218323] Buffer I/O error on dev nvme2n1, logical block 3, async page read
[869053.218338] nvme2n1: unable to read partition table
[869053.246126] nvme2n1: detected capacity change from -62111005559226368 to -62042256479092736
[869053.246195] ldm_validate_partition_table(): Disk read failed.
[869053.246217] Dev nvme2n1: unable to read RDB block 0
From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Monday, October 09, 2017 10:59 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Cc: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
Thanks for your detailed information on the testing.
I've tried the latest SPDK code and with latest fio-3.1-20-g132b and fio-2.19. It seems like no this kind of error.
Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?
fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme0n1 --name=read-phase --rw=randwrite
read-phase: (g=0): rw=randwrite, bs=(R) 512KiB-512KiB, (W) 512KiB-512KiB, (T) 512KiB-512KiB, ioengine=libaio, iodepth=16
...
fio-3.1-20-g132b
Starting 4 processes
Jobs: 4 (f=4): [w(4)][100.0%][r=0KiB/s,w=1592MiB/s][r=0,w=3183 IOPS][eta 00m:00s]
read-phase: (groupid=0, jobs=1): err= 0: pid=46378: Tue Oct 10 01:23:39 2017
My NIC information:
[root(a)node4 nvme-cli-gerrit]# lsmod | grep -i mlx
mlx5_ib 172032 0
ib_core 200704 15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,mlx5_ib,ib_srpt,ib_ucm,rdma_ucm
mlx5_core 380928 1 mlx5_ib
ptp 20480 3 ixgbe,igb,mlx5_core
[root(a)node4 nvme-cli-gerrit]# lspci | grep -i mell
81:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
81:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Victor Banh
Sent: Friday, October 6, 2017 2:41 PM
To: Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio
From: Harris, James R [mailto:james.r.harris(a)intel.com]
Sent: Friday, October 06, 2017 2:32 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Cc: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio
(cc Victor)
From: James Harris <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Date: Thursday, October 5, 2017 at 1:59 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
Could you provide a few more details? This will help the list to provide some ideas.
1) On the client, are you using the SPDK NVMe-oF initiator or the kernel initiator?
Kernel initiator, run these commands on client server.
modprobe mlx5_ib
modprobe nvme-rdma
nvme discover -t rdma -a 192.168.10.11 -s 4420
nvme connect -t rdma -n nqn.2016-06.io.spdk:nvme-subsystem-1 -a 192.168.10.11 -s 4420
2) Can you provide the fio configuration file or command line? Just so we can have more specifics on "bigger block size".
fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme1n1 --name=read-phase --rw=randwrite
3) Any details on the HW setup - specifically details on the RDMA NIC (or if you're using SW RoCE).
Nvmf.conf on target server
[Global]
Comment "Global section"
ReactorMask 0xff00
[Rpc]
Enable No
Listen 127.0.0.1
[Nvmf]
MaxQueuesPerSession 8
MaxQueueDepth 128
[Subsystem1]
NQN nqn.2016-06.io.spdk:nvme-subsystem-1
Core 9
Mode Direct
Listen RDMA 192.168.10.11:4420
NVMe 0000:82:00.0
SN S2PMNAAH400039
It is RDMA NIC, ConnectX 5, Intel CPU Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
NUMA node0 CPU(s): 0-7
NUMA node1 CPU(s): 8-15
Thanks,
-Jim
From: SPDK <spdk-bounces(a)lists.01.org<mailto:spdk-bounces(a)lists.01.org>> on behalf of Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Date: Thursday, October 5, 2017 at 11:26 AM
To: "spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>" <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: [SPDK] Buffer I/O error on bigger block size running fio
Hi
I have SPDK NVMeoF and keep getting error with bigger block size with fio on randwrite tests.
I am using Ubuntu 16.04 with kernel version 4.12.0-041200-generic on target and client.
The DPDK is 17.08 and SPDK is 17.07.1.
Thanks
Victor
[46905.233553] perf: interrupt took too long (2503 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
[48285.159186] blk_update_request: I/O error, dev nvme1n1, sector 2507351968
[48285.159207] blk_update_request: I/O error, dev nvme1n1, sector 1301294496
[48285.159226] blk_update_request: I/O error, dev nvme1n1, sector 1947371168
[48285.159239] blk_update_request: I/O error, dev nvme1n1, sector 1891797568
[48285.159252] blk_update_request: I/O error, dev nvme1n1, sector 10833824
[48285.159265] blk_update_request: I/O error, dev nvme1n1, sector 614937152
[48285.159277] blk_update_request: I/O error, dev nvme1n1, sector 1872305088
[48285.159290] blk_update_request: I/O error, dev nvme1n1, sector 1504491040
[48285.159299] blk_update_request: I/O error, dev nvme1n1, sector 1182136128
[48285.159308] blk_update_request: I/O error, dev nvme1n1, sector 1662985792
[48285.191185] nvme nvme1: Reconnecting in 10 seconds...
[48285.191254] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191291] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191305] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191314] ldm_validate_partition_table(): Disk read failed.
[48285.191320] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191327] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191335] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191342] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191347] Dev nvme1n1: unable to read RDB block 0
[48285.191353] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191360] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191375] Buffer I/O error on dev nvme1n1, logical block 3, async page read
[48285.191389] nvme1n1: unable to read partition table
[48285.223197] nvme1n1: detected capacity change from 1600321314816 to 0
[48289.623192] nvme1n1: detected capacity change from 0 to -65647705833078784
[48289.623411] ldm_validate_partition_table(): Disk read failed.
[48289.623447] Dev nvme1n1: unable to read RDB block 0
[48289.623486] nvme1n1: unable to read partition table
[48289.643305] ldm_validate_partition_table(): Disk read failed.
[48289.643328] Dev nvme1n1: unable to read RDB block 0
[48289.643373] nvme1n1: unable to read partition table
[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 102950 bytes --]
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [SPDK] Buffer I/O error on bigger block size running fio
@ 2017-10-20 22:13 Victor Banh
0 siblings, 0 replies; 24+ messages in thread
From: Victor Banh @ 2017-10-20 22:13 UTC (permalink / raw)
To: spdk
[-- Attachment #1: Type: text/plain, Size: 22955 bytes --]
Hi Gang
Is it possible to try on Ubuntu OS?
Which verion of DPDK and SPDK are you using?
Where can I get them? Github?
Thanks
Victor
root(a)clientintel:~# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.3 LTS
Release: 16.04
Codename: xenial
root(a)clientintel:~# uname -a
Linux clientintel 4.12.0-041200-generic #201707022031 SMP Mon Jul 3 00:32:52 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Wednesday, October 18, 2017 11:52 PM
To: Victor Banh <victorb(a)mellanox.com>; Storage Performance Development Kit <spdk(a)lists.01.org>; Harris, James R <james.r.harris(a)intel.com>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
[root(a)node4 gangcao]# uname -a
Linux node4 4.10.1 #1 SMP Fri Mar 10 15:59:57 CST 2017 x86_64 x86_64 x86_64 GNU/Linux
[root(a)node4 gangcao]# lsb_release -a
LSB Version: :core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch
Distributor ID: CentOS
Description: CentOS Linux release 7.2.1511 (Core)
Release: 7.2.1511
Codename: Core
From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Thursday, October 19, 2017 2:14 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Can you give the OS version and kernel version again for target and client?
I couldn't compile dkdp without installed the latest Kernel on 4.12.X kernel on Ubuntu 16.04.
Thanks
Victor
________________________________
From: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Sent: Wednesday, October 18, 2017 10:59:19 PM
To: Victor Banh; Storage Performance Development Kit; Harris, James R
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
This server does not have OFED installed and it is in the loopback mode.
Found another server also in the loopback mode with ConnectX-3 and have OFED installed.
By the way, what SSD are you using? Maybe it is relating to the SSD? I've just run with 2048k for a short duration and seems no issue. Will run more time to see whether can hit this error.
[root(a)slave3 fio]# lspci | grep -i mell
08:00.0 Ethernet controller: Mellanox Technologies MT27520 Family [ConnectX-3 Pro]
[root(a)slave3 fio]# lsmod | grep -i mlx
mlx4_ib 159744 0
ib_core 208896 15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,ib_srpt,ib_ucm,rdma_ucm,mlx4_ib
mlx4_en 114688 0
mlx4_core 307200 2 mlx4_en,mlx4_ib
ptp 20480 3 ixgbe,igb,mlx4_en
[root(a)slave3 fio]# ofed_info
MLNX_OFED_LINUX-3.2-2.0.0.0 (OFED-3.2-2.0.0):
ar_mgr:
osm_plugins/ar_mgr/ar_mgr-1.0-0.30.ga1ea4b7.tar.gz
cc_mgr:
osm_plugins/cc_mgr/cc_mgr-1.0-0.29.ga1ea4b7.tar.gz
dapl:
dapl.git mlnx_ofed_3_1
commit c30fb6ce2cbc29d8ed4bde51437f7abb93378c78
dump_pr:
osm_plugins/dump_pr//dump_pr-1.0-0.25.ga1ea4b7.tar.gz
fabric-collector:
fabric_collector//fabric-collector-1.1.0.MLNX20140410.51b267e.tar.gz
fca:
mlnx_ofed_fca/fca-2.5.2431-1.src.rpm
hcoll:
mlnx_ofed_hcol/hcoll-3.4.807-1.src.rpm
ibacm:
mlnx_ofed/ibacm.git mlnx_ofed_3_2
commit 15ad8c13bdebbe62edea0b7df030710b65c14f7f
ibacm_ssa:
mlnx_ofed_ssa/acm/ibacm_ssa-0.0.9.3.MLNX20151203.50eb579.tar.gz
ibdump:
sniffer/sniffer-4.0.0-2/ibdump/linux/ibdump-4.0.0-2.tgz
ibsim:
mlnx_ofed_ibsim/ibsim-0.6-0.8.g9d76581.tar.gz
ibssa:
mlnx_ofed_ssa/distrib/ibssa-0.0.9.3.MLNX20151203.50eb579.tar.gz
ibutils:
ofed-1.5.3-rpms/ibutils/ibutils-1.5.7.1-0.12.gdcaeae2.tar.gz
ibutils2:
ibutils2/ibutils2-2.1.1-0.76.MLNX20160222.gd366c7b.tar.gz
infiniband-diags:
mlnx_ofed_infiniband_diags/infiniband-diags-1.6.6.MLNX20151130.7f0213e.tar.gz
infinipath-psm:
mlnx_ofed_infinipath-psm/infinipath-psm-3.3-2_g6f42cdb_open.tar.gz
iser:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d
isert:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d
kernel-mft:
mlnx_ofed_mft/kernel-mft-4.3.0-25.src.rpm
knem:
knem.git mellanox-master
commit f143ee19a575cd42a334422fa8bd329d671238db
libibcm:
mlnx_ofed/libibcm.git mlnx_ofed_3_0
commit d7d485df305e6536711485bd7e477668e77d8320
libibmad:
mlnx_ofed_libibmad/libibmad-1.3.12.MLNX20151122.d140cb1.tar.gz
libibprof:
mlnx_ofed_libibprof/libibprof-1.1.22-1.src.rpm
libibumad:
mlnx_ofed_libibumad/libibumad-1.3.10.2.MLNX20150406.966500d.tar.gz
libibverbs:
mlnx_ofed/libibverbs.git mlnx_ofed_3_2_2
commit 217a77686f4861229f0e4b94485a13f024634caf
libmlx4:
mlnx_ofed/libmlx4.git mlnx_ofed_3_2_2
commit dda6d7ae1e6e3779a485ebdd0a882f4bcbd027a6
libmlx5:
mlnx_ofed/libmlx5.git mlnx_ofed_3_2_2
commit d0c8645359e0f0aba0408b2d344f3b418d27019b
libopensmssa:
mlnx_ofed_ssa/plugin/libopensmssa-0.0.9.3.MLNX20151203.50eb579.tar.gz
librdmacm:
mlnx_ofed/librdmacm.git mlnx_ofed_3_2_2
commit 6bd430fed9e7b3d57a1876c040431ce7295c7703
libsdp:
libsdp.git mlnx_ofed_3_0
commit fbd01dfff05f42d6b82506e7dbf4bc6b7e6a59a4
libvma:
vma/source_rpms//libvma-7.0.14-0.src.rpm
mlnx-ethtool:
upstream/ethtool.git for-upstream
commit ac0cf295abe0c0832f0711fed66ab9601c8b2513
mlnx-ofa_kernel:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d
mpi-selector:
ofed-1.5.3-rpms/mpi-selector/mpi-selector-1.0.3-1.src.rpm
mpitests:
mlnx_ofed_mpitest/mpitests-3.2.17-e1c7f2f.src.rpm
mstflint:
mlnx_ofed_mstflint/mstflint-4.3.0-1.49.g9b9af70.tar.gz
multiperf:
mlnx_ofed_multiperf/multiperf-3.0-0.10.gda89e8c.tar.gz
mvapich2:
mlnx_ofed_mvapich2/mvapich2-2.2a-1.src.rpm
mxm:
mlnx_ofed_mxm/mxm-3.4.3079-1.src.rpm
ofed-docs:
docs.git mlnx_ofed-3.2
commit ea3386416f9f7130edd2c70fc3424cb2cda50f7d
openmpi:
mlnx_ofed_ompi_1.8/openmpi-1.10.3a1-1.src.rpm
opensm:
mlnx_ofed_opensm/opensm-4.6.1.MLNX20160112.774e977.tar.gz
perftest:
mlnx_ofed_perftest/perftest-3.0-0.18.gb464d59.tar.gz
qperf:
mlnx_ofed_qperf/qperf-0.4.9.tar.gz
rds-tools:
rds-tools.git mlnx_ofed_2_4
commit 299420ca25cf9996bc0748e3bc4b08748996ba49
sdpnetstat:
sdpnetstat.git mlnx_ofed_3_0
commit 3cf409a7cc07e5c71f9640eddbb801ece21b4169
sockperf:
sockperf/sockperf-2.7-43.git3ee62bd8107a.src.rpm
srp:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d
srptools:
srptools/srptools-1.0.2-12.src.rpm
Installed Packages:
-------------------
infiniband-diags
librdmacm
libmlx4
libibverbs-utils
mpi-selector
libibmad-devel
sdpnetstat
knem
libibumad-devel
libsdp
mlnx-ethtool
libibverbs-debuginfo
mlnx-ofa_kernel-modules
srp
opensm
mstflint
cc_mgr
libibmad
libibverbs
kernel-mft
libibverbs-devel-static
libibumad
librdmacm-devel
mlnx-ofa_kernel
libsdp-devel
ibutils2
mlnxofed-docs
libibmad-static
iser
opensm-devel
dump_pr
libibumad-static
rds-tools
libmlx4-debuginfo
mlnx-ofa_kernel-devel
opensm-libs
opensm-static
ar_mgr
dapl-devel-static
infiniband-diags-compat
libibverbs-devel
ibsim
[root(a)slave3 fio]# ibstat
CA 'mlx4_0'
CA type: MT4103
Number of ports: 2
Firmware version: 2.35.5100
Hardware version: 0
Node GUID: 0x248a0703006090e0
System image GUID: 0x248a0703006090e0
Port 1:
State: Active
Physical state: LinkUp
Rate: 40
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x04010000
Port GUID: 0x268a07fffe6090e0
Link layer: Ethernet
Port 2:
State: Active
Physical state: LinkUp
Rate: 40
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x04010000
Port GUID: 0x268a07fffe6090e1
Link layer: Ethernet
[root(a)slave3 fio]# ibv_devinfo
hca_id: mlx4_0
transport: InfiniBand (0)
fw_ver: 2.35.5100
node_guid: 248a:0703:0060:90e0
sys_image_guid: 248a:0703:0060:90e0
vendor_id: 0x02c9
vendor_part_id: 4103
hw_ver: 0x0
board_id: MT_1090111023
phys_port_cnt: 2
port: 1
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 1024 (3)
sm_lid: 0
port_lid: 0
port_lmc: 0x00
link_layer: Ethernet
port: 2
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 1024 (3)
sm_lid: 0
port_lid: 0
port_lmc: 0x00
link_layer: Ethernet
From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Thursday, October 19, 2017 12:34 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Do you install Mellanox OFED on the Target and Client server?
Can you run ibstat on both servers?
Thanks
Victor
________________________________
From: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Sent: Wednesday, October 18, 2017 8:59:28 PM
To: Victor Banh; Storage Performance Development Kit; Harris, James R
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
I've just tried the SPDK v17.07.1 and DPDK v17.08.
nvme version: 1.1.38.gfaab
fio version: 3.1
Tried the 512k and 1024k IO size and there is no error. demsg information as following.
So that there may be other difference here? Looks like you are using ConnectX-5 while I am using ConnectX-4?
Other related information:
[root(a)node4 fio]# lspci | grep -i mell
81:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
81:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
[root(a)node4 fio]# lsmod | grep -i mlx
mlx5_ib 172032 0
ib_core 200704 15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,mlx5_ib,ib_srpt,ib_ucm,rdma_ucm
mlx5_core 380928 1 mlx5_ib
ptp 20480 3 ixgbe,igb,mlx5_core
[root(a)node4 fio]# uname -a
Linux node4 4.10.1 #1 SMP Fri Mar 10 15:59:57 CST 2017 x86_64 x86_64 x86_64 GNU/Linux
[577707.543326] nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 192.168.100.8:4420
[577730.854540] detected loopback device
[577730.893761] nvme nvme0: creating 7 I/O queues.
[577730.893797] detected loopback device
[577730.898611] detected loopback device
[577730.908917] detected loopback device
[577730.919073] detected loopback device
[577730.928922] detected loopback device
[577730.938679] detected loopback device
[577730.948365] detected loopback device
[577731.146290] nvme nvme0: new ctrl: NQN "nqn.2016-06.io.spdk:cnode2", addr 192.168.100.8:4420
Thanks,
Gang
From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Thursday, October 19, 2017 9:43 AM
To: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Gang
Any update?
Do you see any error message from "dmesg" with 512k block size running fio?
Thanks
Victor
From: Victor Banh
Sent: Tuesday, October 17, 2017 7:37 PM
To: 'Cao, Gang' <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Gang
spdk-17.07.1 and dpdk-17.08
Thanks
Victor
From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Monday, October 16, 2017 8:51 PM
To: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?
Thanks,
Gang
From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Tuesday, October 17, 2017 5:30 AM
To: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Cao
Do you see any message from dmesg?
I tried this fio version and still saw these error message from dmesg.
fio-3.1
[869053.218235] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218250] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218259] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218263] ldm_validate_partition_table(): Disk read failed.
[869053.218269] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218277] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218285] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218292] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218296] Dev nvme2n1: unable to read RDB block 0
[869053.218303] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218311] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218323] Buffer I/O error on dev nvme2n1, logical block 3, async page read
[869053.218338] nvme2n1: unable to read partition table
[869053.246126] nvme2n1: detected capacity change from -62111005559226368 to -62042256479092736
[869053.246195] ldm_validate_partition_table(): Disk read failed.
[869053.246217] Dev nvme2n1: unable to read RDB block 0
From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Monday, October 09, 2017 10:59 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Cc: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
Thanks for your detailed information on the testing.
I've tried the latest SPDK code and with latest fio-3.1-20-g132b and fio-2.19. It seems like no this kind of error.
Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?
fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme0n1 --name=read-phase --rw=randwrite
read-phase: (g=0): rw=randwrite, bs=(R) 512KiB-512KiB, (W) 512KiB-512KiB, (T) 512KiB-512KiB, ioengine=libaio, iodepth=16
...
fio-3.1-20-g132b
Starting 4 processes
Jobs: 4 (f=4): [w(4)][100.0%][r=0KiB/s,w=1592MiB/s][r=0,w=3183 IOPS][eta 00m:00s]
read-phase: (groupid=0, jobs=1): err= 0: pid=46378: Tue Oct 10 01:23:39 2017
My NIC information:
[root(a)node4 nvme-cli-gerrit]# lsmod | grep -i mlx
mlx5_ib 172032 0
ib_core 200704 15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,mlx5_ib,ib_srpt,ib_ucm,rdma_ucm
mlx5_core 380928 1 mlx5_ib
ptp 20480 3 ixgbe,igb,mlx5_core
[root(a)node4 nvme-cli-gerrit]# lspci | grep -i mell
81:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
81:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Victor Banh
Sent: Friday, October 6, 2017 2:41 PM
To: Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio
From: Harris, James R [mailto:james.r.harris(a)intel.com]
Sent: Friday, October 06, 2017 2:32 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Cc: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio
(cc Victor)
From: James Harris <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Date: Thursday, October 5, 2017 at 1:59 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
Could you provide a few more details? This will help the list to provide some ideas.
1) On the client, are you using the SPDK NVMe-oF initiator or the kernel initiator?
Kernel initiator, run these commands on client server.
modprobe mlx5_ib
modprobe nvme-rdma
nvme discover -t rdma -a 192.168.10.11 -s 4420
nvme connect -t rdma -n nqn.2016-06.io.spdk:nvme-subsystem-1 -a 192.168.10.11 -s 4420
2) Can you provide the fio configuration file or command line? Just so we can have more specifics on "bigger block size".
fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme1n1 --name=read-phase --rw=randwrite
3) Any details on the HW setup - specifically details on the RDMA NIC (or if you're using SW RoCE).
Nvmf.conf on target server
[Global]
Comment "Global section"
ReactorMask 0xff00
[Rpc]
Enable No
Listen 127.0.0.1
[Nvmf]
MaxQueuesPerSession 8
MaxQueueDepth 128
[Subsystem1]
NQN nqn.2016-06.io.spdk:nvme-subsystem-1
Core 9
Mode Direct
Listen RDMA 192.168.10.11:4420
NVMe 0000:82:00.0
SN S2PMNAAH400039
It is RDMA NIC, ConnectX 5, Intel CPU Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
NUMA node0 CPU(s): 0-7
NUMA node1 CPU(s): 8-15
Thanks,
-Jim
From: SPDK <spdk-bounces(a)lists.01.org<mailto:spdk-bounces(a)lists.01.org>> on behalf of Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Date: Thursday, October 5, 2017 at 11:26 AM
To: "spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>" <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: [SPDK] Buffer I/O error on bigger block size running fio
Hi
I have SPDK NVMeoF and keep getting error with bigger block size with fio on randwrite tests.
I am using Ubuntu 16.04 with kernel version 4.12.0-041200-generic on target and client.
The DPDK is 17.08 and SPDK is 17.07.1.
Thanks
Victor
[46905.233553] perf: interrupt took too long (2503 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
[48285.159186] blk_update_request: I/O error, dev nvme1n1, sector 2507351968
[48285.159207] blk_update_request: I/O error, dev nvme1n1, sector 1301294496
[48285.159226] blk_update_request: I/O error, dev nvme1n1, sector 1947371168
[48285.159239] blk_update_request: I/O error, dev nvme1n1, sector 1891797568
[48285.159252] blk_update_request: I/O error, dev nvme1n1, sector 10833824
[48285.159265] blk_update_request: I/O error, dev nvme1n1, sector 614937152
[48285.159277] blk_update_request: I/O error, dev nvme1n1, sector 1872305088
[48285.159290] blk_update_request: I/O error, dev nvme1n1, sector 1504491040
[48285.159299] blk_update_request: I/O error, dev nvme1n1, sector 1182136128
[48285.159308] blk_update_request: I/O error, dev nvme1n1, sector 1662985792
[48285.191185] nvme nvme1: Reconnecting in 10 seconds...
[48285.191254] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191291] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191305] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191314] ldm_validate_partition_table(): Disk read failed.
[48285.191320] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191327] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191335] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191342] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191347] Dev nvme1n1: unable to read RDB block 0
[48285.191353] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191360] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191375] Buffer I/O error on dev nvme1n1, logical block 3, async page read
[48285.191389] nvme1n1: unable to read partition table
[48285.223197] nvme1n1: detected capacity change from 1600321314816 to 0
[48289.623192] nvme1n1: detected capacity change from 0 to -65647705833078784
[48289.623411] ldm_validate_partition_table(): Disk read failed.
[48289.623447] Dev nvme1n1: unable to read RDB block 0
[48289.623486] nvme1n1: unable to read partition table
[48289.643305] ldm_validate_partition_table(): Disk read failed.
[48289.643328] Dev nvme1n1: unable to read RDB block 0
[48289.643373] nvme1n1: unable to read partition table
[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 117117 bytes --]
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [SPDK] Buffer I/O error on bigger block size running fio
@ 2017-10-19 7:16 Victor Banh
0 siblings, 0 replies; 24+ messages in thread
From: Victor Banh @ 2017-10-19 7:16 UTC (permalink / raw)
To: spdk
[-- Attachment #1: Type: text/plain, Size: 22406 bytes --]
I am using Ubuntu 16.04 and kernel 4.12.X.
________________________________
From: Cao, Gang <gang.cao(a)intel.com>
Sent: Wednesday, October 18, 2017 11:51:39 PM
To: Victor Banh; Storage Performance Development Kit; Harris, James R
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
[root(a)node4 gangcao]# uname -a
Linux node4 4.10.1 #1 SMP Fri Mar 10 15:59:57 CST 2017 x86_64 x86_64 x86_64 GNU/Linux
[root(a)node4 gangcao]# lsb_release -a
LSB Version: :core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch
Distributor ID: CentOS
Description: CentOS Linux release 7.2.1511 (Core)
Release: 7.2.1511
Codename: Core
From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Thursday, October 19, 2017 2:14 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org>; Harris, James R <james.r.harris(a)intel.com>; Cao, Gang <gang.cao(a)intel.com>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Can you give the OS version and kernel version again for target and client?
I couldn't compile dkdp without installed the latest Kernel on 4.12.X kernel on Ubuntu 16.04.
Thanks
Victor
________________________________
From: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Sent: Wednesday, October 18, 2017 10:59:19 PM
To: Victor Banh; Storage Performance Development Kit; Harris, James R
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
This server does not have OFED installed and it is in the loopback mode.
Found another server also in the loopback mode with ConnectX-3 and have OFED installed.
By the way, what SSD are you using? Maybe it is relating to the SSD? I’ve just run with 2048k for a short duration and seems no issue. Will run more time to see whether can hit this error.
[root(a)slave3 fio]# lspci | grep -i mell
08:00.0 Ethernet controller: Mellanox Technologies MT27520 Family [ConnectX-3 Pro]
[root(a)slave3 fio]# lsmod | grep -i mlx
mlx4_ib 159744 0
ib_core 208896 15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,ib_srpt,ib_ucm,rdma_ucm,mlx4_ib
mlx4_en 114688 0
mlx4_core 307200 2 mlx4_en,mlx4_ib
ptp 20480 3 ixgbe,igb,mlx4_en
[root(a)slave3 fio]# ofed_info
MLNX_OFED_LINUX-3.2-2.0.0.0 (OFED-3.2-2.0.0):
ar_mgr:
osm_plugins/ar_mgr/ar_mgr-1.0-0.30.ga1ea4b7.tar.gz
cc_mgr:
osm_plugins/cc_mgr/cc_mgr-1.0-0.29.ga1ea4b7.tar.gz
dapl:
dapl.git mlnx_ofed_3_1
commit c30fb6ce2cbc29d8ed4bde51437f7abb93378c78
dump_pr:
osm_plugins/dump_pr//dump_pr-1.0-0.25.ga1ea4b7.tar.gz
fabric-collector:
fabric_collector//fabric-collector-1.1.0.MLNX20140410.51b267e.tar.gz
fca:
mlnx_ofed_fca/fca-2.5.2431-1.src.rpm
hcoll:
mlnx_ofed_hcol/hcoll-3.4.807-1.src.rpm
ibacm:
mlnx_ofed/ibacm.git mlnx_ofed_3_2
commit 15ad8c13bdebbe62edea0b7df030710b65c14f7f
ibacm_ssa:
mlnx_ofed_ssa/acm/ibacm_ssa-0.0.9.3.MLNX20151203.50eb579.tar.gz
ibdump:
sniffer/sniffer-4.0.0-2/ibdump/linux/ibdump-4.0.0-2.tgz
ibsim:
mlnx_ofed_ibsim/ibsim-0.6-0.8.g9d76581.tar.gz
ibssa:
mlnx_ofed_ssa/distrib/ibssa-0.0.9.3.MLNX20151203.50eb579.tar.gz
ibutils:
ofed-1.5.3-rpms/ibutils/ibutils-1.5.7.1-0.12.gdcaeae2.tar.gz
ibutils2:
ibutils2/ibutils2-2.1.1-0.76.MLNX20160222.gd366c7b.tar.gz
infiniband-diags:
mlnx_ofed_infiniband_diags/infiniband-diags-1.6.6.MLNX20151130.7f0213e.tar.gz
infinipath-psm:
mlnx_ofed_infinipath-psm/infinipath-psm-3.3-2_g6f42cdb_open.tar.gz
iser:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d
isert:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d
kernel-mft:
mlnx_ofed_mft/kernel-mft-4.3.0-25.src.rpm
knem:
knem.git mellanox-master
commit f143ee19a575cd42a334422fa8bd329d671238db
libibcm:
mlnx_ofed/libibcm.git mlnx_ofed_3_0
commit d7d485df305e6536711485bd7e477668e77d8320
libibmad:
mlnx_ofed_libibmad/libibmad-1.3.12.MLNX20151122.d140cb1.tar.gz
libibprof:
mlnx_ofed_libibprof/libibprof-1.1.22-1.src.rpm
libibumad:
mlnx_ofed_libibumad/libibumad-1.3.10.2.MLNX20150406.966500d.tar.gz
libibverbs:
mlnx_ofed/libibverbs.git mlnx_ofed_3_2_2
commit 217a77686f4861229f0e4b94485a13f024634caf
libmlx4:
mlnx_ofed/libmlx4.git mlnx_ofed_3_2_2
commit dda6d7ae1e6e3779a485ebdd0a882f4bcbd027a6
libmlx5:
mlnx_ofed/libmlx5.git mlnx_ofed_3_2_2
commit d0c8645359e0f0aba0408b2d344f3b418d27019b
libopensmssa:
mlnx_ofed_ssa/plugin/libopensmssa-0.0.9.3.MLNX20151203.50eb579.tar.gz
librdmacm:
mlnx_ofed/librdmacm.git mlnx_ofed_3_2_2
commit 6bd430fed9e7b3d57a1876c040431ce7295c7703
libsdp:
libsdp.git mlnx_ofed_3_0
commit fbd01dfff05f42d6b82506e7dbf4bc6b7e6a59a4
libvma:
vma/source_rpms//libvma-7.0.14-0.src.rpm
mlnx-ethtool:
upstream/ethtool.git for-upstream
commit ac0cf295abe0c0832f0711fed66ab9601c8b2513
mlnx-ofa_kernel:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d
mpi-selector:
ofed-1.5.3-rpms/mpi-selector/mpi-selector-1.0.3-1.src.rpm
mpitests:
mlnx_ofed_mpitest/mpitests-3.2.17-e1c7f2f.src.rpm
mstflint:
mlnx_ofed_mstflint/mstflint-4.3.0-1.49.g9b9af70.tar.gz
multiperf:
mlnx_ofed_multiperf/multiperf-3.0-0.10.gda89e8c.tar.gz
mvapich2:
mlnx_ofed_mvapich2/mvapich2-2.2a-1.src.rpm
mxm:
mlnx_ofed_mxm/mxm-3.4.3079-1.src.rpm
ofed-docs:
docs.git mlnx_ofed-3.2
commit ea3386416f9f7130edd2c70fc3424cb2cda50f7d
openmpi:
mlnx_ofed_ompi_1.8/openmpi-1.10.3a1-1.src.rpm
opensm:
mlnx_ofed_opensm/opensm-4.6.1.MLNX20160112.774e977.tar.gz
perftest:
mlnx_ofed_perftest/perftest-3.0-0.18.gb464d59.tar.gz
qperf:
mlnx_ofed_qperf/qperf-0.4.9.tar.gz
rds-tools:
rds-tools.git mlnx_ofed_2_4
commit 299420ca25cf9996bc0748e3bc4b08748996ba49
sdpnetstat:
sdpnetstat.git mlnx_ofed_3_0
commit 3cf409a7cc07e5c71f9640eddbb801ece21b4169
sockperf:
sockperf/sockperf-2.7-43.git3ee62bd8107a.src.rpm
srp:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d
srptools:
srptools/srptools-1.0.2-12.src.rpm
Installed Packages:
-------------------
infiniband-diags
librdmacm
libmlx4
libibverbs-utils
mpi-selector
libibmad-devel
sdpnetstat
knem
libibumad-devel
libsdp
mlnx-ethtool
libibverbs-debuginfo
mlnx-ofa_kernel-modules
srp
opensm
mstflint
cc_mgr
libibmad
libibverbs
kernel-mft
libibverbs-devel-static
libibumad
librdmacm-devel
mlnx-ofa_kernel
libsdp-devel
ibutils2
mlnxofed-docs
libibmad-static
iser
opensm-devel
dump_pr
libibumad-static
rds-tools
libmlx4-debuginfo
mlnx-ofa_kernel-devel
opensm-libs
opensm-static
ar_mgr
dapl-devel-static
infiniband-diags-compat
libibverbs-devel
ibsim
[root(a)slave3 fio]# ibstat
CA 'mlx4_0'
CA type: MT4103
Number of ports: 2
Firmware version: 2.35.5100
Hardware version: 0
Node GUID: 0x248a0703006090e0
System image GUID: 0x248a0703006090e0
Port 1:
State: Active
Physical state: LinkUp
Rate: 40
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x04010000
Port GUID: 0x268a07fffe6090e0
Link layer: Ethernet
Port 2:
State: Active
Physical state: LinkUp
Rate: 40
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x04010000
Port GUID: 0x268a07fffe6090e1
Link layer: Ethernet
[root(a)slave3 fio]# ibv_devinfo
hca_id: mlx4_0
transport: InfiniBand (0)
fw_ver: 2.35.5100
node_guid: 248a:0703:0060:90e0
sys_image_guid: 248a:0703:0060:90e0
vendor_id: 0x02c9
vendor_part_id: 4103
hw_ver: 0x0
board_id: MT_1090111023
phys_port_cnt: 2
port: 1
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 1024 (3)
sm_lid: 0
port_lid: 0
port_lmc: 0x00
link_layer: Ethernet
port: 2
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 1024 (3)
sm_lid: 0
port_lid: 0
port_lmc: 0x00
link_layer: Ethernet
From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Thursday, October 19, 2017 12:34 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Do you install Mellanox OFED on the Target and Client server?
Can you run ibstat on both servers?
Thanks
Victor
________________________________
From: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Sent: Wednesday, October 18, 2017 8:59:28 PM
To: Victor Banh; Storage Performance Development Kit; Harris, James R
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
I’ve just tried the SPDK v17.07.1 and DPDK v17.08.
nvme version: 1.1.38.gfaab
fio version: 3.1
Tried the 512k and 1024k IO size and there is no error. demsg information as following.
So that there may be other difference here? Looks like you are using ConnectX-5 while I am using ConnectX-4?
Other related information:
[root(a)node4 fio]# lspci | grep -i mell
81:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
81:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
[root(a)node4 fio]# lsmod | grep -i mlx
mlx5_ib 172032 0
ib_core 200704 15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,mlx5_ib,ib_srpt,ib_ucm,rdma_ucm
mlx5_core 380928 1 mlx5_ib
ptp 20480 3 ixgbe,igb,mlx5_core
[root(a)node4 fio]# uname -a
Linux node4 4.10.1 #1 SMP Fri Mar 10 15:59:57 CST 2017 x86_64 x86_64 x86_64 GNU/Linux
[577707.543326] nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 192.168.100.8:4420
[577730.854540] detected loopback device
[577730.893761] nvme nvme0: creating 7 I/O queues.
[577730.893797] detected loopback device
[577730.898611] detected loopback device
[577730.908917] detected loopback device
[577730.919073] detected loopback device
[577730.928922] detected loopback device
[577730.938679] detected loopback device
[577730.948365] detected loopback device
[577731.146290] nvme nvme0: new ctrl: NQN "nqn.2016-06.io.spdk:cnode2", addr 192.168.100.8:4420
Thanks,
Gang
From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Thursday, October 19, 2017 9:43 AM
To: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Gang
Any update?
Do you see any error message from “dmesg” with 512k block size running fio?
Thanks
Victor
From: Victor Banh
Sent: Tuesday, October 17, 2017 7:37 PM
To: 'Cao, Gang' <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Gang
spdk-17.07.1 and dpdk-17.08
Thanks
Victor
From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Monday, October 16, 2017 8:51 PM
To: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?
Thanks,
Gang
From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Tuesday, October 17, 2017 5:30 AM
To: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Cao
Do you see any message from dmesg?
I tried this fio version and still saw these error message from dmesg.
fio-3.1
[869053.218235] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218250] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218259] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218263] ldm_validate_partition_table(): Disk read failed.
[869053.218269] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218277] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218285] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218292] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218296] Dev nvme2n1: unable to read RDB block 0
[869053.218303] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218311] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218323] Buffer I/O error on dev nvme2n1, logical block 3, async page read
[869053.218338] nvme2n1: unable to read partition table
[869053.246126] nvme2n1: detected capacity change from -62111005559226368 to -62042256479092736
[869053.246195] ldm_validate_partition_table(): Disk read failed.
[869053.246217] Dev nvme2n1: unable to read RDB block 0
From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Monday, October 09, 2017 10:59 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Cc: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
Thanks for your detailed information on the testing.
I’ve tried the latest SPDK code and with latest fio-3.1-20-g132b and fio-2.19. It seems like no this kind of error.
Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?
fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme0n1 --name=read-phase --rw=randwrite
read-phase: (g=0): rw=randwrite, bs=(R) 512KiB-512KiB, (W) 512KiB-512KiB, (T) 512KiB-512KiB, ioengine=libaio, iodepth=16
...
fio-3.1-20-g132b
Starting 4 processes
Jobs: 4 (f=4): [w(4)][100.0%][r=0KiB/s,w=1592MiB/s][r=0,w=3183 IOPS][eta 00m:00s]
read-phase: (groupid=0, jobs=1): err= 0: pid=46378: Tue Oct 10 01:23:39 2017
My NIC information:
[root(a)node4 nvme-cli-gerrit]# lsmod | grep -i mlx
mlx5_ib 172032 0
ib_core 200704 15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,mlx5_ib,ib_srpt,ib_ucm,rdma_ucm
mlx5_core 380928 1 mlx5_ib
ptp 20480 3 ixgbe,igb,mlx5_core
[root(a)node4 nvme-cli-gerrit]# lspci | grep -i mell
81:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
81:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Victor Banh
Sent: Friday, October 6, 2017 2:41 PM
To: Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio
From: Harris, James R [mailto:james.r.harris(a)intel.com]
Sent: Friday, October 06, 2017 2:32 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Cc: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio
(cc Victor)
From: James Harris <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Date: Thursday, October 5, 2017 at 1:59 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
Could you provide a few more details? This will help the list to provide some ideas.
1) On the client, are you using the SPDK NVMe-oF initiator or the kernel initiator?
Kernel initiator, run these commands on client server.
modprobe mlx5_ib
modprobe nvme-rdma
nvme discover -t rdma -a 192.168.10.11 -s 4420
nvme connect -t rdma -n nqn.2016-06.io.spdk:nvme-subsystem-1 -a 192.168.10.11 -s 4420
2) Can you provide the fio configuration file or command line? Just so we can have more specifics on “bigger block size”.
fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme1n1 --name=read-phase --rw=randwrite
3) Any details on the HW setup – specifically details on the RDMA NIC (or if you’re using SW RoCE).
Nvmf.conf on target server
[Global]
Comment "Global section"
ReactorMask 0xff00
[Rpc]
Enable No
Listen 127.0.0.1
[Nvmf]
MaxQueuesPerSession 8
MaxQueueDepth 128
[Subsystem1]
NQN nqn.2016-06.io.spdk:nvme-subsystem-1
Core 9
Mode Direct
Listen RDMA 192.168.10.11:4420
NVMe 0000:82:00.0
SN S2PMNAAH400039
It is RDMA NIC, ConnectX 5, Intel CPU Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
NUMA node0 CPU(s): 0-7
NUMA node1 CPU(s): 8-15
Thanks,
-Jim
From: SPDK <spdk-bounces(a)lists.01.org<mailto:spdk-bounces(a)lists.01.org>> on behalf of Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Date: Thursday, October 5, 2017 at 11:26 AM
To: "spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>" <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: [SPDK] Buffer I/O error on bigger block size running fio
Hi
I have SPDK NVMeoF and keep getting error with bigger block size with fio on randwrite tests.
I am using Ubuntu 16.04 with kernel version 4.12.0-041200-generic on target and client.
The DPDK is 17.08 and SPDK is 17.07.1.
Thanks
Victor
[46905.233553] perf: interrupt took too long (2503 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
[48285.159186] blk_update_request: I/O error, dev nvme1n1, sector 2507351968
[48285.159207] blk_update_request: I/O error, dev nvme1n1, sector 1301294496
[48285.159226] blk_update_request: I/O error, dev nvme1n1, sector 1947371168
[48285.159239] blk_update_request: I/O error, dev nvme1n1, sector 1891797568
[48285.159252] blk_update_request: I/O error, dev nvme1n1, sector 10833824
[48285.159265] blk_update_request: I/O error, dev nvme1n1, sector 614937152
[48285.159277] blk_update_request: I/O error, dev nvme1n1, sector 1872305088
[48285.159290] blk_update_request: I/O error, dev nvme1n1, sector 1504491040
[48285.159299] blk_update_request: I/O error, dev nvme1n1, sector 1182136128
[48285.159308] blk_update_request: I/O error, dev nvme1n1, sector 1662985792
[48285.191185] nvme nvme1: Reconnecting in 10 seconds...
[48285.191254] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191291] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191305] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191314] ldm_validate_partition_table(): Disk read failed.
[48285.191320] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191327] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191335] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191342] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191347] Dev nvme1n1: unable to read RDB block 0
[48285.191353] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191360] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191375] Buffer I/O error on dev nvme1n1, logical block 3, async page read
[48285.191389] nvme1n1: unable to read partition table
[48285.223197] nvme1n1: detected capacity change from 1600321314816 to 0
[48289.623192] nvme1n1: detected capacity change from 0 to -65647705833078784
[48289.623411] ldm_validate_partition_table(): Disk read failed.
[48289.623447] Dev nvme1n1: unable to read RDB block 0
[48289.623486] nvme1n1: unable to read partition table
[48289.643305] ldm_validate_partition_table(): Disk read failed.
[48289.643328] Dev nvme1n1: unable to read RDB block 0
[48289.643373] nvme1n1: unable to read partition table
[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 91642 bytes --]
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [SPDK] Buffer I/O error on bigger block size running fio
@ 2017-10-19 6:51 Cao, Gang
0 siblings, 0 replies; 24+ messages in thread
From: Cao, Gang @ 2017-10-19 6:51 UTC (permalink / raw)
To: spdk
[-- Attachment #1: Type: text/plain, Size: 22079 bytes --]
[root(a)node4 gangcao]# uname -a
Linux node4 4.10.1 #1 SMP Fri Mar 10 15:59:57 CST 2017 x86_64 x86_64 x86_64 GNU/Linux
[root(a)node4 gangcao]# lsb_release -a
LSB Version: :core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch
Distributor ID: CentOS
Description: CentOS Linux release 7.2.1511 (Core)
Release: 7.2.1511
Codename: Core
From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Thursday, October 19, 2017 2:14 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org>; Harris, James R <james.r.harris(a)intel.com>; Cao, Gang <gang.cao(a)intel.com>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Can you give the OS version and kernel version again for target and client?
I couldn't compile dkdp without installed the latest Kernel on 4.12.X kernel on Ubuntu 16.04.
Thanks
Victor
________________________________
From: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Sent: Wednesday, October 18, 2017 10:59:19 PM
To: Victor Banh; Storage Performance Development Kit; Harris, James R
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
This server does not have OFED installed and it is in the loopback mode.
Found another server also in the loopback mode with ConnectX-3 and have OFED installed.
By the way, what SSD are you using? Maybe it is relating to the SSD? I've just run with 2048k for a short duration and seems no issue. Will run more time to see whether can hit this error.
[root(a)slave3 fio]# lspci | grep -i mell
08:00.0 Ethernet controller: Mellanox Technologies MT27520 Family [ConnectX-3 Pro]
[root(a)slave3 fio]# lsmod | grep -i mlx
mlx4_ib 159744 0
ib_core 208896 15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,ib_srpt,ib_ucm,rdma_ucm,mlx4_ib
mlx4_en 114688 0
mlx4_core 307200 2 mlx4_en,mlx4_ib
ptp 20480 3 ixgbe,igb,mlx4_en
[root(a)slave3 fio]# ofed_info
MLNX_OFED_LINUX-3.2-2.0.0.0 (OFED-3.2-2.0.0):
ar_mgr:
osm_plugins/ar_mgr/ar_mgr-1.0-0.30.ga1ea4b7.tar.gz
cc_mgr:
osm_plugins/cc_mgr/cc_mgr-1.0-0.29.ga1ea4b7.tar.gz
dapl:
dapl.git mlnx_ofed_3_1
commit c30fb6ce2cbc29d8ed4bde51437f7abb93378c78
dump_pr:
osm_plugins/dump_pr//dump_pr-1.0-0.25.ga1ea4b7.tar.gz
fabric-collector:
fabric_collector//fabric-collector-1.1.0.MLNX20140410.51b267e.tar.gz
fca:
mlnx_ofed_fca/fca-2.5.2431-1.src.rpm
hcoll:
mlnx_ofed_hcol/hcoll-3.4.807-1.src.rpm
ibacm:
mlnx_ofed/ibacm.git mlnx_ofed_3_2
commit 15ad8c13bdebbe62edea0b7df030710b65c14f7f
ibacm_ssa:
mlnx_ofed_ssa/acm/ibacm_ssa-0.0.9.3.MLNX20151203.50eb579.tar.gz
ibdump:
sniffer/sniffer-4.0.0-2/ibdump/linux/ibdump-4.0.0-2.tgz
ibsim:
mlnx_ofed_ibsim/ibsim-0.6-0.8.g9d76581.tar.gz
ibssa:
mlnx_ofed_ssa/distrib/ibssa-0.0.9.3.MLNX20151203.50eb579.tar.gz
ibutils:
ofed-1.5.3-rpms/ibutils/ibutils-1.5.7.1-0.12.gdcaeae2.tar.gz
ibutils2:
ibutils2/ibutils2-2.1.1-0.76.MLNX20160222.gd366c7b.tar.gz
infiniband-diags:
mlnx_ofed_infiniband_diags/infiniband-diags-1.6.6.MLNX20151130.7f0213e.tar.gz
infinipath-psm:
mlnx_ofed_infinipath-psm/infinipath-psm-3.3-2_g6f42cdb_open.tar.gz
iser:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d
isert:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d
kernel-mft:
mlnx_ofed_mft/kernel-mft-4.3.0-25.src.rpm
knem:
knem.git mellanox-master
commit f143ee19a575cd42a334422fa8bd329d671238db
libibcm:
mlnx_ofed/libibcm.git mlnx_ofed_3_0
commit d7d485df305e6536711485bd7e477668e77d8320
libibmad:
mlnx_ofed_libibmad/libibmad-1.3.12.MLNX20151122.d140cb1.tar.gz
libibprof:
mlnx_ofed_libibprof/libibprof-1.1.22-1.src.rpm
libibumad:
mlnx_ofed_libibumad/libibumad-1.3.10.2.MLNX20150406.966500d.tar.gz
libibverbs:
mlnx_ofed/libibverbs.git mlnx_ofed_3_2_2
commit 217a77686f4861229f0e4b94485a13f024634caf
libmlx4:
mlnx_ofed/libmlx4.git mlnx_ofed_3_2_2
commit dda6d7ae1e6e3779a485ebdd0a882f4bcbd027a6
libmlx5:
mlnx_ofed/libmlx5.git mlnx_ofed_3_2_2
commit d0c8645359e0f0aba0408b2d344f3b418d27019b
libopensmssa:
mlnx_ofed_ssa/plugin/libopensmssa-0.0.9.3.MLNX20151203.50eb579.tar.gz
librdmacm:
mlnx_ofed/librdmacm.git mlnx_ofed_3_2_2
commit 6bd430fed9e7b3d57a1876c040431ce7295c7703
libsdp:
libsdp.git mlnx_ofed_3_0
commit fbd01dfff05f42d6b82506e7dbf4bc6b7e6a59a4
libvma:
vma/source_rpms//libvma-7.0.14-0.src.rpm
mlnx-ethtool:
upstream/ethtool.git for-upstream
commit ac0cf295abe0c0832f0711fed66ab9601c8b2513
mlnx-ofa_kernel:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d
mpi-selector:
ofed-1.5.3-rpms/mpi-selector/mpi-selector-1.0.3-1.src.rpm
mpitests:
mlnx_ofed_mpitest/mpitests-3.2.17-e1c7f2f.src.rpm
mstflint:
mlnx_ofed_mstflint/mstflint-4.3.0-1.49.g9b9af70.tar.gz
multiperf:
mlnx_ofed_multiperf/multiperf-3.0-0.10.gda89e8c.tar.gz
mvapich2:
mlnx_ofed_mvapich2/mvapich2-2.2a-1.src.rpm
mxm:
mlnx_ofed_mxm/mxm-3.4.3079-1.src.rpm
ofed-docs:
docs.git mlnx_ofed-3.2
commit ea3386416f9f7130edd2c70fc3424cb2cda50f7d
openmpi:
mlnx_ofed_ompi_1.8/openmpi-1.10.3a1-1.src.rpm
opensm:
mlnx_ofed_opensm/opensm-4.6.1.MLNX20160112.774e977.tar.gz
perftest:
mlnx_ofed_perftest/perftest-3.0-0.18.gb464d59.tar.gz
qperf:
mlnx_ofed_qperf/qperf-0.4.9.tar.gz
rds-tools:
rds-tools.git mlnx_ofed_2_4
commit 299420ca25cf9996bc0748e3bc4b08748996ba49
sdpnetstat:
sdpnetstat.git mlnx_ofed_3_0
commit 3cf409a7cc07e5c71f9640eddbb801ece21b4169
sockperf:
sockperf/sockperf-2.7-43.git3ee62bd8107a.src.rpm
srp:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d
srptools:
srptools/srptools-1.0.2-12.src.rpm
Installed Packages:
-------------------
infiniband-diags
librdmacm
libmlx4
libibverbs-utils
mpi-selector
libibmad-devel
sdpnetstat
knem
libibumad-devel
libsdp
mlnx-ethtool
libibverbs-debuginfo
mlnx-ofa_kernel-modules
srp
opensm
mstflint
cc_mgr
libibmad
libibverbs
kernel-mft
libibverbs-devel-static
libibumad
librdmacm-devel
mlnx-ofa_kernel
libsdp-devel
ibutils2
mlnxofed-docs
libibmad-static
iser
opensm-devel
dump_pr
libibumad-static
rds-tools
libmlx4-debuginfo
mlnx-ofa_kernel-devel
opensm-libs
opensm-static
ar_mgr
dapl-devel-static
infiniband-diags-compat
libibverbs-devel
ibsim
[root(a)slave3 fio]# ibstat
CA 'mlx4_0'
CA type: MT4103
Number of ports: 2
Firmware version: 2.35.5100
Hardware version: 0
Node GUID: 0x248a0703006090e0
System image GUID: 0x248a0703006090e0
Port 1:
State: Active
Physical state: LinkUp
Rate: 40
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x04010000
Port GUID: 0x268a07fffe6090e0
Link layer: Ethernet
Port 2:
State: Active
Physical state: LinkUp
Rate: 40
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x04010000
Port GUID: 0x268a07fffe6090e1
Link layer: Ethernet
[root(a)slave3 fio]# ibv_devinfo
hca_id: mlx4_0
transport: InfiniBand (0)
fw_ver: 2.35.5100
node_guid: 248a:0703:0060:90e0
sys_image_guid: 248a:0703:0060:90e0
vendor_id: 0x02c9
vendor_part_id: 4103
hw_ver: 0x0
board_id: MT_1090111023
phys_port_cnt: 2
port: 1
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 1024 (3)
sm_lid: 0
port_lid: 0
port_lmc: 0x00
link_layer: Ethernet
port: 2
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 1024 (3)
sm_lid: 0
port_lid: 0
port_lmc: 0x00
link_layer: Ethernet
From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Thursday, October 19, 2017 12:34 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Do you install Mellanox OFED on the Target and Client server?
Can you run ibstat on both servers?
Thanks
Victor
________________________________
From: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Sent: Wednesday, October 18, 2017 8:59:28 PM
To: Victor Banh; Storage Performance Development Kit; Harris, James R
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
I've just tried the SPDK v17.07.1 and DPDK v17.08.
nvme version: 1.1.38.gfaab
fio version: 3.1
Tried the 512k and 1024k IO size and there is no error. demsg information as following.
So that there may be other difference here? Looks like you are using ConnectX-5 while I am using ConnectX-4?
Other related information:
[root(a)node4 fio]# lspci | grep -i mell
81:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
81:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
[root(a)node4 fio]# lsmod | grep -i mlx
mlx5_ib 172032 0
ib_core 200704 15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,mlx5_ib,ib_srpt,ib_ucm,rdma_ucm
mlx5_core 380928 1 mlx5_ib
ptp 20480 3 ixgbe,igb,mlx5_core
[root(a)node4 fio]# uname -a
Linux node4 4.10.1 #1 SMP Fri Mar 10 15:59:57 CST 2017 x86_64 x86_64 x86_64 GNU/Linux
[577707.543326] nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 192.168.100.8:4420
[577730.854540] detected loopback device
[577730.893761] nvme nvme0: creating 7 I/O queues.
[577730.893797] detected loopback device
[577730.898611] detected loopback device
[577730.908917] detected loopback device
[577730.919073] detected loopback device
[577730.928922] detected loopback device
[577730.938679] detected loopback device
[577730.948365] detected loopback device
[577731.146290] nvme nvme0: new ctrl: NQN "nqn.2016-06.io.spdk:cnode2", addr 192.168.100.8:4420
Thanks,
Gang
From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Thursday, October 19, 2017 9:43 AM
To: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Gang
Any update?
Do you see any error message from "dmesg" with 512k block size running fio?
Thanks
Victor
From: Victor Banh
Sent: Tuesday, October 17, 2017 7:37 PM
To: 'Cao, Gang' <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Gang
spdk-17.07.1 and dpdk-17.08
Thanks
Victor
From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Monday, October 16, 2017 8:51 PM
To: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?
Thanks,
Gang
From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Tuesday, October 17, 2017 5:30 AM
To: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Cao
Do you see any message from dmesg?
I tried this fio version and still saw these error message from dmesg.
fio-3.1
[869053.218235] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218250] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218259] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218263] ldm_validate_partition_table(): Disk read failed.
[869053.218269] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218277] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218285] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218292] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218296] Dev nvme2n1: unable to read RDB block 0
[869053.218303] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218311] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218323] Buffer I/O error on dev nvme2n1, logical block 3, async page read
[869053.218338] nvme2n1: unable to read partition table
[869053.246126] nvme2n1: detected capacity change from -62111005559226368 to -62042256479092736
[869053.246195] ldm_validate_partition_table(): Disk read failed.
[869053.246217] Dev nvme2n1: unable to read RDB block 0
From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Monday, October 09, 2017 10:59 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Cc: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
Thanks for your detailed information on the testing.
I've tried the latest SPDK code and with latest fio-3.1-20-g132b and fio-2.19. It seems like no this kind of error.
Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?
fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme0n1 --name=read-phase --rw=randwrite
read-phase: (g=0): rw=randwrite, bs=(R) 512KiB-512KiB, (W) 512KiB-512KiB, (T) 512KiB-512KiB, ioengine=libaio, iodepth=16
...
fio-3.1-20-g132b
Starting 4 processes
Jobs: 4 (f=4): [w(4)][100.0%][r=0KiB/s,w=1592MiB/s][r=0,w=3183 IOPS][eta 00m:00s]
read-phase: (groupid=0, jobs=1): err= 0: pid=46378: Tue Oct 10 01:23:39 2017
My NIC information:
[root(a)node4 nvme-cli-gerrit]# lsmod | grep -i mlx
mlx5_ib 172032 0
ib_core 200704 15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,mlx5_ib,ib_srpt,ib_ucm,rdma_ucm
mlx5_core 380928 1 mlx5_ib
ptp 20480 3 ixgbe,igb,mlx5_core
[root(a)node4 nvme-cli-gerrit]# lspci | grep -i mell
81:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
81:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Victor Banh
Sent: Friday, October 6, 2017 2:41 PM
To: Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio
From: Harris, James R [mailto:james.r.harris(a)intel.com]
Sent: Friday, October 06, 2017 2:32 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Cc: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio
(cc Victor)
From: James Harris <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Date: Thursday, October 5, 2017 at 1:59 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
Could you provide a few more details? This will help the list to provide some ideas.
1) On the client, are you using the SPDK NVMe-oF initiator or the kernel initiator?
Kernel initiator, run these commands on client server.
modprobe mlx5_ib
modprobe nvme-rdma
nvme discover -t rdma -a 192.168.10.11 -s 4420
nvme connect -t rdma -n nqn.2016-06.io.spdk:nvme-subsystem-1 -a 192.168.10.11 -s 4420
2) Can you provide the fio configuration file or command line? Just so we can have more specifics on "bigger block size".
fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme1n1 --name=read-phase --rw=randwrite
3) Any details on the HW setup - specifically details on the RDMA NIC (or if you're using SW RoCE).
Nvmf.conf on target server
[Global]
Comment "Global section"
ReactorMask 0xff00
[Rpc]
Enable No
Listen 127.0.0.1
[Nvmf]
MaxQueuesPerSession 8
MaxQueueDepth 128
[Subsystem1]
NQN nqn.2016-06.io.spdk:nvme-subsystem-1
Core 9
Mode Direct
Listen RDMA 192.168.10.11:4420
NVMe 0000:82:00.0
SN S2PMNAAH400039
It is RDMA NIC, ConnectX 5, Intel CPU Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
NUMA node0 CPU(s): 0-7
NUMA node1 CPU(s): 8-15
Thanks,
-Jim
From: SPDK <spdk-bounces(a)lists.01.org<mailto:spdk-bounces(a)lists.01.org>> on behalf of Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Date: Thursday, October 5, 2017 at 11:26 AM
To: "spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>" <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: [SPDK] Buffer I/O error on bigger block size running fio
Hi
I have SPDK NVMeoF and keep getting error with bigger block size with fio on randwrite tests.
I am using Ubuntu 16.04 with kernel version 4.12.0-041200-generic on target and client.
The DPDK is 17.08 and SPDK is 17.07.1.
Thanks
Victor
[46905.233553] perf: interrupt took too long (2503 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
[48285.159186] blk_update_request: I/O error, dev nvme1n1, sector 2507351968
[48285.159207] blk_update_request: I/O error, dev nvme1n1, sector 1301294496
[48285.159226] blk_update_request: I/O error, dev nvme1n1, sector 1947371168
[48285.159239] blk_update_request: I/O error, dev nvme1n1, sector 1891797568
[48285.159252] blk_update_request: I/O error, dev nvme1n1, sector 10833824
[48285.159265] blk_update_request: I/O error, dev nvme1n1, sector 614937152
[48285.159277] blk_update_request: I/O error, dev nvme1n1, sector 1872305088
[48285.159290] blk_update_request: I/O error, dev nvme1n1, sector 1504491040
[48285.159299] blk_update_request: I/O error, dev nvme1n1, sector 1182136128
[48285.159308] blk_update_request: I/O error, dev nvme1n1, sector 1662985792
[48285.191185] nvme nvme1: Reconnecting in 10 seconds...
[48285.191254] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191291] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191305] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191314] ldm_validate_partition_table(): Disk read failed.
[48285.191320] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191327] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191335] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191342] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191347] Dev nvme1n1: unable to read RDB block 0
[48285.191353] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191360] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191375] Buffer I/O error on dev nvme1n1, logical block 3, async page read
[48285.191389] nvme1n1: unable to read partition table
[48285.223197] nvme1n1: detected capacity change from 1600321314816 to 0
[48289.623192] nvme1n1: detected capacity change from 0 to -65647705833078784
[48289.623411] ldm_validate_partition_table(): Disk read failed.
[48289.623447] Dev nvme1n1: unable to read RDB block 0
[48289.623486] nvme1n1: unable to read partition table
[48289.643305] ldm_validate_partition_table(): Disk read failed.
[48289.643328] Dev nvme1n1: unable to read RDB block 0
[48289.643373] nvme1n1: unable to read partition table
[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 97980 bytes --]
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [SPDK] Buffer I/O error on bigger block size running fio
@ 2017-10-19 6:13 Victor Banh
0 siblings, 0 replies; 24+ messages in thread
From: Victor Banh @ 2017-10-19 6:13 UTC (permalink / raw)
To: spdk
[-- Attachment #1: Type: text/plain, Size: 21181 bytes --]
Can you give the OS version and kernel version again for target and client?
I couldn't compile dkdp without installed the latest Kernel on 4.12.X kernel on Ubuntu 16.04.
Thanks
Victor
________________________________
From: Cao, Gang <gang.cao(a)intel.com>
Sent: Wednesday, October 18, 2017 10:59:19 PM
To: Victor Banh; Storage Performance Development Kit; Harris, James R
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
This server does not have OFED installed and it is in the loopback mode.
Found another server also in the loopback mode with ConnectX-3 and have OFED installed.
By the way, what SSD are you using? Maybe it is relating to the SSD? I’ve just run with 2048k for a short duration and seems no issue. Will run more time to see whether can hit this error.
[root(a)slave3 fio]# lspci | grep -i mell
08:00.0 Ethernet controller: Mellanox Technologies MT27520 Family [ConnectX-3 Pro]
[root(a)slave3 fio]# lsmod | grep -i mlx
mlx4_ib 159744 0
ib_core 208896 15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,ib_srpt,ib_ucm,rdma_ucm,mlx4_ib
mlx4_en 114688 0
mlx4_core 307200 2 mlx4_en,mlx4_ib
ptp 20480 3 ixgbe,igb,mlx4_en
[root(a)slave3 fio]# ofed_info
MLNX_OFED_LINUX-3.2-2.0.0.0 (OFED-3.2-2.0.0):
ar_mgr:
osm_plugins/ar_mgr/ar_mgr-1.0-0.30.ga1ea4b7.tar.gz
cc_mgr:
osm_plugins/cc_mgr/cc_mgr-1.0-0.29.ga1ea4b7.tar.gz
dapl:
dapl.git mlnx_ofed_3_1
commit c30fb6ce2cbc29d8ed4bde51437f7abb93378c78
dump_pr:
osm_plugins/dump_pr//dump_pr-1.0-0.25.ga1ea4b7.tar.gz
fabric-collector:
fabric_collector//fabric-collector-1.1.0.MLNX20140410.51b267e.tar.gz
fca:
mlnx_ofed_fca/fca-2.5.2431-1.src.rpm
hcoll:
mlnx_ofed_hcol/hcoll-3.4.807-1.src.rpm
ibacm:
mlnx_ofed/ibacm.git mlnx_ofed_3_2
commit 15ad8c13bdebbe62edea0b7df030710b65c14f7f
ibacm_ssa:
mlnx_ofed_ssa/acm/ibacm_ssa-0.0.9.3.MLNX20151203.50eb579.tar.gz
ibdump:
sniffer/sniffer-4.0.0-2/ibdump/linux/ibdump-4.0.0-2.tgz
ibsim:
mlnx_ofed_ibsim/ibsim-0.6-0.8.g9d76581.tar.gz
ibssa:
mlnx_ofed_ssa/distrib/ibssa-0.0.9.3.MLNX20151203.50eb579.tar.gz
ibutils:
ofed-1.5.3-rpms/ibutils/ibutils-1.5.7.1-0.12.gdcaeae2.tar.gz
ibutils2:
ibutils2/ibutils2-2.1.1-0.76.MLNX20160222.gd366c7b.tar.gz
infiniband-diags:
mlnx_ofed_infiniband_diags/infiniband-diags-1.6.6.MLNX20151130.7f0213e.tar.gz
infinipath-psm:
mlnx_ofed_infinipath-psm/infinipath-psm-3.3-2_g6f42cdb_open.tar.gz
iser:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d
isert:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d
kernel-mft:
mlnx_ofed_mft/kernel-mft-4.3.0-25.src.rpm
knem:
knem.git mellanox-master
commit f143ee19a575cd42a334422fa8bd329d671238db
libibcm:
mlnx_ofed/libibcm.git mlnx_ofed_3_0
commit d7d485df305e6536711485bd7e477668e77d8320
libibmad:
mlnx_ofed_libibmad/libibmad-1.3.12.MLNX20151122.d140cb1.tar.gz
libibprof:
mlnx_ofed_libibprof/libibprof-1.1.22-1.src.rpm
libibumad:
mlnx_ofed_libibumad/libibumad-1.3.10.2.MLNX20150406.966500d.tar.gz
libibverbs:
mlnx_ofed/libibverbs.git mlnx_ofed_3_2_2
commit 217a77686f4861229f0e4b94485a13f024634caf
libmlx4:
mlnx_ofed/libmlx4.git mlnx_ofed_3_2_2
commit dda6d7ae1e6e3779a485ebdd0a882f4bcbd027a6
libmlx5:
mlnx_ofed/libmlx5.git mlnx_ofed_3_2_2
commit d0c8645359e0f0aba0408b2d344f3b418d27019b
libopensmssa:
mlnx_ofed_ssa/plugin/libopensmssa-0.0.9.3.MLNX20151203.50eb579.tar.gz
librdmacm:
mlnx_ofed/librdmacm.git mlnx_ofed_3_2_2
commit 6bd430fed9e7b3d57a1876c040431ce7295c7703
libsdp:
libsdp.git mlnx_ofed_3_0
commit fbd01dfff05f42d6b82506e7dbf4bc6b7e6a59a4
libvma:
vma/source_rpms//libvma-7.0.14-0.src.rpm
mlnx-ethtool:
upstream/ethtool.git for-upstream
commit ac0cf295abe0c0832f0711fed66ab9601c8b2513
mlnx-ofa_kernel:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d
mpi-selector:
ofed-1.5.3-rpms/mpi-selector/mpi-selector-1.0.3-1.src.rpm
mpitests:
mlnx_ofed_mpitest/mpitests-3.2.17-e1c7f2f.src.rpm
mstflint:
mlnx_ofed_mstflint/mstflint-4.3.0-1.49.g9b9af70.tar.gz
multiperf:
mlnx_ofed_multiperf/multiperf-3.0-0.10.gda89e8c.tar.gz
mvapich2:
mlnx_ofed_mvapich2/mvapich2-2.2a-1.src.rpm
mxm:
mlnx_ofed_mxm/mxm-3.4.3079-1.src.rpm
ofed-docs:
docs.git mlnx_ofed-3.2
commit ea3386416f9f7130edd2c70fc3424cb2cda50f7d
openmpi:
mlnx_ofed_ompi_1.8/openmpi-1.10.3a1-1.src.rpm
opensm:
mlnx_ofed_opensm/opensm-4.6.1.MLNX20160112.774e977.tar.gz
perftest:
mlnx_ofed_perftest/perftest-3.0-0.18.gb464d59.tar.gz
qperf:
mlnx_ofed_qperf/qperf-0.4.9.tar.gz
rds-tools:
rds-tools.git mlnx_ofed_2_4
commit 299420ca25cf9996bc0748e3bc4b08748996ba49
sdpnetstat:
sdpnetstat.git mlnx_ofed_3_0
commit 3cf409a7cc07e5c71f9640eddbb801ece21b4169
sockperf:
sockperf/sockperf-2.7-43.git3ee62bd8107a.src.rpm
srp:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d
srptools:
srptools/srptools-1.0.2-12.src.rpm
Installed Packages:
-------------------
infiniband-diags
librdmacm
libmlx4
libibverbs-utils
mpi-selector
libibmad-devel
sdpnetstat
knem
libibumad-devel
libsdp
mlnx-ethtool
libibverbs-debuginfo
mlnx-ofa_kernel-modules
srp
opensm
mstflint
cc_mgr
libibmad
libibverbs
kernel-mft
libibverbs-devel-static
libibumad
librdmacm-devel
mlnx-ofa_kernel
libsdp-devel
ibutils2
mlnxofed-docs
libibmad-static
iser
opensm-devel
dump_pr
libibumad-static
rds-tools
libmlx4-debuginfo
mlnx-ofa_kernel-devel
opensm-libs
opensm-static
ar_mgr
dapl-devel-static
infiniband-diags-compat
libibverbs-devel
ibsim
[root(a)slave3 fio]# ibstat
CA 'mlx4_0'
CA type: MT4103
Number of ports: 2
Firmware version: 2.35.5100
Hardware version: 0
Node GUID: 0x248a0703006090e0
System image GUID: 0x248a0703006090e0
Port 1:
State: Active
Physical state: LinkUp
Rate: 40
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x04010000
Port GUID: 0x268a07fffe6090e0
Link layer: Ethernet
Port 2:
State: Active
Physical state: LinkUp
Rate: 40
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x04010000
Port GUID: 0x268a07fffe6090e1
Link layer: Ethernet
[root(a)slave3 fio]# ibv_devinfo
hca_id: mlx4_0
transport: InfiniBand (0)
fw_ver: 2.35.5100
node_guid: 248a:0703:0060:90e0
sys_image_guid: 248a:0703:0060:90e0
vendor_id: 0x02c9
vendor_part_id: 4103
hw_ver: 0x0
board_id: MT_1090111023
phys_port_cnt: 2
port: 1
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 1024 (3)
sm_lid: 0
port_lid: 0
port_lmc: 0x00
link_layer: Ethernet
port: 2
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 1024 (3)
sm_lid: 0
port_lid: 0
port_lmc: 0x00
link_layer: Ethernet
From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Thursday, October 19, 2017 12:34 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org>; Harris, James R <james.r.harris(a)intel.com>; Cao, Gang <gang.cao(a)intel.com>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Do you install Mellanox OFED on the Target and Client server?
Can you run ibstat on both servers?
Thanks
Victor
________________________________
From: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Sent: Wednesday, October 18, 2017 8:59:28 PM
To: Victor Banh; Storage Performance Development Kit; Harris, James R
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
I’ve just tried the SPDK v17.07.1 and DPDK v17.08.
nvme version: 1.1.38.gfaab
fio version: 3.1
Tried the 512k and 1024k IO size and there is no error. demsg information as following.
So that there may be other difference here? Looks like you are using ConnectX-5 while I am using ConnectX-4?
Other related information:
[root(a)node4 fio]# lspci | grep -i mell
81:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
81:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
[root(a)node4 fio]# lsmod | grep -i mlx
mlx5_ib 172032 0
ib_core 200704 15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,mlx5_ib,ib_srpt,ib_ucm,rdma_ucm
mlx5_core 380928 1 mlx5_ib
ptp 20480 3 ixgbe,igb,mlx5_core
[root(a)node4 fio]# uname -a
Linux node4 4.10.1 #1 SMP Fri Mar 10 15:59:57 CST 2017 x86_64 x86_64 x86_64 GNU/Linux
[577707.543326] nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 192.168.100.8:4420
[577730.854540] detected loopback device
[577730.893761] nvme nvme0: creating 7 I/O queues.
[577730.893797] detected loopback device
[577730.898611] detected loopback device
[577730.908917] detected loopback device
[577730.919073] detected loopback device
[577730.928922] detected loopback device
[577730.938679] detected loopback device
[577730.948365] detected loopback device
[577731.146290] nvme nvme0: new ctrl: NQN "nqn.2016-06.io.spdk:cnode2", addr 192.168.100.8:4420
Thanks,
Gang
From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Thursday, October 19, 2017 9:43 AM
To: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Gang
Any update?
Do you see any error message from “dmesg” with 512k block size running fio?
Thanks
Victor
From: Victor Banh
Sent: Tuesday, October 17, 2017 7:37 PM
To: 'Cao, Gang' <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Gang
spdk-17.07.1 and dpdk-17.08
Thanks
Victor
From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Monday, October 16, 2017 8:51 PM
To: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?
Thanks,
Gang
From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Tuesday, October 17, 2017 5:30 AM
To: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Cao
Do you see any message from dmesg?
I tried this fio version and still saw these error message from dmesg.
fio-3.1
[869053.218235] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218250] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218259] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218263] ldm_validate_partition_table(): Disk read failed.
[869053.218269] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218277] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218285] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218292] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218296] Dev nvme2n1: unable to read RDB block 0
[869053.218303] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218311] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218323] Buffer I/O error on dev nvme2n1, logical block 3, async page read
[869053.218338] nvme2n1: unable to read partition table
[869053.246126] nvme2n1: detected capacity change from -62111005559226368 to -62042256479092736
[869053.246195] ldm_validate_partition_table(): Disk read failed.
[869053.246217] Dev nvme2n1: unable to read RDB block 0
From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Monday, October 09, 2017 10:59 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Cc: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
Thanks for your detailed information on the testing.
I’ve tried the latest SPDK code and with latest fio-3.1-20-g132b and fio-2.19. It seems like no this kind of error.
Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?
fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme0n1 --name=read-phase --rw=randwrite
read-phase: (g=0): rw=randwrite, bs=(R) 512KiB-512KiB, (W) 512KiB-512KiB, (T) 512KiB-512KiB, ioengine=libaio, iodepth=16
...
fio-3.1-20-g132b
Starting 4 processes
Jobs: 4 (f=4): [w(4)][100.0%][r=0KiB/s,w=1592MiB/s][r=0,w=3183 IOPS][eta 00m:00s]
read-phase: (groupid=0, jobs=1): err= 0: pid=46378: Tue Oct 10 01:23:39 2017
My NIC information:
[root(a)node4 nvme-cli-gerrit]# lsmod | grep -i mlx
mlx5_ib 172032 0
ib_core 200704 15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,mlx5_ib,ib_srpt,ib_ucm,rdma_ucm
mlx5_core 380928 1 mlx5_ib
ptp 20480 3 ixgbe,igb,mlx5_core
[root(a)node4 nvme-cli-gerrit]# lspci | grep -i mell
81:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
81:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Victor Banh
Sent: Friday, October 6, 2017 2:41 PM
To: Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio
From: Harris, James R [mailto:james.r.harris(a)intel.com]
Sent: Friday, October 06, 2017 2:32 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Cc: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio
(cc Victor)
From: James Harris <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Date: Thursday, October 5, 2017 at 1:59 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
Could you provide a few more details? This will help the list to provide some ideas.
1) On the client, are you using the SPDK NVMe-oF initiator or the kernel initiator?
Kernel initiator, run these commands on client server.
modprobe mlx5_ib
modprobe nvme-rdma
nvme discover -t rdma -a 192.168.10.11 -s 4420
nvme connect -t rdma -n nqn.2016-06.io.spdk:nvme-subsystem-1 -a 192.168.10.11 -s 4420
2) Can you provide the fio configuration file or command line? Just so we can have more specifics on “bigger block size”.
fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme1n1 --name=read-phase --rw=randwrite
3) Any details on the HW setup – specifically details on the RDMA NIC (or if you’re using SW RoCE).
Nvmf.conf on target server
[Global]
Comment "Global section"
ReactorMask 0xff00
[Rpc]
Enable No
Listen 127.0.0.1
[Nvmf]
MaxQueuesPerSession 8
MaxQueueDepth 128
[Subsystem1]
NQN nqn.2016-06.io.spdk:nvme-subsystem-1
Core 9
Mode Direct
Listen RDMA 192.168.10.11:4420
NVMe 0000:82:00.0
SN S2PMNAAH400039
It is RDMA NIC, ConnectX 5, Intel CPU Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
NUMA node0 CPU(s): 0-7
NUMA node1 CPU(s): 8-15
Thanks,
-Jim
From: SPDK <spdk-bounces(a)lists.01.org<mailto:spdk-bounces(a)lists.01.org>> on behalf of Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Date: Thursday, October 5, 2017 at 11:26 AM
To: "spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>" <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: [SPDK] Buffer I/O error on bigger block size running fio
Hi
I have SPDK NVMeoF and keep getting error with bigger block size with fio on randwrite tests.
I am using Ubuntu 16.04 with kernel version 4.12.0-041200-generic on target and client.
The DPDK is 17.08 and SPDK is 17.07.1.
Thanks
Victor
[46905.233553] perf: interrupt took too long (2503 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
[48285.159186] blk_update_request: I/O error, dev nvme1n1, sector 2507351968
[48285.159207] blk_update_request: I/O error, dev nvme1n1, sector 1301294496
[48285.159226] blk_update_request: I/O error, dev nvme1n1, sector 1947371168
[48285.159239] blk_update_request: I/O error, dev nvme1n1, sector 1891797568
[48285.159252] blk_update_request: I/O error, dev nvme1n1, sector 10833824
[48285.159265] blk_update_request: I/O error, dev nvme1n1, sector 614937152
[48285.159277] blk_update_request: I/O error, dev nvme1n1, sector 1872305088
[48285.159290] blk_update_request: I/O error, dev nvme1n1, sector 1504491040
[48285.159299] blk_update_request: I/O error, dev nvme1n1, sector 1182136128
[48285.159308] blk_update_request: I/O error, dev nvme1n1, sector 1662985792
[48285.191185] nvme nvme1: Reconnecting in 10 seconds...
[48285.191254] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191291] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191305] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191314] ldm_validate_partition_table(): Disk read failed.
[48285.191320] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191327] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191335] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191342] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191347] Dev nvme1n1: unable to read RDB block 0
[48285.191353] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191360] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191375] Buffer I/O error on dev nvme1n1, logical block 3, async page read
[48285.191389] nvme1n1: unable to read partition table
[48285.223197] nvme1n1: detected capacity change from 1600321314816 to 0
[48289.623192] nvme1n1: detected capacity change from 0 to -65647705833078784
[48289.623411] ldm_validate_partition_table(): Disk read failed.
[48289.623447] Dev nvme1n1: unable to read RDB block 0
[48289.623486] nvme1n1: unable to read partition table
[48289.643305] ldm_validate_partition_table(): Disk read failed.
[48289.643328] Dev nvme1n1: unable to read RDB block 0
[48289.643373] nvme1n1: unable to read partition table
[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 80871 bytes --]
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [SPDK] Buffer I/O error on bigger block size running fio
@ 2017-10-19 5:59 Cao, Gang
0 siblings, 0 replies; 24+ messages in thread
From: Cao, Gang @ 2017-10-19 5:59 UTC (permalink / raw)
To: spdk
[-- Attachment #1: Type: text/plain, Size: 20710 bytes --]
This server does not have OFED installed and it is in the loopback mode.
Found another server also in the loopback mode with ConnectX-3 and have OFED installed.
By the way, what SSD are you using? Maybe it is relating to the SSD? I've just run with 2048k for a short duration and seems no issue. Will run more time to see whether can hit this error.
[root(a)slave3 fio]# lspci | grep -i mell
08:00.0 Ethernet controller: Mellanox Technologies MT27520 Family [ConnectX-3 Pro]
[root(a)slave3 fio]# lsmod | grep -i mlx
mlx4_ib 159744 0
ib_core 208896 15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,ib_srpt,ib_ucm,rdma_ucm,mlx4_ib
mlx4_en 114688 0
mlx4_core 307200 2 mlx4_en,mlx4_ib
ptp 20480 3 ixgbe,igb,mlx4_en
[root(a)slave3 fio]# ofed_info
MLNX_OFED_LINUX-3.2-2.0.0.0 (OFED-3.2-2.0.0):
ar_mgr:
osm_plugins/ar_mgr/ar_mgr-1.0-0.30.ga1ea4b7.tar.gz
cc_mgr:
osm_plugins/cc_mgr/cc_mgr-1.0-0.29.ga1ea4b7.tar.gz
dapl:
dapl.git mlnx_ofed_3_1
commit c30fb6ce2cbc29d8ed4bde51437f7abb93378c78
dump_pr:
osm_plugins/dump_pr//dump_pr-1.0-0.25.ga1ea4b7.tar.gz
fabric-collector:
fabric_collector//fabric-collector-1.1.0.MLNX20140410.51b267e.tar.gz
fca:
mlnx_ofed_fca/fca-2.5.2431-1.src.rpm
hcoll:
mlnx_ofed_hcol/hcoll-3.4.807-1.src.rpm
ibacm:
mlnx_ofed/ibacm.git mlnx_ofed_3_2
commit 15ad8c13bdebbe62edea0b7df030710b65c14f7f
ibacm_ssa:
mlnx_ofed_ssa/acm/ibacm_ssa-0.0.9.3.MLNX20151203.50eb579.tar.gz
ibdump:
sniffer/sniffer-4.0.0-2/ibdump/linux/ibdump-4.0.0-2.tgz
ibsim:
mlnx_ofed_ibsim/ibsim-0.6-0.8.g9d76581.tar.gz
ibssa:
mlnx_ofed_ssa/distrib/ibssa-0.0.9.3.MLNX20151203.50eb579.tar.gz
ibutils:
ofed-1.5.3-rpms/ibutils/ibutils-1.5.7.1-0.12.gdcaeae2.tar.gz
ibutils2:
ibutils2/ibutils2-2.1.1-0.76.MLNX20160222.gd366c7b.tar.gz
infiniband-diags:
mlnx_ofed_infiniband_diags/infiniband-diags-1.6.6.MLNX20151130.7f0213e.tar.gz
infinipath-psm:
mlnx_ofed_infinipath-psm/infinipath-psm-3.3-2_g6f42cdb_open.tar.gz
iser:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d
isert:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d
kernel-mft:
mlnx_ofed_mft/kernel-mft-4.3.0-25.src.rpm
knem:
knem.git mellanox-master
commit f143ee19a575cd42a334422fa8bd329d671238db
libibcm:
mlnx_ofed/libibcm.git mlnx_ofed_3_0
commit d7d485df305e6536711485bd7e477668e77d8320
libibmad:
mlnx_ofed_libibmad/libibmad-1.3.12.MLNX20151122.d140cb1.tar.gz
libibprof:
mlnx_ofed_libibprof/libibprof-1.1.22-1.src.rpm
libibumad:
mlnx_ofed_libibumad/libibumad-1.3.10.2.MLNX20150406.966500d.tar.gz
libibverbs:
mlnx_ofed/libibverbs.git mlnx_ofed_3_2_2
commit 217a77686f4861229f0e4b94485a13f024634caf
libmlx4:
mlnx_ofed/libmlx4.git mlnx_ofed_3_2_2
commit dda6d7ae1e6e3779a485ebdd0a882f4bcbd027a6
libmlx5:
mlnx_ofed/libmlx5.git mlnx_ofed_3_2_2
commit d0c8645359e0f0aba0408b2d344f3b418d27019b
libopensmssa:
mlnx_ofed_ssa/plugin/libopensmssa-0.0.9.3.MLNX20151203.50eb579.tar.gz
librdmacm:
mlnx_ofed/librdmacm.git mlnx_ofed_3_2_2
commit 6bd430fed9e7b3d57a1876c040431ce7295c7703
libsdp:
libsdp.git mlnx_ofed_3_0
commit fbd01dfff05f42d6b82506e7dbf4bc6b7e6a59a4
libvma:
vma/source_rpms//libvma-7.0.14-0.src.rpm
mlnx-ethtool:
upstream/ethtool.git for-upstream
commit ac0cf295abe0c0832f0711fed66ab9601c8b2513
mlnx-ofa_kernel:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d
mpi-selector:
ofed-1.5.3-rpms/mpi-selector/mpi-selector-1.0.3-1.src.rpm
mpitests:
mlnx_ofed_mpitest/mpitests-3.2.17-e1c7f2f.src.rpm
mstflint:
mlnx_ofed_mstflint/mstflint-4.3.0-1.49.g9b9af70.tar.gz
multiperf:
mlnx_ofed_multiperf/multiperf-3.0-0.10.gda89e8c.tar.gz
mvapich2:
mlnx_ofed_mvapich2/mvapich2-2.2a-1.src.rpm
mxm:
mlnx_ofed_mxm/mxm-3.4.3079-1.src.rpm
ofed-docs:
docs.git mlnx_ofed-3.2
commit ea3386416f9f7130edd2c70fc3424cb2cda50f7d
openmpi:
mlnx_ofed_ompi_1.8/openmpi-1.10.3a1-1.src.rpm
opensm:
mlnx_ofed_opensm/opensm-4.6.1.MLNX20160112.774e977.tar.gz
perftest:
mlnx_ofed_perftest/perftest-3.0-0.18.gb464d59.tar.gz
qperf:
mlnx_ofed_qperf/qperf-0.4.9.tar.gz
rds-tools:
rds-tools.git mlnx_ofed_2_4
commit 299420ca25cf9996bc0748e3bc4b08748996ba49
sdpnetstat:
sdpnetstat.git mlnx_ofed_3_0
commit 3cf409a7cc07e5c71f9640eddbb801ece21b4169
sockperf:
sockperf/sockperf-2.7-43.git3ee62bd8107a.src.rpm
srp:
mlnx_ofed/mlnx_rdma.git mlnx_ofed_3_2_2
commit 378ff029c77bac76cd02a8b89c6f3109bbb11c3d
srptools:
srptools/srptools-1.0.2-12.src.rpm
Installed Packages:
-------------------
infiniband-diags
librdmacm
libmlx4
libibverbs-utils
mpi-selector
libibmad-devel
sdpnetstat
knem
libibumad-devel
libsdp
mlnx-ethtool
libibverbs-debuginfo
mlnx-ofa_kernel-modules
srp
opensm
mstflint
cc_mgr
libibmad
libibverbs
kernel-mft
libibverbs-devel-static
libibumad
librdmacm-devel
mlnx-ofa_kernel
libsdp-devel
ibutils2
mlnxofed-docs
libibmad-static
iser
opensm-devel
dump_pr
libibumad-static
rds-tools
libmlx4-debuginfo
mlnx-ofa_kernel-devel
opensm-libs
opensm-static
ar_mgr
dapl-devel-static
infiniband-diags-compat
libibverbs-devel
ibsim
[root(a)slave3 fio]# ibstat
CA 'mlx4_0'
CA type: MT4103
Number of ports: 2
Firmware version: 2.35.5100
Hardware version: 0
Node GUID: 0x248a0703006090e0
System image GUID: 0x248a0703006090e0
Port 1:
State: Active
Physical state: LinkUp
Rate: 40
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x04010000
Port GUID: 0x268a07fffe6090e0
Link layer: Ethernet
Port 2:
State: Active
Physical state: LinkUp
Rate: 40
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x04010000
Port GUID: 0x268a07fffe6090e1
Link layer: Ethernet
[root(a)slave3 fio]# ibv_devinfo
hca_id: mlx4_0
transport: InfiniBand (0)
fw_ver: 2.35.5100
node_guid: 248a:0703:0060:90e0
sys_image_guid: 248a:0703:0060:90e0
vendor_id: 0x02c9
vendor_part_id: 4103
hw_ver: 0x0
board_id: MT_1090111023
phys_port_cnt: 2
port: 1
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 1024 (3)
sm_lid: 0
port_lid: 0
port_lmc: 0x00
link_layer: Ethernet
port: 2
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 1024 (3)
sm_lid: 0
port_lid: 0
port_lmc: 0x00
link_layer: Ethernet
From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Thursday, October 19, 2017 12:34 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org>; Harris, James R <james.r.harris(a)intel.com>; Cao, Gang <gang.cao(a)intel.com>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Do you install Mellanox OFED on the Target and Client server?
Can you run ibstat on both servers?
Thanks
Victor
________________________________
From: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>
Sent: Wednesday, October 18, 2017 8:59:28 PM
To: Victor Banh; Storage Performance Development Kit; Harris, James R
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
I've just tried the SPDK v17.07.1 and DPDK v17.08.
nvme version: 1.1.38.gfaab
fio version: 3.1
Tried the 512k and 1024k IO size and there is no error. demsg information as following.
So that there may be other difference here? Looks like you are using ConnectX-5 while I am using ConnectX-4?
Other related information:
[root(a)node4 fio]# lspci | grep -i mell
81:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
81:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
[root(a)node4 fio]# lsmod | grep -i mlx
mlx5_ib 172032 0
ib_core 200704 15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,mlx5_ib,ib_srpt,ib_ucm,rdma_ucm
mlx5_core 380928 1 mlx5_ib
ptp 20480 3 ixgbe,igb,mlx5_core
[root(a)node4 fio]# uname -a
Linux node4 4.10.1 #1 SMP Fri Mar 10 15:59:57 CST 2017 x86_64 x86_64 x86_64 GNU/Linux
[577707.543326] nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 192.168.100.8:4420
[577730.854540] detected loopback device
[577730.893761] nvme nvme0: creating 7 I/O queues.
[577730.893797] detected loopback device
[577730.898611] detected loopback device
[577730.908917] detected loopback device
[577730.919073] detected loopback device
[577730.928922] detected loopback device
[577730.938679] detected loopback device
[577730.948365] detected loopback device
[577731.146290] nvme nvme0: new ctrl: NQN "nqn.2016-06.io.spdk:cnode2", addr 192.168.100.8:4420
Thanks,
Gang
From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Thursday, October 19, 2017 9:43 AM
To: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Gang
Any update?
Do you see any error message from "dmesg" with 512k block size running fio?
Thanks
Victor
From: Victor Banh
Sent: Tuesday, October 17, 2017 7:37 PM
To: 'Cao, Gang' <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Gang
spdk-17.07.1 and dpdk-17.08
Thanks
Victor
From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Monday, October 16, 2017 8:51 PM
To: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?
Thanks,
Gang
From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Tuesday, October 17, 2017 5:30 AM
To: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Cao
Do you see any message from dmesg?
I tried this fio version and still saw these error message from dmesg.
fio-3.1
[869053.218235] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218250] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218259] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218263] ldm_validate_partition_table(): Disk read failed.
[869053.218269] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218277] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218285] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218292] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218296] Dev nvme2n1: unable to read RDB block 0
[869053.218303] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218311] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218323] Buffer I/O error on dev nvme2n1, logical block 3, async page read
[869053.218338] nvme2n1: unable to read partition table
[869053.246126] nvme2n1: detected capacity change from -62111005559226368 to -62042256479092736
[869053.246195] ldm_validate_partition_table(): Disk read failed.
[869053.246217] Dev nvme2n1: unable to read RDB block 0
From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Monday, October 09, 2017 10:59 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Cc: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
Thanks for your detailed information on the testing.
I've tried the latest SPDK code and with latest fio-3.1-20-g132b and fio-2.19. It seems like no this kind of error.
Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?
fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme0n1 --name=read-phase --rw=randwrite
read-phase: (g=0): rw=randwrite, bs=(R) 512KiB-512KiB, (W) 512KiB-512KiB, (T) 512KiB-512KiB, ioengine=libaio, iodepth=16
...
fio-3.1-20-g132b
Starting 4 processes
Jobs: 4 (f=4): [w(4)][100.0%][r=0KiB/s,w=1592MiB/s][r=0,w=3183 IOPS][eta 00m:00s]
read-phase: (groupid=0, jobs=1): err= 0: pid=46378: Tue Oct 10 01:23:39 2017
My NIC information:
[root(a)node4 nvme-cli-gerrit]# lsmod | grep -i mlx
mlx5_ib 172032 0
ib_core 200704 15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,mlx5_ib,ib_srpt,ib_ucm,rdma_ucm
mlx5_core 380928 1 mlx5_ib
ptp 20480 3 ixgbe,igb,mlx5_core
[root(a)node4 nvme-cli-gerrit]# lspci | grep -i mell
81:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
81:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Victor Banh
Sent: Friday, October 6, 2017 2:41 PM
To: Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio
From: Harris, James R [mailto:james.r.harris(a)intel.com]
Sent: Friday, October 06, 2017 2:32 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Cc: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio
(cc Victor)
From: James Harris <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Date: Thursday, October 5, 2017 at 1:59 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
Could you provide a few more details? This will help the list to provide some ideas.
1) On the client, are you using the SPDK NVMe-oF initiator or the kernel initiator?
Kernel initiator, run these commands on client server.
modprobe mlx5_ib
modprobe nvme-rdma
nvme discover -t rdma -a 192.168.10.11 -s 4420
nvme connect -t rdma -n nqn.2016-06.io.spdk:nvme-subsystem-1 -a 192.168.10.11 -s 4420
2) Can you provide the fio configuration file or command line? Just so we can have more specifics on "bigger block size".
fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme1n1 --name=read-phase --rw=randwrite
3) Any details on the HW setup - specifically details on the RDMA NIC (or if you're using SW RoCE).
Nvmf.conf on target server
[Global]
Comment "Global section"
ReactorMask 0xff00
[Rpc]
Enable No
Listen 127.0.0.1
[Nvmf]
MaxQueuesPerSession 8
MaxQueueDepth 128
[Subsystem1]
NQN nqn.2016-06.io.spdk:nvme-subsystem-1
Core 9
Mode Direct
Listen RDMA 192.168.10.11:4420
NVMe 0000:82:00.0
SN S2PMNAAH400039
It is RDMA NIC, ConnectX 5, Intel CPU Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
NUMA node0 CPU(s): 0-7
NUMA node1 CPU(s): 8-15
Thanks,
-Jim
From: SPDK <spdk-bounces(a)lists.01.org<mailto:spdk-bounces(a)lists.01.org>> on behalf of Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Date: Thursday, October 5, 2017 at 11:26 AM
To: "spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>" <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: [SPDK] Buffer I/O error on bigger block size running fio
Hi
I have SPDK NVMeoF and keep getting error with bigger block size with fio on randwrite tests.
I am using Ubuntu 16.04 with kernel version 4.12.0-041200-generic on target and client.
The DPDK is 17.08 and SPDK is 17.07.1.
Thanks
Victor
[46905.233553] perf: interrupt took too long (2503 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
[48285.159186] blk_update_request: I/O error, dev nvme1n1, sector 2507351968
[48285.159207] blk_update_request: I/O error, dev nvme1n1, sector 1301294496
[48285.159226] blk_update_request: I/O error, dev nvme1n1, sector 1947371168
[48285.159239] blk_update_request: I/O error, dev nvme1n1, sector 1891797568
[48285.159252] blk_update_request: I/O error, dev nvme1n1, sector 10833824
[48285.159265] blk_update_request: I/O error, dev nvme1n1, sector 614937152
[48285.159277] blk_update_request: I/O error, dev nvme1n1, sector 1872305088
[48285.159290] blk_update_request: I/O error, dev nvme1n1, sector 1504491040
[48285.159299] blk_update_request: I/O error, dev nvme1n1, sector 1182136128
[48285.159308] blk_update_request: I/O error, dev nvme1n1, sector 1662985792
[48285.191185] nvme nvme1: Reconnecting in 10 seconds...
[48285.191254] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191291] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191305] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191314] ldm_validate_partition_table(): Disk read failed.
[48285.191320] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191327] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191335] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191342] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191347] Dev nvme1n1: unable to read RDB block 0
[48285.191353] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191360] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191375] Buffer I/O error on dev nvme1n1, logical block 3, async page read
[48285.191389] nvme1n1: unable to read partition table
[48285.223197] nvme1n1: detected capacity change from 1600321314816 to 0
[48289.623192] nvme1n1: detected capacity change from 0 to -65647705833078784
[48289.623411] ldm_validate_partition_table(): Disk read failed.
[48289.623447] Dev nvme1n1: unable to read RDB block 0
[48289.623486] nvme1n1: unable to read partition table
[48289.643305] ldm_validate_partition_table(): Disk read failed.
[48289.643328] Dev nvme1n1: unable to read RDB block 0
[48289.643373] nvme1n1: unable to read partition table
[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 86438 bytes --]
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [SPDK] Buffer I/O error on bigger block size running fio
@ 2017-10-19 4:34 Victor Banh
0 siblings, 0 replies; 24+ messages in thread
From: Victor Banh @ 2017-10-19 4:34 UTC (permalink / raw)
To: spdk
[-- Attachment #1: Type: text/plain, Size: 12711 bytes --]
Do you install Mellanox OFED on the Target and Client server?
Can you run ibstat on both servers?
Thanks
Victor
________________________________
From: Cao, Gang <gang.cao(a)intel.com>
Sent: Wednesday, October 18, 2017 8:59:28 PM
To: Victor Banh; Storage Performance Development Kit; Harris, James R
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
I’ve just tried the SPDK v17.07.1 and DPDK v17.08.
nvme version: 1.1.38.gfaab
fio version: 3.1
Tried the 512k and 1024k IO size and there is no error. demsg information as following.
So that there may be other difference here? Looks like you are using ConnectX-5 while I am using ConnectX-4?
Other related information:
[root(a)node4 fio]# lspci | grep -i mell
81:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
81:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
[root(a)node4 fio]# lsmod | grep -i mlx
mlx5_ib 172032 0
ib_core 200704 15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,mlx5_ib,ib_srpt,ib_ucm,rdma_ucm
mlx5_core 380928 1 mlx5_ib
ptp 20480 3 ixgbe,igb,mlx5_core
[root(a)node4 fio]# uname -a
Linux node4 4.10.1 #1 SMP Fri Mar 10 15:59:57 CST 2017 x86_64 x86_64 x86_64 GNU/Linux
[577707.543326] nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 192.168.100.8:4420
[577730.854540] detected loopback device
[577730.893761] nvme nvme0: creating 7 I/O queues.
[577730.893797] detected loopback device
[577730.898611] detected loopback device
[577730.908917] detected loopback device
[577730.919073] detected loopback device
[577730.928922] detected loopback device
[577730.938679] detected loopback device
[577730.948365] detected loopback device
[577731.146290] nvme nvme0: new ctrl: NQN "nqn.2016-06.io.spdk:cnode2", addr 192.168.100.8:4420
Thanks,
Gang
From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Thursday, October 19, 2017 9:43 AM
To: Cao, Gang <gang.cao(a)intel.com>; Storage Performance Development Kit <spdk(a)lists.01.org>; Harris, James R <james.r.harris(a)intel.com>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Gang
Any update?
Do you see any error message from “dmesg” with 512k block size running fio?
Thanks
Victor
From: Victor Banh
Sent: Tuesday, October 17, 2017 7:37 PM
To: 'Cao, Gang' <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Gang
spdk-17.07.1 and dpdk-17.08
Thanks
Victor
From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Monday, October 16, 2017 8:51 PM
To: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?
Thanks,
Gang
From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Tuesday, October 17, 2017 5:30 AM
To: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Cao
Do you see any message from dmesg?
I tried this fio version and still saw these error message from dmesg.
fio-3.1
[869053.218235] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218250] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218259] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218263] ldm_validate_partition_table(): Disk read failed.
[869053.218269] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218277] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218285] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218292] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218296] Dev nvme2n1: unable to read RDB block 0
[869053.218303] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218311] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218323] Buffer I/O error on dev nvme2n1, logical block 3, async page read
[869053.218338] nvme2n1: unable to read partition table
[869053.246126] nvme2n1: detected capacity change from -62111005559226368 to -62042256479092736
[869053.246195] ldm_validate_partition_table(): Disk read failed.
[869053.246217] Dev nvme2n1: unable to read RDB block 0
From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Monday, October 09, 2017 10:59 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Cc: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
Thanks for your detailed information on the testing.
I’ve tried the latest SPDK code and with latest fio-3.1-20-g132b and fio-2.19. It seems like no this kind of error.
Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?
fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme0n1 --name=read-phase --rw=randwrite
read-phase: (g=0): rw=randwrite, bs=(R) 512KiB-512KiB, (W) 512KiB-512KiB, (T) 512KiB-512KiB, ioengine=libaio, iodepth=16
...
fio-3.1-20-g132b
Starting 4 processes
Jobs: 4 (f=4): [w(4)][100.0%][r=0KiB/s,w=1592MiB/s][r=0,w=3183 IOPS][eta 00m:00s]
read-phase: (groupid=0, jobs=1): err= 0: pid=46378: Tue Oct 10 01:23:39 2017
My NIC information:
[root(a)node4 nvme-cli-gerrit]# lsmod | grep -i mlx
mlx5_ib 172032 0
ib_core 200704 15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,mlx5_ib,ib_srpt,ib_ucm,rdma_ucm
mlx5_core 380928 1 mlx5_ib
ptp 20480 3 ixgbe,igb,mlx5_core
[root(a)node4 nvme-cli-gerrit]# lspci | grep -i mell
81:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
81:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Victor Banh
Sent: Friday, October 6, 2017 2:41 PM
To: Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio
From: Harris, James R [mailto:james.r.harris(a)intel.com]
Sent: Friday, October 06, 2017 2:32 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Cc: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio
(cc Victor)
From: James Harris <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Date: Thursday, October 5, 2017 at 1:59 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
Could you provide a few more details? This will help the list to provide some ideas.
1) On the client, are you using the SPDK NVMe-oF initiator or the kernel initiator?
Kernel initiator, run these commands on client server.
modprobe mlx5_ib
modprobe nvme-rdma
nvme discover -t rdma -a 192.168.10.11 -s 4420
nvme connect -t rdma -n nqn.2016-06.io.spdk:nvme-subsystem-1 -a 192.168.10.11 -s 4420
2) Can you provide the fio configuration file or command line? Just so we can have more specifics on “bigger block size”.
fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme1n1 --name=read-phase --rw=randwrite
3) Any details on the HW setup – specifically details on the RDMA NIC (or if you’re using SW RoCE).
Nvmf.conf on target server
[Global]
Comment "Global section"
ReactorMask 0xff00
[Rpc]
Enable No
Listen 127.0.0.1
[Nvmf]
MaxQueuesPerSession 8
MaxQueueDepth 128
[Subsystem1]
NQN nqn.2016-06.io.spdk:nvme-subsystem-1
Core 9
Mode Direct
Listen RDMA 192.168.10.11:4420
NVMe 0000:82:00.0
SN S2PMNAAH400039
It is RDMA NIC, ConnectX 5, Intel CPU Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
NUMA node0 CPU(s): 0-7
NUMA node1 CPU(s): 8-15
Thanks,
-Jim
From: SPDK <spdk-bounces(a)lists.01.org<mailto:spdk-bounces(a)lists.01.org>> on behalf of Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Date: Thursday, October 5, 2017 at 11:26 AM
To: "spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>" <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: [SPDK] Buffer I/O error on bigger block size running fio
Hi
I have SPDK NVMeoF and keep getting error with bigger block size with fio on randwrite tests.
I am using Ubuntu 16.04 with kernel version 4.12.0-041200-generic on target and client.
The DPDK is 17.08 and SPDK is 17.07.1.
Thanks
Victor
[46905.233553] perf: interrupt took too long (2503 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
[48285.159186] blk_update_request: I/O error, dev nvme1n1, sector 2507351968
[48285.159207] blk_update_request: I/O error, dev nvme1n1, sector 1301294496
[48285.159226] blk_update_request: I/O error, dev nvme1n1, sector 1947371168
[48285.159239] blk_update_request: I/O error, dev nvme1n1, sector 1891797568
[48285.159252] blk_update_request: I/O error, dev nvme1n1, sector 10833824
[48285.159265] blk_update_request: I/O error, dev nvme1n1, sector 614937152
[48285.159277] blk_update_request: I/O error, dev nvme1n1, sector 1872305088
[48285.159290] blk_update_request: I/O error, dev nvme1n1, sector 1504491040
[48285.159299] blk_update_request: I/O error, dev nvme1n1, sector 1182136128
[48285.159308] blk_update_request: I/O error, dev nvme1n1, sector 1662985792
[48285.191185] nvme nvme1: Reconnecting in 10 seconds...
[48285.191254] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191291] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191305] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191314] ldm_validate_partition_table(): Disk read failed.
[48285.191320] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191327] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191335] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191342] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191347] Dev nvme1n1: unable to read RDB block 0
[48285.191353] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191360] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191375] Buffer I/O error on dev nvme1n1, logical block 3, async page read
[48285.191389] nvme1n1: unable to read partition table
[48285.223197] nvme1n1: detected capacity change from 1600321314816 to 0
[48289.623192] nvme1n1: detected capacity change from 0 to -65647705833078784
[48289.623411] ldm_validate_partition_table(): Disk read failed.
[48289.623447] Dev nvme1n1: unable to read RDB block 0
[48289.623486] nvme1n1: unable to read partition table
[48289.643305] ldm_validate_partition_table(): Disk read failed.
[48289.643328] Dev nvme1n1: unable to read RDB block 0
[48289.643373] nvme1n1: unable to read partition table
[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 37548 bytes --]
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [SPDK] Buffer I/O error on bigger block size running fio
@ 2017-10-19 3:59 Cao, Gang
0 siblings, 0 replies; 24+ messages in thread
From: Cao, Gang @ 2017-10-19 3:59 UTC (permalink / raw)
To: spdk
[-- Attachment #1: Type: text/plain, Size: 12329 bytes --]
Hi Victor,
I’ve just tried the SPDK v17.07.1 and DPDK v17.08.
nvme version: 1.1.38.gfaab
fio version: 3.1
Tried the 512k and 1024k IO size and there is no error. demsg information as following.
So that there may be other difference here? Looks like you are using ConnectX-5 while I am using ConnectX-4?
Other related information:
[root(a)node4 fio]# lspci | grep -i mell
81:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
81:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
[root(a)node4 fio]# lsmod | grep -i mlx
mlx5_ib 172032 0
ib_core 200704 15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,mlx5_ib,ib_srpt,ib_ucm,rdma_ucm
mlx5_core 380928 1 mlx5_ib
ptp 20480 3 ixgbe,igb,mlx5_core
[root(a)node4 fio]# uname -a
Linux node4 4.10.1 #1 SMP Fri Mar 10 15:59:57 CST 2017 x86_64 x86_64 x86_64 GNU/Linux
[577707.543326] nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 192.168.100.8:4420
[577730.854540] detected loopback device
[577730.893761] nvme nvme0: creating 7 I/O queues.
[577730.893797] detected loopback device
[577730.898611] detected loopback device
[577730.908917] detected loopback device
[577730.919073] detected loopback device
[577730.928922] detected loopback device
[577730.938679] detected loopback device
[577730.948365] detected loopback device
[577731.146290] nvme nvme0: new ctrl: NQN "nqn.2016-06.io.spdk:cnode2", addr 192.168.100.8:4420
Thanks,
Gang
From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Thursday, October 19, 2017 9:43 AM
To: Cao, Gang <gang.cao(a)intel.com>; Storage Performance Development Kit <spdk(a)lists.01.org>; Harris, James R <james.r.harris(a)intel.com>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Gang
Any update?
Do you see any error message from “dmesg” with 512k block size running fio?
Thanks
Victor
From: Victor Banh
Sent: Tuesday, October 17, 2017 7:37 PM
To: 'Cao, Gang' <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Gang
spdk-17.07.1 and dpdk-17.08
Thanks
Victor
From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Monday, October 16, 2017 8:51 PM
To: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?
Thanks,
Gang
From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Tuesday, October 17, 2017 5:30 AM
To: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Cao
Do you see any message from dmesg?
I tried this fio version and still saw these error message from dmesg.
fio-3.1
[869053.218235] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218250] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218259] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218263] ldm_validate_partition_table(): Disk read failed.
[869053.218269] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218277] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218285] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218292] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218296] Dev nvme2n1: unable to read RDB block 0
[869053.218303] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218311] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218323] Buffer I/O error on dev nvme2n1, logical block 3, async page read
[869053.218338] nvme2n1: unable to read partition table
[869053.246126] nvme2n1: detected capacity change from -62111005559226368 to -62042256479092736
[869053.246195] ldm_validate_partition_table(): Disk read failed.
[869053.246217] Dev nvme2n1: unable to read RDB block 0
From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Monday, October 09, 2017 10:59 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Cc: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
Thanks for your detailed information on the testing.
I’ve tried the latest SPDK code and with latest fio-3.1-20-g132b and fio-2.19. It seems like no this kind of error.
Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?
fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme0n1 --name=read-phase --rw=randwrite
read-phase: (g=0): rw=randwrite, bs=(R) 512KiB-512KiB, (W) 512KiB-512KiB, (T) 512KiB-512KiB, ioengine=libaio, iodepth=16
...
fio-3.1-20-g132b
Starting 4 processes
Jobs: 4 (f=4): [w(4)][100.0%][r=0KiB/s,w=1592MiB/s][r=0,w=3183 IOPS][eta 00m:00s]
read-phase: (groupid=0, jobs=1): err= 0: pid=46378: Tue Oct 10 01:23:39 2017
My NIC information:
[root(a)node4 nvme-cli-gerrit]# lsmod | grep -i mlx
mlx5_ib 172032 0
ib_core 200704 15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,mlx5_ib,ib_srpt,ib_ucm,rdma_ucm
mlx5_core 380928 1 mlx5_ib
ptp 20480 3 ixgbe,igb,mlx5_core
[root(a)node4 nvme-cli-gerrit]# lspci | grep -i mell
81:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
81:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Victor Banh
Sent: Friday, October 6, 2017 2:41 PM
To: Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio
From: Harris, James R [mailto:james.r.harris(a)intel.com]
Sent: Friday, October 06, 2017 2:32 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Cc: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio
(cc Victor)
From: James Harris <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Date: Thursday, October 5, 2017 at 1:59 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
Could you provide a few more details? This will help the list to provide some ideas.
1) On the client, are you using the SPDK NVMe-oF initiator or the kernel initiator?
Kernel initiator, run these commands on client server.
modprobe mlx5_ib
modprobe nvme-rdma
nvme discover -t rdma -a 192.168.10.11 -s 4420
nvme connect -t rdma -n nqn.2016-06.io.spdk:nvme-subsystem-1 -a 192.168.10.11 -s 4420
2) Can you provide the fio configuration file or command line? Just so we can have more specifics on “bigger block size”.
fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme1n1 --name=read-phase --rw=randwrite
3) Any details on the HW setup – specifically details on the RDMA NIC (or if you’re using SW RoCE).
Nvmf.conf on target server
[Global]
Comment "Global section"
ReactorMask 0xff00
[Rpc]
Enable No
Listen 127.0.0.1
[Nvmf]
MaxQueuesPerSession 8
MaxQueueDepth 128
[Subsystem1]
NQN nqn.2016-06.io.spdk:nvme-subsystem-1
Core 9
Mode Direct
Listen RDMA 192.168.10.11:4420
NVMe 0000:82:00.0
SN S2PMNAAH400039
It is RDMA NIC, ConnectX 5, Intel CPU Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
NUMA node0 CPU(s): 0-7
NUMA node1 CPU(s): 8-15
Thanks,
-Jim
From: SPDK <spdk-bounces(a)lists.01.org<mailto:spdk-bounces(a)lists.01.org>> on behalf of Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Date: Thursday, October 5, 2017 at 11:26 AM
To: "spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>" <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: [SPDK] Buffer I/O error on bigger block size running fio
Hi
I have SPDK NVMeoF and keep getting error with bigger block size with fio on randwrite tests.
I am using Ubuntu 16.04 with kernel version 4.12.0-041200-generic on target and client.
The DPDK is 17.08 and SPDK is 17.07.1.
Thanks
Victor
[46905.233553] perf: interrupt took too long (2503 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
[48285.159186] blk_update_request: I/O error, dev nvme1n1, sector 2507351968
[48285.159207] blk_update_request: I/O error, dev nvme1n1, sector 1301294496
[48285.159226] blk_update_request: I/O error, dev nvme1n1, sector 1947371168
[48285.159239] blk_update_request: I/O error, dev nvme1n1, sector 1891797568
[48285.159252] blk_update_request: I/O error, dev nvme1n1, sector 10833824
[48285.159265] blk_update_request: I/O error, dev nvme1n1, sector 614937152
[48285.159277] blk_update_request: I/O error, dev nvme1n1, sector 1872305088
[48285.159290] blk_update_request: I/O error, dev nvme1n1, sector 1504491040
[48285.159299] blk_update_request: I/O error, dev nvme1n1, sector 1182136128
[48285.159308] blk_update_request: I/O error, dev nvme1n1, sector 1662985792
[48285.191185] nvme nvme1: Reconnecting in 10 seconds...
[48285.191254] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191291] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191305] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191314] ldm_validate_partition_table(): Disk read failed.
[48285.191320] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191327] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191335] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191342] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191347] Dev nvme1n1: unable to read RDB block 0
[48285.191353] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191360] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191375] Buffer I/O error on dev nvme1n1, logical block 3, async page read
[48285.191389] nvme1n1: unable to read partition table
[48285.223197] nvme1n1: detected capacity change from 1600321314816 to 0
[48289.623192] nvme1n1: detected capacity change from 0 to -65647705833078784
[48289.623411] ldm_validate_partition_table(): Disk read failed.
[48289.623447] Dev nvme1n1: unable to read RDB block 0
[48289.623486] nvme1n1: unable to read partition table
[48289.643305] ldm_validate_partition_table(): Disk read failed.
[48289.643328] Dev nvme1n1: unable to read RDB block 0
[48289.643373] nvme1n1: unable to read partition table
[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 42044 bytes --]
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [SPDK] Buffer I/O error on bigger block size running fio
@ 2017-10-19 3:05 Cao, Gang
0 siblings, 0 replies; 24+ messages in thread
From: Cao, Gang @ 2017-10-19 3:05 UTC (permalink / raw)
To: spdk
[-- Attachment #1: Type: text/plain, Size: 10788 bytes --]
Let me have a try on your version and check the dmesg.
Thanks,
Gang
From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Thursday, October 19, 2017 9:43 AM
To: Cao, Gang <gang.cao(a)intel.com>; Storage Performance Development Kit <spdk(a)lists.01.org>; Harris, James R <james.r.harris(a)intel.com>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Gang
Any update?
Do you see any error message from “dmesg” with 512k block size running fio?
Thanks
Victor
From: Victor Banh
Sent: Tuesday, October 17, 2017 7:37 PM
To: 'Cao, Gang' <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Gang
spdk-17.07.1 and dpdk-17.08
Thanks
Victor
From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Monday, October 16, 2017 8:51 PM
To: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?
Thanks,
Gang
From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Tuesday, October 17, 2017 5:30 AM
To: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Cao
Do you see any message from dmesg?
I tried this fio version and still saw these error message from dmesg.
fio-3.1
[869053.218235] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218250] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218259] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218263] ldm_validate_partition_table(): Disk read failed.
[869053.218269] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218277] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218285] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218292] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218296] Dev nvme2n1: unable to read RDB block 0
[869053.218303] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218311] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218323] Buffer I/O error on dev nvme2n1, logical block 3, async page read
[869053.218338] nvme2n1: unable to read partition table
[869053.246126] nvme2n1: detected capacity change from -62111005559226368 to -62042256479092736
[869053.246195] ldm_validate_partition_table(): Disk read failed.
[869053.246217] Dev nvme2n1: unable to read RDB block 0
From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Monday, October 09, 2017 10:59 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Cc: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
Thanks for your detailed information on the testing.
I’ve tried the latest SPDK code and with latest fio-3.1-20-g132b and fio-2.19. It seems like no this kind of error.
Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?
fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme0n1 --name=read-phase --rw=randwrite
read-phase: (g=0): rw=randwrite, bs=(R) 512KiB-512KiB, (W) 512KiB-512KiB, (T) 512KiB-512KiB, ioengine=libaio, iodepth=16
...
fio-3.1-20-g132b
Starting 4 processes
Jobs: 4 (f=4): [w(4)][100.0%][r=0KiB/s,w=1592MiB/s][r=0,w=3183 IOPS][eta 00m:00s]
read-phase: (groupid=0, jobs=1): err= 0: pid=46378: Tue Oct 10 01:23:39 2017
My NIC information:
[root(a)node4 nvme-cli-gerrit]# lsmod | grep -i mlx
mlx5_ib 172032 0
ib_core 200704 15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,mlx5_ib,ib_srpt,ib_ucm,rdma_ucm
mlx5_core 380928 1 mlx5_ib
ptp 20480 3 ixgbe,igb,mlx5_core
[root(a)node4 nvme-cli-gerrit]# lspci | grep -i mell
81:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
81:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Victor Banh
Sent: Friday, October 6, 2017 2:41 PM
To: Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio
From: Harris, James R [mailto:james.r.harris(a)intel.com]
Sent: Friday, October 06, 2017 2:32 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Cc: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio
(cc Victor)
From: James Harris <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Date: Thursday, October 5, 2017 at 1:59 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
Could you provide a few more details? This will help the list to provide some ideas.
1) On the client, are you using the SPDK NVMe-oF initiator or the kernel initiator?
Kernel initiator, run these commands on client server.
modprobe mlx5_ib
modprobe nvme-rdma
nvme discover -t rdma -a 192.168.10.11 -s 4420
nvme connect -t rdma -n nqn.2016-06.io.spdk:nvme-subsystem-1 -a 192.168.10.11 -s 4420
2) Can you provide the fio configuration file or command line? Just so we can have more specifics on “bigger block size”.
fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme1n1 --name=read-phase --rw=randwrite
3) Any details on the HW setup – specifically details on the RDMA NIC (or if you’re using SW RoCE).
Nvmf.conf on target server
[Global]
Comment "Global section"
ReactorMask 0xff00
[Rpc]
Enable No
Listen 127.0.0.1
[Nvmf]
MaxQueuesPerSession 8
MaxQueueDepth 128
[Subsystem1]
NQN nqn.2016-06.io.spdk:nvme-subsystem-1
Core 9
Mode Direct
Listen RDMA 192.168.10.11:4420
NVMe 0000:82:00.0
SN S2PMNAAH400039
It is RDMA NIC, ConnectX 5, Intel CPU Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
NUMA node0 CPU(s): 0-7
NUMA node1 CPU(s): 8-15
Thanks,
-Jim
From: SPDK <spdk-bounces(a)lists.01.org<mailto:spdk-bounces(a)lists.01.org>> on behalf of Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Date: Thursday, October 5, 2017 at 11:26 AM
To: "spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>" <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: [SPDK] Buffer I/O error on bigger block size running fio
Hi
I have SPDK NVMeoF and keep getting error with bigger block size with fio on randwrite tests.
I am using Ubuntu 16.04 with kernel version 4.12.0-041200-generic on target and client.
The DPDK is 17.08 and SPDK is 17.07.1.
Thanks
Victor
[46905.233553] perf: interrupt took too long (2503 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
[48285.159186] blk_update_request: I/O error, dev nvme1n1, sector 2507351968
[48285.159207] blk_update_request: I/O error, dev nvme1n1, sector 1301294496
[48285.159226] blk_update_request: I/O error, dev nvme1n1, sector 1947371168
[48285.159239] blk_update_request: I/O error, dev nvme1n1, sector 1891797568
[48285.159252] blk_update_request: I/O error, dev nvme1n1, sector 10833824
[48285.159265] blk_update_request: I/O error, dev nvme1n1, sector 614937152
[48285.159277] blk_update_request: I/O error, dev nvme1n1, sector 1872305088
[48285.159290] blk_update_request: I/O error, dev nvme1n1, sector 1504491040
[48285.159299] blk_update_request: I/O error, dev nvme1n1, sector 1182136128
[48285.159308] blk_update_request: I/O error, dev nvme1n1, sector 1662985792
[48285.191185] nvme nvme1: Reconnecting in 10 seconds...
[48285.191254] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191291] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191305] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191314] ldm_validate_partition_table(): Disk read failed.
[48285.191320] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191327] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191335] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191342] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191347] Dev nvme1n1: unable to read RDB block 0
[48285.191353] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191360] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191375] Buffer I/O error on dev nvme1n1, logical block 3, async page read
[48285.191389] nvme1n1: unable to read partition table
[48285.223197] nvme1n1: detected capacity change from 1600321314816 to 0
[48289.623192] nvme1n1: detected capacity change from 0 to -65647705833078784
[48289.623411] ldm_validate_partition_table(): Disk read failed.
[48289.623447] Dev nvme1n1: unable to read RDB block 0
[48289.623486] nvme1n1: unable to read partition table
[48289.643305] ldm_validate_partition_table(): Disk read failed.
[48289.643328] Dev nvme1n1: unable to read RDB block 0
[48289.643373] nvme1n1: unable to read partition table
[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 36462 bytes --]
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [SPDK] Buffer I/O error on bigger block size running fio
@ 2017-10-19 1:43 Victor Banh
0 siblings, 0 replies; 24+ messages in thread
From: Victor Banh @ 2017-10-19 1:43 UTC (permalink / raw)
To: spdk
[-- Attachment #1: Type: text/plain, Size: 10309 bytes --]
Hi Gang
Any update?
Do you see any error message from “dmesg” with 512k block size running fio?
Thanks
Victor
From: Victor Banh
Sent: Tuesday, October 17, 2017 7:37 PM
To: 'Cao, Gang' <gang.cao(a)intel.com>; Storage Performance Development Kit <spdk(a)lists.01.org>; Harris, James R <james.r.harris(a)intel.com>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Gang
spdk-17.07.1 and dpdk-17.08
Thanks
Victor
From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Monday, October 16, 2017 8:51 PM
To: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?
Thanks,
Gang
From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Tuesday, October 17, 2017 5:30 AM
To: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Cao
Do you see any message from dmesg?
I tried this fio version and still saw these error message from dmesg.
fio-3.1
[869053.218235] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218250] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218259] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218263] ldm_validate_partition_table(): Disk read failed.
[869053.218269] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218277] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218285] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218292] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218296] Dev nvme2n1: unable to read RDB block 0
[869053.218303] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218311] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218323] Buffer I/O error on dev nvme2n1, logical block 3, async page read
[869053.218338] nvme2n1: unable to read partition table
[869053.246126] nvme2n1: detected capacity change from -62111005559226368 to -62042256479092736
[869053.246195] ldm_validate_partition_table(): Disk read failed.
[869053.246217] Dev nvme2n1: unable to read RDB block 0
From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Monday, October 09, 2017 10:59 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Cc: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
Thanks for your detailed information on the testing.
I’ve tried the latest SPDK code and with latest fio-3.1-20-g132b and fio-2.19. It seems like no this kind of error.
Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?
fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme0n1 --name=read-phase --rw=randwrite
read-phase: (g=0): rw=randwrite, bs=(R) 512KiB-512KiB, (W) 512KiB-512KiB, (T) 512KiB-512KiB, ioengine=libaio, iodepth=16
...
fio-3.1-20-g132b
Starting 4 processes
Jobs: 4 (f=4): [w(4)][100.0%][r=0KiB/s,w=1592MiB/s][r=0,w=3183 IOPS][eta 00m:00s]
read-phase: (groupid=0, jobs=1): err= 0: pid=46378: Tue Oct 10 01:23:39 2017
My NIC information:
[root(a)node4 nvme-cli-gerrit]# lsmod | grep -i mlx
mlx5_ib 172032 0
ib_core 200704 15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,mlx5_ib,ib_srpt,ib_ucm,rdma_ucm
mlx5_core 380928 1 mlx5_ib
ptp 20480 3 ixgbe,igb,mlx5_core
[root(a)node4 nvme-cli-gerrit]# lspci | grep -i mell
81:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
81:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Victor Banh
Sent: Friday, October 6, 2017 2:41 PM
To: Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio
From: Harris, James R [mailto:james.r.harris(a)intel.com]
Sent: Friday, October 06, 2017 2:32 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Cc: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio
(cc Victor)
From: James Harris <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Date: Thursday, October 5, 2017 at 1:59 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
Could you provide a few more details? This will help the list to provide some ideas.
1) On the client, are you using the SPDK NVMe-oF initiator or the kernel initiator?
Kernel initiator, run these commands on client server.
modprobe mlx5_ib
modprobe nvme-rdma
nvme discover -t rdma -a 192.168.10.11 -s 4420
nvme connect -t rdma -n nqn.2016-06.io.spdk:nvme-subsystem-1 -a 192.168.10.11 -s 4420
2) Can you provide the fio configuration file or command line? Just so we can have more specifics on “bigger block size”.
fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme1n1 --name=read-phase --rw=randwrite
3) Any details on the HW setup – specifically details on the RDMA NIC (or if you’re using SW RoCE).
Nvmf.conf on target server
[Global]
Comment "Global section"
ReactorMask 0xff00
[Rpc]
Enable No
Listen 127.0.0.1
[Nvmf]
MaxQueuesPerSession 8
MaxQueueDepth 128
[Subsystem1]
NQN nqn.2016-06.io.spdk:nvme-subsystem-1
Core 9
Mode Direct
Listen RDMA 192.168.10.11:4420
NVMe 0000:82:00.0
SN S2PMNAAH400039
It is RDMA NIC, ConnectX 5, Intel CPU Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
NUMA node0 CPU(s): 0-7
NUMA node1 CPU(s): 8-15
Thanks,
-Jim
From: SPDK <spdk-bounces(a)lists.01.org<mailto:spdk-bounces(a)lists.01.org>> on behalf of Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Date: Thursday, October 5, 2017 at 11:26 AM
To: "spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>" <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: [SPDK] Buffer I/O error on bigger block size running fio
Hi
I have SPDK NVMeoF and keep getting error with bigger block size with fio on randwrite tests.
I am using Ubuntu 16.04 with kernel version 4.12.0-041200-generic on target and client.
The DPDK is 17.08 and SPDK is 17.07.1.
Thanks
Victor
[46905.233553] perf: interrupt took too long (2503 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
[48285.159186] blk_update_request: I/O error, dev nvme1n1, sector 2507351968
[48285.159207] blk_update_request: I/O error, dev nvme1n1, sector 1301294496
[48285.159226] blk_update_request: I/O error, dev nvme1n1, sector 1947371168
[48285.159239] blk_update_request: I/O error, dev nvme1n1, sector 1891797568
[48285.159252] blk_update_request: I/O error, dev nvme1n1, sector 10833824
[48285.159265] blk_update_request: I/O error, dev nvme1n1, sector 614937152
[48285.159277] blk_update_request: I/O error, dev nvme1n1, sector 1872305088
[48285.159290] blk_update_request: I/O error, dev nvme1n1, sector 1504491040
[48285.159299] blk_update_request: I/O error, dev nvme1n1, sector 1182136128
[48285.159308] blk_update_request: I/O error, dev nvme1n1, sector 1662985792
[48285.191185] nvme nvme1: Reconnecting in 10 seconds...
[48285.191254] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191291] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191305] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191314] ldm_validate_partition_table(): Disk read failed.
[48285.191320] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191327] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191335] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191342] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191347] Dev nvme1n1: unable to read RDB block 0
[48285.191353] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191360] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191375] Buffer I/O error on dev nvme1n1, logical block 3, async page read
[48285.191389] nvme1n1: unable to read partition table
[48285.223197] nvme1n1: detected capacity change from 1600321314816 to 0
[48289.623192] nvme1n1: detected capacity change from 0 to -65647705833078784
[48289.623411] ldm_validate_partition_table(): Disk read failed.
[48289.623447] Dev nvme1n1: unable to read RDB block 0
[48289.623486] nvme1n1: unable to read partition table
[48289.643305] ldm_validate_partition_table(): Disk read failed.
[48289.643328] Dev nvme1n1: unable to read RDB block 0
[48289.643373] nvme1n1: unable to read partition table
[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 38136 bytes --]
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [SPDK] Buffer I/O error on bigger block size running fio
@ 2017-10-18 2:37 Victor Banh
0 siblings, 0 replies; 24+ messages in thread
From: Victor Banh @ 2017-10-18 2:37 UTC (permalink / raw)
To: spdk
[-- Attachment #1: Type: text/plain, Size: 9816 bytes --]
Hi Gang
spdk-17.07.1 and dpdk-17.08
Thanks
Victor
From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Monday, October 16, 2017 8:51 PM
To: Victor Banh <victorb(a)mellanox.com>; Storage Performance Development Kit <spdk(a)lists.01.org>; Harris, James R <james.r.harris(a)intel.com>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?
Thanks,
Gang
From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Tuesday, October 17, 2017 5:30 AM
To: Cao, Gang <gang.cao(a)intel.com<mailto:gang.cao(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Cao
Do you see any message from dmesg?
I tried this fio version and still saw these error message from dmesg.
fio-3.1
[869053.218235] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218250] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218259] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218263] ldm_validate_partition_table(): Disk read failed.
[869053.218269] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218277] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218285] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218292] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218296] Dev nvme2n1: unable to read RDB block 0
[869053.218303] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218311] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218323] Buffer I/O error on dev nvme2n1, logical block 3, async page read
[869053.218338] nvme2n1: unable to read partition table
[869053.246126] nvme2n1: detected capacity change from -62111005559226368 to -62042256479092736
[869053.246195] ldm_validate_partition_table(): Disk read failed.
[869053.246217] Dev nvme2n1: unable to read RDB block 0
From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Monday, October 09, 2017 10:59 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Cc: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
Thanks for your detailed information on the testing.
I’ve tried the latest SPDK code and with latest fio-3.1-20-g132b and fio-2.19. It seems like no this kind of error.
Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?
fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme0n1 --name=read-phase --rw=randwrite
read-phase: (g=0): rw=randwrite, bs=(R) 512KiB-512KiB, (W) 512KiB-512KiB, (T) 512KiB-512KiB, ioengine=libaio, iodepth=16
...
fio-3.1-20-g132b
Starting 4 processes
Jobs: 4 (f=4): [w(4)][100.0%][r=0KiB/s,w=1592MiB/s][r=0,w=3183 IOPS][eta 00m:00s]
read-phase: (groupid=0, jobs=1): err= 0: pid=46378: Tue Oct 10 01:23:39 2017
My NIC information:
[root(a)node4 nvme-cli-gerrit]# lsmod | grep -i mlx
mlx5_ib 172032 0
ib_core 200704 15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,mlx5_ib,ib_srpt,ib_ucm,rdma_ucm
mlx5_core 380928 1 mlx5_ib
ptp 20480 3 ixgbe,igb,mlx5_core
[root(a)node4 nvme-cli-gerrit]# lspci | grep -i mell
81:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
81:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Victor Banh
Sent: Friday, October 6, 2017 2:41 PM
To: Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio
From: Harris, James R [mailto:james.r.harris(a)intel.com]
Sent: Friday, October 06, 2017 2:32 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Cc: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio
(cc Victor)
From: James Harris <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Date: Thursday, October 5, 2017 at 1:59 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
Could you provide a few more details? This will help the list to provide some ideas.
1) On the client, are you using the SPDK NVMe-oF initiator or the kernel initiator?
Kernel initiator, run these commands on client server.
modprobe mlx5_ib
modprobe nvme-rdma
nvme discover -t rdma -a 192.168.10.11 -s 4420
nvme connect -t rdma -n nqn.2016-06.io.spdk:nvme-subsystem-1 -a 192.168.10.11 -s 4420
2) Can you provide the fio configuration file or command line? Just so we can have more specifics on “bigger block size”.
fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme1n1 --name=read-phase --rw=randwrite
3) Any details on the HW setup – specifically details on the RDMA NIC (or if you’re using SW RoCE).
Nvmf.conf on target server
[Global]
Comment "Global section"
ReactorMask 0xff00
[Rpc]
Enable No
Listen 127.0.0.1
[Nvmf]
MaxQueuesPerSession 8
MaxQueueDepth 128
[Subsystem1]
NQN nqn.2016-06.io.spdk:nvme-subsystem-1
Core 9
Mode Direct
Listen RDMA 192.168.10.11:4420
NVMe 0000:82:00.0
SN S2PMNAAH400039
It is RDMA NIC, ConnectX 5, Intel CPU Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
NUMA node0 CPU(s): 0-7
NUMA node1 CPU(s): 8-15
Thanks,
-Jim
From: SPDK <spdk-bounces(a)lists.01.org<mailto:spdk-bounces(a)lists.01.org>> on behalf of Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Date: Thursday, October 5, 2017 at 11:26 AM
To: "spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>" <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: [SPDK] Buffer I/O error on bigger block size running fio
Hi
I have SPDK NVMeoF and keep getting error with bigger block size with fio on randwrite tests.
I am using Ubuntu 16.04 with kernel version 4.12.0-041200-generic on target and client.
The DPDK is 17.08 and SPDK is 17.07.1.
Thanks
Victor
[46905.233553] perf: interrupt took too long (2503 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
[48285.159186] blk_update_request: I/O error, dev nvme1n1, sector 2507351968
[48285.159207] blk_update_request: I/O error, dev nvme1n1, sector 1301294496
[48285.159226] blk_update_request: I/O error, dev nvme1n1, sector 1947371168
[48285.159239] blk_update_request: I/O error, dev nvme1n1, sector 1891797568
[48285.159252] blk_update_request: I/O error, dev nvme1n1, sector 10833824
[48285.159265] blk_update_request: I/O error, dev nvme1n1, sector 614937152
[48285.159277] blk_update_request: I/O error, dev nvme1n1, sector 1872305088
[48285.159290] blk_update_request: I/O error, dev nvme1n1, sector 1504491040
[48285.159299] blk_update_request: I/O error, dev nvme1n1, sector 1182136128
[48285.159308] blk_update_request: I/O error, dev nvme1n1, sector 1662985792
[48285.191185] nvme nvme1: Reconnecting in 10 seconds...
[48285.191254] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191291] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191305] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191314] ldm_validate_partition_table(): Disk read failed.
[48285.191320] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191327] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191335] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191342] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191347] Dev nvme1n1: unable to read RDB block 0
[48285.191353] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191360] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191375] Buffer I/O error on dev nvme1n1, logical block 3, async page read
[48285.191389] nvme1n1: unable to read partition table
[48285.223197] nvme1n1: detected capacity change from 1600321314816 to 0
[48289.623192] nvme1n1: detected capacity change from 0 to -65647705833078784
[48289.623411] ldm_validate_partition_table(): Disk read failed.
[48289.623447] Dev nvme1n1: unable to read RDB block 0
[48289.623486] nvme1n1: unable to read partition table
[48289.643305] ldm_validate_partition_table(): Disk read failed.
[48289.643328] Dev nvme1n1: unable to read RDB block 0
[48289.643373] nvme1n1: unable to read partition table
[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 37005 bytes --]
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [SPDK] Buffer I/O error on bigger block size running fio
@ 2017-10-17 3:51 Cao, Gang
0 siblings, 0 replies; 24+ messages in thread
From: Cao, Gang @ 2017-10-17 3:51 UTC (permalink / raw)
To: spdk
[-- Attachment #1: Type: text/plain, Size: 9364 bytes --]
Hi Victor,
Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?
Thanks,
Gang
From: Victor Banh [mailto:victorb(a)mellanox.com]
Sent: Tuesday, October 17, 2017 5:30 AM
To: Cao, Gang <gang.cao(a)intel.com>; Storage Performance Development Kit <spdk(a)lists.01.org>; Harris, James R <james.r.harris(a)intel.com>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Cao
Do you see any message from dmesg?
I tried this fio version and still saw these error message from dmesg.
fio-3.1
[869053.218235] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218250] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218259] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218263] ldm_validate_partition_table(): Disk read failed.
[869053.218269] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218277] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218285] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218292] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218296] Dev nvme2n1: unable to read RDB block 0
[869053.218303] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218311] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218323] Buffer I/O error on dev nvme2n1, logical block 3, async page read
[869053.218338] nvme2n1: unable to read partition table
[869053.246126] nvme2n1: detected capacity change from -62111005559226368 to -62042256479092736
[869053.246195] ldm_validate_partition_table(): Disk read failed.
[869053.246217] Dev nvme2n1: unable to read RDB block 0
From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Monday, October 09, 2017 10:59 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>; Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Cc: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
Thanks for your detailed information on the testing.
I’ve tried the latest SPDK code and with latest fio-3.1-20-g132b and fio-2.19. It seems like no this kind of error.
Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?
fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme0n1 --name=read-phase --rw=randwrite
read-phase: (g=0): rw=randwrite, bs=(R) 512KiB-512KiB, (W) 512KiB-512KiB, (T) 512KiB-512KiB, ioengine=libaio, iodepth=16
...
fio-3.1-20-g132b
Starting 4 processes
Jobs: 4 (f=4): [w(4)][100.0%][r=0KiB/s,w=1592MiB/s][r=0,w=3183 IOPS][eta 00m:00s]
read-phase: (groupid=0, jobs=1): err= 0: pid=46378: Tue Oct 10 01:23:39 2017
My NIC information:
[root(a)node4 nvme-cli-gerrit]# lsmod | grep -i mlx
mlx5_ib 172032 0
ib_core 200704 15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,mlx5_ib,ib_srpt,ib_ucm,rdma_ucm
mlx5_core 380928 1 mlx5_ib
ptp 20480 3 ixgbe,igb,mlx5_core
[root(a)node4 nvme-cli-gerrit]# lspci | grep -i mell
81:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
81:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Victor Banh
Sent: Friday, October 6, 2017 2:41 PM
To: Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio
From: Harris, James R [mailto:james.r.harris(a)intel.com]
Sent: Friday, October 06, 2017 2:32 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Cc: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio
(cc Victor)
From: James Harris <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Date: Thursday, October 5, 2017 at 1:59 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
Could you provide a few more details? This will help the list to provide some ideas.
1) On the client, are you using the SPDK NVMe-oF initiator or the kernel initiator?
Kernel initiator, run these commands on client server.
modprobe mlx5_ib
modprobe nvme-rdma
nvme discover -t rdma -a 192.168.10.11 -s 4420
nvme connect -t rdma -n nqn.2016-06.io.spdk:nvme-subsystem-1 -a 192.168.10.11 -s 4420
2) Can you provide the fio configuration file or command line? Just so we can have more specifics on “bigger block size”.
fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme1n1 --name=read-phase --rw=randwrite
3) Any details on the HW setup – specifically details on the RDMA NIC (or if you’re using SW RoCE).
Nvmf.conf on target server
[Global]
Comment "Global section"
ReactorMask 0xff00
[Rpc]
Enable No
Listen 127.0.0.1
[Nvmf]
MaxQueuesPerSession 8
MaxQueueDepth 128
[Subsystem1]
NQN nqn.2016-06.io.spdk:nvme-subsystem-1
Core 9
Mode Direct
Listen RDMA 192.168.10.11:4420
NVMe 0000:82:00.0
SN S2PMNAAH400039
It is RDMA NIC, ConnectX 5, Intel CPU Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
NUMA node0 CPU(s): 0-7
NUMA node1 CPU(s): 8-15
Thanks,
-Jim
From: SPDK <spdk-bounces(a)lists.01.org<mailto:spdk-bounces(a)lists.01.org>> on behalf of Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Date: Thursday, October 5, 2017 at 11:26 AM
To: "spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>" <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: [SPDK] Buffer I/O error on bigger block size running fio
Hi
I have SPDK NVMeoF and keep getting error with bigger block size with fio on randwrite tests.
I am using Ubuntu 16.04 with kernel version 4.12.0-041200-generic on target and client.
The DPDK is 17.08 and SPDK is 17.07.1.
Thanks
Victor
[46905.233553] perf: interrupt took too long (2503 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
[48285.159186] blk_update_request: I/O error, dev nvme1n1, sector 2507351968
[48285.159207] blk_update_request: I/O error, dev nvme1n1, sector 1301294496
[48285.159226] blk_update_request: I/O error, dev nvme1n1, sector 1947371168
[48285.159239] blk_update_request: I/O error, dev nvme1n1, sector 1891797568
[48285.159252] blk_update_request: I/O error, dev nvme1n1, sector 10833824
[48285.159265] blk_update_request: I/O error, dev nvme1n1, sector 614937152
[48285.159277] blk_update_request: I/O error, dev nvme1n1, sector 1872305088
[48285.159290] blk_update_request: I/O error, dev nvme1n1, sector 1504491040
[48285.159299] blk_update_request: I/O error, dev nvme1n1, sector 1182136128
[48285.159308] blk_update_request: I/O error, dev nvme1n1, sector 1662985792
[48285.191185] nvme nvme1: Reconnecting in 10 seconds...
[48285.191254] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191291] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191305] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191314] ldm_validate_partition_table(): Disk read failed.
[48285.191320] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191327] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191335] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191342] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191347] Dev nvme1n1: unable to read RDB block 0
[48285.191353] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191360] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191375] Buffer I/O error on dev nvme1n1, logical block 3, async page read
[48285.191389] nvme1n1: unable to read partition table
[48285.223197] nvme1n1: detected capacity change from 1600321314816 to 0
[48289.623192] nvme1n1: detected capacity change from 0 to -65647705833078784
[48289.623411] ldm_validate_partition_table(): Disk read failed.
[48289.623447] Dev nvme1n1: unable to read RDB block 0
[48289.623486] nvme1n1: unable to read partition table
[48289.643305] ldm_validate_partition_table(): Disk read failed.
[48289.643328] Dev nvme1n1: unable to read RDB block 0
[48289.643373] nvme1n1: unable to read partition table
[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 32455 bytes --]
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [SPDK] Buffer I/O error on bigger block size running fio
@ 2017-10-16 21:30 Victor Banh
0 siblings, 0 replies; 24+ messages in thread
From: Victor Banh @ 2017-10-16 21:30 UTC (permalink / raw)
To: spdk
[-- Attachment #1: Type: text/plain, Size: 8791 bytes --]
Hi Cao
Do you see any message from dmesg?
I tried this fio version and still saw these error message from dmesg.
fio-3.1
[869053.218235] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218250] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218259] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218263] ldm_validate_partition_table(): Disk read failed.
[869053.218269] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218277] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218285] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218292] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218296] Dev nvme2n1: unable to read RDB block 0
[869053.218303] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218311] Buffer I/O error on dev nvme2n1, logical block 0, async page read
[869053.218323] Buffer I/O error on dev nvme2n1, logical block 3, async page read
[869053.218338] nvme2n1: unable to read partition table
[869053.246126] nvme2n1: detected capacity change from -62111005559226368 to -62042256479092736
[869053.246195] ldm_validate_partition_table(): Disk read failed.
[869053.246217] Dev nvme2n1: unable to read RDB block 0
From: Cao, Gang [mailto:gang.cao(a)intel.com]
Sent: Monday, October 09, 2017 10:59 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org>; Harris, James R <james.r.harris(a)intel.com>
Cc: Victor Banh <victorb(a)mellanox.com>
Subject: RE: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
Thanks for your detailed information on the testing.
I’ve tried the latest SPDK code and with latest fio-3.1-20-g132b and fio-2.19. It seems like no this kind of error.
Could you share us which version of SPDK you are using when seeing this error? Or maybe you can have a try with the latest SPDK code?
fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme0n1 --name=read-phase --rw=randwrite
read-phase: (g=0): rw=randwrite, bs=(R) 512KiB-512KiB, (W) 512KiB-512KiB, (T) 512KiB-512KiB, ioengine=libaio, iodepth=16
...
fio-3.1-20-g132b
Starting 4 processes
Jobs: 4 (f=4): [w(4)][100.0%][r=0KiB/s,w=1592MiB/s][r=0,w=3183 IOPS][eta 00m:00s]
read-phase: (groupid=0, jobs=1): err= 0: pid=46378: Tue Oct 10 01:23:39 2017
My NIC information:
[root(a)node4 nvme-cli-gerrit]# lsmod | grep -i mlx
mlx5_ib 172032 0
ib_core 200704 15 ib_iser,ib_cm,rdma_cm,ib_umad,ib_srp,nvme_rdma,ib_isert,ib_uverbs,rpcrdma,ib_ipoib,iw_cm,mlx5_ib,ib_srpt,ib_ucm,rdma_ucm
mlx5_core 380928 1 mlx5_ib
ptp 20480 3 ixgbe,igb,mlx5_core
[root(a)node4 nvme-cli-gerrit]# lspci | grep -i mell
81:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
81:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Victor Banh
Sent: Friday, October 6, 2017 2:41 PM
To: Harris, James R <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>; Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio
From: Harris, James R [mailto:james.r.harris(a)intel.com]
Sent: Friday, October 06, 2017 2:32 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Cc: Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio
(cc Victor)
From: James Harris <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Date: Thursday, October 5, 2017 at 1:59 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
Could you provide a few more details? This will help the list to provide some ideas.
1) On the client, are you using the SPDK NVMe-oF initiator or the kernel initiator?
Kernel initiator, run these commands on client server.
modprobe mlx5_ib
modprobe nvme-rdma
nvme discover -t rdma -a 192.168.10.11 -s 4420
nvme connect -t rdma -n nqn.2016-06.io.spdk:nvme-subsystem-1 -a 192.168.10.11 -s 4420
2) Can you provide the fio configuration file or command line? Just so we can have more specifics on “bigger block size”.
fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme1n1 --name=read-phase --rw=randwrite
3) Any details on the HW setup – specifically details on the RDMA NIC (or if you’re using SW RoCE).
Nvmf.conf on target server
[Global]
Comment "Global section"
ReactorMask 0xff00
[Rpc]
Enable No
Listen 127.0.0.1
[Nvmf]
MaxQueuesPerSession 8
MaxQueueDepth 128
[Subsystem1]
NQN nqn.2016-06.io.spdk:nvme-subsystem-1
Core 9
Mode Direct
Listen RDMA 192.168.10.11:4420
NVMe 0000:82:00.0
SN S2PMNAAH400039
It is RDMA NIC, ConnectX 5, Intel CPU Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
NUMA node0 CPU(s): 0-7
NUMA node1 CPU(s): 8-15
Thanks,
-Jim
From: SPDK <spdk-bounces(a)lists.01.org<mailto:spdk-bounces(a)lists.01.org>> on behalf of Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Date: Thursday, October 5, 2017 at 11:26 AM
To: "spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>" <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: [SPDK] Buffer I/O error on bigger block size running fio
Hi
I have SPDK NVMeoF and keep getting error with bigger block size with fio on randwrite tests.
I am using Ubuntu 16.04 with kernel version 4.12.0-041200-generic on target and client.
The DPDK is 17.08 and SPDK is 17.07.1.
Thanks
Victor
[46905.233553] perf: interrupt took too long (2503 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
[48285.159186] blk_update_request: I/O error, dev nvme1n1, sector 2507351968
[48285.159207] blk_update_request: I/O error, dev nvme1n1, sector 1301294496
[48285.159226] blk_update_request: I/O error, dev nvme1n1, sector 1947371168
[48285.159239] blk_update_request: I/O error, dev nvme1n1, sector 1891797568
[48285.159252] blk_update_request: I/O error, dev nvme1n1, sector 10833824
[48285.159265] blk_update_request: I/O error, dev nvme1n1, sector 614937152
[48285.159277] blk_update_request: I/O error, dev nvme1n1, sector 1872305088
[48285.159290] blk_update_request: I/O error, dev nvme1n1, sector 1504491040
[48285.159299] blk_update_request: I/O error, dev nvme1n1, sector 1182136128
[48285.159308] blk_update_request: I/O error, dev nvme1n1, sector 1662985792
[48285.191185] nvme nvme1: Reconnecting in 10 seconds...
[48285.191254] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191291] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191305] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191314] ldm_validate_partition_table(): Disk read failed.
[48285.191320] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191327] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191335] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191342] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191347] Dev nvme1n1: unable to read RDB block 0
[48285.191353] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191360] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191375] Buffer I/O error on dev nvme1n1, logical block 3, async page read
[48285.191389] nvme1n1: unable to read partition table
[48285.223197] nvme1n1: detected capacity change from 1600321314816 to 0
[48289.623192] nvme1n1: detected capacity change from 0 to -65647705833078784
[48289.623411] ldm_validate_partition_table(): Disk read failed.
[48289.623447] Dev nvme1n1: unable to read RDB block 0
[48289.623486] nvme1n1: unable to read partition table
[48289.643305] ldm_validate_partition_table(): Disk read failed.
[48289.643328] Dev nvme1n1: unable to read RDB block 0
[48289.643373] nvme1n1: unable to read partition table
[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 32425 bytes --]
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [SPDK] Buffer I/O error on bigger block size running fio
@ 2017-10-06 21:40 Victor Banh
0 siblings, 0 replies; 24+ messages in thread
From: Victor Banh @ 2017-10-06 21:40 UTC (permalink / raw)
To: spdk
[-- Attachment #1: Type: text/plain, Size: 5234 bytes --]
From: Harris, James R [mailto:james.r.harris(a)intel.com]
Sent: Friday, October 06, 2017 2:32 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Cc: Victor Banh <victorb(a)mellanox.com>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio
(cc Victor)
From: James Harris <james.r.harris(a)intel.com<mailto:james.r.harris(a)intel.com>>
Date: Thursday, October 5, 2017 at 1:59 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
Could you provide a few more details? This will help the list to provide some ideas.
1) On the client, are you using the SPDK NVMe-oF initiator or the kernel initiator?
Kernel initiator, run these commands on client server.
modprobe mlx5_ib
modprobe nvme-rdma
nvme discover -t rdma -a 192.168.10.11 -s 4420
nvme connect -t rdma -n nqn.2016-06.io.spdk:nvme-subsystem-1 -a 192.168.10.11 -s 4420
2) Can you provide the fio configuration file or command line? Just so we can have more specifics on “bigger block size”.
fio --bs=512k --numjobs=4 --iodepth=16 --loops=1 --ioengine=libaio --direct=1 --invalidate=1 --fsync_on_close=1 --randrepeat=1 --norandommap --time_based --runtime=60 --filename=/dev/nvme1n1 --name=read-phase --rw=randwrite
3) Any details on the HW setup – specifically details on the RDMA NIC (or if you’re using SW RoCE).
Nvmf.conf on target server
[Global]
Comment "Global section"
ReactorMask 0xff00
[Rpc]
Enable No
Listen 127.0.0.1
[Nvmf]
MaxQueuesPerSession 8
MaxQueueDepth 128
[Subsystem1]
NQN nqn.2016-06.io.spdk:nvme-subsystem-1
Core 9
Mode Direct
Listen RDMA 192.168.10.11:4420
NVMe 0000:82:00.0
SN S2PMNAAH400039
It is RDMA NIC, ConnectX 5, Intel CPU Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
NUMA node0 CPU(s): 0-7
NUMA node1 CPU(s): 8-15
Thanks,
-Jim
From: SPDK <spdk-bounces(a)lists.01.org<mailto:spdk-bounces(a)lists.01.org>> on behalf of Victor Banh <victorb(a)mellanox.com<mailto:victorb(a)mellanox.com>>
Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Date: Thursday, October 5, 2017 at 11:26 AM
To: "spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>" <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: [SPDK] Buffer I/O error on bigger block size running fio
Hi
I have SPDK NVMeoF and keep getting error with bigger block size with fio on randwrite tests.
I am using Ubuntu 16.04 with kernel version 4.12.0-041200-generic on target and client.
The DPDK is 17.08 and SPDK is 17.07.1.
Thanks
Victor
[46905.233553] perf: interrupt took too long (2503 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
[48285.159186] blk_update_request: I/O error, dev nvme1n1, sector 2507351968
[48285.159207] blk_update_request: I/O error, dev nvme1n1, sector 1301294496
[48285.159226] blk_update_request: I/O error, dev nvme1n1, sector 1947371168
[48285.159239] blk_update_request: I/O error, dev nvme1n1, sector 1891797568
[48285.159252] blk_update_request: I/O error, dev nvme1n1, sector 10833824
[48285.159265] blk_update_request: I/O error, dev nvme1n1, sector 614937152
[48285.159277] blk_update_request: I/O error, dev nvme1n1, sector 1872305088
[48285.159290] blk_update_request: I/O error, dev nvme1n1, sector 1504491040
[48285.159299] blk_update_request: I/O error, dev nvme1n1, sector 1182136128
[48285.159308] blk_update_request: I/O error, dev nvme1n1, sector 1662985792
[48285.191185] nvme nvme1: Reconnecting in 10 seconds...
[48285.191254] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191291] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191305] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191314] ldm_validate_partition_table(): Disk read failed.
[48285.191320] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191327] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191335] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191342] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191347] Dev nvme1n1: unable to read RDB block 0
[48285.191353] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191360] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191375] Buffer I/O error on dev nvme1n1, logical block 3, async page read
[48285.191389] nvme1n1: unable to read partition table
[48285.223197] nvme1n1: detected capacity change from 1600321314816 to 0
[48289.623192] nvme1n1: detected capacity change from 0 to -65647705833078784
[48289.623411] ldm_validate_partition_table(): Disk read failed.
[48289.623447] Dev nvme1n1: unable to read RDB block 0
[48289.623486] nvme1n1: unable to read partition table
[48289.643305] ldm_validate_partition_table(): Disk read failed.
[48289.643328] Dev nvme1n1: unable to read RDB block 0
[48289.643373] nvme1n1: unable to read partition table
[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 17572 bytes --]
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [SPDK] Buffer I/O error on bigger block size running fio
@ 2017-10-06 21:32 Harris, James R
0 siblings, 0 replies; 24+ messages in thread
From: Harris, James R @ 2017-10-06 21:32 UTC (permalink / raw)
To: spdk
[-- Attachment #1: Type: text/plain, Size: 3750 bytes --]
(cc Victor)
From: James Harris <james.r.harris(a)intel.com>
Date: Thursday, October 5, 2017 at 1:59 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: Re: [SPDK] Buffer I/O error on bigger block size running fio
Hi Victor,
Could you provide a few more details? This will help the list to provide some ideas.
1) On the client, are you using the SPDK NVMe-oF initiator or the kernel initiator?
2) Can you provide the fio configuration file or command line? Just so we can have more specifics on “bigger block size”.
3) Any details on the HW setup – specifically details on the RDMA NIC (or if you’re using SW RoCE).
Thanks,
-Jim
From: SPDK <spdk-bounces(a)lists.01.org> on behalf of Victor Banh <victorb(a)mellanox.com>
Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org>
Date: Thursday, October 5, 2017 at 11:26 AM
To: "spdk(a)lists.01.org" <spdk(a)lists.01.org>
Subject: [SPDK] Buffer I/O error on bigger block size running fio
Hi
I have SPDK NVMeoF and keep getting error with bigger block size with fio on randwrite tests.
I am using Ubuntu 16.04 with kernel version 4.12.0-041200-generic on target and client.
The DPDK is 17.08 and SPDK is 17.07.1.
Thanks
Victor
[46905.233553] perf: interrupt took too long (2503 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
[48285.159186] blk_update_request: I/O error, dev nvme1n1, sector 2507351968
[48285.159207] blk_update_request: I/O error, dev nvme1n1, sector 1301294496
[48285.159226] blk_update_request: I/O error, dev nvme1n1, sector 1947371168
[48285.159239] blk_update_request: I/O error, dev nvme1n1, sector 1891797568
[48285.159252] blk_update_request: I/O error, dev nvme1n1, sector 10833824
[48285.159265] blk_update_request: I/O error, dev nvme1n1, sector 614937152
[48285.159277] blk_update_request: I/O error, dev nvme1n1, sector 1872305088
[48285.159290] blk_update_request: I/O error, dev nvme1n1, sector 1504491040
[48285.159299] blk_update_request: I/O error, dev nvme1n1, sector 1182136128
[48285.159308] blk_update_request: I/O error, dev nvme1n1, sector 1662985792
[48285.191185] nvme nvme1: Reconnecting in 10 seconds...
[48285.191254] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191291] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191305] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191314] ldm_validate_partition_table(): Disk read failed.
[48285.191320] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191327] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191335] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191342] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191347] Dev nvme1n1: unable to read RDB block 0
[48285.191353] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191360] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191375] Buffer I/O error on dev nvme1n1, logical block 3, async page read
[48285.191389] nvme1n1: unable to read partition table
[48285.223197] nvme1n1: detected capacity change from 1600321314816 to 0
[48289.623192] nvme1n1: detected capacity change from 0 to -65647705833078784
[48289.623411] ldm_validate_partition_table(): Disk read failed.
[48289.623447] Dev nvme1n1: unable to read RDB block 0
[48289.623486] nvme1n1: unable to read partition table
[48289.643305] ldm_validate_partition_table(): Disk read failed.
[48289.643328] Dev nvme1n1: unable to read RDB block 0
[48289.643373] nvme1n1: unable to read partition table
[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 12470 bytes --]
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [SPDK] Buffer I/O error on bigger block size running fio
@ 2017-10-05 20:59 Harris, James R
0 siblings, 0 replies; 24+ messages in thread
From: Harris, James R @ 2017-10-05 20:59 UTC (permalink / raw)
To: spdk
[-- Attachment #1: Type: text/plain, Size: 3506 bytes --]
Hi Victor,
Could you provide a few more details? This will help the list to provide some ideas.
1) On the client, are you using the SPDK NVMe-oF initiator or the kernel initiator?
2) Can you provide the fio configuration file or command line? Just so we can have more specifics on “bigger block size”.
3) Any details on the HW setup – specifically details on the RDMA NIC (or if you’re using SW RoCE).
Thanks,
-Jim
From: SPDK <spdk-bounces(a)lists.01.org> on behalf of Victor Banh <victorb(a)mellanox.com>
Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org>
Date: Thursday, October 5, 2017 at 11:26 AM
To: "spdk(a)lists.01.org" <spdk(a)lists.01.org>
Subject: [SPDK] Buffer I/O error on bigger block size running fio
Hi
I have SPDK NVMeoF and keep getting error with bigger block size with fio on randwrite tests.
I am using Ubuntu 16.04 with kernel version 4.12.0-041200-generic on target and client.
The DPDK is 17.08 and SPDK is 17.07.1.
Thanks
Victor
[46905.233553] perf: interrupt took too long (2503 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
[48285.159186] blk_update_request: I/O error, dev nvme1n1, sector 2507351968
[48285.159207] blk_update_request: I/O error, dev nvme1n1, sector 1301294496
[48285.159226] blk_update_request: I/O error, dev nvme1n1, sector 1947371168
[48285.159239] blk_update_request: I/O error, dev nvme1n1, sector 1891797568
[48285.159252] blk_update_request: I/O error, dev nvme1n1, sector 10833824
[48285.159265] blk_update_request: I/O error, dev nvme1n1, sector 614937152
[48285.159277] blk_update_request: I/O error, dev nvme1n1, sector 1872305088
[48285.159290] blk_update_request: I/O error, dev nvme1n1, sector 1504491040
[48285.159299] blk_update_request: I/O error, dev nvme1n1, sector 1182136128
[48285.159308] blk_update_request: I/O error, dev nvme1n1, sector 1662985792
[48285.191185] nvme nvme1: Reconnecting in 10 seconds...
[48285.191254] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191291] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191305] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191314] ldm_validate_partition_table(): Disk read failed.
[48285.191320] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191327] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191335] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191342] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191347] Dev nvme1n1: unable to read RDB block 0
[48285.191353] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191360] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191375] Buffer I/O error on dev nvme1n1, logical block 3, async page read
[48285.191389] nvme1n1: unable to read partition table
[48285.223197] nvme1n1: detected capacity change from 1600321314816 to 0
[48289.623192] nvme1n1: detected capacity change from 0 to -65647705833078784
[48289.623411] ldm_validate_partition_table(): Disk read failed.
[48289.623447] Dev nvme1n1: unable to read RDB block 0
[48289.623486] nvme1n1: unable to read partition table
[48289.643305] ldm_validate_partition_table(): Disk read failed.
[48289.643328] Dev nvme1n1: unable to read RDB block 0
[48289.643373] nvme1n1: unable to read partition table
[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 11289 bytes --]
^ permalink raw reply [flat|nested] 24+ messages in thread
* [SPDK] Buffer I/O error on bigger block size running fio
@ 2017-10-05 18:26 Victor Banh
0 siblings, 0 replies; 24+ messages in thread
From: Victor Banh @ 2017-10-05 18:26 UTC (permalink / raw)
To: spdk
[-- Attachment #1: Type: text/plain, Size: 2714 bytes --]
Hi
I have SPDK NVMeoF and keep getting error with bigger block size with fio on randwrite tests.
I am using Ubuntu 16.04 with kernel version 4.12.0-041200-generic on target and client.
The DPDK is 17.08 and SPDK is 17.07.1.
Thanks
Victor
[46905.233553] perf: interrupt took too long (2503 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
[48285.159186] blk_update_request: I/O error, dev nvme1n1, sector 2507351968
[48285.159207] blk_update_request: I/O error, dev nvme1n1, sector 1301294496
[48285.159226] blk_update_request: I/O error, dev nvme1n1, sector 1947371168
[48285.159239] blk_update_request: I/O error, dev nvme1n1, sector 1891797568
[48285.159252] blk_update_request: I/O error, dev nvme1n1, sector 10833824
[48285.159265] blk_update_request: I/O error, dev nvme1n1, sector 614937152
[48285.159277] blk_update_request: I/O error, dev nvme1n1, sector 1872305088
[48285.159290] blk_update_request: I/O error, dev nvme1n1, sector 1504491040
[48285.159299] blk_update_request: I/O error, dev nvme1n1, sector 1182136128
[48285.159308] blk_update_request: I/O error, dev nvme1n1, sector 1662985792
[48285.191185] nvme nvme1: Reconnecting in 10 seconds...
[48285.191254] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191291] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191305] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191314] ldm_validate_partition_table(): Disk read failed.
[48285.191320] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191327] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191335] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191342] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191347] Dev nvme1n1: unable to read RDB block 0
[48285.191353] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191360] Buffer I/O error on dev nvme1n1, logical block 0, async page read
[48285.191375] Buffer I/O error on dev nvme1n1, logical block 3, async page read
[48285.191389] nvme1n1: unable to read partition table
[48285.223197] nvme1n1: detected capacity change from 1600321314816 to 0
[48289.623192] nvme1n1: detected capacity change from 0 to -65647705833078784
[48289.623411] ldm_validate_partition_table(): Disk read failed.
[48289.623447] Dev nvme1n1: unable to read RDB block 0
[48289.623486] nvme1n1: unable to read partition table
[48289.643305] ldm_validate_partition_table(): Disk read failed.
[48289.643328] Dev nvme1n1: unable to read RDB block 0
[48289.643373] nvme1n1: unable to read partition table
[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 5908 bytes --]
^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2018-03-05 19:32 UTC | newest]
Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-09 17:58 [SPDK] Buffer I/O error on bigger block size running fio Cao, Gang
-- strict thread matches above, loose matches on Subject: below --
2018-03-05 19:32 Wodkowski, PawelX
2018-03-03 4:22 Victor Banh
2017-10-24 5:04 Cao, Gang
2017-10-24 4:57 Victor Banh
2017-10-24 4:25 Cao, Gang
2017-10-21 15:22 Victor Banh
2017-10-21 8:21 Cao, Gang
2017-10-20 22:13 Victor Banh
2017-10-19 7:16 Victor Banh
2017-10-19 6:51 Cao, Gang
2017-10-19 6:13 Victor Banh
2017-10-19 5:59 Cao, Gang
2017-10-19 4:34 Victor Banh
2017-10-19 3:59 Cao, Gang
2017-10-19 3:05 Cao, Gang
2017-10-19 1:43 Victor Banh
2017-10-18 2:37 Victor Banh
2017-10-17 3:51 Cao, Gang
2017-10-16 21:30 Victor Banh
2017-10-06 21:40 Victor Banh
2017-10-06 21:32 Harris, James R
2017-10-05 20:59 Harris, James R
2017-10-05 18:26 Victor Banh
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.