Data corruption when using multiple devices with NVMEoF TCP

* Data corruption when using multiple devices with NVMEoF TCP
@ 2020-12-22 18:09 Hao Wang
  2020-12-22 19:29 ` Sagi Grimberg
  0 siblings, 1 reply; 23+ messages in thread
From: Hao Wang @ 2020-12-22 18:09 UTC (permalink / raw)
  To: Linux-nvme

I'm using kernel 5.2.9 with following related configs enabled:
CONFIG_NVME_CORE=y
CONFIG_BLK_DEV_NVME=y
CONFIG_NVME_MULTIPATH=y
CONFIG_NVME_FABRICS=m
# CONFIG_NVME_FC is not set
CONFIG_NVME_TCP=m
CONFIG_NVME_TARGET=m
CONFIG_NVME_TARGET_LOOP=m
# CONFIG_NVME_TARGET_FC is not set
CONFIG_NVME_TARGET_TCP=m
CONFIG_RTC_NVMEM=y
CONFIG_NVMEM=y
CONFIG_NVMEM_SYSFS=y

On target side, I exported 2 NVMe devices using tcp/ipv6:
[root@rtptest34337.prn2 ~/ext_nvme]# ll
/sys/kernel/config/nvmet/ports/1/subsystems/
total 0
lrwxrwxrwx 1 root root 0 Dec 19 02:08 nvmet-rtptest34337-1 ->
../../../../nvmet/subsystems/nvmet-rtptest34337-1
lrwxrwxrwx 1 root root 0 Dec 19 02:08 nvmet-rtptest34337-2 ->
../../../../nvmet/subsystems/nvmet-rtptest34337-2

On initiator side, I could successfully connect the 2 nvme devices,
nvme1n1 & nvme2n1;
[root@rtptest34206.prn2 /]# nvme list
Node             SN                   Model
Namespace          Usage                      Format           FW Rev
---------------- --------------------
---------------------------------------- ---------
-------------------------- ---------------- --------
/dev/nvme0n1     ***********     INTEL *******          1
256.06  GB / 256.06  GB    512   B +  0 B    PSF119D
/dev/nvme1n1     ***********     Linux                       1
900.19  GB / 900.19  GB      4 KiB +  0 B     5.2.9-0_
/dev/nvme2n1     ***********     Linux                       1
900.19  GB / 900.19  GB      4 KiB +  0 B     5.2.9-0_

Then for each of nvme1n1 & nvme2n1, I created a partition using fdisk;
type is "linux raid autodetect";
Next I created a RAID-0 volume using, created a filesystem on it, and
mounted itL
# mdadm --create /dev/md5 --level=0 --raid-devices=2 --chunk=128
/dev/nvme1n1p1 /dev/nvme2n1p1
# mkfs.xfs -f /dev/md5
# mkdir /flash
# mount -o rw,noatime,discard /dev/md5 /flash/

Now, when I copy a large directory into /flash/, a lot of files under
/flash/ are corrupted.
Specifically, that large directory has a lot of .gz files, and unzip
will fail on many of them;
also diff with original files does show they are different, although
the file size is exactly the same.

Also I found that if I don't create a RAID-0 array, instead just make
a filesystem on either /dev/nvme1n1p1 or /dev/nvme2n1p1, there is no
data corruption.

I'm wondering if there is a known issue, or I'm doing something not
really supported.
Thanks!

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 23+ messages in thread