linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] nvmet: introduce use_vfs ns-attr
@ 2019-10-23 20:17 Chaitanya Kulkarni
  2019-10-24  2:00 ` Keith Busch
  0 siblings, 1 reply; 19+ messages in thread
From: Chaitanya Kulkarni @ 2019-10-23 20:17 UTC (permalink / raw)
  To: linux-nvme; +Cc: MRuijter, hch, Chaitanya Kulkarni, sagi

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=y, Size: 7813 bytes --]

From: Mark Ruijter <MRuijter@onestopsystems.com>

With reference to the following issue reported on the mailing list :-
http://lists.infradead.org/pipermail/linux-nvme/2019-October/027604.html
This patch adds a new attrubute use_vfs so that any block device can be
used in the file backend.

We can see the follwoing performance improvement in the I/Os with
the setup described in the link when new attribute use_vfs=1 and
device_path configured as /dev/md0.

Performance numbers :-

1. With this patch using /dev/md0 as namespace backend where use_vfs=0:-
  write: IOPS=66.1k, BW=258MiB/s (271MB/s)(7750MiB/30002msec)
  write: IOPS=65.8k, BW=257MiB/s (269MB/s)(7709MiB/30002msec)
  write: IOPS=64.8k, BW=253MiB/s (266MB/s)(7599MiB/30002msec)

2. With this patch using /dev/md0 as namespace backend where use_vfs=1:-
  write: IOPS=153k, BW=598MiB/s (627MB/s)(17.5GiB/30001msec)
  write: IOPS=152k, BW=594MiB/s (623MB/s)(17.4GiB/30001msec)
  write: IOPS=151k, BW=589MiB/s (617MB/s)(17.2GiB/30002msec)

Signed-off-by: Mark Ruijter <MRuijter@onestopsystems.com>
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
Hi,

This work is originally done by Mark Ruijter
(MRuijter@onestopsystems.com), I've fixed couple of coding style issues,
tested, validated performance numbers with nvme-loop.

Setup Info md0 with 2 memory backed null_blk devices :-
# lsblk | grep null 
nullb1            252:1    0    2G  0 disk  
└─nullb1p1        259:1    0    2G  0 part  
nullb0            252:0    0    2G  0 disk  
└─nullb0p1        259:2    0    2G  0 part  
# mdadm -E /dev/nullb0
/dev/nullb0:
   MBR Magic : aa55
Partition[0] :      4192256 sectors at         2048 (type fd)
# mdadm -E /dev/nullb1
/dev/nullb1:
   MBR Magic : aa55
Partition[0] :      4192256 sectors at         2048 (type fd)
# mdadm --detail /dev/md0
/dev/md0:
           Version : 1.2
     Creation Time : Tue Oct 22 15:45:48 2019
        Raid Level : raid1
        Array Size : 2095104 (2046.00 MiB 2145.39 MB)
     Used Dev Size : 2095104 (2046.00 MiB 2145.39 MB)
      Raid Devices : 2
     Total Devices : 2
       Persistence : Superblock is persistent

       Update Time : Tue Oct 22 23:22:22 2019
             State : clean 
    Active Devices : 2
   Working Devices : 2
    Failed Devices : 0
     Spare Devices : 0

Consistency Policy : resync

              Name : cvenusqemu:0  (local to host cvenusqemu)
              UUID : 28141eb1:94d31044:e2692981:08ccd882
            Events : 17

    Number   Major   Minor   RaidDevice State
       0     259        2        0      active sync   /dev/nullb0p1
       1     259        1        1      active sync   /dev/nullb1p1

Performance numbers :-
1. With this patch using /dev/md0 as namespace backend where use_vfs=0:-
  write: IOPS=66.1k, BW=258MiB/s (271MB/s)(7750MiB/30002msec)
  write: IOPS=65.8k, BW=257MiB/s (269MB/s)(7709MiB/30002msec)
  write: IOPS=64.8k, BW=253MiB/s (266MB/s)(7599MiB/30002msec)

2. With this patch using /dev/md0 as namespace backend where use_vfs=1:-
  write: IOPS=153k, BW=598MiB/s (627MB/s)(17.5GiB/30001msec)
  write: IOPS=152k, BW=594MiB/s (623MB/s)(17.4GiB/30001msec)
  write: IOPS=151k, BW=589MiB/s (617MB/s)(17.2GiB/30002msec)

We can see the significant performance improvement when use_vfs=1.

Note:- I've not tested entire patch with all the corner cases.
Once I get a feedback I'll send out well tested version.

Regards,
-Chaitanya

---
 drivers/nvme/target/configfs.c    | 29 +++++++++++++++++++++++++++++
 drivers/nvme/target/core.c        |  1 +
 drivers/nvme/target/io-cmd-bdev.c |  5 +++++
 drivers/nvme/target/io-cmd-file.c | 31 +++++++++++++++++++++----------
 drivers/nvme/target/nvmet.h       |  1 +
 5 files changed, 57 insertions(+), 10 deletions(-)

diff --git a/drivers/nvme/target/configfs.c b/drivers/nvme/target/configfs.c
index 98613a45bd3b..184555c19c03 100644
--- a/drivers/nvme/target/configfs.c
+++ b/drivers/nvme/target/configfs.c
@@ -545,12 +545,41 @@ static ssize_t nvmet_ns_buffered_io_store(struct config_item *item,
 
 CONFIGFS_ATTR(nvmet_ns_, buffered_io);
 
+static ssize_t nvmet_ns_use_vfs_show(struct config_item *item, char *page)
+{
+	return sprintf(page, "%d\n", to_nvmet_ns(item)->use_vfs);
+}
+
+static ssize_t nvmet_ns_use_vfs_store(struct config_item *item,
+		const char *page, size_t count)
+{
+	struct nvmet_ns *ns = to_nvmet_ns(item);
+	bool val;
+
+	if (strtobool(page, &val))
+		return -EINVAL;
+
+	mutex_lock(&ns->subsys->lock);
+	if (ns->enabled) {
+		pr_err("disable ns before setting use_vfs value.\n");
+		mutex_unlock(&ns->subsys->lock);
+		return -EINVAL;
+	}
+
+	ns->use_vfs = val;
+	mutex_unlock(&ns->subsys->lock);
+	return count;
+}
+
+CONFIGFS_ATTR(nvmet_ns_, use_vfs);
+
 static struct configfs_attribute *nvmet_ns_attrs[] = {
 	&nvmet_ns_attr_device_path,
 	&nvmet_ns_attr_device_nguid,
 	&nvmet_ns_attr_device_uuid,
 	&nvmet_ns_attr_ana_grpid,
 	&nvmet_ns_attr_enable,
+	&nvmet_ns_attr_use_vfs,
 	&nvmet_ns_attr_buffered_io,
 #ifdef CONFIG_PCI_P2PDMA
 	&nvmet_ns_attr_p2pmem,
diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
index 6b39cfc6ade1..1d7c6310d5f0 100644
--- a/drivers/nvme/target/core.c
+++ b/drivers/nvme/target/core.c
@@ -653,6 +653,7 @@ struct nvmet_ns *nvmet_ns_alloc(struct nvmet_subsys *subsys, u32 nsid)
 
 	uuid_gen(&ns->uuid);
 	ns->buffered_io = false;
+	ns->use_vfs = false;
 
 	return ns;
 }
diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c
index f2618dc2ef3a..e0d8079de5c3 100644
--- a/drivers/nvme/target/io-cmd-bdev.c
+++ b/drivers/nvme/target/io-cmd-bdev.c
@@ -51,6 +51,11 @@ int nvmet_bdev_ns_enable(struct nvmet_ns *ns)
 {
 	int ret;
 
+	if (ns->use_vfs) {
+		pr_info("Force using the vfs layer\n");
+		return -ENOTBLK;
+	}
+
 	ns->bdev = blkdev_get_by_path(ns->device_path,
 			FMODE_READ | FMODE_WRITE, NULL);
 	if (IS_ERR(ns->bdev)) {
diff --git a/drivers/nvme/target/io-cmd-file.c b/drivers/nvme/target/io-cmd-file.c
index 05453f5d1448..336ffda3261b 100644
--- a/drivers/nvme/target/io-cmd-file.c
+++ b/drivers/nvme/target/io-cmd-file.c
@@ -30,6 +30,7 @@ void nvmet_file_ns_disable(struct nvmet_ns *ns)
 int nvmet_file_ns_enable(struct nvmet_ns *ns)
 {
 	int flags = O_RDWR | O_LARGEFILE;
+	struct block_device *bdev;
 	struct kstat stat;
 	int ret;
 
@@ -45,17 +46,27 @@ int nvmet_file_ns_enable(struct nvmet_ns *ns)
 
 	ret = vfs_getattr(&ns->file->f_path,
 			&stat, STATX_SIZE, AT_STATX_FORCE_SYNC);
-	if (ret)
-		goto err;
-
-	ns->size = stat.size;
-	/*
-	 * i_blkbits can be greater than the universally accepted upper bound,
-	 * so make sure we export a sane namespace lba_shift.
-	 */
-	ns->blksize_shift = min_t(u8,
-			file_inode(ns->file)->i_blkbits, 12);
+	if (ret) {
+		pr_err("failed to stat device file %s\n",
+			ns->device_path);
+	}
 
+	if (stat.size == 0 && ns->use_vfs) {
+		bdev = blkdev_get_by_path(ns->device_path,
+					  FMODE_READ | FMODE_WRITE, NULL);
+		if (IS_ERR(bdev))
+			goto err;
+		ns->size = i_size_read(bdev->bd_inode);
+		ns->blksize_shift = blksize_bits(bdev_logical_block_size(bdev));
+	} else {
+		/*
+		 * i_blkbits can be greater than the universally accepted upper
+		 * bound, so make sure we export a sane namespace lba_shift.
+		 */
+		ns->size = stat.size;
+		ns->blksize_shift = min_t(u8,
+				file_inode(ns->file)->i_blkbits, 12);
+	}
 	ns->bvec_cache = kmem_cache_create("nvmet-bvec",
 			NVMET_MAX_MPOOL_BVEC * sizeof(struct bio_vec),
 			0, SLAB_HWCACHE_ALIGN, NULL);
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index c51f8dd01dc4..20aa83077765 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -63,6 +63,7 @@ struct nvmet_ns {
 	u32			anagrpid;
 
 	bool			buffered_io;
+	bool			use_vfs;
 	bool			enabled;
 	struct nvmet_subsys	*subsys;
 	const char		*device_path;
-- 
2.22.1



[-- Attachment #2: Type: text/plain, Size: 158 bytes --]

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH] nvmet: introduce use_vfs ns-attr
  2019-10-23 20:17 [PATCH] nvmet: introduce use_vfs ns-attr Chaitanya Kulkarni
@ 2019-10-24  2:00 ` Keith Busch
  2019-10-24 11:30   ` Mark Ruijter
  2019-10-25  3:29   ` Chaitanya Kulkarni
  0 siblings, 2 replies; 19+ messages in thread
From: Keith Busch @ 2019-10-24  2:00 UTC (permalink / raw)
  To: Chaitanya Kulkarni; +Cc: MRuijter, hch, linux-nvme, sagi

On Wed, Oct 23, 2019 at 01:17:15PM -0700, Chaitanya Kulkarni wrote:
> @@ -45,17 +46,27 @@ int nvmet_file_ns_enable(struct nvmet_ns *ns)
>  
>  	ret = vfs_getattr(&ns->file->f_path,
>  			&stat, STATX_SIZE, AT_STATX_FORCE_SYNC);
> -	if (ret)
> -		goto err;
> -
> -	ns->size = stat.size;
> -	/*
> -	 * i_blkbits can be greater than the universally accepted upper bound,
> -	 * so make sure we export a sane namespace lba_shift.
> -	 */
> -	ns->blksize_shift = min_t(u8,
> -			file_inode(ns->file)->i_blkbits, 12);
> +	if (ret) {
> +		pr_err("failed to stat device file %s\n",
> +			ns->device_path);


Why remove the 'goto err' from this condition? The code that proceeds
may be using an uninitialized 'stat' without it.


> +	}
>  
> +	if (stat.size == 0 && ns->use_vfs) {
> +		bdev = blkdev_get_by_path(ns->device_path,
> +					  FMODE_READ | FMODE_WRITE, NULL);
> +		if (IS_ERR(bdev))
> +			goto err;
> +		ns->size = i_size_read(bdev->bd_inode);
> +		ns->blksize_shift = blksize_bits(bdev_logical_block_size(bdev));
> +	} else {
> +		/*
> +		 * i_blkbits can be greater than the universally accepted upper
> +		 * bound, so make sure we export a sane namespace lba_shift.
> +		 */
> +		ns->size = stat.size;
> +		ns->blksize_shift = min_t(u8,
> +				file_inode(ns->file)->i_blkbits, 12);
> +	}

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] nvmet: introduce use_vfs ns-attr
  2019-10-24  2:00 ` Keith Busch
@ 2019-10-24 11:30   ` Mark Ruijter
  2019-10-25  4:05     ` Keith Busch
  2019-10-25  3:29   ` Chaitanya Kulkarni
  1 sibling, 1 reply; 19+ messages in thread
From: Mark Ruijter @ 2019-10-24 11:30 UTC (permalink / raw)
  To: Keith Busch, Chaitanya Kulkarni; +Cc: hch, linux-nvme, sagi

Hi Keith,

You are right, that check and the goto should be included.

Note that I wrote this patch to prove that a performance problem exists when using raid1.
Either the md raid1 driver or the io-cmd-bdev.c code has issues.
When you add an additional layer like the VFS the performance should typically drop with 5~10%.
However in this case the performance increases even though the nvme target uses direct-io and the random writes do not get merged by the VFS.

Thanks,

Mark Ruijter
 
Op 24-10-19 04:00 heeft Keith Busch <kbusch@kernel.org> geschreven:

    On Wed, Oct 23, 2019 at 01:17:15PM -0700, Chaitanya Kulkarni wrote:
    > @@ -45,17 +46,27 @@ int nvmet_file_ns_enable(struct nvmet_ns *ns)
    >  
    >  	ret = vfs_getattr(&ns->file->f_path,
    >  			&stat, STATX_SIZE, AT_STATX_FORCE_SYNC);
    > -	if (ret)
    > -		goto err;
    > -
    > -	ns->size = stat.size;
    > -	/*
    > -	 * i_blkbits can be greater than the universally accepted upper bound,
    > -	 * so make sure we export a sane namespace lba_shift.
    > -	 */
    > -	ns->blksize_shift = min_t(u8,
    > -			file_inode(ns->file)->i_blkbits, 12);
    > +	if (ret) {
    > +		pr_err("failed to stat device file %s\n",
    > +			ns->device_path);
    
    
    Why remove the 'goto err' from this condition? The code that proceeds
    may be using an uninitialized 'stat' without it.
    
    
    > +	}
    >  
    > +	if (stat.size == 0 && ns->use_vfs) {
    > +		bdev = blkdev_get_by_path(ns->device_path,
    > +					  FMODE_READ | FMODE_WRITE, NULL);
    > +		if (IS_ERR(bdev))
    > +			goto err;
    > +		ns->size = i_size_read(bdev->bd_inode);
    > +		ns->blksize_shift = blksize_bits(bdev_logical_block_size(bdev));
    > +	} else {
    > +		/*
    > +		 * i_blkbits can be greater than the universally accepted upper
    > +		 * bound, so make sure we export a sane namespace lba_shift.
    > +		 */
    > +		ns->size = stat.size;
    > +		ns->blksize_shift = min_t(u8,
    > +				file_inode(ns->file)->i_blkbits, 12);
    > +	}
    

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] nvmet: introduce use_vfs ns-attr
  2019-10-24  2:00 ` Keith Busch
  2019-10-24 11:30   ` Mark Ruijter
@ 2019-10-25  3:29   ` Chaitanya Kulkarni
  1 sibling, 0 replies; 19+ messages in thread
From: Chaitanya Kulkarni @ 2019-10-25  3:29 UTC (permalink / raw)
  To: Keith Busch; +Cc: MRuijter, hch, linux-nvme, sagi

On 10/23/2019 07:00 PM, Keith Busch wrote:
> On Wed, Oct 23, 2019 at 01:17:15PM -0700, Chaitanya Kulkarni wrote:
>> >@@ -45,17 +46,27 @@ int nvmet_file_ns_enable(struct nvmet_ns *ns)
>> >
>> >  	ret = vfs_getattr(&ns->file->f_path,
>> >  			&stat, STATX_SIZE, AT_STATX_FORCE_SYNC);
>> >-	if (ret)
>> >-		goto err;
>> >-
>> >-	ns->size = stat.size;
>> >-	/*
>> >-	 * i_blkbits can be greater than the universally accepted upper bound,
>> >-	 * so make sure we export a sane namespace lba_shift.
>> >-	 */
>> >-	ns->blksize_shift = min_t(u8,
>> >-			file_inode(ns->file)->i_blkbits, 12);
>> >+	if (ret) {
>> >+		pr_err("failed to stat device file %s\n",
>> >+			ns->device_path);
>
> Why remove the 'goto err' from this condition? The code that proceeds
> may be using an uninitialized 'stat' without it.
>
>
Yes, will fix it in next version. Still waiting for some comments
and see if this makes sense.


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] nvmet: introduce use_vfs ns-attr
  2019-10-24 11:30   ` Mark Ruijter
@ 2019-10-25  4:05     ` Keith Busch
  2019-10-25  4:26       ` Keith Busch
  0 siblings, 1 reply; 19+ messages in thread
From: Keith Busch @ 2019-10-25  4:05 UTC (permalink / raw)
  To: Mark Ruijter; +Cc: linux-nvme, hch, Chaitanya Kulkarni, sagi

On Thu, Oct 24, 2019 at 11:30:18AM +0000, Mark Ruijter wrote:
> Note that I wrote this patch to prove that a performance problem exists when using raid1.
> Either the md raid1 driver or the io-cmd-bdev.c code has issues.
> When you add an additional layer like the VFS the performance should typically drop with 5~10%.
> However in this case the performance increases even though the nvme target uses direct-io and the random writes do not get merged by the VFS.

Are we really using direct io when nvme target is going through vfs,
though? That should happen if we've set IOCB_DIRECT in the ki_flags,
but I don't see that happening, and if that's right, then the difference
sounds like it's related to buffered IO.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] nvmet: introduce use_vfs ns-attr
  2019-10-25  4:05     ` Keith Busch
@ 2019-10-25  4:26       ` Keith Busch
  2019-10-25  8:44         ` Mark Ruijter
  0 siblings, 1 reply; 19+ messages in thread
From: Keith Busch @ 2019-10-25  4:26 UTC (permalink / raw)
  To: Mark Ruijter; +Cc: Chaitanya Kulkarni, hch, linux-nvme, sagi

On Fri, Oct 25, 2019 at 01:05:40PM +0900, Keith Busch wrote:
> On Thu, Oct 24, 2019 at 11:30:18AM +0000, Mark Ruijter wrote:
> > Note that I wrote this patch to prove that a performance problem exists when using raid1.
> > Either the md raid1 driver or the io-cmd-bdev.c code has issues.
> > When you add an additional layer like the VFS the performance should typically drop with 5~10%.
> > However in this case the performance increases even though the nvme target uses direct-io and the random writes do not get merged by the VFS.
> 
> Are we really using direct io when nvme target is going through vfs,
> though? That should happen if we've set IOCB_DIRECT in the ki_flags,
> but I don't see that happening, and if that's right, then the difference
> sounds like it's related to buffered IO.

Err, I see we actually default to direct when we open the file. You'd
have to change that through configfs to use buffered, which I assume
you're not doing. My mistake.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] nvmet: introduce use_vfs ns-attr
  2019-10-25  4:26       ` Keith Busch
@ 2019-10-25  8:44         ` Mark Ruijter
  2019-10-26  1:06           ` Keith Busch
  2019-10-27 15:03           ` hch
  0 siblings, 2 replies; 19+ messages in thread
From: Mark Ruijter @ 2019-10-25  8:44 UTC (permalink / raw)
  To: Keith Busch; +Cc: Chaitanya Kulkarni, hch, linux-nvme, sagi


Hi Keith,

I am indeed not using buffered io.
Using the VFS increases my 4k random write performance from 200K to 650K when using raid1. 
So the difference is huge and becomes more significant when the underlying drives or raid0 can handle more iops.

Over the next few days I am going to provide a number of patches.

1. Currently a controller id collision can occur when using a clustered HA setup. See this message:
>>> [1122789.054677] nvme nvme1: Duplicate cntlid 4 with nvme0, rejecting.

The controller ID is currently hard wired.

       ret = ida_simple_get(&cntlid_ida,
                             NVME_CNTLID_MIN, NVME_CNTLID_MAX,
                             GFP_KERNEL);

So two nodes exporting the exact same volume using the same port configuration can easily come up with the same controller id.
I would like to propose to make it configurable, but with the current logic setting a default.
SCST for example allows manual target id selection for this reason.

2. The Model of the drives has been hard wired to Linux. As I see it this should be configurable with 'Linux' as default value.
I'll provide code that makes that work.

3. A NVMEoF connected disk on the initiator seems to queue forever when the target dies.
It would be nice if we had the ability to select either 'queue foreever' or 'failfast'.

I hope this makes sense,

Mark Ruijter

Op 25-10-19 06:27 heeft Keith Busch <kbusch@kernel.org> geschreven:

    On Fri, Oct 25, 2019 at 01:05:40PM +0900, Keith Busch wrote:
    > On Thu, Oct 24, 2019 at 11:30:18AM +0000, Mark Ruijter wrote:
    > > Note that I wrote this patch to prove that a performance problem exists when using raid1.
    > > Either the md raid1 driver or the io-cmd-bdev.c code has issues.
    > > When you add an additional layer like the VFS the performance should typically drop with 5~10%.
    > > However in this case the performance increases even though the nvme target uses direct-io and the random writes do not get merged by the VFS.
    > 
    > Are we really using direct io when nvme target is going through vfs,
    > though? That should happen if we've set IOCB_DIRECT in the ki_flags,
    > but I don't see that happening, and if that's right, then the difference
    > sounds like it's related to buffered IO.
    
    Err, I see we actually default to direct when we open the file. You'd
    have to change that through configfs to use buffered, which I assume
    you're not doing. My mistake.
    

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] nvmet: introduce use_vfs ns-attr
  2019-10-25  8:44         ` Mark Ruijter
@ 2019-10-26  1:06           ` Keith Busch
  2019-10-27 15:03           ` hch
  1 sibling, 0 replies; 19+ messages in thread
From: Keith Busch @ 2019-10-26  1:06 UTC (permalink / raw)
  To: Mark Ruijter; +Cc: Chaitanya Kulkarni, hch, linux-nvme, sagi

On Fri, Oct 25, 2019 at 08:44:00AM +0000, Mark Ruijter wrote:
> 
> Hi Keith,
> 
> I am indeed not using buffered io.
> Using the VFS increases my 4k random write performance from 200K to 650K when using raid1. 
> So the difference is huge and becomes more significant when the underlying drives or raid0 can handle more iops.

You're observing a difference only when using RAID? I just tried some
simple tests on a machine comparing ramdisk files and loop block devices
as the backing storage, and there appears to be a similar performance
difference there as well.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] nvmet: introduce use_vfs ns-attr
  2019-10-25  8:44         ` Mark Ruijter
  2019-10-26  1:06           ` Keith Busch
@ 2019-10-27 15:03           ` hch
  2019-10-27 16:06             ` Mark Ruijter
  2019-10-28  0:55             ` Keith Busch
  1 sibling, 2 replies; 19+ messages in thread
From: hch @ 2019-10-27 15:03 UTC (permalink / raw)
  To: Mark Ruijter
  Cc: Hannes Reinecke, sagi, Chaitanya Kulkarni, linux-nvme, Keith Busch, hch

On Fri, Oct 25, 2019 at 08:44:00AM +0000, Mark Ruijter wrote:
> 
> Hi Keith,
> 
> I am indeed not using buffered io.
> Using the VFS increases my 4k random write performance from 200K to 650K when using raid1. 
> So the difference is huge and becomes more significant when the underlying drives or raid0 can handle more iops.

Can you try the patch below to use block layer plugging in nvmet?  That
should be the only major difference in how we do I/O.

> 1. Currently a controller id collision can occur when using a clustered HA setup. See this message:
> >>> [1122789.054677] nvme nvme1: Duplicate cntlid 4 with nvme0, rejecting.
> 
> The controller ID is currently hard wired.
> 
>        ret = ida_simple_get(&cntlid_ida,
>                              NVME_CNTLID_MIN, NVME_CNTLID_MAX,
>                              GFP_KERNEL);
> 
> So two nodes exporting the exact same volume using the same port configuration can easily come up with the same controller id.
> I would like to propose to make it configurable, but with the current logic setting a default.
> SCST for example allows manual target id selection for this reason.

We can allow some control there using a new configfs file.  But what
would be even better is an actually integrated cluster manager, which
we'd need to support features such as persistent reservations.

> 2. The Model of the drives has been hard wired to Linux. As I see it this should be configurable with 'Linux' as default value.
> I'll provide code that makes that work.

Yes, please send a patch.

> 3. A NVMEoF connected disk on the initiator seems to queue forever when the target dies.
> It would be nice if we had the ability to select either 'queue foreever' or 'failfast'.

Making this configurable has been a long time todo list item.  At some
point in the past Hannes (added to Cc) signed up for it, but it seems
to have dropped off his priority list.

---
From 87ab0d6f9e092cde04775452131f90e8b4c46a66 Mon Sep 17 00:00:00 2001
From: Christoph Hellwig <hch@lst.de>
Date: Sun, 27 Oct 2019 15:59:08 +0100
Subject: nvmet: use block layer plugging in nvmet_bdev_execute_rw

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/target/io-cmd-bdev.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c
index 04a9cd2a2604..ed1a8d0ab30e 100644
--- a/drivers/nvme/target/io-cmd-bdev.c
+++ b/drivers/nvme/target/io-cmd-bdev.c
@@ -147,6 +147,7 @@ static void nvmet_bdev_execute_rw(struct nvmet_req *req)
 	int sg_cnt = req->sg_cnt;
 	struct bio *bio;
 	struct scatterlist *sg;
+	struct blk_plug plug;
 	sector_t sector;
 	int op, op_flags = 0, i;
 
@@ -185,6 +186,7 @@ static void nvmet_bdev_execute_rw(struct nvmet_req *req)
 	bio->bi_end_io = nvmet_bio_done;
 	bio_set_op_attrs(bio, op, op_flags);
 
+	blk_start_plug(&plug);
 	for_each_sg(req->sg, sg, req->sg_cnt, i) {
 		while (bio_add_page(bio, sg_page(sg), sg->length, sg->offset)
 				!= sg->length) {
@@ -202,6 +204,7 @@ static void nvmet_bdev_execute_rw(struct nvmet_req *req)
 		sector += sg->length >> 9;
 		sg_cnt--;
 	}
+	blk_finish_plug(&plug);
 
 	submit_bio(bio);
 }
-- 
2.20.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH] nvmet: introduce use_vfs ns-attr
  2019-10-27 15:03           ` hch
@ 2019-10-27 16:06             ` Mark Ruijter
  2019-10-28  0:55             ` Keith Busch
  1 sibling, 0 replies; 19+ messages in thread
From: Mark Ruijter @ 2019-10-27 16:06 UTC (permalink / raw)
  To: hch; +Cc: Keith Busch, Hannes Reinecke, Chaitanya Kulkarni, linux-nvme, sagi


I will install the patch tomorrow on the systems that I used before and run some tests so that I can compare the performance numbers.

Mark

> Op 27 okt. 2019 om 16:03 heeft "hch@lst.de" <hch@lst.de> het volgende geschreven:
> 
> On Fri, Oct 25, 2019 at 08:44:00AM +0000, Mark Ruijter wrote:
>> 
>> Hi Keith,
>> 
>> I am indeed not using buffered io.
>> Using the VFS increases my 4k random write performance from 200K to 650K when using raid1. 
>> So the difference is huge and becomes more significant when the underlying drives or raid0 can handle more iops.
> 
> Can you try the patch below to use block layer plugging in nvmet?  That
> should be the only major difference in how we do I/O.
> 
>> 1. Currently a controller id collision can occur when using a clustered HA setup. See this message:
>>>>> [1122789.054677] nvme nvme1: Duplicate cntlid 4 with nvme0, rejecting.
>> 
>> The controller ID is currently hard wired.
>> 
>>       ret = ida_simple_get(&cntlid_ida,
>>                             NVME_CNTLID_MIN, NVME_CNTLID_MAX,
>>                             GFP_KERNEL);
>> 
>> So two nodes exporting the exact same volume using the same port configuration can easily come up with the same controller id.
>> I would like to propose to make it configurable, but with the current logic setting a default.
>> SCST for example allows manual target id selection for this reason.
> 
> We can allow some control there using a new configfs file.  But what
> would be even better is an actually integrated cluster manager, which
> we'd need to support features such as persistent reservations.
> 
>> 2. The Model of the drives has been hard wired to Linux. As I see it this should be configurable with 'Linux' as default value.
>> I'll provide code that makes that work.
> 
> Yes, please send a patch.
> 
>> 3. A NVMEoF connected disk on the initiator seems to queue forever when the target dies.
>> It would be nice if we had the ability to select either 'queue foreever' or 'failfast'.
> 
> Making this configurable has been a long time todo list item.  At some
> point in the past Hannes (added to Cc) signed up for it, but it seems
> to have dropped off his priority list.
> 
> ---
> From 87ab0d6f9e092cde04775452131f90e8b4c46a66 Mon Sep 17 00:00:00 2001
> From: Christoph Hellwig <hch@lst.de>
> Date: Sun, 27 Oct 2019 15:59:08 +0100
> Subject: nvmet: use block layer plugging in nvmet_bdev_execute_rw
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> drivers/nvme/target/io-cmd-bdev.c | 3 +++
> 1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c
> index 04a9cd2a2604..ed1a8d0ab30e 100644
> --- a/drivers/nvme/target/io-cmd-bdev.c
> +++ b/drivers/nvme/target/io-cmd-bdev.c
> @@ -147,6 +147,7 @@ static void nvmet_bdev_execute_rw(struct nvmet_req *req)
>    int sg_cnt = req->sg_cnt;
>    struct bio *bio;
>    struct scatterlist *sg;
> +    struct blk_plug plug;
>    sector_t sector;
>    int op, op_flags = 0, i;
> 
> @@ -185,6 +186,7 @@ static void nvmet_bdev_execute_rw(struct nvmet_req *req)
>    bio->bi_end_io = nvmet_bio_done;
>    bio_set_op_attrs(bio, op, op_flags);
> 
> +    blk_start_plug(&plug);
>    for_each_sg(req->sg, sg, req->sg_cnt, i) {
>        while (bio_add_page(bio, sg_page(sg), sg->length, sg->offset)
>                != sg->length) {
> @@ -202,6 +204,7 @@ static void nvmet_bdev_execute_rw(struct nvmet_req *req)
>        sector += sg->length >> 9;
>        sg_cnt--;
>    }
> +    blk_finish_plug(&plug);
> 
>    submit_bio(bio);
> }
> -- 
> 2.20.1
> 
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] nvmet: introduce use_vfs ns-attr
  2019-10-27 15:03           ` hch
  2019-10-27 16:06             ` Mark Ruijter
@ 2019-10-28  0:55             ` Keith Busch
  2019-10-28  7:26               ` Chaitanya Kulkarni
  2019-10-28  7:32               ` Chaitanya Kulkarni
  1 sibling, 2 replies; 19+ messages in thread
From: Keith Busch @ 2019-10-28  0:55 UTC (permalink / raw)
  To: hch; +Cc: Mark Ruijter, Hannes Reinecke, Chaitanya Kulkarni, linux-nvme, sagi

On Sun, Oct 27, 2019 at 04:03:30PM +0100, hch@lst.de wrote:
> ---
>  drivers/nvme/target/io-cmd-bdev.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c
> index 04a9cd2a2604..ed1a8d0ab30e 100644
> --- a/drivers/nvme/target/io-cmd-bdev.c
> +++ b/drivers/nvme/target/io-cmd-bdev.c
> @@ -147,6 +147,7 @@ static void nvmet_bdev_execute_rw(struct nvmet_req *req)
>  	int sg_cnt = req->sg_cnt;
>  	struct bio *bio;
>  	struct scatterlist *sg;
> +	struct blk_plug plug;
>  	sector_t sector;
>  	int op, op_flags = 0, i;
>  
> @@ -185,6 +186,7 @@ static void nvmet_bdev_execute_rw(struct nvmet_req *req)
>  	bio->bi_end_io = nvmet_bio_done;
>  	bio_set_op_attrs(bio, op, op_flags);
>  
> +	blk_start_plug(&plug);
>  	for_each_sg(req->sg, sg, req->sg_cnt, i) {
>  		while (bio_add_page(bio, sg_page(sg), sg->length, sg->offset)
>  				!= sg->length) {
> @@ -202,6 +204,7 @@ static void nvmet_bdev_execute_rw(struct nvmet_req *req)
>  		sector += sg->length >> 9;
>  		sg_cnt--;
>  	}
> +	blk_finish_plug(&plug);
>  
>  	submit_bio(bio);
>  }

The blk_finish_plug() should be after the last submit_bio().

I looked at plugging too since that is a difference between the
submit_bio and write_iter paths, but I thought we needed to plug the
entire IO queue drain. Otherwise this random 4k write workload should
plug a single request, which doesn't sound like it would change anything.

Using the block plug for the entire IO queue drain requires quite a bit
larger change, though. Also, I saw a similar performance difference with
a ramdisk back-end, which doesn't use plugs.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] nvmet: introduce use_vfs ns-attr
  2019-10-28  0:55             ` Keith Busch
@ 2019-10-28  7:26               ` Chaitanya Kulkarni
  2019-10-28  7:32               ` Chaitanya Kulkarni
  1 sibling, 0 replies; 19+ messages in thread
From: Chaitanya Kulkarni @ 2019-10-28  7:26 UTC (permalink / raw)
  To: Keith Busch, hch; +Cc: Mark Ruijter, Hannes Reinecke, sagi, linux-nvme

I've collected the performance numbers with this patch and without
following patch:-

1. With Plugging patch:-
   write: IOPS=43.6k, BW=170MiB/s (179MB/s)(5112MiB/30002msec)
   write: IOPS=42.8k, BW=167MiB/s (175MB/s)(5014MiB/30002msec)

2. Without this patch :-
   write: IOPS=41.5k, BW=162MiB/s (170MB/s)(4861MiB/30003msec)
   write: IOPS=41.1k, BW=160MiB/s (168MB/s)(4813MiB/30002msec)
   cpu          : usr=0.49%, sys=3.66%, ctx=1244502, majf=0, minf=559
   cpu          : usr=0.53%, sys=3.63%, ctx=1232208, majf=0, minf=581
   slat (usec): min=8, max=437, avg=15.63, stdev= 9.92
   slat (usec): min=8, max=389, avg=15.77, stdev=10.00
   clat (usec): min=56, max=1472, avg=754.31, stdev=172.63
   clat (usec): min=55, max=2405, avg=761.82, stdev=153.19

3. With use_vfs patch where use_vfs=1:-
   write: IOPS=114k, BW=445MiB/s (466MB/s)(13.0GiB/30007msec)
   write: IOPS=114k, BW=445MiB/s (466MB/s)(13.0GiB/30024msec)
   cpu          : usr=1.31%, sys=8.67%, ctx=3415138, majf=0, minf=527
   cpu          : usr=1.28%, sys=8.70%, ctx=3418737, majf=0, minf=570
   slat (usec): min=8, max=6450, avg=13.68, stdev= 8.35
   slat (usec): min=8, max=22847, avg=13.65, stdev=12.77
   clat (usec): min=62, max=6633, avg=265.98, stdev=124.55
   clat (usec): min=69, max=1900, avg=265.70, stdev=125.61

 From above data it shows that there is a big difference in clat fio
numbers in #2 and #3 (#1 is close to #2 so didn't report it,
where CPU, slat is approximately same.

Regards,
Chaitanya

On 10/27/19 5:55 PM, Keith Busch wrote:
> On Sun, Oct 27, 2019 at 04:03:30PM +0100, hch@lst.de wrote:
>> ---
>>   drivers/nvme/target/io-cmd-bdev.c | 3 +++
>>   1 file changed, 3 insertions(+)
>>
>> diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c
>> index 04a9cd2a2604..ed1a8d0ab30e 100644
>> --- a/drivers/nvme/target/io-cmd-bdev.c
>> +++ b/drivers/nvme/target/io-cmd-bdev.c
>> @@ -147,6 +147,7 @@ static void nvmet_bdev_execute_rw(struct nvmet_req *req)
>>   	int sg_cnt = req->sg_cnt;
>>   	struct bio *bio;
>>   	struct scatterlist *sg;
>> +	struct blk_plug plug;
>>   	sector_t sector;
>>   	int op, op_flags = 0, i;
>>   
>> @@ -185,6 +186,7 @@ static void nvmet_bdev_execute_rw(struct nvmet_req *req)
>>   	bio->bi_end_io = nvmet_bio_done;
>>   	bio_set_op_attrs(bio, op, op_flags);
>>   
>> +	blk_start_plug(&plug);
>>   	for_each_sg(req->sg, sg, req->sg_cnt, i) {
>>   		while (bio_add_page(bio, sg_page(sg), sg->length, sg->offset)
>>   				!= sg->length) {
>> @@ -202,6 +204,7 @@ static void nvmet_bdev_execute_rw(struct nvmet_req *req)
>>   		sector += sg->length >> 9;
>>   		sg_cnt--;
>>   	}
>> +	blk_finish_plug(&plug);
>>   
>>   	submit_bio(bio);
>>   }
> 
> The blk_finish_plug() should be after the last submit_bio().
> 
> I looked at plugging too since that is a difference between the
> submit_bio and write_iter paths, but I thought we needed to plug the
> entire IO queue drain. Otherwise this random 4k write workload should
> plug a single request, which doesn't sound like it would change anything.
> 
> Using the block plug for the entire IO queue drain requires quite a bit
> larger change, though. Also, I saw a similar performance difference with
> a ramdisk back-end, which doesn't use plugs.
> 


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] nvmet: introduce use_vfs ns-attr
  2019-10-28  0:55             ` Keith Busch
  2019-10-28  7:26               ` Chaitanya Kulkarni
@ 2019-10-28  7:32               ` Chaitanya Kulkarni
  2019-10-28  7:35                 ` hch
  2019-10-28  8:01                 ` Keith Busch
  1 sibling, 2 replies; 19+ messages in thread
From: Chaitanya Kulkarni @ 2019-10-28  7:32 UTC (permalink / raw)
  To: Keith Busch, hch; +Cc: Mark Ruijter, Hannes Reinecke, sagi, linux-nvme

Just did a quick test with following patch on the top of
plug patch :-

[root@ioprio nvme]# git diff
diff --git a/drivers/nvme/target/io-cmd-bdev.c 
b/drivers/nvme/target/io-cmd-bdev.c
index ed1a8d0..07e4f8c 100644
--- a/drivers/nvme/target/io-cmd-bdev.c
+++ b/drivers/nvme/target/io-cmd-bdev.c
@@ -204,9 +204,9 @@ static void nvmet_bdev_execute_rw(struct nvmet_req *req)
                 sector += sg->length >> 9;
                 sg_cnt--;
         }
-       blk_finish_plug(&plug);

         submit_bio(bio);
+       blk_finish_plug(&plug);
  }

  static void nvmet_bdev_execute_flush(struct nvmet_req *req)

   write: IOPS=123k, BW=479MiB/s (502MB/s)(14.0GiB/30009msec)
   write: IOPS=123k, BW=480MiB/s (504MB/s)(14.1GiB/30002msec)
     slat (usec): min=8, max=8778, avg=13.29, stdev= 5.70
     slat (usec): min=8, max=315, avg=13.28, stdev= 3.38
     clat (usec): min=44, max=9790, avg=246.21, stdev=167.10
     clat (usec): min=27, max=10883, avg=245.59, stdev=164.19


Still need to look into the code to make sure above change make sense.

On 10/27/19 5:55 PM, Keith Busch wrote:
> On Sun, Oct 27, 2019 at 04:03:30PM +0100, hch@lst.de wrote:
>> ---
>>   drivers/nvme/target/io-cmd-bdev.c | 3 +++
>>   1 file changed, 3 insertions(+)
>>
>> diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c
>> index 04a9cd2a2604..ed1a8d0ab30e 100644
>> --- a/drivers/nvme/target/io-cmd-bdev.c
>> +++ b/drivers/nvme/target/io-cmd-bdev.c
>> @@ -147,6 +147,7 @@ static void nvmet_bdev_execute_rw(struct nvmet_req *req)
>>   	int sg_cnt = req->sg_cnt;
>>   	struct bio *bio;
>>   	struct scatterlist *sg;
>> +	struct blk_plug plug;
>>   	sector_t sector;
>>   	int op, op_flags = 0, i;
>>   
>> @@ -185,6 +186,7 @@ static void nvmet_bdev_execute_rw(struct nvmet_req *req)
>>   	bio->bi_end_io = nvmet_bio_done;
>>   	bio_set_op_attrs(bio, op, op_flags);
>>   
>> +	blk_start_plug(&plug);
>>   	for_each_sg(req->sg, sg, req->sg_cnt, i) {
>>   		while (bio_add_page(bio, sg_page(sg), sg->length, sg->offset)
>>   				!= sg->length) {
>> @@ -202,6 +204,7 @@ static void nvmet_bdev_execute_rw(struct nvmet_req *req)
>>   		sector += sg->length >> 9;
>>   		sg_cnt--;
>>   	}
>> +	blk_finish_plug(&plug);
>>   
>>   	submit_bio(bio);
>>   }
> 
> The blk_finish_plug() should be after the last submit_bio().
> 
> I looked at plugging too since that is a difference between the
> submit_bio and write_iter paths, but I thought we needed to plug the
> entire IO queue drain. Otherwise this random 4k write workload should
> plug a single request, which doesn't sound like it would change anything.
> 
> Using the block plug for the entire IO queue drain requires quite a bit
> larger change, though. Also, I saw a similar performance difference with
> a ramdisk back-end, which doesn't use plugs.
> 


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH] nvmet: introduce use_vfs ns-attr
  2019-10-28  7:32               ` Chaitanya Kulkarni
@ 2019-10-28  7:35                 ` hch
  2019-10-28  7:38                   ` Chaitanya Kulkarni
  2019-10-28  8:01                 ` Keith Busch
  1 sibling, 1 reply; 19+ messages in thread
From: hch @ 2019-10-28  7:35 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: Hannes Reinecke, sagi, linux-nvme, Mark Ruijter, Keith Busch, hch

On Mon, Oct 28, 2019 at 07:32:45AM +0000, Chaitanya Kulkarni wrote:
> Just did a quick test with following patch on the top of
> plug patch :-

Yes, that change makes total sense - in fact Keith also mentioned I
placed the finish_plug incorrectly.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] nvmet: introduce use_vfs ns-attr
  2019-10-28  7:35                 ` hch
@ 2019-10-28  7:38                   ` Chaitanya Kulkarni
  2019-10-28  7:43                     ` hch
  0 siblings, 1 reply; 19+ messages in thread
From: Chaitanya Kulkarni @ 2019-10-28  7:38 UTC (permalink / raw)
  To: hch; +Cc: Keith Busch, Mark Ruijter, Hannes Reinecke, sagi, linux-nvme

On 10/28/19 12:35 AM, hch@lst.de wrote:
> On Mon, Oct 28, 2019 at 07:32:45AM +0000, Chaitanya Kulkarni wrote:
>> Just did a quick test with following patch on the top of
>> plug patch :-
> 
> Yes, that change makes total sense - in fact Keith also mentioned I
> placed the finish_plug incorrectly.
> 

Do you want me to send a new plug patch with updated patch
description and performance numbers keeping you original
author ?

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] nvmet: introduce use_vfs ns-attr
  2019-10-28  7:38                   ` Chaitanya Kulkarni
@ 2019-10-28  7:43                     ` hch
  2019-10-28  8:04                       ` Chaitanya Kulkarni
  0 siblings, 1 reply; 19+ messages in thread
From: hch @ 2019-10-28  7:43 UTC (permalink / raw)
  To: Chaitanya Kulkarni
  Cc: Hannes Reinecke, sagi, linux-nvme, Mark Ruijter, Keith Busch, hch

On Mon, Oct 28, 2019 at 07:38:05AM +0000, Chaitanya Kulkarni wrote:
> On 10/28/19 12:35 AM, hch@lst.de wrote:
> > On Mon, Oct 28, 2019 at 07:32:45AM +0000, Chaitanya Kulkarni wrote:
> >> Just did a quick test with following patch on the top of
> >> plug patch :-
> > 
> > Yes, that change makes total sense - in fact Keith also mentioned I
> > placed the finish_plug incorrectly.
> > 
> 
> Do you want me to send a new plug patch with updated patch
> description and performance numbers keeping you original
> author ?

Fine me me.  Or take the authorship for you, after all you changed 2
line a 3 line patch ;-)

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] nvmet: introduce use_vfs ns-attr
  2019-10-28  7:32               ` Chaitanya Kulkarni
  2019-10-28  7:35                 ` hch
@ 2019-10-28  8:01                 ` Keith Busch
  2019-10-28  8:41                   ` Mark Ruijter
  1 sibling, 1 reply; 19+ messages in thread
From: Keith Busch @ 2019-10-28  8:01 UTC (permalink / raw)
  To: Chaitanya Kulkarni; +Cc: Mark Ruijter, Hannes Reinecke, hch, linux-nvme, sagi

On Mon, Oct 28, 2019 at 07:32:45AM +0000, Chaitanya Kulkarni wrote:
> diff --git a/drivers/nvme/target/io-cmd-bdev.c 
> b/drivers/nvme/target/io-cmd-bdev.c
> index ed1a8d0..07e4f8c 100644
> --- a/drivers/nvme/target/io-cmd-bdev.c
> +++ b/drivers/nvme/target/io-cmd-bdev.c
> @@ -204,9 +204,9 @@ static void nvmet_bdev_execute_rw(struct nvmet_req *req)
>                  sector += sg->length >> 9;
>                  sg_cnt--;
>          }
> -       blk_finish_plug(&plug);
> 
>          submit_bio(bio);
> +       blk_finish_plug(&plug);
>   }
> 
>   static void nvmet_bdev_execute_flush(struct nvmet_req *req)
> 
>    write: IOPS=123k, BW=479MiB/s (502MB/s)(14.0GiB/30009msec)
>    write: IOPS=123k, BW=480MiB/s (504MB/s)(14.1GiB/30002msec)
>      slat (usec): min=8, max=8778, avg=13.29, stdev= 5.70
>      slat (usec): min=8, max=315, avg=13.28, stdev= 3.38
>      clat (usec): min=44, max=9790, avg=246.21, stdev=167.10
>      clat (usec): min=27, max=10883, avg=245.59, stdev=164.19
> 
> 
> Still need to look into the code to make sure above change make sense.

The block plug code change definitely makes sense. I don't think there's a
better place to plug without a more invasive change, so let's use this. I
don't immediately understand why it measures this much of a difference for
this workload, though.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] nvmet: introduce use_vfs ns-attr
  2019-10-28  7:43                     ` hch
@ 2019-10-28  8:04                       ` Chaitanya Kulkarni
  0 siblings, 0 replies; 19+ messages in thread
From: Chaitanya Kulkarni @ 2019-10-28  8:04 UTC (permalink / raw)
  To: hch; +Cc: Keith Busch, Mark Ruijter, Hannes Reinecke, sagi, linux-nvme

On 10/28/19 12:43 AM, hch@lst.de wrote:
> Fine me me.  Or take the authorship for you, after all you changed 2
> line a 3 line patch;-)
> 

;), thanks Christoph.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] nvmet: introduce use_vfs ns-attr
  2019-10-28  8:01                 ` Keith Busch
@ 2019-10-28  8:41                   ` Mark Ruijter
  0 siblings, 0 replies; 19+ messages in thread
From: Mark Ruijter @ 2019-10-28  8:41 UTC (permalink / raw)
  To: Keith Busch, Chaitanya Kulkarni; +Cc: Hannes Reinecke, hch, linux-nvme, sagi

[-- Attachment #1: Type: text/plain, Size: 2011 bytes --]


Hi Keith,

As promised the test results.

The proposed patch does indeed solve the performance problem.
I now get 640K IOPS when using the block layer and 632K IOPS when using the VFS layer.
So the VFS only introduces a surprisingly small 1.25% overhead.

Without the patches I used to get < 200K IOPS.

I've attached the fio job details (VFS vs PLUG patch) to this email for whoever is interested to see the details.

Thanks,

Mark


Op 28-10-19 09:01 heeft Keith Busch <kbusch@kernel.org> geschreven:

    On Mon, Oct 28, 2019 at 07:32:45AM +0000, Chaitanya Kulkarni wrote:
    > diff --git a/drivers/nvme/target/io-cmd-bdev.c 
    > b/drivers/nvme/target/io-cmd-bdev.c
    > index ed1a8d0..07e4f8c 100644
    > --- a/drivers/nvme/target/io-cmd-bdev.c
    > +++ b/drivers/nvme/target/io-cmd-bdev.c
    > @@ -204,9 +204,9 @@ static void nvmet_bdev_execute_rw(struct nvmet_req *req)
    >                  sector += sg->length >> 9;
    >                  sg_cnt--;
    >          }
    > -       blk_finish_plug(&plug);
    > 
    >          submit_bio(bio);
    > +       blk_finish_plug(&plug);
    >   }
    > 
    >   static void nvmet_bdev_execute_flush(struct nvmet_req *req)
    > 
    >    write: IOPS=123k, BW=479MiB/s (502MB/s)(14.0GiB/30009msec)
    >    write: IOPS=123k, BW=480MiB/s (504MB/s)(14.1GiB/30002msec)
    >      slat (usec): min=8, max=8778, avg=13.29, stdev= 5.70
    >      slat (usec): min=8, max=315, avg=13.28, stdev= 3.38
    >      clat (usec): min=44, max=9790, avg=246.21, stdev=167.10
    >      clat (usec): min=27, max=10883, avg=245.59, stdev=164.19
    > 
    > 
    > Still need to look into the code to make sure above change make sense.
    
    The block plug code change definitely makes sense. I don't think there's a
    better place to plug without a more invasive change, so let's use this. I
    don't immediately understand why it measures this much of a difference for
    this workload, though.
    


[-- Attachment #2: TEST_RESULTS --]
[-- Type: application/octet-stream, Size: 4444 bytes --]


BLOCKIO with the PLUG patch:

root@r11i3:~# fio --name=nvme_tcp_plug --rw=randrw --rwmixwrite=100 --bs=4k --filename=/dev/nvme0n1 --numjobs=32 --iodepth=128 --exitall --direct=1 --group_reporting --time_based --runtime=300 --ioengine=libaio
nvme_tcp: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=128
...
fio-3.1
Starting 32 processes
Jobs: 32 (f=32): [w(32)][100.0%][r=0KiB/s,w=2553MiB/s][r=0,w=654k IOPS][eta 00m:00s]
nvme_tcp: (groupid=0, jobs=32): err= 0: pid=4757: Mon Oct 28 02:24:17 2019
  write: IOPS=640k, BW=2500MiB/s (2621MB/s)(732GiB/300004msec)
    slat (nsec): min=1516, max=622409k, avg=44039.23, stdev=2005189.82
    clat (usec): min=20, max=665711, avg=6354.09, stdev=23263.39
     lat (usec): min=61, max=665715, avg=6398.35, stdev=23355.16
    clat percentiles (usec):
     |  1.00th=[   351],  5.00th=[   766], 10.00th=[  1074], 20.00th=[  1434],
     | 30.00th=[  1713], 40.00th=[  2008], 50.00th=[  2343], 60.00th=[  2802],
     | 70.00th=[  3425], 80.00th=[  4359], 90.00th=[  7504], 95.00th=[ 13960],
     | 99.00th=[179307], 99.50th=[204473], 99.90th=[221250], 99.95th=[233833],
     | 99.99th=[400557]
   bw (  KiB/s): min= 1188, max=313871, per=3.14%, avg=80503.42, stdev=55919.47, samples=19198
   iops        : min=  297, max=78467, avg=20125.57, stdev=13979.86, samples=19198
  lat (usec)   : 50=0.01%, 100=0.01%, 250=0.35%, 500=1.92%, 750=2.51%
  lat (usec)   : 1000=3.72%
  lat (msec)   : 2=31.15%, 4=37.32%, 10=15.72%, 20=4.08%, 50=1.62%
  lat (msec)   : 100=0.20%, 250=1.38%, 500=0.03%, 750=0.01%
  cpu          : usr=4.54%, sys=12.18%, ctx=209523478, majf=0, minf=296274
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
     issued rwt: total=0,192004671,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=128

Run status group 0 (all jobs):
  WRITE: bw=2500MiB/s (2621MB/s), 2500MiB/s-2500MiB/s (2621MB/s-2621MB/s), io=732GiB (786GB), run=300004-300004msec

USING THE VFS PATCH:

root@r11i3:~# fio --name=nvme_tcp_vfs --rw=randrw --rwmixwrite=100 --bs=4k --filename=/dev/nvme0n1 --numjobs=32 --iodepth=128 --exitall --direct=1 --group_reporting --time_based --runtime=300 --ioengine=libaio
nvme_tcp_vfs: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=128
...
fio-3.1
Starting 32 processes
Jobs: 32 (f=32): [w(32)][100.0%][r=0KiB/s,w=2627MiB/s][r=0,w=672k IOPS][eta 00m:00s]
nvme_tcp_vfs: (groupid=0, jobs=32): err= 0: pid=4835: Mon Oct 28 02:30:10 2019
  write: IOPS=632k, BW=2470MiB/s (2590MB/s)(724GiB/300120msec)
    slat (nsec): min=1488, max=639771k, avg=44471.44, stdev=1974060.88
    clat (usec): min=43, max=696423, avg=6430.04, stdev=23000.02
     lat (usec): min=61, max=696438, avg=6474.72, stdev=23090.93
    clat percentiles (usec):
     |  1.00th=[   343],  5.00th=[   742], 10.00th=[  1074], 20.00th=[  1450],
     | 30.00th=[  1745], 40.00th=[  2057], 50.00th=[  2409], 60.00th=[  2900],
     | 70.00th=[  3523], 80.00th=[  4555], 90.00th=[  7898], 95.00th=[ 14746],
     | 99.00th=[175113], 99.50th=[204473], 99.90th=[221250], 99.95th=[233833],
     | 99.99th=[408945]
   bw (  KiB/s): min=  641, max=321596, per=3.15%, avg=79587.15, stdev=57295.71, samples=19194
   iops        : min=  160, max=80399, avg=19896.46, stdev=14323.93, samples=19194
  lat (usec)   : 50=0.01%, 100=0.01%, 250=0.37%, 500=2.07%, 750=2.65%
  lat (usec)   : 1000=3.58%
  lat (msec)   : 2=29.55%, 4=37.30%, 10=16.69%, 20=4.36%, 50=1.80%
  lat (msec)   : 100=0.23%, 250=1.34%, 500=0.03%, 750=0.01%
  cpu          : usr=4.45%, sys=12.08%, ctx=206612393, majf=0, minf=276187
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
     issued rwt: total=0,189772185,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=128

Run status group 0 (all jobs):
  WRITE: bw=2470MiB/s (2590MB/s), 2470MiB/s-2470MiB/s (2590MB/s-2590MB/s), io=724GiB (777GB), run=300120-300120msec

Disk stats (read/write):
  nvme0n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%

[-- Attachment #3: Type: text/plain, Size: 158 bytes --]

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2019-10-28  8:41 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-23 20:17 [PATCH] nvmet: introduce use_vfs ns-attr Chaitanya Kulkarni
2019-10-24  2:00 ` Keith Busch
2019-10-24 11:30   ` Mark Ruijter
2019-10-25  4:05     ` Keith Busch
2019-10-25  4:26       ` Keith Busch
2019-10-25  8:44         ` Mark Ruijter
2019-10-26  1:06           ` Keith Busch
2019-10-27 15:03           ` hch
2019-10-27 16:06             ` Mark Ruijter
2019-10-28  0:55             ` Keith Busch
2019-10-28  7:26               ` Chaitanya Kulkarni
2019-10-28  7:32               ` Chaitanya Kulkarni
2019-10-28  7:35                 ` hch
2019-10-28  7:38                   ` Chaitanya Kulkarni
2019-10-28  7:43                     ` hch
2019-10-28  8:04                       ` Chaitanya Kulkarni
2019-10-28  8:01                 ` Keith Busch
2019-10-28  8:41                   ` Mark Ruijter
2019-10-25  3:29   ` Chaitanya Kulkarni

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).