Re: [PATCH] nvmet: introduce use_vfs ns-attr

From: "hch@lst.de" <hch@lst.de>
To: Mark Ruijter <MRuijter@onestopsystems.com>
Cc: Hannes Reinecke <hare@suse.com>,
	"sagi@grimberg.me" <sagi@grimberg.me>,
	Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>,
	"linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>,
	Keith Busch <kbusch@kernel.org>, "hch@lst.de" <hch@lst.de>
Subject: Re: [PATCH] nvmet: introduce use_vfs ns-attr
Date: Sun, 27 Oct 2019 16:03:30 +0100	[thread overview]
Message-ID: <20191027150330.GA5843@lst.de> (raw)
In-Reply-To: <109617B2-CC73-4CDE-B97A-FDDB12CD22BD@onestopsystems.com>

On Fri, Oct 25, 2019 at 08:44:00AM +0000, Mark Ruijter wrote:
> 
> Hi Keith,
> 
> I am indeed not using buffered io.
> Using the VFS increases my 4k random write performance from 200K to 650K when using raid1. 
> So the difference is huge and becomes more significant when the underlying drives or raid0 can handle more iops.

Can you try the patch below to use block layer plugging in nvmet?  That
should be the only major difference in how we do I/O.

> 1. Currently a controller id collision can occur when using a clustered HA setup. See this message:
> >>> [1122789.054677] nvme nvme1: Duplicate cntlid 4 with nvme0, rejecting.
> 
> The controller ID is currently hard wired.
> 
>        ret = ida_simple_get(&cntlid_ida,
>                              NVME_CNTLID_MIN, NVME_CNTLID_MAX,
>                              GFP_KERNEL);
> 
> So two nodes exporting the exact same volume using the same port configuration can easily come up with the same controller id.
> I would like to propose to make it configurable, but with the current logic setting a default.
> SCST for example allows manual target id selection for this reason.

We can allow some control there using a new configfs file.  But what
would be even better is an actually integrated cluster manager, which
we'd need to support features such as persistent reservations.

> 2. The Model of the drives has been hard wired to Linux. As I see it this should be configurable with 'Linux' as default value.
> I'll provide code that makes that work.

Yes, please send a patch.

> 3. A NVMEoF connected disk on the initiator seems to queue forever when the target dies.
> It would be nice if we had the ability to select either 'queue foreever' or 'failfast'.

Making this configurable has been a long time todo list item.  At some
point in the past Hannes (added to Cc) signed up for it, but it seems
to have dropped off his priority list.

---
From 87ab0d6f9e092cde04775452131f90e8b4c46a66 Mon Sep 17 00:00:00 2001
From: Christoph Hellwig <hch@lst.de>
Date: Sun, 27 Oct 2019 15:59:08 +0100
Subject: nvmet: use block layer plugging in nvmet_bdev_execute_rw

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/target/io-cmd-bdev.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c
index 04a9cd2a2604..ed1a8d0ab30e 100644
--- a/drivers/nvme/target/io-cmd-bdev.c
+++ b/drivers/nvme/target/io-cmd-bdev.c
@@ -147,6 +147,7 @@ static void nvmet_bdev_execute_rw(struct nvmet_req *req)
 	int sg_cnt = req->sg_cnt;
 	struct bio *bio;
 	struct scatterlist *sg;
+	struct blk_plug plug;
 	sector_t sector;
 	int op, op_flags = 0, i;
 
@@ -185,6 +186,7 @@ static void nvmet_bdev_execute_rw(struct nvmet_req *req)
 	bio->bi_end_io = nvmet_bio_done;
 	bio_set_op_attrs(bio, op, op_flags);
 
+	blk_start_plug(&plug);
 	for_each_sg(req->sg, sg, req->sg_cnt, i) {
 		while (bio_add_page(bio, sg_page(sg), sg->length, sg->offset)
 				!= sg->length) {
@@ -202,6 +204,7 @@ static void nvmet_bdev_execute_rw(struct nvmet_req *req)
 		sector += sg->length >> 9;
 		sg_cnt--;
 	}
+	blk_finish_plug(&plug);
 
 	submit_bio(bio);
 }
-- 
2.20.1


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme