All of lore.kernel.org
 help / color / mirror / Atom feed
* nvme multipath support V2
@ 2017-09-18 23:14 ` Christoph Hellwig
  0 siblings, 0 replies; 66+ messages in thread
From: Christoph Hellwig @ 2017-09-18 23:14 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Keith Busch, Sagi Grimberg, linux-nvme, linux-block

Hi all,

this series adds support for multipathing, that is accessing nvme
namespaces through multiple controllers to the nvme core driver.

It is a very thin and efficient implementation that relies on
close cooperation with other bits of the nvme driver, and few small
and simple block helpers.

Compared to dm-multipath the important differences are how management
of the paths is done, and how the I/O path works.

Management of the paths is fully integrated into the nvme driver,
for each newly found nvme controller we check if there are other
controllers that refer to the same subsystem, and if so we link them
up in the nvme driver.  Then for each namespace found we check if
the namespace id and identifiers match to check if we have multiple
controllers that refer to the same namespaces.  For now path
availability is based entirely on the controller status, which at
least for fabrics will be continuously updated based on the mandatory
keep alive timer.  Once the Asynchronous Namespace Access (ANA)
proposal passes in NVMe we will also get per-namespace states in
addition to that, but for now any details of that remain confidential
to NVMe members.

The I/O path is very different from the existing multipath drivers,
which is enabled by the fact that NVMe (unlike SCSI) does not support
partial completions - a controller will either complete a whole
command or not, but never only complete parts of it.  Because of that
there is no need to clone bios or requests - the I/O path simply
redirects the I/O to a suitable path.  For successful commands
multipath is not in the completion stack at all.  For failed commands
we decide if the error could be a path failure, and if yes remove
the bios from the request structure and requeue them before completing
the request.  All together this means there is no performance
degradation compared to normal nvme operation when using the multipath
device node (at least not until I find a dual ported DRAM backed
device :))

There are a couple questions left in the individual patches, comments
welcome.

Note that this series requires the previous series to remove bi_bdev,
in doubt use the git tree below for testing.

A git tree is available at:

   git://git.infradead.org/users/hch/block.git nvme-mpath

gitweb:

   http://git.infradead.org/users/hch/block.git/shortlog/refs/heads/nvme-mpath

Changes since V1:
  - introduce new nvme_ns_ids structure to clean up identifier handling
  - generic_make_request_fast is now named direct_make_request and calls
    generic_make_request_checks
  - reset bi_disk on resubmission
  - create sysfs links between the existing nvme namespace block devices and
    the new share mpath device
  - temporarily added the timeout patches from James, this should go into
    nvme-4.14, though

^ permalink raw reply	[flat|nested] 66+ messages in thread

* nvme multipath support V2
@ 2017-09-18 23:14 ` Christoph Hellwig
  0 siblings, 0 replies; 66+ messages in thread
From: Christoph Hellwig @ 2017-09-18 23:14 UTC (permalink / raw)


Hi all,

this series adds support for multipathing, that is accessing nvme
namespaces through multiple controllers to the nvme core driver.

It is a very thin and efficient implementation that relies on
close cooperation with other bits of the nvme driver, and few small
and simple block helpers.

Compared to dm-multipath the important differences are how management
of the paths is done, and how the I/O path works.

Management of the paths is fully integrated into the nvme driver,
for each newly found nvme controller we check if there are other
controllers that refer to the same subsystem, and if so we link them
up in the nvme driver.  Then for each namespace found we check if
the namespace id and identifiers match to check if we have multiple
controllers that refer to the same namespaces.  For now path
availability is based entirely on the controller status, which at
least for fabrics will be continuously updated based on the mandatory
keep alive timer.  Once the Asynchronous Namespace Access (ANA)
proposal passes in NVMe we will also get per-namespace states in
addition to that, but for now any details of that remain confidential
to NVMe members.

The I/O path is very different from the existing multipath drivers,
which is enabled by the fact that NVMe (unlike SCSI) does not support
partial completions - a controller will either complete a whole
command or not, but never only complete parts of it.  Because of that
there is no need to clone bios or requests - the I/O path simply
redirects the I/O to a suitable path.  For successful commands
multipath is not in the completion stack at all.  For failed commands
we decide if the error could be a path failure, and if yes remove
the bios from the request structure and requeue them before completing
the request.  All together this means there is no performance
degradation compared to normal nvme operation when using the multipath
device node (at least not until I find a dual ported DRAM backed
device :))

There are a couple questions left in the individual patches, comments
welcome.

Note that this series requires the previous series to remove bi_bdev,
in doubt use the git tree below for testing.

A git tree is available at:

   git://git.infradead.org/users/hch/block.git nvme-mpath

gitweb:

   http://git.infradead.org/users/hch/block.git/shortlog/refs/heads/nvme-mpath

Changes since V1:
  - introduce new nvme_ns_ids structure to clean up identifier handling
  - generic_make_request_fast is now named direct_make_request and calls
    generic_make_request_checks
  - reset bi_disk on resubmission
  - create sysfs links between the existing nvme namespace block devices and
    the new share mpath device
  - temporarily added the timeout patches from James, this should go into
    nvme-4.14, though

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH 1/9] nvme: allow timed-out ios to retry
  2017-09-18 23:14 ` Christoph Hellwig
@ 2017-09-18 23:14   ` Christoph Hellwig
  -1 siblings, 0 replies; 66+ messages in thread
From: Christoph Hellwig @ 2017-09-18 23:14 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Keith Busch, Sagi Grimberg, linux-nvme, linux-block, James Smart,
	James Smart

From: James Smart <jsmart2021@gmail.com>

Currently the nvme_req_needs_retry() applies several checks to see if
a retry is allowed. On of those is whether the current time has exceeded
the start time of the io plus the timeout length. This check, if an io
times out, means there is never a retry allowed for the io. Which means
applications see the io failure.

Remove this check and allow the io to timeout, like it does on other
protocols, and retries to be made.

On the FC transport, a frame can be lost for an individual io, and there
may be no other errors that escalate for the connection/association.
The io will timeout, which causes the transport to escalate into creating
a new association, but the io that timed out, due to this retry logic, has
already failed back to the application and things are hosed.

Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/core.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index d470f031e27f..5589f67d2cd8 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -134,8 +134,6 @@ static inline bool nvme_req_needs_retry(struct request *req)
 		return false;
 	if (nvme_req(req)->status & NVME_SC_DNR)
 		return false;
-	if (jiffies - req->start_time >= req->timeout)
-		return false;
 	if (nvme_req(req)->retries >= nvme_max_retries)
 		return false;
 	return true;
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 1/9] nvme: allow timed-out ios to retry
@ 2017-09-18 23:14   ` Christoph Hellwig
  0 siblings, 0 replies; 66+ messages in thread
From: Christoph Hellwig @ 2017-09-18 23:14 UTC (permalink / raw)


From: James Smart <jsmart2021@gmail.com>

Currently the nvme_req_needs_retry() applies several checks to see if
a retry is allowed. On of those is whether the current time has exceeded
the start time of the io plus the timeout length. This check, if an io
times out, means there is never a retry allowed for the io. Which means
applications see the io failure.

Remove this check and allow the io to timeout, like it does on other
protocols, and retries to be made.

On the FC transport, a frame can be lost for an individual io, and there
may be no other errors that escalate for the connection/association.
The io will timeout, which causes the transport to escalate into creating
a new association, but the io that timed out, due to this retry logic, has
already failed back to the application and things are hosed.

Signed-off-by: James Smart <james.smart at broadcom.com>
Reviewed-by: Keith Busch <keith.busch at intel.com>
Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/nvme/host/core.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index d470f031e27f..5589f67d2cd8 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -134,8 +134,6 @@ static inline bool nvme_req_needs_retry(struct request *req)
 		return false;
 	if (nvme_req(req)->status & NVME_SC_DNR)
 		return false;
-	if (jiffies - req->start_time >= req->timeout)
-		return false;
 	if (nvme_req(req)->retries >= nvme_max_retries)
 		return false;
 	return true;
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 2/9] block: move REQ_NOWAIT
  2017-09-18 23:14 ` Christoph Hellwig
@ 2017-09-18 23:14   ` Christoph Hellwig
  -1 siblings, 0 replies; 66+ messages in thread
From: Christoph Hellwig @ 2017-09-18 23:14 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Keith Busch, Sagi Grimberg, linux-nvme, linux-block

This flag should be before the operation-specific REQ_NOUNMAP bit.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/linux/blk_types.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index a2d2aa709cef..acc2f3cdc2fc 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -224,11 +224,11 @@ enum req_flag_bits {
 	__REQ_PREFLUSH,		/* request for cache flush */
 	__REQ_RAHEAD,		/* read ahead, can fail anytime */
 	__REQ_BACKGROUND,	/* background IO */
+	__REQ_NOWAIT,           /* Don't wait if request will block */
 
 	/* command specific flags for REQ_OP_WRITE_ZEROES: */
 	__REQ_NOUNMAP,		/* do not free blocks when zeroing */
 
-	__REQ_NOWAIT,           /* Don't wait if request will block */
 	__REQ_NR_BITS,		/* stops here */
 };
 
@@ -245,9 +245,9 @@ enum req_flag_bits {
 #define REQ_PREFLUSH		(1ULL << __REQ_PREFLUSH)
 #define REQ_RAHEAD		(1ULL << __REQ_RAHEAD)
 #define REQ_BACKGROUND		(1ULL << __REQ_BACKGROUND)
+#define REQ_NOWAIT		(1ULL << __REQ_NOWAIT)
 
 #define REQ_NOUNMAP		(1ULL << __REQ_NOUNMAP)
-#define REQ_NOWAIT		(1ULL << __REQ_NOWAIT)
 
 #define REQ_FAILFAST_MASK \
 	(REQ_FAILFAST_DEV | REQ_FAILFAST_TRANSPORT | REQ_FAILFAST_DRIVER)
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 2/9] block: move REQ_NOWAIT
@ 2017-09-18 23:14   ` Christoph Hellwig
  0 siblings, 0 replies; 66+ messages in thread
From: Christoph Hellwig @ 2017-09-18 23:14 UTC (permalink / raw)


This flag should be before the operation-specific REQ_NOUNMAP bit.

Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 include/linux/blk_types.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index a2d2aa709cef..acc2f3cdc2fc 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -224,11 +224,11 @@ enum req_flag_bits {
 	__REQ_PREFLUSH,		/* request for cache flush */
 	__REQ_RAHEAD,		/* read ahead, can fail anytime */
 	__REQ_BACKGROUND,	/* background IO */
+	__REQ_NOWAIT,           /* Don't wait if request will block */
 
 	/* command specific flags for REQ_OP_WRITE_ZEROES: */
 	__REQ_NOUNMAP,		/* do not free blocks when zeroing */
 
-	__REQ_NOWAIT,           /* Don't wait if request will block */
 	__REQ_NR_BITS,		/* stops here */
 };
 
@@ -245,9 +245,9 @@ enum req_flag_bits {
 #define REQ_PREFLUSH		(1ULL << __REQ_PREFLUSH)
 #define REQ_RAHEAD		(1ULL << __REQ_RAHEAD)
 #define REQ_BACKGROUND		(1ULL << __REQ_BACKGROUND)
+#define REQ_NOWAIT		(1ULL << __REQ_NOWAIT)
 
 #define REQ_NOUNMAP		(1ULL << __REQ_NOUNMAP)
-#define REQ_NOWAIT		(1ULL << __REQ_NOWAIT)
 
 #define REQ_FAILFAST_MASK \
 	(REQ_FAILFAST_DEV | REQ_FAILFAST_TRANSPORT | REQ_FAILFAST_DRIVER)
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 3/9] block: add REQ_DRV bit
  2017-09-18 23:14 ` Christoph Hellwig
@ 2017-09-18 23:14   ` Christoph Hellwig
  -1 siblings, 0 replies; 66+ messages in thread
From: Christoph Hellwig @ 2017-09-18 23:14 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Keith Busch, Sagi Grimberg, linux-nvme, linux-block

Set aside a bit in the request/bio flags for driver use.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/linux/blk_types.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index acc2f3cdc2fc..7ec2ed097a8a 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -229,6 +229,9 @@ enum req_flag_bits {
 	/* command specific flags for REQ_OP_WRITE_ZEROES: */
 	__REQ_NOUNMAP,		/* do not free blocks when zeroing */
 
+	/* for driver use */
+	__REQ_DRV,
+
 	__REQ_NR_BITS,		/* stops here */
 };
 
@@ -249,6 +252,8 @@ enum req_flag_bits {
 
 #define REQ_NOUNMAP		(1ULL << __REQ_NOUNMAP)
 
+#define REQ_DRV			(1ULL << __REQ_DRV)
+
 #define REQ_FAILFAST_MASK \
 	(REQ_FAILFAST_DEV | REQ_FAILFAST_TRANSPORT | REQ_FAILFAST_DRIVER)
 
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 3/9] block: add REQ_DRV bit
@ 2017-09-18 23:14   ` Christoph Hellwig
  0 siblings, 0 replies; 66+ messages in thread
From: Christoph Hellwig @ 2017-09-18 23:14 UTC (permalink / raw)


Set aside a bit in the request/bio flags for driver use.

Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 include/linux/blk_types.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index acc2f3cdc2fc..7ec2ed097a8a 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -229,6 +229,9 @@ enum req_flag_bits {
 	/* command specific flags for REQ_OP_WRITE_ZEROES: */
 	__REQ_NOUNMAP,		/* do not free blocks when zeroing */
 
+	/* for driver use */
+	__REQ_DRV,
+
 	__REQ_NR_BITS,		/* stops here */
 };
 
@@ -249,6 +252,8 @@ enum req_flag_bits {
 
 #define REQ_NOUNMAP		(1ULL << __REQ_NOUNMAP)
 
+#define REQ_DRV			(1ULL << __REQ_DRV)
+
 #define REQ_FAILFAST_MASK \
 	(REQ_FAILFAST_DEV | REQ_FAILFAST_TRANSPORT | REQ_FAILFAST_DRIVER)
 
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 4/9] block: provide a direct_make_request helper
  2017-09-18 23:14 ` Christoph Hellwig
@ 2017-09-18 23:14   ` Christoph Hellwig
  -1 siblings, 0 replies; 66+ messages in thread
From: Christoph Hellwig @ 2017-09-18 23:14 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Keith Busch, Sagi Grimberg, linux-nvme, linux-block

This helper allows reinserting a bio into a new queue without much
overhead, but requires all queue limits to be the same for the upper
and lower queues, and it does not provide any recursion preventions.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-core.c       | 32 ++++++++++++++++++++++++++++++++
 include/linux/blkdev.h |  1 +
 2 files changed, 33 insertions(+)

diff --git a/block/blk-core.c b/block/blk-core.c
index aebe676225e6..f02eb346c910 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2238,6 +2238,38 @@ blk_qc_t generic_make_request(struct bio *bio)
 }
 EXPORT_SYMBOL(generic_make_request);
 
+/**
+ * direct_make_request - hand a buffer directly to its device driver for I/O
+ * @bio:  The bio describing the location in memory and on the device.
+ *
+ * This function behaves like generic_make_request(), but does not protect
+ * against recursion.  Must only be used if the called driver is known
+ * to not re-issue this bio or a split to itself.
+ */
+blk_qc_t direct_make_request(struct bio *bio)
+{
+	struct request_queue *q = bio->bi_disk->queue;
+	bool nowait = bio->bi_opf & REQ_NOWAIT;
+	blk_qc_t ret;
+
+	if (!generic_make_request_checks(bio))
+		return BLK_QC_T_NONE;
+
+	if (unlikely(blk_queue_enter(q, nowait))) {
+		if (nowait && !blk_queue_dying(q))
+			bio->bi_status = BLK_STS_AGAIN;
+		else
+			bio->bi_status = BLK_STS_IOERR;
+		bio_endio(bio);
+		return BLK_QC_T_NONE;
+	}
+
+	ret = q->make_request_fn(q, bio);
+	blk_queue_exit(q);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(direct_make_request);
+
 /**
  * submit_bio - submit a bio to the block device layer for I/O
  * @bio: The &struct bio which describes the I/O
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 460294bb0fa5..86dfeea16d4c 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -935,6 +935,7 @@ do {								\
 extern int blk_register_queue(struct gendisk *disk);
 extern void blk_unregister_queue(struct gendisk *disk);
 extern blk_qc_t generic_make_request(struct bio *bio);
+extern blk_qc_t direct_make_request(struct bio *bio);
 extern void blk_rq_init(struct request_queue *q, struct request *rq);
 extern void blk_init_request_from_bio(struct request *req, struct bio *bio);
 extern void blk_put_request(struct request *);
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 4/9] block: provide a direct_make_request helper
@ 2017-09-18 23:14   ` Christoph Hellwig
  0 siblings, 0 replies; 66+ messages in thread
From: Christoph Hellwig @ 2017-09-18 23:14 UTC (permalink / raw)


This helper allows reinserting a bio into a new queue without much
overhead, but requires all queue limits to be the same for the upper
and lower queues, and it does not provide any recursion preventions.

Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 block/blk-core.c       | 32 ++++++++++++++++++++++++++++++++
 include/linux/blkdev.h |  1 +
 2 files changed, 33 insertions(+)

diff --git a/block/blk-core.c b/block/blk-core.c
index aebe676225e6..f02eb346c910 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2238,6 +2238,38 @@ blk_qc_t generic_make_request(struct bio *bio)
 }
 EXPORT_SYMBOL(generic_make_request);
 
+/**
+ * direct_make_request - hand a buffer directly to its device driver for I/O
+ * @bio:  The bio describing the location in memory and on the device.
+ *
+ * This function behaves like generic_make_request(), but does not protect
+ * against recursion.  Must only be used if the called driver is known
+ * to not re-issue this bio or a split to itself.
+ */
+blk_qc_t direct_make_request(struct bio *bio)
+{
+	struct request_queue *q = bio->bi_disk->queue;
+	bool nowait = bio->bi_opf & REQ_NOWAIT;
+	blk_qc_t ret;
+
+	if (!generic_make_request_checks(bio))
+		return BLK_QC_T_NONE;
+
+	if (unlikely(blk_queue_enter(q, nowait))) {
+		if (nowait && !blk_queue_dying(q))
+			bio->bi_status = BLK_STS_AGAIN;
+		else
+			bio->bi_status = BLK_STS_IOERR;
+		bio_endio(bio);
+		return BLK_QC_T_NONE;
+	}
+
+	ret = q->make_request_fn(q, bio);
+	blk_queue_exit(q);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(direct_make_request);
+
 /**
  * submit_bio - submit a bio to the block device layer for I/O
  * @bio: The &struct bio which describes the I/O
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 460294bb0fa5..86dfeea16d4c 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -935,6 +935,7 @@ do {								\
 extern int blk_register_queue(struct gendisk *disk);
 extern void blk_unregister_queue(struct gendisk *disk);
 extern blk_qc_t generic_make_request(struct bio *bio);
+extern blk_qc_t direct_make_request(struct bio *bio);
 extern void blk_rq_init(struct request_queue *q, struct request *rq);
 extern void blk_init_request_from_bio(struct request *req, struct bio *bio);
 extern void blk_put_request(struct request *);
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 5/9] block: add a blk_steal_bios helper
  2017-09-18 23:14 ` Christoph Hellwig
@ 2017-09-18 23:14   ` Christoph Hellwig
  -1 siblings, 0 replies; 66+ messages in thread
From: Christoph Hellwig @ 2017-09-18 23:14 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Keith Busch, Sagi Grimberg, linux-nvme, linux-block

This helpers allows to bounce steal the uncompleted bios from a request so
that they can be reissued on another path.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
---
 block/blk-core.c       | 20 ++++++++++++++++++++
 include/linux/blkdev.h |  2 ++
 2 files changed, 22 insertions(+)

diff --git a/block/blk-core.c b/block/blk-core.c
index f02eb346c910..6f3db2c14843 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2724,6 +2724,26 @@ struct request *blk_fetch_request(struct request_queue *q)
 }
 EXPORT_SYMBOL(blk_fetch_request);
 
+/*
+ * Steal bios from a request.  The request must not have been partially
+ * completed before.
+ */
+void blk_steal_bios(struct bio_list *list, struct request *rq)
+{
+	if (rq->bio) {
+		if (list->tail)
+			list->tail->bi_next = rq->bio;
+		else
+			list->head = rq->bio;
+		list->tail = rq->biotail;
+	}
+
+	rq->bio = NULL;
+	rq->biotail = NULL;
+	rq->__data_len = 0;
+}
+EXPORT_SYMBOL_GPL(blk_steal_bios);
+
 /**
  * blk_update_request - Special helper function for request stacking drivers
  * @req:      the request being processed
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 86dfeea16d4c..0ca5525ee5ce 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1109,6 +1109,8 @@ extern struct request *blk_peek_request(struct request_queue *q);
 extern void blk_start_request(struct request *rq);
 extern struct request *blk_fetch_request(struct request_queue *q);
 
+void blk_steal_bios(struct bio_list *list, struct request *rq);
+
 /*
  * Request completion related functions.
  *
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 5/9] block: add a blk_steal_bios helper
@ 2017-09-18 23:14   ` Christoph Hellwig
  0 siblings, 0 replies; 66+ messages in thread
From: Christoph Hellwig @ 2017-09-18 23:14 UTC (permalink / raw)


This helpers allows to bounce steal the uncompleted bios from a request so
that they can be reissued on another path.

Signed-off-by: Christoph Hellwig <hch at lst.de>
Reviewed-by: Sagi Grimberg <sagi at grimberg.me>
---
 block/blk-core.c       | 20 ++++++++++++++++++++
 include/linux/blkdev.h |  2 ++
 2 files changed, 22 insertions(+)

diff --git a/block/blk-core.c b/block/blk-core.c
index f02eb346c910..6f3db2c14843 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2724,6 +2724,26 @@ struct request *blk_fetch_request(struct request_queue *q)
 }
 EXPORT_SYMBOL(blk_fetch_request);
 
+/*
+ * Steal bios from a request.  The request must not have been partially
+ * completed before.
+ */
+void blk_steal_bios(struct bio_list *list, struct request *rq)
+{
+	if (rq->bio) {
+		if (list->tail)
+			list->tail->bi_next = rq->bio;
+		else
+			list->head = rq->bio;
+		list->tail = rq->biotail;
+	}
+
+	rq->bio = NULL;
+	rq->biotail = NULL;
+	rq->__data_len = 0;
+}
+EXPORT_SYMBOL_GPL(blk_steal_bios);
+
 /**
  * blk_update_request - Special helper function for request stacking drivers
  * @req:      the request being processed
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 86dfeea16d4c..0ca5525ee5ce 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1109,6 +1109,8 @@ extern struct request *blk_peek_request(struct request_queue *q);
 extern void blk_start_request(struct request *rq);
 extern struct request *blk_fetch_request(struct request_queue *q);
 
+void blk_steal_bios(struct bio_list *list, struct request *rq);
+
 /*
  * Request completion related functions.
  *
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 6/9] nvme: track subsystems
  2017-09-18 23:14 ` Christoph Hellwig
@ 2017-09-18 23:14   ` Christoph Hellwig
  -1 siblings, 0 replies; 66+ messages in thread
From: Christoph Hellwig @ 2017-09-18 23:14 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Keith Busch, Sagi Grimberg, linux-nvme, linux-block

This adds a new nvme_subsystem structure so that we can track multiple
controllers that belong to a single subsystem.  For now we only use it
to store the NQN, and to check that we don't have duplicate NQNs unless
the involved subsystems support multiple controllers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/core.c    | 111 ++++++++++++++++++++++++++++++++++++++++----
 drivers/nvme/host/fabrics.c |   4 +-
 drivers/nvme/host/nvme.h    |  12 ++++-
 3 files changed, 116 insertions(+), 11 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 5589f67d2cd8..4291119a6bc9 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -71,6 +71,9 @@ MODULE_PARM_DESC(streams, "turn on support for Streams write directives");
 struct workqueue_struct *nvme_wq;
 EXPORT_SYMBOL_GPL(nvme_wq);
 
+static LIST_HEAD(nvme_subsystems);
+static DEFINE_MUTEX(nvme_subsystems_lock);
+
 static LIST_HEAD(nvme_ctrl_list);
 static DEFINE_SPINLOCK(dev_list_lock);
 
@@ -1736,14 +1739,15 @@ static bool quirk_matches(const struct nvme_id_ctrl *id,
 		string_matches(id->fr, q->fr, sizeof(id->fr));
 }
 
-static void nvme_init_subnqn(struct nvme_ctrl *ctrl, struct nvme_id_ctrl *id)
+static void nvme_init_subnqn(struct nvme_subsystem *subsys, struct nvme_ctrl *ctrl,
+		struct nvme_id_ctrl *id)
 {
 	size_t nqnlen;
 	int off;
 
 	nqnlen = strnlen(id->subnqn, NVMF_NQN_SIZE);
 	if (nqnlen > 0 && nqnlen < NVMF_NQN_SIZE) {
-		strcpy(ctrl->subnqn, id->subnqn);
+		strcpy(subsys->subnqn, id->subnqn);
 		return;
 	}
 
@@ -1751,14 +1755,91 @@ static void nvme_init_subnqn(struct nvme_ctrl *ctrl, struct nvme_id_ctrl *id)
 		dev_warn(ctrl->device, "missing or invalid SUBNQN field.\n");
 
 	/* Generate a "fake" NQN per Figure 254 in NVMe 1.3 + ECN 001 */
-	off = snprintf(ctrl->subnqn, NVMF_NQN_SIZE,
+	off = snprintf(subsys->subnqn, NVMF_NQN_SIZE,
 			"nqn.2014.08.org.nvmexpress:%4x%4x",
 			le16_to_cpu(id->vid), le16_to_cpu(id->ssvid));
-	memcpy(ctrl->subnqn + off, id->sn, sizeof(id->sn));
+	memcpy(subsys->subnqn + off, id->sn, sizeof(id->sn));
 	off += sizeof(id->sn);
-	memcpy(ctrl->subnqn + off, id->mn, sizeof(id->mn));
+	memcpy(subsys->subnqn + off, id->mn, sizeof(id->mn));
 	off += sizeof(id->mn);
-	memset(ctrl->subnqn + off, 0, sizeof(ctrl->subnqn) - off);
+	memset(subsys->subnqn + off, 0, sizeof(subsys->subnqn) - off);
+}
+
+static void nvme_destroy_subsystem(struct kref *ref)
+{
+	struct nvme_subsystem *subsys =
+			container_of(ref, struct nvme_subsystem, ref);
+
+	mutex_lock(&nvme_subsystems_lock);
+	list_del(&subsys->entry);
+	mutex_unlock(&nvme_subsystems_lock);
+
+	kfree(subsys);
+}
+
+static void nvme_put_subsystem(struct nvme_subsystem *subsys)
+{
+	kref_put(&subsys->ref, nvme_destroy_subsystem);
+}
+
+static struct nvme_subsystem *__nvme_find_get_subsystem(const char *subsysnqn)
+{
+	struct nvme_subsystem *subsys;
+
+	lockdep_assert_held(&nvme_subsystems_lock);
+
+	list_for_each_entry(subsys, &nvme_subsystems, entry) {
+		if (strcmp(subsys->subnqn, subsysnqn))
+			continue;
+		if (!kref_get_unless_zero(&subsys->ref))
+			continue;
+		return subsys;
+	}
+
+	return NULL;
+}
+
+static int nvme_init_subsystem(struct nvme_ctrl *ctrl, struct nvme_id_ctrl *id)
+{
+	struct nvme_subsystem *subsys, *found;
+
+	subsys = kzalloc(sizeof(*subsys), GFP_KERNEL);
+	if (!subsys)
+		return -ENOMEM;
+	INIT_LIST_HEAD(&subsys->ctrls);
+	kref_init(&subsys->ref);
+	nvme_init_subnqn(subsys, ctrl, id);
+	mutex_init(&subsys->lock);
+
+	mutex_lock(&nvme_subsystems_lock);
+	found = __nvme_find_get_subsystem(subsys->subnqn);
+	if (found) {
+		/*
+		 * Verify that the subsystem actually supports multiple
+		 * controllers, else bail out.
+		 */
+		kfree(subsys);
+		if (!(id->cmic & (1 << 1))) {
+			dev_err(ctrl->device,
+				"ignoring ctrl due to duplicate subnqn (%s).\n",
+				found->subnqn);
+			mutex_unlock(&nvme_subsystems_lock);
+			return -EINVAL;
+		}
+
+		subsys = found;
+	} else {
+		list_add_tail(&subsys->entry, &nvme_subsystems);
+	}
+
+	ctrl->subsys = subsys;
+	mutex_unlock(&nvme_subsystems_lock);
+
+	mutex_lock(&subsys->lock);
+	list_add_tail(&ctrl->subsys_entry, &subsys->ctrls);
+	mutex_unlock(&subsys->lock);
+
+	return 0;
 }
 
 /*
@@ -1796,7 +1877,11 @@ int nvme_init_identify(struct nvme_ctrl *ctrl)
 		return -EIO;
 	}
 
-	nvme_init_subnqn(ctrl, id);
+	ret = nvme_init_subsystem(ctrl, id);
+	if (ret) {
+		kfree(id);
+		return ret;
+	}
 
 	if (!ctrl->identified) {
 		/*
@@ -2230,7 +2315,7 @@ static ssize_t nvme_sysfs_show_subsysnqn(struct device *dev,
 {
 	struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
 
-	return snprintf(buf, PAGE_SIZE, "%s\n", ctrl->subnqn);
+	return snprintf(buf, PAGE_SIZE, "%s\n", ctrl->subsys->subnqn);
 }
 static DEVICE_ATTR(subsysnqn, S_IRUGO, nvme_sysfs_show_subsysnqn, NULL);
 
@@ -2770,12 +2855,22 @@ EXPORT_SYMBOL_GPL(nvme_uninit_ctrl);
 static void nvme_free_ctrl(struct kref *kref)
 {
 	struct nvme_ctrl *ctrl = container_of(kref, struct nvme_ctrl, kref);
+	struct nvme_subsystem *subsys = ctrl->subsys;
 
 	put_device(ctrl->device);
 	nvme_release_instance(ctrl);
 	ida_destroy(&ctrl->ns_ida);
 
+	if (subsys) {
+		mutex_lock(&subsys->lock);
+		list_del(&ctrl->subsys_entry);
+		mutex_unlock(&subsys->lock);
+	}
+
 	ctrl->ops->free_ctrl(ctrl);
+
+	if (subsys)
+		nvme_put_subsystem(subsys);
 }
 
 void nvme_put_ctrl(struct nvme_ctrl *ctrl)
diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
index 47307752dc65..ec443dcb289b 100644
--- a/drivers/nvme/host/fabrics.c
+++ b/drivers/nvme/host/fabrics.c
@@ -874,10 +874,10 @@ nvmf_create_ctrl(struct device *dev, const char *buf, size_t count)
 		goto out_unlock;
 	}
 
-	if (strcmp(ctrl->subnqn, opts->subsysnqn)) {
+	if (strcmp(ctrl->subsys->subnqn, opts->subsysnqn)) {
 		dev_warn(ctrl->device,
 			"controller returned incorrect NQN: \"%s\".\n",
-			ctrl->subnqn);
+			ctrl->subsys->subnqn);
 		up_read(&nvmf_transports_rwsem);
 		ctrl->ops->delete_ctrl(ctrl);
 		return ERR_PTR(-EINVAL);
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index d3f3c4447515..b96aeaf93dc1 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -138,13 +138,15 @@ struct nvme_ctrl {
 	struct ida ns_ida;
 	struct work_struct reset_work;
 
+	struct nvme_subsystem *subsys;
+	struct list_head subsys_entry;
+
 	struct opal_dev *opal_dev;
 
 	char name[12];
 	char serial[20];
 	char model[40];
 	char firmware_rev[8];
-	char subnqn[NVMF_NQN_SIZE];
 	u16 cntlid;
 
 	u32 ctrl_config;
@@ -197,6 +199,14 @@ struct nvme_ctrl {
 	struct nvmf_ctrl_options *opts;
 };
 
+struct nvme_subsystem {
+	struct list_head	entry;
+	struct mutex		lock;
+	struct list_head	ctrls;
+	struct kref		ref;
+	char			subnqn[NVMF_NQN_SIZE];
+};
+
 struct nvme_ns {
 	struct list_head list;
 
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 6/9] nvme: track subsystems
@ 2017-09-18 23:14   ` Christoph Hellwig
  0 siblings, 0 replies; 66+ messages in thread
From: Christoph Hellwig @ 2017-09-18 23:14 UTC (permalink / raw)


This adds a new nvme_subsystem structure so that we can track multiple
controllers that belong to a single subsystem.  For now we only use it
to store the NQN, and to check that we don't have duplicate NQNs unless
the involved subsystems support multiple controllers.

Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/nvme/host/core.c    | 111 ++++++++++++++++++++++++++++++++++++++++----
 drivers/nvme/host/fabrics.c |   4 +-
 drivers/nvme/host/nvme.h    |  12 ++++-
 3 files changed, 116 insertions(+), 11 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 5589f67d2cd8..4291119a6bc9 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -71,6 +71,9 @@ MODULE_PARM_DESC(streams, "turn on support for Streams write directives");
 struct workqueue_struct *nvme_wq;
 EXPORT_SYMBOL_GPL(nvme_wq);
 
+static LIST_HEAD(nvme_subsystems);
+static DEFINE_MUTEX(nvme_subsystems_lock);
+
 static LIST_HEAD(nvme_ctrl_list);
 static DEFINE_SPINLOCK(dev_list_lock);
 
@@ -1736,14 +1739,15 @@ static bool quirk_matches(const struct nvme_id_ctrl *id,
 		string_matches(id->fr, q->fr, sizeof(id->fr));
 }
 
-static void nvme_init_subnqn(struct nvme_ctrl *ctrl, struct nvme_id_ctrl *id)
+static void nvme_init_subnqn(struct nvme_subsystem *subsys, struct nvme_ctrl *ctrl,
+		struct nvme_id_ctrl *id)
 {
 	size_t nqnlen;
 	int off;
 
 	nqnlen = strnlen(id->subnqn, NVMF_NQN_SIZE);
 	if (nqnlen > 0 && nqnlen < NVMF_NQN_SIZE) {
-		strcpy(ctrl->subnqn, id->subnqn);
+		strcpy(subsys->subnqn, id->subnqn);
 		return;
 	}
 
@@ -1751,14 +1755,91 @@ static void nvme_init_subnqn(struct nvme_ctrl *ctrl, struct nvme_id_ctrl *id)
 		dev_warn(ctrl->device, "missing or invalid SUBNQN field.\n");
 
 	/* Generate a "fake" NQN per Figure 254 in NVMe 1.3 + ECN 001 */
-	off = snprintf(ctrl->subnqn, NVMF_NQN_SIZE,
+	off = snprintf(subsys->subnqn, NVMF_NQN_SIZE,
 			"nqn.2014.08.org.nvmexpress:%4x%4x",
 			le16_to_cpu(id->vid), le16_to_cpu(id->ssvid));
-	memcpy(ctrl->subnqn + off, id->sn, sizeof(id->sn));
+	memcpy(subsys->subnqn + off, id->sn, sizeof(id->sn));
 	off += sizeof(id->sn);
-	memcpy(ctrl->subnqn + off, id->mn, sizeof(id->mn));
+	memcpy(subsys->subnqn + off, id->mn, sizeof(id->mn));
 	off += sizeof(id->mn);
-	memset(ctrl->subnqn + off, 0, sizeof(ctrl->subnqn) - off);
+	memset(subsys->subnqn + off, 0, sizeof(subsys->subnqn) - off);
+}
+
+static void nvme_destroy_subsystem(struct kref *ref)
+{
+	struct nvme_subsystem *subsys =
+			container_of(ref, struct nvme_subsystem, ref);
+
+	mutex_lock(&nvme_subsystems_lock);
+	list_del(&subsys->entry);
+	mutex_unlock(&nvme_subsystems_lock);
+
+	kfree(subsys);
+}
+
+static void nvme_put_subsystem(struct nvme_subsystem *subsys)
+{
+	kref_put(&subsys->ref, nvme_destroy_subsystem);
+}
+
+static struct nvme_subsystem *__nvme_find_get_subsystem(const char *subsysnqn)
+{
+	struct nvme_subsystem *subsys;
+
+	lockdep_assert_held(&nvme_subsystems_lock);
+
+	list_for_each_entry(subsys, &nvme_subsystems, entry) {
+		if (strcmp(subsys->subnqn, subsysnqn))
+			continue;
+		if (!kref_get_unless_zero(&subsys->ref))
+			continue;
+		return subsys;
+	}
+
+	return NULL;
+}
+
+static int nvme_init_subsystem(struct nvme_ctrl *ctrl, struct nvme_id_ctrl *id)
+{
+	struct nvme_subsystem *subsys, *found;
+
+	subsys = kzalloc(sizeof(*subsys), GFP_KERNEL);
+	if (!subsys)
+		return -ENOMEM;
+	INIT_LIST_HEAD(&subsys->ctrls);
+	kref_init(&subsys->ref);
+	nvme_init_subnqn(subsys, ctrl, id);
+	mutex_init(&subsys->lock);
+
+	mutex_lock(&nvme_subsystems_lock);
+	found = __nvme_find_get_subsystem(subsys->subnqn);
+	if (found) {
+		/*
+		 * Verify that the subsystem actually supports multiple
+		 * controllers, else bail out.
+		 */
+		kfree(subsys);
+		if (!(id->cmic & (1 << 1))) {
+			dev_err(ctrl->device,
+				"ignoring ctrl due to duplicate subnqn (%s).\n",
+				found->subnqn);
+			mutex_unlock(&nvme_subsystems_lock);
+			return -EINVAL;
+		}
+
+		subsys = found;
+	} else {
+		list_add_tail(&subsys->entry, &nvme_subsystems);
+	}
+
+	ctrl->subsys = subsys;
+	mutex_unlock(&nvme_subsystems_lock);
+
+	mutex_lock(&subsys->lock);
+	list_add_tail(&ctrl->subsys_entry, &subsys->ctrls);
+	mutex_unlock(&subsys->lock);
+
+	return 0;
 }
 
 /*
@@ -1796,7 +1877,11 @@ int nvme_init_identify(struct nvme_ctrl *ctrl)
 		return -EIO;
 	}
 
-	nvme_init_subnqn(ctrl, id);
+	ret = nvme_init_subsystem(ctrl, id);
+	if (ret) {
+		kfree(id);
+		return ret;
+	}
 
 	if (!ctrl->identified) {
 		/*
@@ -2230,7 +2315,7 @@ static ssize_t nvme_sysfs_show_subsysnqn(struct device *dev,
 {
 	struct nvme_ctrl *ctrl = dev_get_drvdata(dev);
 
-	return snprintf(buf, PAGE_SIZE, "%s\n", ctrl->subnqn);
+	return snprintf(buf, PAGE_SIZE, "%s\n", ctrl->subsys->subnqn);
 }
 static DEVICE_ATTR(subsysnqn, S_IRUGO, nvme_sysfs_show_subsysnqn, NULL);
 
@@ -2770,12 +2855,22 @@ EXPORT_SYMBOL_GPL(nvme_uninit_ctrl);
 static void nvme_free_ctrl(struct kref *kref)
 {
 	struct nvme_ctrl *ctrl = container_of(kref, struct nvme_ctrl, kref);
+	struct nvme_subsystem *subsys = ctrl->subsys;
 
 	put_device(ctrl->device);
 	nvme_release_instance(ctrl);
 	ida_destroy(&ctrl->ns_ida);
 
+	if (subsys) {
+		mutex_lock(&subsys->lock);
+		list_del(&ctrl->subsys_entry);
+		mutex_unlock(&subsys->lock);
+	}
+
 	ctrl->ops->free_ctrl(ctrl);
+
+	if (subsys)
+		nvme_put_subsystem(subsys);
 }
 
 void nvme_put_ctrl(struct nvme_ctrl *ctrl)
diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c
index 47307752dc65..ec443dcb289b 100644
--- a/drivers/nvme/host/fabrics.c
+++ b/drivers/nvme/host/fabrics.c
@@ -874,10 +874,10 @@ nvmf_create_ctrl(struct device *dev, const char *buf, size_t count)
 		goto out_unlock;
 	}
 
-	if (strcmp(ctrl->subnqn, opts->subsysnqn)) {
+	if (strcmp(ctrl->subsys->subnqn, opts->subsysnqn)) {
 		dev_warn(ctrl->device,
 			"controller returned incorrect NQN: \"%s\".\n",
-			ctrl->subnqn);
+			ctrl->subsys->subnqn);
 		up_read(&nvmf_transports_rwsem);
 		ctrl->ops->delete_ctrl(ctrl);
 		return ERR_PTR(-EINVAL);
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index d3f3c4447515..b96aeaf93dc1 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -138,13 +138,15 @@ struct nvme_ctrl {
 	struct ida ns_ida;
 	struct work_struct reset_work;
 
+	struct nvme_subsystem *subsys;
+	struct list_head subsys_entry;
+
 	struct opal_dev *opal_dev;
 
 	char name[12];
 	char serial[20];
 	char model[40];
 	char firmware_rev[8];
-	char subnqn[NVMF_NQN_SIZE];
 	u16 cntlid;
 
 	u32 ctrl_config;
@@ -197,6 +199,14 @@ struct nvme_ctrl {
 	struct nvmf_ctrl_options *opts;
 };
 
+struct nvme_subsystem {
+	struct list_head	entry;
+	struct mutex		lock;
+	struct list_head	ctrls;
+	struct kref		ref;
+	char			subnqn[NVMF_NQN_SIZE];
+};
+
 struct nvme_ns {
 	struct list_head list;
 
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 7/9] nvme: introduce a nvme_ns_ids structure
  2017-09-18 23:14 ` Christoph Hellwig
@ 2017-09-18 23:14   ` Christoph Hellwig
  -1 siblings, 0 replies; 66+ messages in thread
From: Christoph Hellwig @ 2017-09-18 23:14 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Keith Busch, Sagi Grimberg, linux-nvme, linux-block

This allows us to manage the various uniqueue namespace identifiers
together instead needing various variables and arguments.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/core.c | 69 +++++++++++++++++++++++++++---------------------
 drivers/nvme/host/nvme.h | 14 +++++++---
 2 files changed, 49 insertions(+), 34 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 4291119a6bc9..9da538d7ca87 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -781,7 +781,7 @@ static int nvme_identify_ctrl(struct nvme_ctrl *dev, struct nvme_id_ctrl **id)
 }
 
 static int nvme_identify_ns_descs(struct nvme_ctrl *ctrl, unsigned nsid,
-		u8 *eui64, u8 *nguid, uuid_t *uuid)
+		struct nvme_ns_ids *ids)
 {
 	struct nvme_command c = { };
 	int status;
@@ -817,7 +817,7 @@ static int nvme_identify_ns_descs(struct nvme_ctrl *ctrl, unsigned nsid,
 				goto free_data;
 			}
 			len = NVME_NIDT_EUI64_LEN;
-			memcpy(eui64, data + pos + sizeof(*cur), len);
+			memcpy(ids->eui64, data + pos + sizeof(*cur), len);
 			break;
 		case NVME_NIDT_NGUID:
 			if (cur->nidl != NVME_NIDT_NGUID_LEN) {
@@ -827,7 +827,7 @@ static int nvme_identify_ns_descs(struct nvme_ctrl *ctrl, unsigned nsid,
 				goto free_data;
 			}
 			len = NVME_NIDT_NGUID_LEN;
-			memcpy(nguid, data + pos + sizeof(*cur), len);
+			memcpy(ids->nguid, data + pos + sizeof(*cur), len);
 			break;
 		case NVME_NIDT_UUID:
 			if (cur->nidl != NVME_NIDT_UUID_LEN) {
@@ -837,7 +837,7 @@ static int nvme_identify_ns_descs(struct nvme_ctrl *ctrl, unsigned nsid,
 				goto free_data;
 			}
 			len = NVME_NIDT_UUID_LEN;
-			uuid_copy(uuid, data + pos + sizeof(*cur));
+			uuid_copy(&ids->uuid, data + pos + sizeof(*cur));
 			break;
 		default:
 			/* Skip unnkown types */
@@ -1178,22 +1178,31 @@ static void nvme_config_discard(struct nvme_ns *ns)
 }
 
 static void nvme_report_ns_ids(struct nvme_ctrl *ctrl, unsigned int nsid,
-		struct nvme_id_ns *id, u8 *eui64, u8 *nguid, uuid_t *uuid)
+		struct nvme_id_ns *id, struct nvme_ns_ids *ids)
 {
+	memset(ids, 0, sizeof(*ids));
+
 	if (ctrl->vs >= NVME_VS(1, 1, 0))
-		memcpy(eui64, id->eui64, sizeof(id->eui64));
+		memcpy(ids->eui64, id->eui64, sizeof(id->eui64));
 	if (ctrl->vs >= NVME_VS(1, 2, 0))
-		memcpy(nguid, id->nguid, sizeof(id->nguid));
+		memcpy(ids->nguid, id->nguid, sizeof(id->nguid));
 	if (ctrl->vs >= NVME_VS(1, 3, 0)) {
 		 /* Don't treat error as fatal we potentially
 		  * already have a NGUID or EUI-64
 		  */
-		if (nvme_identify_ns_descs(ctrl, nsid, eui64, nguid, uuid))
+		if (nvme_identify_ns_descs(ctrl, nsid, ids))
 			dev_warn(ctrl->device,
 				 "%s: Identify Descriptors failed\n", __func__);
 	}
 }
 
+static bool nvme_ns_ids_equal(struct nvme_ns_ids *a, struct nvme_ns_ids *b)
+{
+	return uuid_equal(&a->uuid, &b->uuid) &&
+		memcmp(&a->nguid, &b->nguid, sizeof(a->nguid)) == 0 &&
+		memcmp(&a->eui64, &b->eui64, sizeof(a->eui64)) == 0;
+}
+
 static void __nvme_revalidate_disk(struct gendisk *disk, struct nvme_id_ns *id)
 {
 	struct nvme_ns *ns = disk->private_data;
@@ -1234,8 +1243,7 @@ static int nvme_revalidate_disk(struct gendisk *disk)
 	struct nvme_ns *ns = disk->private_data;
 	struct nvme_ctrl *ctrl = ns->ctrl;
 	struct nvme_id_ns *id;
-	u8 eui64[8] = { 0 }, nguid[16] = { 0 };
-	uuid_t uuid = uuid_null;
+	struct nvme_ns_ids ids;
 	int ret = 0;
 
 	if (test_bit(NVME_NS_DEAD, &ns->flags)) {
@@ -1252,10 +1260,8 @@ static int nvme_revalidate_disk(struct gendisk *disk)
 		goto out;
 	}
 
-	nvme_report_ns_ids(ctrl, ns->ns_id, id, eui64, nguid, &uuid);
-	if (!uuid_equal(&ns->uuid, &uuid) ||
-	    memcmp(&ns->nguid, &nguid, sizeof(ns->nguid)) ||
-	    memcmp(&ns->eui, &eui64, sizeof(ns->eui))) {
+	nvme_report_ns_ids(ctrl, ns->ns_id, id, &ids);
+	if (!nvme_ns_ids_equal(&ns->ids, &ids)) {
 		dev_err(ctrl->device,
 			"identifiers changed for nsid %d\n", ns->ns_id);
 		ret = -ENODEV;
@@ -2139,18 +2145,19 @@ static ssize_t wwid_show(struct device *dev, struct device_attribute *attr,
 								char *buf)
 {
 	struct nvme_ns *ns = nvme_get_ns_from_dev(dev);
+	struct nvme_ns_ids *ids = &ns->ids;
 	struct nvme_ctrl *ctrl = ns->ctrl;
 	int serial_len = sizeof(ctrl->serial);
 	int model_len = sizeof(ctrl->model);
 
-	if (!uuid_is_null(&ns->uuid))
-		return sprintf(buf, "uuid.%pU\n", &ns->uuid);
+	if (!uuid_is_null(&ids->uuid))
+		return sprintf(buf, "uuid.%pU\n", &ids->uuid);
 
-	if (memchr_inv(ns->nguid, 0, sizeof(ns->nguid)))
-		return sprintf(buf, "eui.%16phN\n", ns->nguid);
+	if (memchr_inv(ids->nguid, 0, sizeof(ids->nguid)))
+		return sprintf(buf, "eui.%16phN\n", ids->nguid);
 
-	if (memchr_inv(ns->eui, 0, sizeof(ns->eui)))
-		return sprintf(buf, "eui.%8phN\n", ns->eui);
+	if (memchr_inv(ids->eui64, 0, sizeof(ids->eui64)))
+		return sprintf(buf, "eui.%8phN\n", ids->eui64);
 
 	while (serial_len > 0 && (ctrl->serial[serial_len - 1] == ' ' ||
 				  ctrl->serial[serial_len - 1] == '\0'))
@@ -2168,7 +2175,7 @@ static ssize_t nguid_show(struct device *dev, struct device_attribute *attr,
 			  char *buf)
 {
 	struct nvme_ns *ns = nvme_get_ns_from_dev(dev);
-	return sprintf(buf, "%pU\n", ns->nguid);
+	return sprintf(buf, "%pU\n", ns->ids.nguid);
 }
 static DEVICE_ATTR(nguid, S_IRUGO, nguid_show, NULL);
 
@@ -2176,16 +2183,17 @@ static ssize_t uuid_show(struct device *dev, struct device_attribute *attr,
 								char *buf)
 {
 	struct nvme_ns *ns = nvme_get_ns_from_dev(dev);
+	struct nvme_ns_ids *ids = &ns->ids;
 
 	/* For backward compatibility expose the NGUID to userspace if
 	 * we have no UUID set
 	 */
-	if (uuid_is_null(&ns->uuid)) {
+	if (uuid_is_null(&ids->uuid)) {
 		printk_ratelimited(KERN_WARNING
 				   "No UUID available providing old NGUID\n");
-		return sprintf(buf, "%pU\n", ns->nguid);
+		return sprintf(buf, "%pU\n", ids->nguid);
 	}
-	return sprintf(buf, "%pU\n", &ns->uuid);
+	return sprintf(buf, "%pU\n", &ids->uuid);
 }
 static DEVICE_ATTR(uuid, S_IRUGO, uuid_show, NULL);
 
@@ -2193,7 +2201,7 @@ static ssize_t eui_show(struct device *dev, struct device_attribute *attr,
 								char *buf)
 {
 	struct nvme_ns *ns = nvme_get_ns_from_dev(dev);
-	return sprintf(buf, "%8phd\n", ns->eui);
+	return sprintf(buf, "%8phd\n", ns->ids.eui64);
 }
 static DEVICE_ATTR(eui, S_IRUGO, eui_show, NULL);
 
@@ -2219,18 +2227,19 @@ static umode_t nvme_ns_attrs_are_visible(struct kobject *kobj,
 {
 	struct device *dev = container_of(kobj, struct device, kobj);
 	struct nvme_ns *ns = nvme_get_ns_from_dev(dev);
+	struct nvme_ns_ids *ids = &ns->ids;
 
 	if (a == &dev_attr_uuid.attr) {
-		if (uuid_is_null(&ns->uuid) ||
-		    !memchr_inv(ns->nguid, 0, sizeof(ns->nguid)))
+		if (uuid_is_null(&ids->uuid) ||
+		    !memchr_inv(ids->nguid, 0, sizeof(ids->nguid)))
 			return 0;
 	}
 	if (a == &dev_attr_nguid.attr) {
-		if (!memchr_inv(ns->nguid, 0, sizeof(ns->nguid)))
+		if (!memchr_inv(ids->nguid, 0, sizeof(ids->nguid)))
 			return 0;
 	}
 	if (a == &dev_attr_eui.attr) {
-		if (!memchr_inv(ns->eui, 0, sizeof(ns->eui)))
+		if (!memchr_inv(ids->eui64, 0, sizeof(ids->eui64)))
 			return 0;
 	}
 	return a->mode;
@@ -2460,7 +2469,7 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 	if (id->ncap == 0)
 		goto out_free_id;
 
-	nvme_report_ns_ids(ctrl, ns->ns_id, id, ns->eui, ns->nguid, &ns->uuid);
+	nvme_report_ns_ids(ctrl, ns->ns_id, id, &ns->ids);
 
 	if ((ctrl->quirks & NVME_QUIRK_LIGHTNVM) && id->vs[0] == 0x1) {
 		if (nvme_nvm_register(ns, disk_name, node)) {
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index b96aeaf93dc1..1ad5dff473ea 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -207,6 +207,15 @@ struct nvme_subsystem {
 	char			subnqn[NVMF_NQN_SIZE];
 };
 
+/*
+ * Container structure for uniqueue namespace identifiers.
+ */
+struct nvme_ns_ids {
+	u8	eui64[8];
+	u8	nguid[16];
+	uuid_t	uuid;
+};
+
 struct nvme_ns {
 	struct list_head list;
 
@@ -217,11 +226,8 @@ struct nvme_ns {
 	struct kref kref;
 	int instance;
 
-	u8 eui[8];
-	u8 nguid[16];
-	uuid_t uuid;
-
 	unsigned ns_id;
+	struct nvme_ns_ids ids;
 	int lba_shift;
 	u16 ms;
 	u16 sgs;
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 7/9] nvme: introduce a nvme_ns_ids structure
@ 2017-09-18 23:14   ` Christoph Hellwig
  0 siblings, 0 replies; 66+ messages in thread
From: Christoph Hellwig @ 2017-09-18 23:14 UTC (permalink / raw)


This allows us to manage the various uniqueue namespace identifiers
together instead needing various variables and arguments.

Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/nvme/host/core.c | 69 +++++++++++++++++++++++++++---------------------
 drivers/nvme/host/nvme.h | 14 +++++++---
 2 files changed, 49 insertions(+), 34 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 4291119a6bc9..9da538d7ca87 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -781,7 +781,7 @@ static int nvme_identify_ctrl(struct nvme_ctrl *dev, struct nvme_id_ctrl **id)
 }
 
 static int nvme_identify_ns_descs(struct nvme_ctrl *ctrl, unsigned nsid,
-		u8 *eui64, u8 *nguid, uuid_t *uuid)
+		struct nvme_ns_ids *ids)
 {
 	struct nvme_command c = { };
 	int status;
@@ -817,7 +817,7 @@ static int nvme_identify_ns_descs(struct nvme_ctrl *ctrl, unsigned nsid,
 				goto free_data;
 			}
 			len = NVME_NIDT_EUI64_LEN;
-			memcpy(eui64, data + pos + sizeof(*cur), len);
+			memcpy(ids->eui64, data + pos + sizeof(*cur), len);
 			break;
 		case NVME_NIDT_NGUID:
 			if (cur->nidl != NVME_NIDT_NGUID_LEN) {
@@ -827,7 +827,7 @@ static int nvme_identify_ns_descs(struct nvme_ctrl *ctrl, unsigned nsid,
 				goto free_data;
 			}
 			len = NVME_NIDT_NGUID_LEN;
-			memcpy(nguid, data + pos + sizeof(*cur), len);
+			memcpy(ids->nguid, data + pos + sizeof(*cur), len);
 			break;
 		case NVME_NIDT_UUID:
 			if (cur->nidl != NVME_NIDT_UUID_LEN) {
@@ -837,7 +837,7 @@ static int nvme_identify_ns_descs(struct nvme_ctrl *ctrl, unsigned nsid,
 				goto free_data;
 			}
 			len = NVME_NIDT_UUID_LEN;
-			uuid_copy(uuid, data + pos + sizeof(*cur));
+			uuid_copy(&ids->uuid, data + pos + sizeof(*cur));
 			break;
 		default:
 			/* Skip unnkown types */
@@ -1178,22 +1178,31 @@ static void nvme_config_discard(struct nvme_ns *ns)
 }
 
 static void nvme_report_ns_ids(struct nvme_ctrl *ctrl, unsigned int nsid,
-		struct nvme_id_ns *id, u8 *eui64, u8 *nguid, uuid_t *uuid)
+		struct nvme_id_ns *id, struct nvme_ns_ids *ids)
 {
+	memset(ids, 0, sizeof(*ids));
+
 	if (ctrl->vs >= NVME_VS(1, 1, 0))
-		memcpy(eui64, id->eui64, sizeof(id->eui64));
+		memcpy(ids->eui64, id->eui64, sizeof(id->eui64));
 	if (ctrl->vs >= NVME_VS(1, 2, 0))
-		memcpy(nguid, id->nguid, sizeof(id->nguid));
+		memcpy(ids->nguid, id->nguid, sizeof(id->nguid));
 	if (ctrl->vs >= NVME_VS(1, 3, 0)) {
 		 /* Don't treat error as fatal we potentially
 		  * already have a NGUID or EUI-64
 		  */
-		if (nvme_identify_ns_descs(ctrl, nsid, eui64, nguid, uuid))
+		if (nvme_identify_ns_descs(ctrl, nsid, ids))
 			dev_warn(ctrl->device,
 				 "%s: Identify Descriptors failed\n", __func__);
 	}
 }
 
+static bool nvme_ns_ids_equal(struct nvme_ns_ids *a, struct nvme_ns_ids *b)
+{
+	return uuid_equal(&a->uuid, &b->uuid) &&
+		memcmp(&a->nguid, &b->nguid, sizeof(a->nguid)) == 0 &&
+		memcmp(&a->eui64, &b->eui64, sizeof(a->eui64)) == 0;
+}
+
 static void __nvme_revalidate_disk(struct gendisk *disk, struct nvme_id_ns *id)
 {
 	struct nvme_ns *ns = disk->private_data;
@@ -1234,8 +1243,7 @@ static int nvme_revalidate_disk(struct gendisk *disk)
 	struct nvme_ns *ns = disk->private_data;
 	struct nvme_ctrl *ctrl = ns->ctrl;
 	struct nvme_id_ns *id;
-	u8 eui64[8] = { 0 }, nguid[16] = { 0 };
-	uuid_t uuid = uuid_null;
+	struct nvme_ns_ids ids;
 	int ret = 0;
 
 	if (test_bit(NVME_NS_DEAD, &ns->flags)) {
@@ -1252,10 +1260,8 @@ static int nvme_revalidate_disk(struct gendisk *disk)
 		goto out;
 	}
 
-	nvme_report_ns_ids(ctrl, ns->ns_id, id, eui64, nguid, &uuid);
-	if (!uuid_equal(&ns->uuid, &uuid) ||
-	    memcmp(&ns->nguid, &nguid, sizeof(ns->nguid)) ||
-	    memcmp(&ns->eui, &eui64, sizeof(ns->eui))) {
+	nvme_report_ns_ids(ctrl, ns->ns_id, id, &ids);
+	if (!nvme_ns_ids_equal(&ns->ids, &ids)) {
 		dev_err(ctrl->device,
 			"identifiers changed for nsid %d\n", ns->ns_id);
 		ret = -ENODEV;
@@ -2139,18 +2145,19 @@ static ssize_t wwid_show(struct device *dev, struct device_attribute *attr,
 								char *buf)
 {
 	struct nvme_ns *ns = nvme_get_ns_from_dev(dev);
+	struct nvme_ns_ids *ids = &ns->ids;
 	struct nvme_ctrl *ctrl = ns->ctrl;
 	int serial_len = sizeof(ctrl->serial);
 	int model_len = sizeof(ctrl->model);
 
-	if (!uuid_is_null(&ns->uuid))
-		return sprintf(buf, "uuid.%pU\n", &ns->uuid);
+	if (!uuid_is_null(&ids->uuid))
+		return sprintf(buf, "uuid.%pU\n", &ids->uuid);
 
-	if (memchr_inv(ns->nguid, 0, sizeof(ns->nguid)))
-		return sprintf(buf, "eui.%16phN\n", ns->nguid);
+	if (memchr_inv(ids->nguid, 0, sizeof(ids->nguid)))
+		return sprintf(buf, "eui.%16phN\n", ids->nguid);
 
-	if (memchr_inv(ns->eui, 0, sizeof(ns->eui)))
-		return sprintf(buf, "eui.%8phN\n", ns->eui);
+	if (memchr_inv(ids->eui64, 0, sizeof(ids->eui64)))
+		return sprintf(buf, "eui.%8phN\n", ids->eui64);
 
 	while (serial_len > 0 && (ctrl->serial[serial_len - 1] == ' ' ||
 				  ctrl->serial[serial_len - 1] == '\0'))
@@ -2168,7 +2175,7 @@ static ssize_t nguid_show(struct device *dev, struct device_attribute *attr,
 			  char *buf)
 {
 	struct nvme_ns *ns = nvme_get_ns_from_dev(dev);
-	return sprintf(buf, "%pU\n", ns->nguid);
+	return sprintf(buf, "%pU\n", ns->ids.nguid);
 }
 static DEVICE_ATTR(nguid, S_IRUGO, nguid_show, NULL);
 
@@ -2176,16 +2183,17 @@ static ssize_t uuid_show(struct device *dev, struct device_attribute *attr,
 								char *buf)
 {
 	struct nvme_ns *ns = nvme_get_ns_from_dev(dev);
+	struct nvme_ns_ids *ids = &ns->ids;
 
 	/* For backward compatibility expose the NGUID to userspace if
 	 * we have no UUID set
 	 */
-	if (uuid_is_null(&ns->uuid)) {
+	if (uuid_is_null(&ids->uuid)) {
 		printk_ratelimited(KERN_WARNING
 				   "No UUID available providing old NGUID\n");
-		return sprintf(buf, "%pU\n", ns->nguid);
+		return sprintf(buf, "%pU\n", ids->nguid);
 	}
-	return sprintf(buf, "%pU\n", &ns->uuid);
+	return sprintf(buf, "%pU\n", &ids->uuid);
 }
 static DEVICE_ATTR(uuid, S_IRUGO, uuid_show, NULL);
 
@@ -2193,7 +2201,7 @@ static ssize_t eui_show(struct device *dev, struct device_attribute *attr,
 								char *buf)
 {
 	struct nvme_ns *ns = nvme_get_ns_from_dev(dev);
-	return sprintf(buf, "%8phd\n", ns->eui);
+	return sprintf(buf, "%8phd\n", ns->ids.eui64);
 }
 static DEVICE_ATTR(eui, S_IRUGO, eui_show, NULL);
 
@@ -2219,18 +2227,19 @@ static umode_t nvme_ns_attrs_are_visible(struct kobject *kobj,
 {
 	struct device *dev = container_of(kobj, struct device, kobj);
 	struct nvme_ns *ns = nvme_get_ns_from_dev(dev);
+	struct nvme_ns_ids *ids = &ns->ids;
 
 	if (a == &dev_attr_uuid.attr) {
-		if (uuid_is_null(&ns->uuid) ||
-		    !memchr_inv(ns->nguid, 0, sizeof(ns->nguid)))
+		if (uuid_is_null(&ids->uuid) ||
+		    !memchr_inv(ids->nguid, 0, sizeof(ids->nguid)))
 			return 0;
 	}
 	if (a == &dev_attr_nguid.attr) {
-		if (!memchr_inv(ns->nguid, 0, sizeof(ns->nguid)))
+		if (!memchr_inv(ids->nguid, 0, sizeof(ids->nguid)))
 			return 0;
 	}
 	if (a == &dev_attr_eui.attr) {
-		if (!memchr_inv(ns->eui, 0, sizeof(ns->eui)))
+		if (!memchr_inv(ids->eui64, 0, sizeof(ids->eui64)))
 			return 0;
 	}
 	return a->mode;
@@ -2460,7 +2469,7 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 	if (id->ncap == 0)
 		goto out_free_id;
 
-	nvme_report_ns_ids(ctrl, ns->ns_id, id, ns->eui, ns->nguid, &ns->uuid);
+	nvme_report_ns_ids(ctrl, ns->ns_id, id, &ns->ids);
 
 	if ((ctrl->quirks & NVME_QUIRK_LIGHTNVM) && id->vs[0] == 0x1) {
 		if (nvme_nvm_register(ns, disk_name, node)) {
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index b96aeaf93dc1..1ad5dff473ea 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -207,6 +207,15 @@ struct nvme_subsystem {
 	char			subnqn[NVMF_NQN_SIZE];
 };
 
+/*
+ * Container structure for uniqueue namespace identifiers.
+ */
+struct nvme_ns_ids {
+	u8	eui64[8];
+	u8	nguid[16];
+	uuid_t	uuid;
+};
+
 struct nvme_ns {
 	struct list_head list;
 
@@ -217,11 +226,8 @@ struct nvme_ns {
 	struct kref kref;
 	int instance;
 
-	u8 eui[8];
-	u8 nguid[16];
-	uuid_t uuid;
-
 	unsigned ns_id;
+	struct nvme_ns_ids ids;
 	int lba_shift;
 	u16 ms;
 	u16 sgs;
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 8/9] nvme: track shared namespaces
  2017-09-18 23:14 ` Christoph Hellwig
@ 2017-09-18 23:14   ` Christoph Hellwig
  -1 siblings, 0 replies; 66+ messages in thread
From: Christoph Hellwig @ 2017-09-18 23:14 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Keith Busch, Sagi Grimberg, linux-nvme, linux-block

Introduce a new struct nvme_ns_head [1] that holds information about
an actual namespace, unlike struct nvme_ns, which only holds the
per-controller namespace information.  For private namespaces there
is a 1:1 relation of the two, but for shared namespaces this lets us
discover all the paths to it.  For now only the identifiers are moved
to the new structure, but most of the information in struct nvme_ns
should eventually move over.

To allow lockless path lookup the list of nvme_ns structures per
nvme_ns_head is protected by SRCU, which requires freeing the nvme_ns
structure through call_srcu.

[1] comments welcome if you have a better name for it, the current one is
    horrible.  One idea would be to rename the current struct nvme_ns
    to struct nvme_ns_link or similar and use the nvme_ns name for the
    new structure.  But that would involve a lot of churn.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/core.c     | 193 +++++++++++++++++++++++++++++++++++++------
 drivers/nvme/host/lightnvm.c |  14 ++--
 drivers/nvme/host/nvme.h     |  21 ++++-
 3 files changed, 193 insertions(+), 35 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 9da538d7ca87..3e8405fd57a9 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -247,10 +247,28 @@ bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
 }
 EXPORT_SYMBOL_GPL(nvme_change_ctrl_state);
 
+static void nvme_destroy_ns_head(struct kref *ref)
+{
+	struct nvme_ns_head *head =
+		container_of(ref, struct nvme_ns_head, ref);
+
+	list_del_init(&head->entry);
+	cleanup_srcu_struct(&head->srcu);
+	kfree(head);
+}
+
+static void nvme_put_ns_head(struct nvme_ns_head *head)
+{
+	kref_put(&head->ref, nvme_destroy_ns_head);
+}
+
 static void nvme_free_ns(struct kref *kref)
 {
 	struct nvme_ns *ns = container_of(kref, struct nvme_ns, kref);
 
+	if (ns->head)
+		nvme_put_ns_head(ns->head);
+
 	if (ns->ndev)
 		nvme_nvm_unregister(ns);
 
@@ -420,7 +438,7 @@ static inline void nvme_setup_flush(struct nvme_ns *ns,
 {
 	memset(cmnd, 0, sizeof(*cmnd));
 	cmnd->common.opcode = nvme_cmd_flush;
-	cmnd->common.nsid = cpu_to_le32(ns->ns_id);
+	cmnd->common.nsid = cpu_to_le32(ns->head->ns_id);
 }
 
 static blk_status_t nvme_setup_discard(struct nvme_ns *ns, struct request *req,
@@ -451,7 +469,7 @@ static blk_status_t nvme_setup_discard(struct nvme_ns *ns, struct request *req,
 
 	memset(cmnd, 0, sizeof(*cmnd));
 	cmnd->dsm.opcode = nvme_cmd_dsm;
-	cmnd->dsm.nsid = cpu_to_le32(ns->ns_id);
+	cmnd->dsm.nsid = cpu_to_le32(ns->head->ns_id);
 	cmnd->dsm.nr = cpu_to_le32(segments - 1);
 	cmnd->dsm.attributes = cpu_to_le32(NVME_DSMGMT_AD);
 
@@ -490,7 +508,7 @@ static inline blk_status_t nvme_setup_rw(struct nvme_ns *ns,
 
 	memset(cmnd, 0, sizeof(*cmnd));
 	cmnd->rw.opcode = (rq_data_dir(req) ? nvme_cmd_write : nvme_cmd_read);
-	cmnd->rw.nsid = cpu_to_le32(ns->ns_id);
+	cmnd->rw.nsid = cpu_to_le32(ns->head->ns_id);
 	cmnd->rw.slba = cpu_to_le64(nvme_block_nr(ns, blk_rq_pos(req)));
 	cmnd->rw.length = cpu_to_le16((blk_rq_bytes(req) >> ns->lba_shift) - 1);
 
@@ -971,7 +989,7 @@ static int nvme_submit_io(struct nvme_ns *ns, struct nvme_user_io __user *uio)
 	memset(&c, 0, sizeof(c));
 	c.rw.opcode = io.opcode;
 	c.rw.flags = io.flags;
-	c.rw.nsid = cpu_to_le32(ns->ns_id);
+	c.rw.nsid = cpu_to_le32(ns->head->ns_id);
 	c.rw.slba = cpu_to_le64(io.slba);
 	c.rw.length = cpu_to_le16(io.nblocks);
 	c.rw.control = cpu_to_le16(io.control);
@@ -1036,7 +1054,7 @@ static int nvme_ioctl(struct block_device *bdev, fmode_t mode,
 	switch (cmd) {
 	case NVME_IOCTL_ID:
 		force_successful_syscall_return();
-		return ns->ns_id;
+		return ns->head->ns_id;
 	case NVME_IOCTL_ADMIN_CMD:
 		return nvme_user_cmd(ns->ctrl, NULL, (void __user *)arg);
 	case NVME_IOCTL_IO_CMD:
@@ -1196,6 +1214,13 @@ static void nvme_report_ns_ids(struct nvme_ctrl *ctrl, unsigned int nsid,
 	}
 }
 
+static bool nvme_ns_ids_valid(struct nvme_ns_ids *ids)
+{
+	return !uuid_is_null(&ids->uuid) ||
+		memchr_inv(ids->nguid, 0, sizeof(ids->nguid)) ||
+		memchr_inv(ids->eui64, 0, sizeof(ids->eui64));
+}
+
 static bool nvme_ns_ids_equal(struct nvme_ns_ids *a, struct nvme_ns_ids *b)
 {
 	return uuid_equal(&a->uuid, &b->uuid) &&
@@ -1251,7 +1276,7 @@ static int nvme_revalidate_disk(struct gendisk *disk)
 		return -ENODEV;
 	}
 
-	id = nvme_identify_ns(ctrl, ns->ns_id);
+	id = nvme_identify_ns(ctrl, ns->head->ns_id);
 	if (!id)
 		return -ENODEV;
 
@@ -1260,10 +1285,10 @@ static int nvme_revalidate_disk(struct gendisk *disk)
 		goto out;
 	}
 
-	nvme_report_ns_ids(ctrl, ns->ns_id, id, &ids);
-	if (!nvme_ns_ids_equal(&ns->ids, &ids)) {
+	nvme_report_ns_ids(ctrl, ns->head->ns_id, id, &ids);
+	if (!nvme_ns_ids_equal(&ns->head->ids, &ids)) {
 		dev_err(ctrl->device,
-			"identifiers changed for nsid %d\n", ns->ns_id);
+			"identifiers changed for nsid %d\n", ns->head->ns_id);
 		ret = -ENODEV;
 	}
 
@@ -1304,7 +1329,7 @@ static int nvme_pr_command(struct block_device *bdev, u32 cdw10,
 
 	memset(&c, 0, sizeof(c));
 	c.common.opcode = op;
-	c.common.nsid = cpu_to_le32(ns->ns_id);
+	c.common.nsid = cpu_to_le32(ns->head->ns_id);
 	c.common.cdw10[0] = cpu_to_le32(cdw10);
 
 	return nvme_submit_sync_cmd(ns->queue, &c, data, 16);
@@ -1813,6 +1838,7 @@ static int nvme_init_subsystem(struct nvme_ctrl *ctrl, struct nvme_id_ctrl *id)
 	if (!subsys)
 		return -ENOMEM;
 	INIT_LIST_HEAD(&subsys->ctrls);
+	INIT_LIST_HEAD(&subsys->nsheads);
 	kref_init(&subsys->ref);
 	nvme_init_subnqn(subsys, ctrl, id);
 	mutex_init(&subsys->lock);
@@ -2145,7 +2171,7 @@ static ssize_t wwid_show(struct device *dev, struct device_attribute *attr,
 								char *buf)
 {
 	struct nvme_ns *ns = nvme_get_ns_from_dev(dev);
-	struct nvme_ns_ids *ids = &ns->ids;
+	struct nvme_ns_ids *ids = &ns->head->ids;
 	struct nvme_ctrl *ctrl = ns->ctrl;
 	int serial_len = sizeof(ctrl->serial);
 	int model_len = sizeof(ctrl->model);
@@ -2167,7 +2193,8 @@ static ssize_t wwid_show(struct device *dev, struct device_attribute *attr,
 		model_len--;
 
 	return sprintf(buf, "nvme.%04x-%*phN-%*phN-%08x\n", ctrl->vid,
-		serial_len, ctrl->serial, model_len, ctrl->model, ns->ns_id);
+		serial_len, ctrl->serial, model_len, ctrl->model,
+		ns->head->ns_id);
 }
 static DEVICE_ATTR(wwid, S_IRUGO, wwid_show, NULL);
 
@@ -2175,7 +2202,7 @@ static ssize_t nguid_show(struct device *dev, struct device_attribute *attr,
 			  char *buf)
 {
 	struct nvme_ns *ns = nvme_get_ns_from_dev(dev);
-	return sprintf(buf, "%pU\n", ns->ids.nguid);
+	return sprintf(buf, "%pU\n", ns->head->ids.nguid);
 }
 static DEVICE_ATTR(nguid, S_IRUGO, nguid_show, NULL);
 
@@ -2183,7 +2210,7 @@ static ssize_t uuid_show(struct device *dev, struct device_attribute *attr,
 								char *buf)
 {
 	struct nvme_ns *ns = nvme_get_ns_from_dev(dev);
-	struct nvme_ns_ids *ids = &ns->ids;
+	struct nvme_ns_ids *ids = &ns->head->ids;
 
 	/* For backward compatibility expose the NGUID to userspace if
 	 * we have no UUID set
@@ -2201,7 +2228,7 @@ static ssize_t eui_show(struct device *dev, struct device_attribute *attr,
 								char *buf)
 {
 	struct nvme_ns *ns = nvme_get_ns_from_dev(dev);
-	return sprintf(buf, "%8phd\n", ns->ids.eui64);
+	return sprintf(buf, "%8phd\n", ns->head->ids.eui64);
 }
 static DEVICE_ATTR(eui, S_IRUGO, eui_show, NULL);
 
@@ -2209,7 +2236,7 @@ static ssize_t nsid_show(struct device *dev, struct device_attribute *attr,
 								char *buf)
 {
 	struct nvme_ns *ns = nvme_get_ns_from_dev(dev);
-	return sprintf(buf, "%d\n", ns->ns_id);
+	return sprintf(buf, "%d\n", ns->head->ns_id);
 }
 static DEVICE_ATTR(nsid, S_IRUGO, nsid_show, NULL);
 
@@ -2227,7 +2254,7 @@ static umode_t nvme_ns_attrs_are_visible(struct kobject *kobj,
 {
 	struct device *dev = container_of(kobj, struct device, kobj);
 	struct nvme_ns *ns = nvme_get_ns_from_dev(dev);
-	struct nvme_ns_ids *ids = &ns->ids;
+	struct nvme_ns_ids *ids = &ns->head->ids;
 
 	if (a == &dev_attr_uuid.attr) {
 		if (uuid_is_null(&ids->uuid) ||
@@ -2377,12 +2404,114 @@ static const struct attribute_group *nvme_dev_attr_groups[] = {
 	NULL,
 };
 
+static struct nvme_ns_head *__nvme_find_ns_head(struct nvme_subsystem *subsys,
+		unsigned nsid)
+{
+	struct nvme_ns_head *h;
+
+	lockdep_assert_held(&subsys->lock);
+
+	list_for_each_entry(h, &subsys->nsheads, entry) {
+		if (h->ns_id == nsid && kref_get_unless_zero(&h->ref))
+			return h;
+	}
+
+	return NULL;
+}
+
+static int __nvme_check_ids(struct nvme_subsystem *subsys,
+		struct nvme_ns_head *new)
+{
+	struct nvme_ns_head *h;
+
+	lockdep_assert_held(&subsys->lock);
+
+	list_for_each_entry(h, &subsys->nsheads, entry) {
+		if (nvme_ns_ids_valid(&new->ids) &&
+		    nvme_ns_ids_equal(&new->ids, &h->ids))
+			return -EINVAL;
+	}
+
+	return 0;
+}
+
+static struct nvme_ns_head *nvme_alloc_ns_head(struct nvme_ctrl *ctrl,
+		unsigned nsid, struct nvme_id_ns *id)
+{
+	struct nvme_ns_head *head;
+	int ret = -ENOMEM;
+
+	head = kzalloc(sizeof(*head), GFP_KERNEL);
+	if (!head)
+		goto out;
+
+	INIT_LIST_HEAD(&head->list);
+	head->ns_id = nsid;
+	init_srcu_struct(&head->srcu);
+	kref_init(&head->ref);
+
+	nvme_report_ns_ids(ctrl, nsid, id, &head->ids);
+
+	ret = __nvme_check_ids(ctrl->subsys, head);
+	if (ret) {
+		dev_err(ctrl->device,
+			"duplicate IDs for nsid %d\n", nsid);
+		goto out_free_head;
+	}
+
+	list_add_tail(&head->entry, &ctrl->subsys->nsheads);
+	return head;
+out_free_head:
+	cleanup_srcu_struct(&head->srcu);
+	kfree(head);
+out:
+	return ERR_PTR(ret);
+}
+
+static int nvme_init_ns_head(struct nvme_ns *ns, unsigned nsid,
+		struct nvme_id_ns *id)
+{
+	struct nvme_ctrl *ctrl = ns->ctrl;
+	bool is_shared = id->nmic & (1 << 0);
+	struct nvme_ns_head *head = NULL;
+	int ret = 0;
+
+	mutex_lock(&ctrl->subsys->lock);
+	if (is_shared)
+		head = __nvme_find_ns_head(ctrl->subsys, nsid);
+	if (!head) {
+		head = nvme_alloc_ns_head(ctrl, nsid, id);
+		if (IS_ERR(head)) {
+			ret = PTR_ERR(head);
+			goto out_unlock;
+		}
+	} else {
+		struct nvme_ns_ids ids;
+
+		nvme_report_ns_ids(ctrl, nsid, id, &ids);
+		if (!nvme_ns_ids_equal(&head->ids, &ids)) {
+			dev_err(ctrl->device,
+				"IDs don't match for shared namespace %d\n",
+					nsid);
+			ret = -EINVAL;
+			goto out_unlock;
+		}
+	}
+
+	list_add_tail(&ns->siblings, &head->list);
+	ns->head = head;
+
+out_unlock:
+	mutex_unlock(&ctrl->subsys->lock);
+	return ret;
+}
+
 static int ns_cmp(void *priv, struct list_head *a, struct list_head *b)
 {
 	struct nvme_ns *nsa = container_of(a, struct nvme_ns, list);
 	struct nvme_ns *nsb = container_of(b, struct nvme_ns, list);
 
-	return nsa->ns_id - nsb->ns_id;
+	return nsa->head->ns_id - nsb->head->ns_id;
 }
 
 static struct nvme_ns *nvme_find_get_ns(struct nvme_ctrl *ctrl, unsigned nsid)
@@ -2391,12 +2520,12 @@ static struct nvme_ns *nvme_find_get_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 
 	mutex_lock(&ctrl->namespaces_mutex);
 	list_for_each_entry(ns, &ctrl->namespaces, list) {
-		if (ns->ns_id == nsid) {
+		if (ns->head->ns_id == nsid) {
 			kref_get(&ns->kref);
 			ret = ns;
 			break;
 		}
-		if (ns->ns_id > nsid)
+		if (ns->head->ns_id > nsid)
 			break;
 	}
 	mutex_unlock(&ctrl->namespaces_mutex);
@@ -2411,7 +2540,7 @@ static int nvme_setup_streams_ns(struct nvme_ctrl *ctrl, struct nvme_ns *ns)
 	if (!ctrl->nr_streams)
 		return 0;
 
-	ret = nvme_get_stream_params(ctrl, &s, ns->ns_id);
+	ret = nvme_get_stream_params(ctrl, &s, ns->head->ns_id);
 	if (ret)
 		return ret;
 
@@ -2453,7 +2582,6 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 	ns->ctrl = ctrl;
 
 	kref_init(&ns->kref);
-	ns->ns_id = nsid;
 	ns->lba_shift = 9; /* set to a default value for 512 until disk is validated */
 
 	blk_queue_logical_block_size(ns->queue, 1 << ns->lba_shift);
@@ -2469,18 +2597,19 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 	if (id->ncap == 0)
 		goto out_free_id;
 
-	nvme_report_ns_ids(ctrl, ns->ns_id, id, &ns->ids);
+	if (nvme_init_ns_head(ns, nsid, id))
+		goto out_free_id;
 
 	if ((ctrl->quirks & NVME_QUIRK_LIGHTNVM) && id->vs[0] == 0x1) {
 		if (nvme_nvm_register(ns, disk_name, node)) {
 			dev_warn(ctrl->device, "LightNVM init failure\n");
-			goto out_free_id;
+			goto out_unlink_ns;
 		}
 	}
 
 	disk = alloc_disk_node(0, node);
 	if (!disk)
-		goto out_free_id;
+		goto out_unlink_ns;
 
 	disk->fops = &nvme_fops;
 	disk->private_data = ns;
@@ -2508,6 +2637,10 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 		pr_warn("%s: failed to register lightnvm sysfs group for identification\n",
 			ns->disk->disk_name);
 	return;
+ out_unlink_ns:
+	mutex_lock(&ctrl->subsys->lock);
+	list_del_rcu(&ns->siblings);
+	mutex_unlock(&ctrl->subsys->lock);
  out_free_id:
 	kfree(id);
  out_free_queue:
@@ -2520,6 +2653,8 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 
 static void nvme_ns_remove(struct nvme_ns *ns)
 {
+	struct nvme_ns_head *head = ns->head;
+
 	if (test_and_set_bit(NVME_NS_REMOVING, &ns->flags))
 		return;
 
@@ -2534,10 +2669,16 @@ static void nvme_ns_remove(struct nvme_ns *ns)
 		blk_cleanup_queue(ns->queue);
 	}
 
+	mutex_lock(&ns->ctrl->subsys->lock);
+	if (head)
+		list_del_rcu(&ns->siblings);
+	mutex_unlock(&ns->ctrl->subsys->lock);
+
 	mutex_lock(&ns->ctrl->namespaces_mutex);
 	list_del_init(&ns->list);
 	mutex_unlock(&ns->ctrl->namespaces_mutex);
 
+	synchronize_srcu(&head->srcu);
 	nvme_put_ns(ns);
 }
 
@@ -2560,7 +2701,7 @@ static void nvme_remove_invalid_namespaces(struct nvme_ctrl *ctrl,
 	struct nvme_ns *ns, *next;
 
 	list_for_each_entry_safe(ns, next, &ctrl->namespaces, list) {
-		if (ns->ns_id > nsid)
+		if (ns->head->ns_id > nsid)
 			nvme_ns_remove(ns);
 	}
 }
diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
index 1f79e3f141e6..44e46276319c 100644
--- a/drivers/nvme/host/lightnvm.c
+++ b/drivers/nvme/host/lightnvm.c
@@ -305,7 +305,7 @@ static int nvme_nvm_identity(struct nvm_dev *nvmdev, struct nvm_id *nvm_id)
 	int ret;
 
 	c.identity.opcode = nvme_nvm_admin_identity;
-	c.identity.nsid = cpu_to_le32(ns->ns_id);
+	c.identity.nsid = cpu_to_le32(ns->head->ns_id);
 	c.identity.chnl_off = 0;
 
 	nvme_nvm_id = kmalloc(sizeof(struct nvme_nvm_id), GFP_KERNEL);
@@ -344,7 +344,7 @@ static int nvme_nvm_get_l2p_tbl(struct nvm_dev *nvmdev, u64 slba, u32 nlb,
 	int ret = 0;
 
 	c.l2p.opcode = nvme_nvm_admin_get_l2p_tbl;
-	c.l2p.nsid = cpu_to_le32(ns->ns_id);
+	c.l2p.nsid = cpu_to_le32(ns->head->ns_id);
 	entries = kmalloc(len, GFP_KERNEL);
 	if (!entries)
 		return -ENOMEM;
@@ -402,7 +402,7 @@ static int nvme_nvm_get_bb_tbl(struct nvm_dev *nvmdev, struct ppa_addr ppa,
 	int ret = 0;
 
 	c.get_bb.opcode = nvme_nvm_admin_get_bb_tbl;
-	c.get_bb.nsid = cpu_to_le32(ns->ns_id);
+	c.get_bb.nsid = cpu_to_le32(ns->head->ns_id);
 	c.get_bb.spba = cpu_to_le64(ppa.ppa);
 
 	bb_tbl = kzalloc(tblsz, GFP_KERNEL);
@@ -452,7 +452,7 @@ static int nvme_nvm_set_bb_tbl(struct nvm_dev *nvmdev, struct ppa_addr *ppas,
 	int ret = 0;
 
 	c.set_bb.opcode = nvme_nvm_admin_set_bb_tbl;
-	c.set_bb.nsid = cpu_to_le32(ns->ns_id);
+	c.set_bb.nsid = cpu_to_le32(ns->head->ns_id);
 	c.set_bb.spba = cpu_to_le64(ppas->ppa);
 	c.set_bb.nlb = cpu_to_le16(nr_ppas - 1);
 	c.set_bb.value = type;
@@ -469,7 +469,7 @@ static inline void nvme_nvm_rqtocmd(struct nvm_rq *rqd, struct nvme_ns *ns,
 				    struct nvme_nvm_command *c)
 {
 	c->ph_rw.opcode = rqd->opcode;
-	c->ph_rw.nsid = cpu_to_le32(ns->ns_id);
+	c->ph_rw.nsid = cpu_to_le32(ns->head->ns_id);
 	c->ph_rw.spba = cpu_to_le64(rqd->ppa_addr.ppa);
 	c->ph_rw.metadata = cpu_to_le64(rqd->dma_meta_list);
 	c->ph_rw.control = cpu_to_le16(rqd->flags);
@@ -691,7 +691,7 @@ static int nvme_nvm_submit_vio(struct nvme_ns *ns,
 
 	memset(&c, 0, sizeof(c));
 	c.ph_rw.opcode = vio.opcode;
-	c.ph_rw.nsid = cpu_to_le32(ns->ns_id);
+	c.ph_rw.nsid = cpu_to_le32(ns->head->ns_id);
 	c.ph_rw.control = cpu_to_le16(vio.control);
 	c.ph_rw.length = cpu_to_le16(vio.nppas);
 
@@ -728,7 +728,7 @@ static int nvme_nvm_user_vcmd(struct nvme_ns *ns, int admin,
 
 	memset(&c, 0, sizeof(c));
 	c.common.opcode = vcmd.opcode;
-	c.common.nsid = cpu_to_le32(ns->ns_id);
+	c.common.nsid = cpu_to_le32(ns->head->ns_id);
 	c.common.cdw2[0] = cpu_to_le32(vcmd.cdw2);
 	c.common.cdw2[1] = cpu_to_le32(vcmd.cdw3);
 	/* cdw11-12 */
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 1ad5dff473ea..a724d2597c4c 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -203,6 +203,7 @@ struct nvme_subsystem {
 	struct list_head	entry;
 	struct mutex		lock;
 	struct list_head	ctrls;
+	struct list_head	nsheads;
 	struct kref		ref;
 	char			subnqn[NVMF_NQN_SIZE];
 };
@@ -216,18 +217,34 @@ struct nvme_ns_ids {
 	uuid_t	uuid;
 };
 
+/*
+ * Anchor structure for namespaces.  There is one for each namespace in a
+ * NVMe subsystem that any of our controllers can see, and the namespace
+ * structure for each controller is chained of it.  For private namespaces
+ * there is a 1:1 relation to our namespace structures, that is ->list
+ * only ever has a single entry for private namespaces.
+ */
+struct nvme_ns_head {
+	struct list_head	list;
+	struct srcu_struct      srcu;
+	unsigned		ns_id;
+	struct nvme_ns_ids	ids;
+	struct list_head	entry;
+	struct kref		ref;
+};
+
 struct nvme_ns {
 	struct list_head list;
 
 	struct nvme_ctrl *ctrl;
 	struct request_queue *queue;
 	struct gendisk *disk;
+	struct list_head siblings;
 	struct nvm_dev *ndev;
 	struct kref kref;
+	struct nvme_ns_head *head;
 	int instance;
 
-	unsigned ns_id;
-	struct nvme_ns_ids ids;
 	int lba_shift;
 	u16 ms;
 	u16 sgs;
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 8/9] nvme: track shared namespaces
@ 2017-09-18 23:14   ` Christoph Hellwig
  0 siblings, 0 replies; 66+ messages in thread
From: Christoph Hellwig @ 2017-09-18 23:14 UTC (permalink / raw)


Introduce a new struct nvme_ns_head [1] that holds information about
an actual namespace, unlike struct nvme_ns, which only holds the
per-controller namespace information.  For private namespaces there
is a 1:1 relation of the two, but for shared namespaces this lets us
discover all the paths to it.  For now only the identifiers are moved
to the new structure, but most of the information in struct nvme_ns
should eventually move over.

To allow lockless path lookup the list of nvme_ns structures per
nvme_ns_head is protected by SRCU, which requires freeing the nvme_ns
structure through call_srcu.

[1] comments welcome if you have a better name for it, the current one is
    horrible.  One idea would be to rename the current struct nvme_ns
    to struct nvme_ns_link or similar and use the nvme_ns name for the
    new structure.  But that would involve a lot of churn.

Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/nvme/host/core.c     | 193 +++++++++++++++++++++++++++++++++++++------
 drivers/nvme/host/lightnvm.c |  14 ++--
 drivers/nvme/host/nvme.h     |  21 ++++-
 3 files changed, 193 insertions(+), 35 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 9da538d7ca87..3e8405fd57a9 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -247,10 +247,28 @@ bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
 }
 EXPORT_SYMBOL_GPL(nvme_change_ctrl_state);
 
+static void nvme_destroy_ns_head(struct kref *ref)
+{
+	struct nvme_ns_head *head =
+		container_of(ref, struct nvme_ns_head, ref);
+
+	list_del_init(&head->entry);
+	cleanup_srcu_struct(&head->srcu);
+	kfree(head);
+}
+
+static void nvme_put_ns_head(struct nvme_ns_head *head)
+{
+	kref_put(&head->ref, nvme_destroy_ns_head);
+}
+
 static void nvme_free_ns(struct kref *kref)
 {
 	struct nvme_ns *ns = container_of(kref, struct nvme_ns, kref);
 
+	if (ns->head)
+		nvme_put_ns_head(ns->head);
+
 	if (ns->ndev)
 		nvme_nvm_unregister(ns);
 
@@ -420,7 +438,7 @@ static inline void nvme_setup_flush(struct nvme_ns *ns,
 {
 	memset(cmnd, 0, sizeof(*cmnd));
 	cmnd->common.opcode = nvme_cmd_flush;
-	cmnd->common.nsid = cpu_to_le32(ns->ns_id);
+	cmnd->common.nsid = cpu_to_le32(ns->head->ns_id);
 }
 
 static blk_status_t nvme_setup_discard(struct nvme_ns *ns, struct request *req,
@@ -451,7 +469,7 @@ static blk_status_t nvme_setup_discard(struct nvme_ns *ns, struct request *req,
 
 	memset(cmnd, 0, sizeof(*cmnd));
 	cmnd->dsm.opcode = nvme_cmd_dsm;
-	cmnd->dsm.nsid = cpu_to_le32(ns->ns_id);
+	cmnd->dsm.nsid = cpu_to_le32(ns->head->ns_id);
 	cmnd->dsm.nr = cpu_to_le32(segments - 1);
 	cmnd->dsm.attributes = cpu_to_le32(NVME_DSMGMT_AD);
 
@@ -490,7 +508,7 @@ static inline blk_status_t nvme_setup_rw(struct nvme_ns *ns,
 
 	memset(cmnd, 0, sizeof(*cmnd));
 	cmnd->rw.opcode = (rq_data_dir(req) ? nvme_cmd_write : nvme_cmd_read);
-	cmnd->rw.nsid = cpu_to_le32(ns->ns_id);
+	cmnd->rw.nsid = cpu_to_le32(ns->head->ns_id);
 	cmnd->rw.slba = cpu_to_le64(nvme_block_nr(ns, blk_rq_pos(req)));
 	cmnd->rw.length = cpu_to_le16((blk_rq_bytes(req) >> ns->lba_shift) - 1);
 
@@ -971,7 +989,7 @@ static int nvme_submit_io(struct nvme_ns *ns, struct nvme_user_io __user *uio)
 	memset(&c, 0, sizeof(c));
 	c.rw.opcode = io.opcode;
 	c.rw.flags = io.flags;
-	c.rw.nsid = cpu_to_le32(ns->ns_id);
+	c.rw.nsid = cpu_to_le32(ns->head->ns_id);
 	c.rw.slba = cpu_to_le64(io.slba);
 	c.rw.length = cpu_to_le16(io.nblocks);
 	c.rw.control = cpu_to_le16(io.control);
@@ -1036,7 +1054,7 @@ static int nvme_ioctl(struct block_device *bdev, fmode_t mode,
 	switch (cmd) {
 	case NVME_IOCTL_ID:
 		force_successful_syscall_return();
-		return ns->ns_id;
+		return ns->head->ns_id;
 	case NVME_IOCTL_ADMIN_CMD:
 		return nvme_user_cmd(ns->ctrl, NULL, (void __user *)arg);
 	case NVME_IOCTL_IO_CMD:
@@ -1196,6 +1214,13 @@ static void nvme_report_ns_ids(struct nvme_ctrl *ctrl, unsigned int nsid,
 	}
 }
 
+static bool nvme_ns_ids_valid(struct nvme_ns_ids *ids)
+{
+	return !uuid_is_null(&ids->uuid) ||
+		memchr_inv(ids->nguid, 0, sizeof(ids->nguid)) ||
+		memchr_inv(ids->eui64, 0, sizeof(ids->eui64));
+}
+
 static bool nvme_ns_ids_equal(struct nvme_ns_ids *a, struct nvme_ns_ids *b)
 {
 	return uuid_equal(&a->uuid, &b->uuid) &&
@@ -1251,7 +1276,7 @@ static int nvme_revalidate_disk(struct gendisk *disk)
 		return -ENODEV;
 	}
 
-	id = nvme_identify_ns(ctrl, ns->ns_id);
+	id = nvme_identify_ns(ctrl, ns->head->ns_id);
 	if (!id)
 		return -ENODEV;
 
@@ -1260,10 +1285,10 @@ static int nvme_revalidate_disk(struct gendisk *disk)
 		goto out;
 	}
 
-	nvme_report_ns_ids(ctrl, ns->ns_id, id, &ids);
-	if (!nvme_ns_ids_equal(&ns->ids, &ids)) {
+	nvme_report_ns_ids(ctrl, ns->head->ns_id, id, &ids);
+	if (!nvme_ns_ids_equal(&ns->head->ids, &ids)) {
 		dev_err(ctrl->device,
-			"identifiers changed for nsid %d\n", ns->ns_id);
+			"identifiers changed for nsid %d\n", ns->head->ns_id);
 		ret = -ENODEV;
 	}
 
@@ -1304,7 +1329,7 @@ static int nvme_pr_command(struct block_device *bdev, u32 cdw10,
 
 	memset(&c, 0, sizeof(c));
 	c.common.opcode = op;
-	c.common.nsid = cpu_to_le32(ns->ns_id);
+	c.common.nsid = cpu_to_le32(ns->head->ns_id);
 	c.common.cdw10[0] = cpu_to_le32(cdw10);
 
 	return nvme_submit_sync_cmd(ns->queue, &c, data, 16);
@@ -1813,6 +1838,7 @@ static int nvme_init_subsystem(struct nvme_ctrl *ctrl, struct nvme_id_ctrl *id)
 	if (!subsys)
 		return -ENOMEM;
 	INIT_LIST_HEAD(&subsys->ctrls);
+	INIT_LIST_HEAD(&subsys->nsheads);
 	kref_init(&subsys->ref);
 	nvme_init_subnqn(subsys, ctrl, id);
 	mutex_init(&subsys->lock);
@@ -2145,7 +2171,7 @@ static ssize_t wwid_show(struct device *dev, struct device_attribute *attr,
 								char *buf)
 {
 	struct nvme_ns *ns = nvme_get_ns_from_dev(dev);
-	struct nvme_ns_ids *ids = &ns->ids;
+	struct nvme_ns_ids *ids = &ns->head->ids;
 	struct nvme_ctrl *ctrl = ns->ctrl;
 	int serial_len = sizeof(ctrl->serial);
 	int model_len = sizeof(ctrl->model);
@@ -2167,7 +2193,8 @@ static ssize_t wwid_show(struct device *dev, struct device_attribute *attr,
 		model_len--;
 
 	return sprintf(buf, "nvme.%04x-%*phN-%*phN-%08x\n", ctrl->vid,
-		serial_len, ctrl->serial, model_len, ctrl->model, ns->ns_id);
+		serial_len, ctrl->serial, model_len, ctrl->model,
+		ns->head->ns_id);
 }
 static DEVICE_ATTR(wwid, S_IRUGO, wwid_show, NULL);
 
@@ -2175,7 +2202,7 @@ static ssize_t nguid_show(struct device *dev, struct device_attribute *attr,
 			  char *buf)
 {
 	struct nvme_ns *ns = nvme_get_ns_from_dev(dev);
-	return sprintf(buf, "%pU\n", ns->ids.nguid);
+	return sprintf(buf, "%pU\n", ns->head->ids.nguid);
 }
 static DEVICE_ATTR(nguid, S_IRUGO, nguid_show, NULL);
 
@@ -2183,7 +2210,7 @@ static ssize_t uuid_show(struct device *dev, struct device_attribute *attr,
 								char *buf)
 {
 	struct nvme_ns *ns = nvme_get_ns_from_dev(dev);
-	struct nvme_ns_ids *ids = &ns->ids;
+	struct nvme_ns_ids *ids = &ns->head->ids;
 
 	/* For backward compatibility expose the NGUID to userspace if
 	 * we have no UUID set
@@ -2201,7 +2228,7 @@ static ssize_t eui_show(struct device *dev, struct device_attribute *attr,
 								char *buf)
 {
 	struct nvme_ns *ns = nvme_get_ns_from_dev(dev);
-	return sprintf(buf, "%8phd\n", ns->ids.eui64);
+	return sprintf(buf, "%8phd\n", ns->head->ids.eui64);
 }
 static DEVICE_ATTR(eui, S_IRUGO, eui_show, NULL);
 
@@ -2209,7 +2236,7 @@ static ssize_t nsid_show(struct device *dev, struct device_attribute *attr,
 								char *buf)
 {
 	struct nvme_ns *ns = nvme_get_ns_from_dev(dev);
-	return sprintf(buf, "%d\n", ns->ns_id);
+	return sprintf(buf, "%d\n", ns->head->ns_id);
 }
 static DEVICE_ATTR(nsid, S_IRUGO, nsid_show, NULL);
 
@@ -2227,7 +2254,7 @@ static umode_t nvme_ns_attrs_are_visible(struct kobject *kobj,
 {
 	struct device *dev = container_of(kobj, struct device, kobj);
 	struct nvme_ns *ns = nvme_get_ns_from_dev(dev);
-	struct nvme_ns_ids *ids = &ns->ids;
+	struct nvme_ns_ids *ids = &ns->head->ids;
 
 	if (a == &dev_attr_uuid.attr) {
 		if (uuid_is_null(&ids->uuid) ||
@@ -2377,12 +2404,114 @@ static const struct attribute_group *nvme_dev_attr_groups[] = {
 	NULL,
 };
 
+static struct nvme_ns_head *__nvme_find_ns_head(struct nvme_subsystem *subsys,
+		unsigned nsid)
+{
+	struct nvme_ns_head *h;
+
+	lockdep_assert_held(&subsys->lock);
+
+	list_for_each_entry(h, &subsys->nsheads, entry) {
+		if (h->ns_id == nsid && kref_get_unless_zero(&h->ref))
+			return h;
+	}
+
+	return NULL;
+}
+
+static int __nvme_check_ids(struct nvme_subsystem *subsys,
+		struct nvme_ns_head *new)
+{
+	struct nvme_ns_head *h;
+
+	lockdep_assert_held(&subsys->lock);
+
+	list_for_each_entry(h, &subsys->nsheads, entry) {
+		if (nvme_ns_ids_valid(&new->ids) &&
+		    nvme_ns_ids_equal(&new->ids, &h->ids))
+			return -EINVAL;
+	}
+
+	return 0;
+}
+
+static struct nvme_ns_head *nvme_alloc_ns_head(struct nvme_ctrl *ctrl,
+		unsigned nsid, struct nvme_id_ns *id)
+{
+	struct nvme_ns_head *head;
+	int ret = -ENOMEM;
+
+	head = kzalloc(sizeof(*head), GFP_KERNEL);
+	if (!head)
+		goto out;
+
+	INIT_LIST_HEAD(&head->list);
+	head->ns_id = nsid;
+	init_srcu_struct(&head->srcu);
+	kref_init(&head->ref);
+
+	nvme_report_ns_ids(ctrl, nsid, id, &head->ids);
+
+	ret = __nvme_check_ids(ctrl->subsys, head);
+	if (ret) {
+		dev_err(ctrl->device,
+			"duplicate IDs for nsid %d\n", nsid);
+		goto out_free_head;
+	}
+
+	list_add_tail(&head->entry, &ctrl->subsys->nsheads);
+	return head;
+out_free_head:
+	cleanup_srcu_struct(&head->srcu);
+	kfree(head);
+out:
+	return ERR_PTR(ret);
+}
+
+static int nvme_init_ns_head(struct nvme_ns *ns, unsigned nsid,
+		struct nvme_id_ns *id)
+{
+	struct nvme_ctrl *ctrl = ns->ctrl;
+	bool is_shared = id->nmic & (1 << 0);
+	struct nvme_ns_head *head = NULL;
+	int ret = 0;
+
+	mutex_lock(&ctrl->subsys->lock);
+	if (is_shared)
+		head = __nvme_find_ns_head(ctrl->subsys, nsid);
+	if (!head) {
+		head = nvme_alloc_ns_head(ctrl, nsid, id);
+		if (IS_ERR(head)) {
+			ret = PTR_ERR(head);
+			goto out_unlock;
+		}
+	} else {
+		struct nvme_ns_ids ids;
+
+		nvme_report_ns_ids(ctrl, nsid, id, &ids);
+		if (!nvme_ns_ids_equal(&head->ids, &ids)) {
+			dev_err(ctrl->device,
+				"IDs don't match for shared namespace %d\n",
+					nsid);
+			ret = -EINVAL;
+			goto out_unlock;
+		}
+	}
+
+	list_add_tail(&ns->siblings, &head->list);
+	ns->head = head;
+
+out_unlock:
+	mutex_unlock(&ctrl->subsys->lock);
+	return ret;
+}
+
 static int ns_cmp(void *priv, struct list_head *a, struct list_head *b)
 {
 	struct nvme_ns *nsa = container_of(a, struct nvme_ns, list);
 	struct nvme_ns *nsb = container_of(b, struct nvme_ns, list);
 
-	return nsa->ns_id - nsb->ns_id;
+	return nsa->head->ns_id - nsb->head->ns_id;
 }
 
 static struct nvme_ns *nvme_find_get_ns(struct nvme_ctrl *ctrl, unsigned nsid)
@@ -2391,12 +2520,12 @@ static struct nvme_ns *nvme_find_get_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 
 	mutex_lock(&ctrl->namespaces_mutex);
 	list_for_each_entry(ns, &ctrl->namespaces, list) {
-		if (ns->ns_id == nsid) {
+		if (ns->head->ns_id == nsid) {
 			kref_get(&ns->kref);
 			ret = ns;
 			break;
 		}
-		if (ns->ns_id > nsid)
+		if (ns->head->ns_id > nsid)
 			break;
 	}
 	mutex_unlock(&ctrl->namespaces_mutex);
@@ -2411,7 +2540,7 @@ static int nvme_setup_streams_ns(struct nvme_ctrl *ctrl, struct nvme_ns *ns)
 	if (!ctrl->nr_streams)
 		return 0;
 
-	ret = nvme_get_stream_params(ctrl, &s, ns->ns_id);
+	ret = nvme_get_stream_params(ctrl, &s, ns->head->ns_id);
 	if (ret)
 		return ret;
 
@@ -2453,7 +2582,6 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 	ns->ctrl = ctrl;
 
 	kref_init(&ns->kref);
-	ns->ns_id = nsid;
 	ns->lba_shift = 9; /* set to a default value for 512 until disk is validated */
 
 	blk_queue_logical_block_size(ns->queue, 1 << ns->lba_shift);
@@ -2469,18 +2597,19 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 	if (id->ncap == 0)
 		goto out_free_id;
 
-	nvme_report_ns_ids(ctrl, ns->ns_id, id, &ns->ids);
+	if (nvme_init_ns_head(ns, nsid, id))
+		goto out_free_id;
 
 	if ((ctrl->quirks & NVME_QUIRK_LIGHTNVM) && id->vs[0] == 0x1) {
 		if (nvme_nvm_register(ns, disk_name, node)) {
 			dev_warn(ctrl->device, "LightNVM init failure\n");
-			goto out_free_id;
+			goto out_unlink_ns;
 		}
 	}
 
 	disk = alloc_disk_node(0, node);
 	if (!disk)
-		goto out_free_id;
+		goto out_unlink_ns;
 
 	disk->fops = &nvme_fops;
 	disk->private_data = ns;
@@ -2508,6 +2637,10 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 		pr_warn("%s: failed to register lightnvm sysfs group for identification\n",
 			ns->disk->disk_name);
 	return;
+ out_unlink_ns:
+	mutex_lock(&ctrl->subsys->lock);
+	list_del_rcu(&ns->siblings);
+	mutex_unlock(&ctrl->subsys->lock);
  out_free_id:
 	kfree(id);
  out_free_queue:
@@ -2520,6 +2653,8 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 
 static void nvme_ns_remove(struct nvme_ns *ns)
 {
+	struct nvme_ns_head *head = ns->head;
+
 	if (test_and_set_bit(NVME_NS_REMOVING, &ns->flags))
 		return;
 
@@ -2534,10 +2669,16 @@ static void nvme_ns_remove(struct nvme_ns *ns)
 		blk_cleanup_queue(ns->queue);
 	}
 
+	mutex_lock(&ns->ctrl->subsys->lock);
+	if (head)
+		list_del_rcu(&ns->siblings);
+	mutex_unlock(&ns->ctrl->subsys->lock);
+
 	mutex_lock(&ns->ctrl->namespaces_mutex);
 	list_del_init(&ns->list);
 	mutex_unlock(&ns->ctrl->namespaces_mutex);
 
+	synchronize_srcu(&head->srcu);
 	nvme_put_ns(ns);
 }
 
@@ -2560,7 +2701,7 @@ static void nvme_remove_invalid_namespaces(struct nvme_ctrl *ctrl,
 	struct nvme_ns *ns, *next;
 
 	list_for_each_entry_safe(ns, next, &ctrl->namespaces, list) {
-		if (ns->ns_id > nsid)
+		if (ns->head->ns_id > nsid)
 			nvme_ns_remove(ns);
 	}
 }
diff --git a/drivers/nvme/host/lightnvm.c b/drivers/nvme/host/lightnvm.c
index 1f79e3f141e6..44e46276319c 100644
--- a/drivers/nvme/host/lightnvm.c
+++ b/drivers/nvme/host/lightnvm.c
@@ -305,7 +305,7 @@ static int nvme_nvm_identity(struct nvm_dev *nvmdev, struct nvm_id *nvm_id)
 	int ret;
 
 	c.identity.opcode = nvme_nvm_admin_identity;
-	c.identity.nsid = cpu_to_le32(ns->ns_id);
+	c.identity.nsid = cpu_to_le32(ns->head->ns_id);
 	c.identity.chnl_off = 0;
 
 	nvme_nvm_id = kmalloc(sizeof(struct nvme_nvm_id), GFP_KERNEL);
@@ -344,7 +344,7 @@ static int nvme_nvm_get_l2p_tbl(struct nvm_dev *nvmdev, u64 slba, u32 nlb,
 	int ret = 0;
 
 	c.l2p.opcode = nvme_nvm_admin_get_l2p_tbl;
-	c.l2p.nsid = cpu_to_le32(ns->ns_id);
+	c.l2p.nsid = cpu_to_le32(ns->head->ns_id);
 	entries = kmalloc(len, GFP_KERNEL);
 	if (!entries)
 		return -ENOMEM;
@@ -402,7 +402,7 @@ static int nvme_nvm_get_bb_tbl(struct nvm_dev *nvmdev, struct ppa_addr ppa,
 	int ret = 0;
 
 	c.get_bb.opcode = nvme_nvm_admin_get_bb_tbl;
-	c.get_bb.nsid = cpu_to_le32(ns->ns_id);
+	c.get_bb.nsid = cpu_to_le32(ns->head->ns_id);
 	c.get_bb.spba = cpu_to_le64(ppa.ppa);
 
 	bb_tbl = kzalloc(tblsz, GFP_KERNEL);
@@ -452,7 +452,7 @@ static int nvme_nvm_set_bb_tbl(struct nvm_dev *nvmdev, struct ppa_addr *ppas,
 	int ret = 0;
 
 	c.set_bb.opcode = nvme_nvm_admin_set_bb_tbl;
-	c.set_bb.nsid = cpu_to_le32(ns->ns_id);
+	c.set_bb.nsid = cpu_to_le32(ns->head->ns_id);
 	c.set_bb.spba = cpu_to_le64(ppas->ppa);
 	c.set_bb.nlb = cpu_to_le16(nr_ppas - 1);
 	c.set_bb.value = type;
@@ -469,7 +469,7 @@ static inline void nvme_nvm_rqtocmd(struct nvm_rq *rqd, struct nvme_ns *ns,
 				    struct nvme_nvm_command *c)
 {
 	c->ph_rw.opcode = rqd->opcode;
-	c->ph_rw.nsid = cpu_to_le32(ns->ns_id);
+	c->ph_rw.nsid = cpu_to_le32(ns->head->ns_id);
 	c->ph_rw.spba = cpu_to_le64(rqd->ppa_addr.ppa);
 	c->ph_rw.metadata = cpu_to_le64(rqd->dma_meta_list);
 	c->ph_rw.control = cpu_to_le16(rqd->flags);
@@ -691,7 +691,7 @@ static int nvme_nvm_submit_vio(struct nvme_ns *ns,
 
 	memset(&c, 0, sizeof(c));
 	c.ph_rw.opcode = vio.opcode;
-	c.ph_rw.nsid = cpu_to_le32(ns->ns_id);
+	c.ph_rw.nsid = cpu_to_le32(ns->head->ns_id);
 	c.ph_rw.control = cpu_to_le16(vio.control);
 	c.ph_rw.length = cpu_to_le16(vio.nppas);
 
@@ -728,7 +728,7 @@ static int nvme_nvm_user_vcmd(struct nvme_ns *ns, int admin,
 
 	memset(&c, 0, sizeof(c));
 	c.common.opcode = vcmd.opcode;
-	c.common.nsid = cpu_to_le32(ns->ns_id);
+	c.common.nsid = cpu_to_le32(ns->head->ns_id);
 	c.common.cdw2[0] = cpu_to_le32(vcmd.cdw2);
 	c.common.cdw2[1] = cpu_to_le32(vcmd.cdw3);
 	/* cdw11-12 */
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 1ad5dff473ea..a724d2597c4c 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -203,6 +203,7 @@ struct nvme_subsystem {
 	struct list_head	entry;
 	struct mutex		lock;
 	struct list_head	ctrls;
+	struct list_head	nsheads;
 	struct kref		ref;
 	char			subnqn[NVMF_NQN_SIZE];
 };
@@ -216,18 +217,34 @@ struct nvme_ns_ids {
 	uuid_t	uuid;
 };
 
+/*
+ * Anchor structure for namespaces.  There is one for each namespace in a
+ * NVMe subsystem that any of our controllers can see, and the namespace
+ * structure for each controller is chained of it.  For private namespaces
+ * there is a 1:1 relation to our namespace structures, that is ->list
+ * only ever has a single entry for private namespaces.
+ */
+struct nvme_ns_head {
+	struct list_head	list;
+	struct srcu_struct      srcu;
+	unsigned		ns_id;
+	struct nvme_ns_ids	ids;
+	struct list_head	entry;
+	struct kref		ref;
+};
+
 struct nvme_ns {
 	struct list_head list;
 
 	struct nvme_ctrl *ctrl;
 	struct request_queue *queue;
 	struct gendisk *disk;
+	struct list_head siblings;
 	struct nvm_dev *ndev;
 	struct kref kref;
+	struct nvme_ns_head *head;
 	int instance;
 
-	unsigned ns_id;
-	struct nvme_ns_ids ids;
 	int lba_shift;
 	u16 ms;
 	u16 sgs;
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 9/9] nvme: implement multipath access to nvme subsystems
  2017-09-18 23:14 ` Christoph Hellwig
@ 2017-09-18 23:14   ` Christoph Hellwig
  -1 siblings, 0 replies; 66+ messages in thread
From: Christoph Hellwig @ 2017-09-18 23:14 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Keith Busch, Sagi Grimberg, linux-nvme, linux-block

This patch adds initial multipath support to the nvme driver.  For each
namespace we create a new block device node, which can be used to access
that namespace through any of the controllers that refer to it.

Currently we will always send I/O to the first available path, this will
be changed once the NVMe Asynchronous Namespace Access (ANA) TP is
ratified and implemented, at which point we will look at the ANA state
for each namespace.  Another possibility that was prototyped is to
use the path that is closes to the submitting NUMA code, which will be
mostly interesting for PCI, but might also be useful for RDMA or FC
transports in the future.  There is not plan to implement round robin
or I/O service time path selectors, as those are not scalable with
the performance rates provided by NVMe.

The multipath device will go away once all paths to it disappear,
any delay to keep it alive needs to be implemented at the controller
level.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/core.c | 264 ++++++++++++++++++++++++++++++++++++++++++++---
 drivers/nvme/host/nvme.h |  11 ++
 2 files changed, 259 insertions(+), 16 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 3e8405fd57a9..5449c83a9dc3 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -77,6 +77,8 @@ static DEFINE_MUTEX(nvme_subsystems_lock);
 static LIST_HEAD(nvme_ctrl_list);
 static DEFINE_SPINLOCK(dev_list_lock);
 
+static DEFINE_IDA(nvme_disk_ida);
+
 static struct class *nvme_class;
 
 static __le32 nvme_get_log_dw10(u8 lid, size_t size)
@@ -104,6 +106,19 @@ static int nvme_reset_ctrl_sync(struct nvme_ctrl *ctrl)
 	return ret;
 }
 
+static void nvme_failover_req(struct request *req)
+{
+	struct nvme_ns *ns = req->q->queuedata;
+	unsigned long flags;
+
+	spin_lock_irqsave(&ns->head->requeue_lock, flags);
+	blk_steal_bios(&ns->head->requeue_list, req);
+	spin_unlock_irqrestore(&ns->head->requeue_lock, flags);
+
+	nvme_reset_ctrl(ns->ctrl);
+	kblockd_schedule_work(&ns->head->requeue_work);
+}
+
 static blk_status_t nvme_error_status(struct request *req)
 {
 	switch (nvme_req(req)->status & 0x7ff) {
@@ -131,6 +146,53 @@ static blk_status_t nvme_error_status(struct request *req)
 	}
 }
 
+static bool nvme_req_needs_failover(struct request *req)
+{
+	if (!(req->cmd_flags & REQ_NVME_MPATH))
+		return false;
+
+	switch (nvme_req(req)->status & 0x7ff) {
+	/*
+	 * Generic command status:
+	 */
+	case NVME_SC_INVALID_OPCODE:
+	case NVME_SC_INVALID_FIELD:
+	case NVME_SC_INVALID_NS:
+	case NVME_SC_LBA_RANGE:
+	case NVME_SC_CAP_EXCEEDED:
+	case NVME_SC_RESERVATION_CONFLICT:
+		return false;
+
+	/*
+	 * I/O command set specific error.  Unfortunately these values are
+	 * reused for fabrics commands, but those should never get here.
+	 */
+	case NVME_SC_BAD_ATTRIBUTES:
+	case NVME_SC_INVALID_PI:
+	case NVME_SC_READ_ONLY:
+	case NVME_SC_ONCS_NOT_SUPPORTED:
+		WARN_ON_ONCE(nvme_req(req)->cmd->common.opcode ==
+			nvme_fabrics_command);
+		return false;
+
+	/*
+	 * Media and Data Integrity Errors:
+	 */
+	case NVME_SC_WRITE_FAULT:
+	case NVME_SC_READ_ERROR:
+	case NVME_SC_GUARD_CHECK:
+	case NVME_SC_APPTAG_CHECK:
+	case NVME_SC_REFTAG_CHECK:
+	case NVME_SC_COMPARE_FAILED:
+	case NVME_SC_ACCESS_DENIED:
+	case NVME_SC_UNWRITTEN_BLOCK:
+		return false;
+	}
+
+	/* Everything else could be a path failure, so should be retried */
+	return true;
+}
+
 static inline bool nvme_req_needs_retry(struct request *req)
 {
 	if (blk_noretry_request(req))
@@ -145,6 +207,11 @@ static inline bool nvme_req_needs_retry(struct request *req)
 void nvme_complete_rq(struct request *req)
 {
 	if (unlikely(nvme_req(req)->status && nvme_req_needs_retry(req))) {
+		if (nvme_req_needs_failover(req)) {
+			nvme_failover_req(req);
+			return;
+		}
+
 		nvme_req(req)->retries++;
 		blk_mq_requeue_request(req, true);
 		return;
@@ -173,6 +240,18 @@ void nvme_cancel_request(struct request *req, void *data, bool reserved)
 }
 EXPORT_SYMBOL_GPL(nvme_cancel_request);
 
+static void nvme_kick_requeue_lists(struct nvme_ctrl *ctrl)
+{
+	struct nvme_ns *ns;
+
+	mutex_lock(&ctrl->namespaces_mutex);
+	list_for_each_entry(ns, &ctrl->namespaces, list) {
+		if (ns->head)
+			kblockd_schedule_work(&ns->head->requeue_work);
+	}
+	mutex_unlock(&ctrl->namespaces_mutex);
+}
+
 bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
 		enum nvme_ctrl_state new_state)
 {
@@ -240,9 +319,10 @@ bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
 
 	if (changed)
 		ctrl->state = new_state;
-
 	spin_unlock_irqrestore(&ctrl->lock, flags);
 
+	if (changed && ctrl->state == NVME_CTRL_LIVE)
+		nvme_kick_requeue_lists(ctrl);
 	return changed;
 }
 EXPORT_SYMBOL_GPL(nvme_change_ctrl_state);
@@ -252,6 +332,15 @@ static void nvme_destroy_ns_head(struct kref *ref)
 	struct nvme_ns_head *head =
 		container_of(ref, struct nvme_ns_head, ref);
 
+	del_gendisk(head->disk);
+	blk_set_queue_dying(head->disk->queue);
+	/* make sure all pending bios are cleaned up */
+	kblockd_schedule_work(&head->requeue_work);
+	flush_work(&head->requeue_work);
+	blk_cleanup_queue(head->disk->queue);
+	put_disk(head->disk);
+	ida_simple_remove(&nvme_disk_ida, head->instance);
+
 	list_del_init(&head->entry);
 	cleanup_srcu_struct(&head->srcu);
 	kfree(head);
@@ -1123,8 +1212,10 @@ static void nvme_prep_integrity(struct gendisk *disk, struct nvme_id_ns *id,
 	if (blk_get_integrity(disk) &&
 	    (ns->pi_type != pi_type || ns->ms != old_ms ||
 	     bs != queue_logical_block_size(disk->queue) ||
-	     (ns->ms && ns->ext)))
+	     (ns->ms && ns->ext))) {
 		blk_integrity_unregister(disk);
+		blk_integrity_unregister(ns->head->disk);
+	}
 
 	ns->pi_type = pi_type;
 }
@@ -1152,7 +1243,9 @@ static void nvme_init_integrity(struct nvme_ns *ns)
 	}
 	integrity.tuple_size = ns->ms;
 	blk_integrity_register(ns->disk, &integrity);
+	blk_integrity_register(ns->head->disk, &integrity);
 	blk_queue_max_integrity_segments(ns->queue, 1);
+	blk_queue_max_integrity_segments(ns->head->disk->queue, 1);
 }
 #else
 static void nvme_prep_integrity(struct gendisk *disk, struct nvme_id_ns *id,
@@ -1170,7 +1263,7 @@ static void nvme_set_chunk_size(struct nvme_ns *ns)
 	blk_queue_chunk_sectors(ns->queue, rounddown_pow_of_two(chunk_size));
 }
 
-static void nvme_config_discard(struct nvme_ns *ns)
+static void nvme_config_discard(struct nvme_ns *ns, struct request_queue *queue)
 {
 	struct nvme_ctrl *ctrl = ns->ctrl;
 	u32 logical_block_size = queue_logical_block_size(ns->queue);
@@ -1181,18 +1274,18 @@ static void nvme_config_discard(struct nvme_ns *ns)
 	if (ctrl->nr_streams && ns->sws && ns->sgs) {
 		unsigned int sz = logical_block_size * ns->sws * ns->sgs;
 
-		ns->queue->limits.discard_alignment = sz;
-		ns->queue->limits.discard_granularity = sz;
+		queue->limits.discard_alignment = sz;
+		queue->limits.discard_granularity = sz;
 	} else {
 		ns->queue->limits.discard_alignment = logical_block_size;
 		ns->queue->limits.discard_granularity = logical_block_size;
 	}
-	blk_queue_max_discard_sectors(ns->queue, UINT_MAX);
-	blk_queue_max_discard_segments(ns->queue, NVME_DSM_MAX_RANGES);
-	queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, ns->queue);
+	blk_queue_max_discard_sectors(queue, UINT_MAX);
+	blk_queue_max_discard_segments(queue, NVME_DSM_MAX_RANGES);
+	queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, queue);
 
 	if (ctrl->quirks & NVME_QUIRK_DEALLOCATE_ZEROES)
-		blk_queue_max_write_zeroes_sectors(ns->queue, UINT_MAX);
+		blk_queue_max_write_zeroes_sectors(queue, UINT_MAX);
 }
 
 static void nvme_report_ns_ids(struct nvme_ctrl *ctrl, unsigned int nsid,
@@ -1249,17 +1342,25 @@ static void __nvme_revalidate_disk(struct gendisk *disk, struct nvme_id_ns *id)
 	if (ctrl->ops->flags & NVME_F_METADATA_SUPPORTED)
 		nvme_prep_integrity(disk, id, bs);
 	blk_queue_logical_block_size(ns->queue, bs);
+	blk_queue_logical_block_size(ns->head->disk->queue, bs);
 	if (ns->noiob)
 		nvme_set_chunk_size(ns);
 	if (ns->ms && !blk_get_integrity(disk) && !ns->ext)
 		nvme_init_integrity(ns);
-	if (ns->ms && !(ns->ms == 8 && ns->pi_type) && !blk_get_integrity(disk))
+	if (ns->ms && !(ns->ms == 8 && ns->pi_type) && !blk_get_integrity(disk)) {
 		set_capacity(disk, 0);
-	else
+		if (ns->head)
+			set_capacity(ns->head->disk, 0);
+	} else {
 		set_capacity(disk, le64_to_cpup(&id->nsze) << (ns->lba_shift - 9));
+		if (ns->head)
+			set_capacity(ns->head->disk, le64_to_cpup(&id->nsze) << (ns->lba_shift - 9));
+	}
 
-	if (ctrl->oncs & NVME_CTRL_ONCS_DSM)
-		nvme_config_discard(ns);
+	if (ctrl->oncs & NVME_CTRL_ONCS_DSM) {
+		nvme_config_discard(ns, ns->queue);
+		nvme_config_discard(ns, ns->head->disk->queue);
+	}
 	blk_mq_unfreeze_queue(disk->queue);
 }
 
@@ -2404,6 +2505,80 @@ static const struct attribute_group *nvme_dev_attr_groups[] = {
 	NULL,
 };
 
+static struct nvme_ns *nvme_find_path(struct nvme_ns_head *head)
+{
+	struct nvme_ns *ns;
+
+	list_for_each_entry_rcu(ns, &head->list, siblings) {
+		if (ns->ctrl->state == NVME_CTRL_LIVE) {
+			rcu_assign_pointer(head->current_path, ns);
+			return ns;
+		}
+	}
+
+	return NULL;
+}
+
+static blk_qc_t nvme_make_request(struct request_queue *q, struct bio *bio)
+{
+	struct nvme_ns_head *head = q->queuedata;
+	struct device *dev = disk_to_dev(head->disk);
+	struct nvme_ns *ns;
+	blk_qc_t ret = BLK_QC_T_NONE;
+	int srcu_idx;
+
+	srcu_idx = srcu_read_lock(&head->srcu);
+	ns = srcu_dereference(head->current_path, &head->srcu);
+	if (unlikely(!ns || ns->ctrl->state != NVME_CTRL_LIVE))
+		ns = nvme_find_path(head);
+	if (likely(ns)) {
+		bio->bi_disk = ns->disk;
+		bio->bi_opf |= REQ_NVME_MPATH;
+		ret = direct_make_request(bio);
+	} else if (!list_empty_careful(&head->list)) {
+		dev_warn_ratelimited(dev, "no path available - requeing I/O\n");
+
+		spin_lock_irq(&head->requeue_lock);
+		bio_list_add(&head->requeue_list, bio);
+		spin_unlock_irq(&head->requeue_lock);
+	} else {
+		dev_warn_ratelimited(dev, "no path - failing I/O\n");
+
+		bio->bi_status = BLK_STS_IOERR;
+		bio_endio(bio);
+	}
+
+	srcu_read_unlock(&head->srcu, srcu_idx);
+	return ret;
+}
+
+static const struct block_device_operations nvme_subsys_ops = {
+	.owner		= THIS_MODULE,
+};
+
+static void nvme_requeue_work(struct work_struct *work)
+{
+	struct nvme_ns_head *head =
+		container_of(work, struct nvme_ns_head, requeue_work);
+	struct bio *bio, *next;
+
+	spin_lock_irq(&head->requeue_lock);
+	next = bio_list_get(&head->requeue_list);
+	spin_unlock_irq(&head->requeue_lock);
+
+	while ((bio = next) != NULL) {
+		next = bio->bi_next;
+		bio->bi_next = NULL;
+
+		/*
+		 * Reset disk to the mpath node and resubmit to select a new
+		 * path.
+		 */
+		bio->bi_disk = head->disk;
+		direct_make_request(bio);
+	}
+}
+
 static struct nvme_ns_head *__nvme_find_ns_head(struct nvme_subsystem *subsys,
 		unsigned nsid)
 {
@@ -2439,6 +2614,7 @@ static struct nvme_ns_head *nvme_alloc_ns_head(struct nvme_ctrl *ctrl,
 		unsigned nsid, struct nvme_id_ns *id)
 {
 	struct nvme_ns_head *head;
+	struct request_queue *q;
 	int ret = -ENOMEM;
 
 	head = kzalloc(sizeof(*head), GFP_KERNEL);
@@ -2447,6 +2623,9 @@ static struct nvme_ns_head *nvme_alloc_ns_head(struct nvme_ctrl *ctrl,
 
 	INIT_LIST_HEAD(&head->list);
 	head->ns_id = nsid;
+	bio_list_init(&head->requeue_list);
+	spin_lock_init(&head->requeue_lock);
+	INIT_WORK(&head->requeue_work, nvme_requeue_work);
 	init_srcu_struct(&head->srcu);
 	kref_init(&head->ref);
 
@@ -2459,8 +2638,37 @@ static struct nvme_ns_head *nvme_alloc_ns_head(struct nvme_ctrl *ctrl,
 		goto out_free_head;
 	}
 
+	ret = -ENOMEM;
+	q = blk_alloc_queue_node(GFP_KERNEL, NUMA_NO_NODE);
+	if (!q)
+		goto out_free_head;
+	q->queuedata = head;
+	blk_queue_make_request(q, nvme_make_request);
+	queue_flag_set_unlocked(QUEUE_FLAG_NONROT, q);
+	/* set to a default value for 512 until disk is validated */
+	blk_queue_logical_block_size(q, 512);
+	nvme_set_queue_limits(ctrl, q);
+
+	head->instance = ida_simple_get(&nvme_disk_ida, 1, 0, GFP_KERNEL);
+	if (head->instance < 0)
+		goto out_cleanup_queue;
+
+	head->disk = alloc_disk(0);
+	if (!head->disk)
+		goto out_ida_remove;
+	head->disk->fops = &nvme_subsys_ops;
+	head->disk->private_data = head;
+	head->disk->queue = q;
+	head->disk->flags = GENHD_FL_EXT_DEVT;
+	sprintf(head->disk->disk_name, "nvme/ns%d", head->instance);
+
 	list_add_tail(&head->entry, &ctrl->subsys->nsheads);
 	return head;
+
+out_ida_remove:
+	ida_simple_remove(&nvme_disk_ida, head->instance);
+out_cleanup_queue:
+	blk_cleanup_queue(q);
 out_free_head:
 	cleanup_srcu_struct(&head->srcu);
 	kfree(head);
@@ -2469,7 +2677,7 @@ static struct nvme_ns_head *nvme_alloc_ns_head(struct nvme_ctrl *ctrl,
 }
 
 static int nvme_init_ns_head(struct nvme_ns *ns, unsigned nsid,
-		struct nvme_id_ns *id)
+		struct nvme_id_ns *id, bool *new)
 {
 	struct nvme_ctrl *ctrl = ns->ctrl;
 	bool is_shared = id->nmic & (1 << 0);
@@ -2485,6 +2693,8 @@ static int nvme_init_ns_head(struct nvme_ns *ns, unsigned nsid,
 			ret = PTR_ERR(head);
 			goto out_unlock;
 		}
+
+		*new = true;
 	} else {
 		struct nvme_ns_ids ids;
 
@@ -2496,6 +2706,8 @@ static int nvme_init_ns_head(struct nvme_ns *ns, unsigned nsid,
 			ret = -EINVAL;
 			goto out_unlock;
 		}
+
+		*new = false;
 	}
 
 	list_add_tail(&ns->siblings, &head->list);
@@ -2565,6 +2777,7 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 	struct nvme_id_ns *id;
 	char disk_name[DISK_NAME_LEN];
 	int node = dev_to_node(ctrl->dev);
+	bool new = true;
 
 	ns = kzalloc_node(sizeof(*ns), GFP_KERNEL, node);
 	if (!ns)
@@ -2597,7 +2810,7 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 	if (id->ncap == 0)
 		goto out_free_id;
 
-	if (nvme_init_ns_head(ns, nsid, id))
+	if (nvme_init_ns_head(ns, nsid, id, &new))
 		goto out_free_id;
 
 	if ((ctrl->quirks & NVME_QUIRK_LIGHTNVM) && id->vs[0] == 0x1) {
@@ -2636,6 +2849,19 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 	if (ns->ndev && nvme_nvm_register_sysfs(ns))
 		pr_warn("%s: failed to register lightnvm sysfs group for identification\n",
 			ns->disk->disk_name);
+
+	if (new)
+		add_disk(ns->head->disk);
+
+	if (sysfs_create_link(&disk_to_dev(ns->disk)->kobj,
+			&disk_to_dev(ns->head->disk)->kobj, "mpath"))
+		pr_warn("%s: failed to create sysfs link to mpath device\n",
+			ns->disk->disk_name);
+	if (sysfs_create_link(&disk_to_dev(ns->head->disk)->kobj,
+			&disk_to_dev(ns->disk)->kobj, ns->disk->disk_name))
+		pr_warn("%s: failed to create sysfs link from mpath device\n",
+			ns->disk->disk_name);
+
 	return;
  out_unlink_ns:
 	mutex_lock(&ctrl->subsys->lock);
@@ -2663,6 +2889,9 @@ static void nvme_ns_remove(struct nvme_ns *ns)
 			blk_integrity_unregister(ns->disk);
 		sysfs_remove_group(&disk_to_dev(ns->disk)->kobj,
 					&nvme_ns_attr_group);
+		sysfs_remove_link(&disk_to_dev(ns->disk)->kobj, "mpath");
+		sysfs_remove_link(&disk_to_dev(ns->head->disk)->kobj,
+				ns->disk->disk_name);
 		if (ns->ndev)
 			nvme_nvm_unregister_sysfs(ns);
 		del_gendisk(ns->disk);
@@ -2670,8 +2899,10 @@ static void nvme_ns_remove(struct nvme_ns *ns)
 	}
 
 	mutex_lock(&ns->ctrl->subsys->lock);
-	if (head)
+	if (head) {
+		rcu_assign_pointer(head->current_path, NULL);
 		list_del_rcu(&ns->siblings);
+	}
 	mutex_unlock(&ns->ctrl->subsys->lock);
 
 	mutex_lock(&ns->ctrl->namespaces_mutex);
@@ -3222,6 +3453,7 @@ int __init nvme_core_init(void)
 
 void nvme_core_exit(void)
 {
+	ida_destroy(&nvme_disk_ida);
 	class_destroy(nvme_class);
 	__unregister_chrdev(nvme_char_major, 0, NVME_MINORS, "nvme");
 	destroy_workqueue(nvme_wq);
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index a724d2597c4c..2062e62c9769 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -94,6 +94,11 @@ struct nvme_request {
 	u16			status;
 };
 
+/*
+ * Mark a bio as coming in through the mpath node.
+ */
+#define REQ_NVME_MPATH		REQ_DRV
+
 enum {
 	NVME_REQ_CANCELLED		= (1 << 0),
 };
@@ -225,12 +230,18 @@ struct nvme_ns_ids {
  * only ever has a single entry for private namespaces.
  */
 struct nvme_ns_head {
+	struct nvme_ns __rcu	*current_path;
+	struct gendisk		*disk;
 	struct list_head	list;
 	struct srcu_struct      srcu;
+	struct bio_list		requeue_list;
+	spinlock_t		requeue_lock;
+	struct work_struct	requeue_work;
 	unsigned		ns_id;
 	struct nvme_ns_ids	ids;
 	struct list_head	entry;
 	struct kref		ref;
+	int			instance;
 };
 
 struct nvme_ns {
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 9/9] nvme: implement multipath access to nvme subsystems
@ 2017-09-18 23:14   ` Christoph Hellwig
  0 siblings, 0 replies; 66+ messages in thread
From: Christoph Hellwig @ 2017-09-18 23:14 UTC (permalink / raw)


This patch adds initial multipath support to the nvme driver.  For each
namespace we create a new block device node, which can be used to access
that namespace through any of the controllers that refer to it.

Currently we will always send I/O to the first available path, this will
be changed once the NVMe Asynchronous Namespace Access (ANA) TP is
ratified and implemented, at which point we will look at the ANA state
for each namespace.  Another possibility that was prototyped is to
use the path that is closes to the submitting NUMA code, which will be
mostly interesting for PCI, but might also be useful for RDMA or FC
transports in the future.  There is not plan to implement round robin
or I/O service time path selectors, as those are not scalable with
the performance rates provided by NVMe.

The multipath device will go away once all paths to it disappear,
any delay to keep it alive needs to be implemented at the controller
level.

Signed-off-by: Christoph Hellwig <hch at lst.de>
---
 drivers/nvme/host/core.c | 264 ++++++++++++++++++++++++++++++++++++++++++++---
 drivers/nvme/host/nvme.h |  11 ++
 2 files changed, 259 insertions(+), 16 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 3e8405fd57a9..5449c83a9dc3 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -77,6 +77,8 @@ static DEFINE_MUTEX(nvme_subsystems_lock);
 static LIST_HEAD(nvme_ctrl_list);
 static DEFINE_SPINLOCK(dev_list_lock);
 
+static DEFINE_IDA(nvme_disk_ida);
+
 static struct class *nvme_class;
 
 static __le32 nvme_get_log_dw10(u8 lid, size_t size)
@@ -104,6 +106,19 @@ static int nvme_reset_ctrl_sync(struct nvme_ctrl *ctrl)
 	return ret;
 }
 
+static void nvme_failover_req(struct request *req)
+{
+	struct nvme_ns *ns = req->q->queuedata;
+	unsigned long flags;
+
+	spin_lock_irqsave(&ns->head->requeue_lock, flags);
+	blk_steal_bios(&ns->head->requeue_list, req);
+	spin_unlock_irqrestore(&ns->head->requeue_lock, flags);
+
+	nvme_reset_ctrl(ns->ctrl);
+	kblockd_schedule_work(&ns->head->requeue_work);
+}
+
 static blk_status_t nvme_error_status(struct request *req)
 {
 	switch (nvme_req(req)->status & 0x7ff) {
@@ -131,6 +146,53 @@ static blk_status_t nvme_error_status(struct request *req)
 	}
 }
 
+static bool nvme_req_needs_failover(struct request *req)
+{
+	if (!(req->cmd_flags & REQ_NVME_MPATH))
+		return false;
+
+	switch (nvme_req(req)->status & 0x7ff) {
+	/*
+	 * Generic command status:
+	 */
+	case NVME_SC_INVALID_OPCODE:
+	case NVME_SC_INVALID_FIELD:
+	case NVME_SC_INVALID_NS:
+	case NVME_SC_LBA_RANGE:
+	case NVME_SC_CAP_EXCEEDED:
+	case NVME_SC_RESERVATION_CONFLICT:
+		return false;
+
+	/*
+	 * I/O command set specific error.  Unfortunately these values are
+	 * reused for fabrics commands, but those should never get here.
+	 */
+	case NVME_SC_BAD_ATTRIBUTES:
+	case NVME_SC_INVALID_PI:
+	case NVME_SC_READ_ONLY:
+	case NVME_SC_ONCS_NOT_SUPPORTED:
+		WARN_ON_ONCE(nvme_req(req)->cmd->common.opcode ==
+			nvme_fabrics_command);
+		return false;
+
+	/*
+	 * Media and Data Integrity Errors:
+	 */
+	case NVME_SC_WRITE_FAULT:
+	case NVME_SC_READ_ERROR:
+	case NVME_SC_GUARD_CHECK:
+	case NVME_SC_APPTAG_CHECK:
+	case NVME_SC_REFTAG_CHECK:
+	case NVME_SC_COMPARE_FAILED:
+	case NVME_SC_ACCESS_DENIED:
+	case NVME_SC_UNWRITTEN_BLOCK:
+		return false;
+	}
+
+	/* Everything else could be a path failure, so should be retried */
+	return true;
+}
+
 static inline bool nvme_req_needs_retry(struct request *req)
 {
 	if (blk_noretry_request(req))
@@ -145,6 +207,11 @@ static inline bool nvme_req_needs_retry(struct request *req)
 void nvme_complete_rq(struct request *req)
 {
 	if (unlikely(nvme_req(req)->status && nvme_req_needs_retry(req))) {
+		if (nvme_req_needs_failover(req)) {
+			nvme_failover_req(req);
+			return;
+		}
+
 		nvme_req(req)->retries++;
 		blk_mq_requeue_request(req, true);
 		return;
@@ -173,6 +240,18 @@ void nvme_cancel_request(struct request *req, void *data, bool reserved)
 }
 EXPORT_SYMBOL_GPL(nvme_cancel_request);
 
+static void nvme_kick_requeue_lists(struct nvme_ctrl *ctrl)
+{
+	struct nvme_ns *ns;
+
+	mutex_lock(&ctrl->namespaces_mutex);
+	list_for_each_entry(ns, &ctrl->namespaces, list) {
+		if (ns->head)
+			kblockd_schedule_work(&ns->head->requeue_work);
+	}
+	mutex_unlock(&ctrl->namespaces_mutex);
+}
+
 bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
 		enum nvme_ctrl_state new_state)
 {
@@ -240,9 +319,10 @@ bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
 
 	if (changed)
 		ctrl->state = new_state;
-
 	spin_unlock_irqrestore(&ctrl->lock, flags);
 
+	if (changed && ctrl->state == NVME_CTRL_LIVE)
+		nvme_kick_requeue_lists(ctrl);
 	return changed;
 }
 EXPORT_SYMBOL_GPL(nvme_change_ctrl_state);
@@ -252,6 +332,15 @@ static void nvme_destroy_ns_head(struct kref *ref)
 	struct nvme_ns_head *head =
 		container_of(ref, struct nvme_ns_head, ref);
 
+	del_gendisk(head->disk);
+	blk_set_queue_dying(head->disk->queue);
+	/* make sure all pending bios are cleaned up */
+	kblockd_schedule_work(&head->requeue_work);
+	flush_work(&head->requeue_work);
+	blk_cleanup_queue(head->disk->queue);
+	put_disk(head->disk);
+	ida_simple_remove(&nvme_disk_ida, head->instance);
+
 	list_del_init(&head->entry);
 	cleanup_srcu_struct(&head->srcu);
 	kfree(head);
@@ -1123,8 +1212,10 @@ static void nvme_prep_integrity(struct gendisk *disk, struct nvme_id_ns *id,
 	if (blk_get_integrity(disk) &&
 	    (ns->pi_type != pi_type || ns->ms != old_ms ||
 	     bs != queue_logical_block_size(disk->queue) ||
-	     (ns->ms && ns->ext)))
+	     (ns->ms && ns->ext))) {
 		blk_integrity_unregister(disk);
+		blk_integrity_unregister(ns->head->disk);
+	}
 
 	ns->pi_type = pi_type;
 }
@@ -1152,7 +1243,9 @@ static void nvme_init_integrity(struct nvme_ns *ns)
 	}
 	integrity.tuple_size = ns->ms;
 	blk_integrity_register(ns->disk, &integrity);
+	blk_integrity_register(ns->head->disk, &integrity);
 	blk_queue_max_integrity_segments(ns->queue, 1);
+	blk_queue_max_integrity_segments(ns->head->disk->queue, 1);
 }
 #else
 static void nvme_prep_integrity(struct gendisk *disk, struct nvme_id_ns *id,
@@ -1170,7 +1263,7 @@ static void nvme_set_chunk_size(struct nvme_ns *ns)
 	blk_queue_chunk_sectors(ns->queue, rounddown_pow_of_two(chunk_size));
 }
 
-static void nvme_config_discard(struct nvme_ns *ns)
+static void nvme_config_discard(struct nvme_ns *ns, struct request_queue *queue)
 {
 	struct nvme_ctrl *ctrl = ns->ctrl;
 	u32 logical_block_size = queue_logical_block_size(ns->queue);
@@ -1181,18 +1274,18 @@ static void nvme_config_discard(struct nvme_ns *ns)
 	if (ctrl->nr_streams && ns->sws && ns->sgs) {
 		unsigned int sz = logical_block_size * ns->sws * ns->sgs;
 
-		ns->queue->limits.discard_alignment = sz;
-		ns->queue->limits.discard_granularity = sz;
+		queue->limits.discard_alignment = sz;
+		queue->limits.discard_granularity = sz;
 	} else {
 		ns->queue->limits.discard_alignment = logical_block_size;
 		ns->queue->limits.discard_granularity = logical_block_size;
 	}
-	blk_queue_max_discard_sectors(ns->queue, UINT_MAX);
-	blk_queue_max_discard_segments(ns->queue, NVME_DSM_MAX_RANGES);
-	queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, ns->queue);
+	blk_queue_max_discard_sectors(queue, UINT_MAX);
+	blk_queue_max_discard_segments(queue, NVME_DSM_MAX_RANGES);
+	queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, queue);
 
 	if (ctrl->quirks & NVME_QUIRK_DEALLOCATE_ZEROES)
-		blk_queue_max_write_zeroes_sectors(ns->queue, UINT_MAX);
+		blk_queue_max_write_zeroes_sectors(queue, UINT_MAX);
 }
 
 static void nvme_report_ns_ids(struct nvme_ctrl *ctrl, unsigned int nsid,
@@ -1249,17 +1342,25 @@ static void __nvme_revalidate_disk(struct gendisk *disk, struct nvme_id_ns *id)
 	if (ctrl->ops->flags & NVME_F_METADATA_SUPPORTED)
 		nvme_prep_integrity(disk, id, bs);
 	blk_queue_logical_block_size(ns->queue, bs);
+	blk_queue_logical_block_size(ns->head->disk->queue, bs);
 	if (ns->noiob)
 		nvme_set_chunk_size(ns);
 	if (ns->ms && !blk_get_integrity(disk) && !ns->ext)
 		nvme_init_integrity(ns);
-	if (ns->ms && !(ns->ms == 8 && ns->pi_type) && !blk_get_integrity(disk))
+	if (ns->ms && !(ns->ms == 8 && ns->pi_type) && !blk_get_integrity(disk)) {
 		set_capacity(disk, 0);
-	else
+		if (ns->head)
+			set_capacity(ns->head->disk, 0);
+	} else {
 		set_capacity(disk, le64_to_cpup(&id->nsze) << (ns->lba_shift - 9));
+		if (ns->head)
+			set_capacity(ns->head->disk, le64_to_cpup(&id->nsze) << (ns->lba_shift - 9));
+	}
 
-	if (ctrl->oncs & NVME_CTRL_ONCS_DSM)
-		nvme_config_discard(ns);
+	if (ctrl->oncs & NVME_CTRL_ONCS_DSM) {
+		nvme_config_discard(ns, ns->queue);
+		nvme_config_discard(ns, ns->head->disk->queue);
+	}
 	blk_mq_unfreeze_queue(disk->queue);
 }
 
@@ -2404,6 +2505,80 @@ static const struct attribute_group *nvme_dev_attr_groups[] = {
 	NULL,
 };
 
+static struct nvme_ns *nvme_find_path(struct nvme_ns_head *head)
+{
+	struct nvme_ns *ns;
+
+	list_for_each_entry_rcu(ns, &head->list, siblings) {
+		if (ns->ctrl->state == NVME_CTRL_LIVE) {
+			rcu_assign_pointer(head->current_path, ns);
+			return ns;
+		}
+	}
+
+	return NULL;
+}
+
+static blk_qc_t nvme_make_request(struct request_queue *q, struct bio *bio)
+{
+	struct nvme_ns_head *head = q->queuedata;
+	struct device *dev = disk_to_dev(head->disk);
+	struct nvme_ns *ns;
+	blk_qc_t ret = BLK_QC_T_NONE;
+	int srcu_idx;
+
+	srcu_idx = srcu_read_lock(&head->srcu);
+	ns = srcu_dereference(head->current_path, &head->srcu);
+	if (unlikely(!ns || ns->ctrl->state != NVME_CTRL_LIVE))
+		ns = nvme_find_path(head);
+	if (likely(ns)) {
+		bio->bi_disk = ns->disk;
+		bio->bi_opf |= REQ_NVME_MPATH;
+		ret = direct_make_request(bio);
+	} else if (!list_empty_careful(&head->list)) {
+		dev_warn_ratelimited(dev, "no path available - requeing I/O\n");
+
+		spin_lock_irq(&head->requeue_lock);
+		bio_list_add(&head->requeue_list, bio);
+		spin_unlock_irq(&head->requeue_lock);
+	} else {
+		dev_warn_ratelimited(dev, "no path - failing I/O\n");
+
+		bio->bi_status = BLK_STS_IOERR;
+		bio_endio(bio);
+	}
+
+	srcu_read_unlock(&head->srcu, srcu_idx);
+	return ret;
+}
+
+static const struct block_device_operations nvme_subsys_ops = {
+	.owner		= THIS_MODULE,
+};
+
+static void nvme_requeue_work(struct work_struct *work)
+{
+	struct nvme_ns_head *head =
+		container_of(work, struct nvme_ns_head, requeue_work);
+	struct bio *bio, *next;
+
+	spin_lock_irq(&head->requeue_lock);
+	next = bio_list_get(&head->requeue_list);
+	spin_unlock_irq(&head->requeue_lock);
+
+	while ((bio = next) != NULL) {
+		next = bio->bi_next;
+		bio->bi_next = NULL;
+
+		/*
+		 * Reset disk to the mpath node and resubmit to select a new
+		 * path.
+		 */
+		bio->bi_disk = head->disk;
+		direct_make_request(bio);
+	}
+}
+
 static struct nvme_ns_head *__nvme_find_ns_head(struct nvme_subsystem *subsys,
 		unsigned nsid)
 {
@@ -2439,6 +2614,7 @@ static struct nvme_ns_head *nvme_alloc_ns_head(struct nvme_ctrl *ctrl,
 		unsigned nsid, struct nvme_id_ns *id)
 {
 	struct nvme_ns_head *head;
+	struct request_queue *q;
 	int ret = -ENOMEM;
 
 	head = kzalloc(sizeof(*head), GFP_KERNEL);
@@ -2447,6 +2623,9 @@ static struct nvme_ns_head *nvme_alloc_ns_head(struct nvme_ctrl *ctrl,
 
 	INIT_LIST_HEAD(&head->list);
 	head->ns_id = nsid;
+	bio_list_init(&head->requeue_list);
+	spin_lock_init(&head->requeue_lock);
+	INIT_WORK(&head->requeue_work, nvme_requeue_work);
 	init_srcu_struct(&head->srcu);
 	kref_init(&head->ref);
 
@@ -2459,8 +2638,37 @@ static struct nvme_ns_head *nvme_alloc_ns_head(struct nvme_ctrl *ctrl,
 		goto out_free_head;
 	}
 
+	ret = -ENOMEM;
+	q = blk_alloc_queue_node(GFP_KERNEL, NUMA_NO_NODE);
+	if (!q)
+		goto out_free_head;
+	q->queuedata = head;
+	blk_queue_make_request(q, nvme_make_request);
+	queue_flag_set_unlocked(QUEUE_FLAG_NONROT, q);
+	/* set to a default value for 512 until disk is validated */
+	blk_queue_logical_block_size(q, 512);
+	nvme_set_queue_limits(ctrl, q);
+
+	head->instance = ida_simple_get(&nvme_disk_ida, 1, 0, GFP_KERNEL);
+	if (head->instance < 0)
+		goto out_cleanup_queue;
+
+	head->disk = alloc_disk(0);
+	if (!head->disk)
+		goto out_ida_remove;
+	head->disk->fops = &nvme_subsys_ops;
+	head->disk->private_data = head;
+	head->disk->queue = q;
+	head->disk->flags = GENHD_FL_EXT_DEVT;
+	sprintf(head->disk->disk_name, "nvme/ns%d", head->instance);
+
 	list_add_tail(&head->entry, &ctrl->subsys->nsheads);
 	return head;
+
+out_ida_remove:
+	ida_simple_remove(&nvme_disk_ida, head->instance);
+out_cleanup_queue:
+	blk_cleanup_queue(q);
 out_free_head:
 	cleanup_srcu_struct(&head->srcu);
 	kfree(head);
@@ -2469,7 +2677,7 @@ static struct nvme_ns_head *nvme_alloc_ns_head(struct nvme_ctrl *ctrl,
 }
 
 static int nvme_init_ns_head(struct nvme_ns *ns, unsigned nsid,
-		struct nvme_id_ns *id)
+		struct nvme_id_ns *id, bool *new)
 {
 	struct nvme_ctrl *ctrl = ns->ctrl;
 	bool is_shared = id->nmic & (1 << 0);
@@ -2485,6 +2693,8 @@ static int nvme_init_ns_head(struct nvme_ns *ns, unsigned nsid,
 			ret = PTR_ERR(head);
 			goto out_unlock;
 		}
+
+		*new = true;
 	} else {
 		struct nvme_ns_ids ids;
 
@@ -2496,6 +2706,8 @@ static int nvme_init_ns_head(struct nvme_ns *ns, unsigned nsid,
 			ret = -EINVAL;
 			goto out_unlock;
 		}
+
+		*new = false;
 	}
 
 	list_add_tail(&ns->siblings, &head->list);
@@ -2565,6 +2777,7 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 	struct nvme_id_ns *id;
 	char disk_name[DISK_NAME_LEN];
 	int node = dev_to_node(ctrl->dev);
+	bool new = true;
 
 	ns = kzalloc_node(sizeof(*ns), GFP_KERNEL, node);
 	if (!ns)
@@ -2597,7 +2810,7 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 	if (id->ncap == 0)
 		goto out_free_id;
 
-	if (nvme_init_ns_head(ns, nsid, id))
+	if (nvme_init_ns_head(ns, nsid, id, &new))
 		goto out_free_id;
 
 	if ((ctrl->quirks & NVME_QUIRK_LIGHTNVM) && id->vs[0] == 0x1) {
@@ -2636,6 +2849,19 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
 	if (ns->ndev && nvme_nvm_register_sysfs(ns))
 		pr_warn("%s: failed to register lightnvm sysfs group for identification\n",
 			ns->disk->disk_name);
+
+	if (new)
+		add_disk(ns->head->disk);
+
+	if (sysfs_create_link(&disk_to_dev(ns->disk)->kobj,
+			&disk_to_dev(ns->head->disk)->kobj, "mpath"))
+		pr_warn("%s: failed to create sysfs link to mpath device\n",
+			ns->disk->disk_name);
+	if (sysfs_create_link(&disk_to_dev(ns->head->disk)->kobj,
+			&disk_to_dev(ns->disk)->kobj, ns->disk->disk_name))
+		pr_warn("%s: failed to create sysfs link from mpath device\n",
+			ns->disk->disk_name);
+
 	return;
  out_unlink_ns:
 	mutex_lock(&ctrl->subsys->lock);
@@ -2663,6 +2889,9 @@ static void nvme_ns_remove(struct nvme_ns *ns)
 			blk_integrity_unregister(ns->disk);
 		sysfs_remove_group(&disk_to_dev(ns->disk)->kobj,
 					&nvme_ns_attr_group);
+		sysfs_remove_link(&disk_to_dev(ns->disk)->kobj, "mpath");
+		sysfs_remove_link(&disk_to_dev(ns->head->disk)->kobj,
+				ns->disk->disk_name);
 		if (ns->ndev)
 			nvme_nvm_unregister_sysfs(ns);
 		del_gendisk(ns->disk);
@@ -2670,8 +2899,10 @@ static void nvme_ns_remove(struct nvme_ns *ns)
 	}
 
 	mutex_lock(&ns->ctrl->subsys->lock);
-	if (head)
+	if (head) {
+		rcu_assign_pointer(head->current_path, NULL);
 		list_del_rcu(&ns->siblings);
+	}
 	mutex_unlock(&ns->ctrl->subsys->lock);
 
 	mutex_lock(&ns->ctrl->namespaces_mutex);
@@ -3222,6 +3453,7 @@ int __init nvme_core_init(void)
 
 void nvme_core_exit(void)
 {
+	ida_destroy(&nvme_disk_ida);
 	class_destroy(nvme_class);
 	__unregister_chrdev(nvme_char_major, 0, NVME_MINORS, "nvme");
 	destroy_workqueue(nvme_wq);
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index a724d2597c4c..2062e62c9769 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -94,6 +94,11 @@ struct nvme_request {
 	u16			status;
 };
 
+/*
+ * Mark a bio as coming in through the mpath node.
+ */
+#define REQ_NVME_MPATH		REQ_DRV
+
 enum {
 	NVME_REQ_CANCELLED		= (1 << 0),
 };
@@ -225,12 +230,18 @@ struct nvme_ns_ids {
  * only ever has a single entry for private namespaces.
  */
 struct nvme_ns_head {
+	struct nvme_ns __rcu	*current_path;
+	struct gendisk		*disk;
 	struct list_head	list;
 	struct srcu_struct      srcu;
+	struct bio_list		requeue_list;
+	spinlock_t		requeue_lock;
+	struct work_struct	requeue_work;
 	unsigned		ns_id;
 	struct nvme_ns_ids	ids;
 	struct list_head	entry;
 	struct kref		ref;
+	int			instance;
 };
 
 struct nvme_ns {
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: [PATCH 9/9] nvme: implement multipath access to nvme subsystems
  2017-09-18 23:14   ` Christoph Hellwig
@ 2017-09-20  0:18     ` Tony Yang
  -1 siblings, 0 replies; 66+ messages in thread
From: Tony Yang @ 2017-09-20  0:18 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Keith Busch, linux-block, Sagi Grimberg, linux-nvme

Hi , Christoph

     I use the above code to recompile the kernel. The following error
occurred. I can't find the blk_steal_bios function. What's the reason
for that? Hope to get your help, thank you


[root@scst1 nvme-nvme-4.13]# make
  CHK     include/config/kernel.release
  CHK     include/generated/uapi/linux/version.h
  CHK     include/generated/utsrelease.h
  CHK     include/generated/bounds.h
  CHK     include/generated/timeconst.h
  CHK     include/generated/asm-offsets.h
  CALL    scripts/checksyscalls.sh
  CHK     scripts/mod/devicetable-offsets.h
  CHK     include/generated/compile.h
  CC [M]  drivers/nvme/host/core.o
drivers/nvme/host/core.c: In function 'nvme_failover_req':
drivers/nvme/host/core.c:115:2: error: implicit declaration of
function 'blk_steal_bios' [-Werror=implicit-function-declaration]
  blk_steal_bios(&ns->head->requeue_list, req);
  ^
In file included from drivers/nvme/host/core.c:32:0:
drivers/nvme/host/core.c: In function 'nvme_req_needs_failover':
drivers/nvme/host/nvme.h:100:25: error: 'REQ_DRV' undeclared (first
use in this function)
 #define REQ_NVME_MPATH  REQ_DRV
                         ^
drivers/nvme/host/core.c:151:25: note: in expansion of macro 'REQ_NVME_MPATH'
  if (!(req->cmd_flags & REQ_NVME_MPATH))
                         ^
drivers/nvme/host/nvme.h:100:25: note: each undeclared identifier is
reported only once for each function it appears in
 #define REQ_NVME_MPATH  REQ_DRV
                         ^
drivers/nvme/host/core.c:151:25: note: in expansion of macro 'REQ_NVME_MPATH'
  if (!(req->cmd_flags & REQ_NVME_MPATH))
                         ^
In file included from ./include/linux/byteorder/little_endian.h:4:0,
                 from ./arch/x86/include/uapi/asm/byteorder.h:4,
                 from ./include/asm-generic/bitops/le.h:5,
                 from ./arch/x86/include/asm/bitops.h:517,
                 from ./include/linux/bitops.h:36,
                 from ./include/linux/kernel.h:10,
                 from ./arch/x86/include/asm/percpu.h:44,
                 from ./arch/x86/include/asm/current.h:5,
                 from ./include/linux/sched.h:11,
                 from ./include/linux/blkdev.h:4,
                 from drivers/nvme/host/core.c:15:
drivers/nvme/host/core.c: In function 'nvme_toggle_streams':
drivers/nvme/host/core.c:433:33: error: 'NVME_NSID_ALL' undeclared
(first use in this function)
  c.directive.nsid = cpu_to_le32(NVME_NSID_ALL);
                                 ^
./include/uapi/linux/byteorder/little_endian.h:32:51: note: in
definition of macro '__cpu_to_le32'
 #define __cpu_to_le32(x) ((__force __le32)(__u32)(x))
                                                   ^
drivers/nvme/host/core.c:433:21: note: in expansion of macro 'cpu_to_le32'
  c.directive.nsid = cpu_to_le32(NVME_NSID_ALL);
                     ^
drivers/nvme/host/core.c: In function 'nvme_configure_directives':
drivers/nvme/host/core.c:483:41: error: 'NVME_NSID_ALL' undeclared
(first use in this function)
  ret = nvme_get_stream_params(ctrl, &s, NVME_NSID_ALL);
                                         ^
drivers/nvme/host/core.c: In function 'nvme_submit_user_cmd':
drivers/nvme/host/core.c:770:6: error: 'struct bio' has no member
named 'bi_disk'
   bio->bi_disk = disk;
      ^
drivers/nvme/host/core.c: In function 'nvme_enable_ctrl':
drivers/nvme/host/core.c:1598:23: error: 'NVME_CC_AMS_RR' undeclared
(first use in this function)
  ctrl->ctrl_config |= NVME_CC_AMS_RR | NVME_CC_SHN_NONE;
                       ^
drivers/nvme/host/core.c: In function 'nvme_configure_timestamp':
drivers/nvme/host/core.c:1665:21: error: 'NVME_CTRL_ONCS_TIMESTAMP'
undeclared (first use in this function)
  if (!(ctrl->oncs & NVME_CTRL_ONCS_TIMESTAMP))
                     ^
drivers/nvme/host/core.c:1669:32: error: 'NVME_FEAT_TIMESTAMP'
undeclared (first use in this function)
  ret = nvme_set_features(ctrl, NVME_FEAT_TIMESTAMP, 0, &ts, sizeof(ts),
                                ^
In file included from ./include/linux/byteorder/little_endian.h:4:0,
                 from ./arch/x86/include/uapi/asm/byteorder.h:4,
                 from ./include/asm-generic/bitops/le.h:5,
                 from ./arch/x86/include/asm/bitops.h:517,
                 from ./include/linux/bitops.h:36,
                 from ./include/linux/kernel.h:10,
                 from ./arch/x86/include/asm/percpu.h:44,
                 from ./arch/x86/include/asm/current.h:5,
                 from ./include/linux/sched.h:11,
                 from ./include/linux/blkdev.h:4,
                 from drivers/nvme/host/core.c:15:
drivers/nvme/host/core.c: In function 'nvme_init_identify':
drivers/nvme/host/core.c:2116:33: error: 'struct nvme_id_ctrl' has no
member named 'hmminds'
   ctrl->hmminds = le32_to_cpu(id->hmminds);
                                 ^
./include/uapi/linux/byteorder/little_endian.h:33:51: note: in
definition of macro '__le32_to_cpu'
 #define __le32_to_cpu(x) ((__force __u32)(__le32)(x))
                                                   ^
drivers/nvme/host/core.c:2116:19: note: in expansion of macro 'le32_to_cpu'
   ctrl->hmminds = le32_to_cpu(id->hmminds);
                   ^
drivers/nvme/host/core.c:2117:32: error: 'struct nvme_id_ctrl' has no
member named 'hmmaxd'
   ctrl->hmmaxd = le16_to_cpu(id->hmmaxd);
                                ^
./include/uapi/linux/byteorder/little_endian.h:35:51: note: in
definition of macro '__le16_to_cpu'
 #define __le16_to_cpu(x) ((__force __u16)(__le16)(x))
                                                   ^
drivers/nvme/host/core.c:2117:18: note: in expansion of macro 'le16_to_cpu'
   ctrl->hmmaxd = le16_to_cpu(id->hmmaxd);
                  ^
drivers/nvme/host/core.c: In function 'nvme_make_request':
drivers/nvme/host/core.c:2535:6: error: 'struct bio' has no member
named 'bi_disk'
   bio->bi_disk = ns->disk;
      ^
In file included from drivers/nvme/host/core.c:32:0:
drivers/nvme/host/nvme.h:100:25: error: 'REQ_DRV' undeclared (first
use in this function)
 #define REQ_NVME_MPATH  REQ_DRV
                         ^
drivers/nvme/host/core.c:2536:18: note: in expansion of macro 'REQ_NVME_MPATH'
   bio->bi_opf |= REQ_NVME_MPATH;
                  ^
drivers/nvme/host/core.c:2537:3: error: implicit declaration of
function 'direct_make_request' [-Werror=implicit-function-declaration]
   ret = direct_make_request(bio);
   ^
drivers/nvme/host/core.c: In function 'nvme_requeue_work':
drivers/nvme/host/core.c:2577:6: error: 'struct bio' has no member
named 'bi_disk'
   bio->bi_disk = head->disk;
      ^
drivers/nvme/host/core.c: In function 'nvme_ctrl_pp_status':
drivers/nvme/host/core.c:3078:58: error: 'NVME_CSTS_PP' undeclared
(first use in this function)
  return ((ctrl->ctrl_config & NVME_CC_ENABLE) && (csts & NVME_CSTS_PP));
                                                          ^
drivers/nvme/host/core.c: In function 'nvme_get_fw_slot_info':
drivers/nvme/host/core.c:3086:23: error: dereferencing pointer to
incomplete type
  log = kmalloc(sizeof(*log), GFP_KERNEL);
                       ^
In file included from ./include/linux/byteorder/little_endian.h:4:0,
                 from ./arch/x86/include/uapi/asm/byteorder.h:4,
                 from ./include/asm-generic/bitops/le.h:5,
                 from ./arch/x86/include/asm/bitops.h:517,
                 from ./include/linux/bitops.h:36,
                 from ./include/linux/kernel.h:10,
                 from ./arch/x86/include/asm/percpu.h:44,
                 from ./arch/x86/include/asm/current.h:5,
                 from ./include/linux/sched.h:11,
                 from ./include/linux/blkdev.h:4,
                 from drivers/nvme/host/core.c:15:
drivers/nvme/host/core.c:3091:30: error: 'NVME_NSID_ALL' undeclared
(first use in this function)
  c.common.nsid = cpu_to_le32(NVME_NSID_ALL);
                              ^
./include/uapi/linux/byteorder/little_endian.h:32:51: note: in
definition of macro '__cpu_to_le32'
 #define __cpu_to_le32(x) ((__force __le32)(__u32)(x))
                                                   ^
drivers/nvme/host/core.c:3091:18: note: in expansion of macro 'cpu_to_le32'
  c.common.nsid = cpu_to_le32(NVME_NSID_ALL);
                  ^
drivers/nvme/host/core.c:3092:65: error: dereferencing pointer to
incomplete type
  c.common.cdw10[0] = nvme_get_log_dw10(NVME_LOG_FW_SLOT, sizeof(*log));
                                                                 ^
drivers/nvme/host/core.c:3094:59: error: dereferencing pointer to
incomplete type
  if (!nvme_submit_sync_cmd(ctrl->admin_q, &c, log, sizeof(*log)))
                                                           ^
drivers/nvme/host/core.c: In function 'nvme_complete_async_event':
drivers/nvme/host/core.c:3159:7: error:
'NVME_AER_NOTICE_FW_ACT_STARTING' undeclared (first use in this
function)
  case NVME_AER_NOTICE_FW_ACT_STARTING:
       ^
drivers/nvme/host/core.c: In function 'nvme_ctrl_pp_status':
drivers/nvme/host/core.c:3079:1: warning: control reaches end of
non-void function [-Wreturn-type]
 }
 ^
cc1: some warnings being treated as errors
make[3]: *** [drivers/nvme/host/core.o] Error 1
make[2]: *** [drivers/nvme/host] Error 2
make[1]: *** [drivers/nvme] Error 2
make: *** [drivers] Error 2
[root@scst1 nvme-nvme-4.13]#

2017-09-19 7:14 GMT+08:00 Christoph Hellwig <hch@lst.de>:
> This patch adds initial multipath support to the nvme driver.  For each
> namespace we create a new block device node, which can be used to access
> that namespace through any of the controllers that refer to it.
>
> Currently we will always send I/O to the first available path, this will
> be changed once the NVMe Asynchronous Namespace Access (ANA) TP is
> ratified and implemented, at which point we will look at the ANA state
> for each namespace.  Another possibility that was prototyped is to
> use the path that is closes to the submitting NUMA code, which will be
> mostly interesting for PCI, but might also be useful for RDMA or FC
> transports in the future.  There is not plan to implement round robin
> or I/O service time path selectors, as those are not scalable with
> the performance rates provided by NVMe.
>
> The multipath device will go away once all paths to it disappear,
> any delay to keep it alive needs to be implemented at the controller
> level.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  drivers/nvme/host/core.c | 264 ++++++++++++++++++++++++++++++++++++++++++++---
>  drivers/nvme/host/nvme.h |  11 ++
>  2 files changed, 259 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 3e8405fd57a9..5449c83a9dc3 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -77,6 +77,8 @@ static DEFINE_MUTEX(nvme_subsystems_lock);
>  static LIST_HEAD(nvme_ctrl_list);
>  static DEFINE_SPINLOCK(dev_list_lock);
>
> +static DEFINE_IDA(nvme_disk_ida);
> +
>  static struct class *nvme_class;
>
>  static __le32 nvme_get_log_dw10(u8 lid, size_t size)
> @@ -104,6 +106,19 @@ static int nvme_reset_ctrl_sync(struct nvme_ctrl *ctrl)
>         return ret;
>  }
>
> +static void nvme_failover_req(struct request *req)
> +{
> +       struct nvme_ns *ns = req->q->queuedata;
> +       unsigned long flags;
> +
> +       spin_lock_irqsave(&ns->head->requeue_lock, flags);
> +       blk_steal_bios(&ns->head->requeue_list, req);
> +       spin_unlock_irqrestore(&ns->head->requeue_lock, flags);
> +
> +       nvme_reset_ctrl(ns->ctrl);
> +       kblockd_schedule_work(&ns->head->requeue_work);
> +}
> +
>  static blk_status_t nvme_error_status(struct request *req)
>  {
>         switch (nvme_req(req)->status & 0x7ff) {
> @@ -131,6 +146,53 @@ static blk_status_t nvme_error_status(struct request *req)
>         }
>  }
>
> +static bool nvme_req_needs_failover(struct request *req)
> +{
> +       if (!(req->cmd_flags & REQ_NVME_MPATH))
> +               return false;
> +
> +       switch (nvme_req(req)->status & 0x7ff) {
> +       /*
> +        * Generic command status:
> +        */
> +       case NVME_SC_INVALID_OPCODE:
> +       case NVME_SC_INVALID_FIELD:
> +       case NVME_SC_INVALID_NS:
> +       case NVME_SC_LBA_RANGE:
> +       case NVME_SC_CAP_EXCEEDED:
> +       case NVME_SC_RESERVATION_CONFLICT:
> +               return false;
> +
> +       /*
> +        * I/O command set specific error.  Unfortunately these values are
> +        * reused for fabrics commands, but those should never get here.
> +        */
> +       case NVME_SC_BAD_ATTRIBUTES:
> +       case NVME_SC_INVALID_PI:
> +       case NVME_SC_READ_ONLY:
> +       case NVME_SC_ONCS_NOT_SUPPORTED:
> +               WARN_ON_ONCE(nvme_req(req)->cmd->common.opcode ==
> +                       nvme_fabrics_command);
> +               return false;
> +
> +       /*
> +        * Media and Data Integrity Errors:
> +        */
> +       case NVME_SC_WRITE_FAULT:
> +       case NVME_SC_READ_ERROR:
> +       case NVME_SC_GUARD_CHECK:
> +       case NVME_SC_APPTAG_CHECK:
> +       case NVME_SC_REFTAG_CHECK:
> +       case NVME_SC_COMPARE_FAILED:
> +       case NVME_SC_ACCESS_DENIED:
> +       case NVME_SC_UNWRITTEN_BLOCK:
> +               return false;
> +       }
> +
> +       /* Everything else could be a path failure, so should be retried */
> +       return true;
> +}
> +
>  static inline bool nvme_req_needs_retry(struct request *req)
>  {
>         if (blk_noretry_request(req))
> @@ -145,6 +207,11 @@ static inline bool nvme_req_needs_retry(struct request *req)
>  void nvme_complete_rq(struct request *req)
>  {
>         if (unlikely(nvme_req(req)->status && nvme_req_needs_retry(req))) {
> +               if (nvme_req_needs_failover(req)) {
> +                       nvme_failover_req(req);
> +                       return;
> +               }
> +
>                 nvme_req(req)->retries++;
>                 blk_mq_requeue_request(req, true);
>                 return;
> @@ -173,6 +240,18 @@ void nvme_cancel_request(struct request *req, void *data, bool reserved)
>  }
>  EXPORT_SYMBOL_GPL(nvme_cancel_request);
>
> +static void nvme_kick_requeue_lists(struct nvme_ctrl *ctrl)
> +{
> +       struct nvme_ns *ns;
> +
> +       mutex_lock(&ctrl->namespaces_mutex);
> +       list_for_each_entry(ns, &ctrl->namespaces, list) {
> +               if (ns->head)
> +                       kblockd_schedule_work(&ns->head->requeue_work);
> +       }
> +       mutex_unlock(&ctrl->namespaces_mutex);
> +}
> +
>  bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
>                 enum nvme_ctrl_state new_state)
>  {
> @@ -240,9 +319,10 @@ bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
>
>         if (changed)
>                 ctrl->state = new_state;
> -
>         spin_unlock_irqrestore(&ctrl->lock, flags);
>
> +       if (changed && ctrl->state == NVME_CTRL_LIVE)
> +               nvme_kick_requeue_lists(ctrl);
>         return changed;
>  }
>  EXPORT_SYMBOL_GPL(nvme_change_ctrl_state);
> @@ -252,6 +332,15 @@ static void nvme_destroy_ns_head(struct kref *ref)
>         struct nvme_ns_head *head =
>                 container_of(ref, struct nvme_ns_head, ref);
>
> +       del_gendisk(head->disk);
> +       blk_set_queue_dying(head->disk->queue);
> +       /* make sure all pending bios are cleaned up */
> +       kblockd_schedule_work(&head->requeue_work);
> +       flush_work(&head->requeue_work);
> +       blk_cleanup_queue(head->disk->queue);
> +       put_disk(head->disk);
> +       ida_simple_remove(&nvme_disk_ida, head->instance);
> +
>         list_del_init(&head->entry);
>         cleanup_srcu_struct(&head->srcu);
>         kfree(head);
> @@ -1123,8 +1212,10 @@ static void nvme_prep_integrity(struct gendisk *disk, struct nvme_id_ns *id,
>         if (blk_get_integrity(disk) &&
>             (ns->pi_type != pi_type || ns->ms != old_ms ||
>              bs != queue_logical_block_size(disk->queue) ||
> -            (ns->ms && ns->ext)))
> +            (ns->ms && ns->ext))) {
>                 blk_integrity_unregister(disk);
> +               blk_integrity_unregister(ns->head->disk);
> +       }
>
>         ns->pi_type = pi_type;
>  }
> @@ -1152,7 +1243,9 @@ static void nvme_init_integrity(struct nvme_ns *ns)
>         }
>         integrity.tuple_size = ns->ms;
>         blk_integrity_register(ns->disk, &integrity);
> +       blk_integrity_register(ns->head->disk, &integrity);
>         blk_queue_max_integrity_segments(ns->queue, 1);
> +       blk_queue_max_integrity_segments(ns->head->disk->queue, 1);
>  }
>  #else
>  static void nvme_prep_integrity(struct gendisk *disk, struct nvme_id_ns *id,
> @@ -1170,7 +1263,7 @@ static void nvme_set_chunk_size(struct nvme_ns *ns)
>         blk_queue_chunk_sectors(ns->queue, rounddown_pow_of_two(chunk_size));
>  }
>
> -static void nvme_config_discard(struct nvme_ns *ns)
> +static void nvme_config_discard(struct nvme_ns *ns, struct request_queue *queue)
>  {
>         struct nvme_ctrl *ctrl = ns->ctrl;
>         u32 logical_block_size = queue_logical_block_size(ns->queue);
> @@ -1181,18 +1274,18 @@ static void nvme_config_discard(struct nvme_ns *ns)
>         if (ctrl->nr_streams && ns->sws && ns->sgs) {
>                 unsigned int sz = logical_block_size * ns->sws * ns->sgs;
>
> -               ns->queue->limits.discard_alignment = sz;
> -               ns->queue->limits.discard_granularity = sz;
> +               queue->limits.discard_alignment = sz;
> +               queue->limits.discard_granularity = sz;
>         } else {
>                 ns->queue->limits.discard_alignment = logical_block_size;
>                 ns->queue->limits.discard_granularity = logical_block_size;
>         }
> -       blk_queue_max_discard_sectors(ns->queue, UINT_MAX);
> -       blk_queue_max_discard_segments(ns->queue, NVME_DSM_MAX_RANGES);
> -       queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, ns->queue);
> +       blk_queue_max_discard_sectors(queue, UINT_MAX);
> +       blk_queue_max_discard_segments(queue, NVME_DSM_MAX_RANGES);
> +       queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, queue);
>
>         if (ctrl->quirks & NVME_QUIRK_DEALLOCATE_ZEROES)
> -               blk_queue_max_write_zeroes_sectors(ns->queue, UINT_MAX);
> +               blk_queue_max_write_zeroes_sectors(queue, UINT_MAX);
>  }
>
>  static void nvme_report_ns_ids(struct nvme_ctrl *ctrl, unsigned int nsid,
> @@ -1249,17 +1342,25 @@ static void __nvme_revalidate_disk(struct gendisk *disk, struct nvme_id_ns *id)
>         if (ctrl->ops->flags & NVME_F_METADATA_SUPPORTED)
>                 nvme_prep_integrity(disk, id, bs);
>         blk_queue_logical_block_size(ns->queue, bs);
> +       blk_queue_logical_block_size(ns->head->disk->queue, bs);
>         if (ns->noiob)
>                 nvme_set_chunk_size(ns);
>         if (ns->ms && !blk_get_integrity(disk) && !ns->ext)
>                 nvme_init_integrity(ns);
> -       if (ns->ms && !(ns->ms == 8 && ns->pi_type) && !blk_get_integrity(disk))
> +       if (ns->ms && !(ns->ms == 8 && ns->pi_type) && !blk_get_integrity(disk)) {
>                 set_capacity(disk, 0);
> -       else
> +               if (ns->head)
> +                       set_capacity(ns->head->disk, 0);
> +       } else {
>                 set_capacity(disk, le64_to_cpup(&id->nsze) << (ns->lba_shift - 9));
> +               if (ns->head)
> +                       set_capacity(ns->head->disk, le64_to_cpup(&id->nsze) << (ns->lba_shift - 9));
> +       }
>
> -       if (ctrl->oncs & NVME_CTRL_ONCS_DSM)
> -               nvme_config_discard(ns);
> +       if (ctrl->oncs & NVME_CTRL_ONCS_DSM) {
> +               nvme_config_discard(ns, ns->queue);
> +               nvme_config_discard(ns, ns->head->disk->queue);
> +       }
>         blk_mq_unfreeze_queue(disk->queue);
>  }
>
> @@ -2404,6 +2505,80 @@ static const struct attribute_group *nvme_dev_attr_groups[] = {
>         NULL,
>  };
>
> +static struct nvme_ns *nvme_find_path(struct nvme_ns_head *head)
> +{
> +       struct nvme_ns *ns;
> +
> +       list_for_each_entry_rcu(ns, &head->list, siblings) {
> +               if (ns->ctrl->state == NVME_CTRL_LIVE) {
> +                       rcu_assign_pointer(head->current_path, ns);
> +                       return ns;
> +               }
> +       }
> +
> +       return NULL;
> +}
> +
> +static blk_qc_t nvme_make_request(struct request_queue *q, struct bio *bio)
> +{
> +       struct nvme_ns_head *head = q->queuedata;
> +       struct device *dev = disk_to_dev(head->disk);
> +       struct nvme_ns *ns;
> +       blk_qc_t ret = BLK_QC_T_NONE;
> +       int srcu_idx;
> +
> +       srcu_idx = srcu_read_lock(&head->srcu);
> +       ns = srcu_dereference(head->current_path, &head->srcu);
> +       if (unlikely(!ns || ns->ctrl->state != NVME_CTRL_LIVE))
> +               ns = nvme_find_path(head);
> +       if (likely(ns)) {
> +               bio->bi_disk = ns->disk;
> +               bio->bi_opf |= REQ_NVME_MPATH;
> +               ret = direct_make_request(bio);
> +       } else if (!list_empty_careful(&head->list)) {
> +               dev_warn_ratelimited(dev, "no path available - requeing I/O\n");
> +
> +               spin_lock_irq(&head->requeue_lock);
> +               bio_list_add(&head->requeue_list, bio);
> +               spin_unlock_irq(&head->requeue_lock);
> +       } else {
> +               dev_warn_ratelimited(dev, "no path - failing I/O\n");
> +
> +               bio->bi_status = BLK_STS_IOERR;
> +               bio_endio(bio);
> +       }
> +
> +       srcu_read_unlock(&head->srcu, srcu_idx);
> +       return ret;
> +}
> +
> +static const struct block_device_operations nvme_subsys_ops = {
> +       .owner          = THIS_MODULE,
> +};
> +
> +static void nvme_requeue_work(struct work_struct *work)
> +{
> +       struct nvme_ns_head *head =
> +               container_of(work, struct nvme_ns_head, requeue_work);
> +       struct bio *bio, *next;
> +
> +       spin_lock_irq(&head->requeue_lock);
> +       next = bio_list_get(&head->requeue_list);
> +       spin_unlock_irq(&head->requeue_lock);
> +
> +       while ((bio = next) != NULL) {
> +               next = bio->bi_next;
> +               bio->bi_next = NULL;
> +
> +               /*
> +                * Reset disk to the mpath node and resubmit to select a new
> +                * path.
> +                */
> +               bio->bi_disk = head->disk;
> +               direct_make_request(bio);
> +       }
> +}
> +
>  static struct nvme_ns_head *__nvme_find_ns_head(struct nvme_subsystem *subsys,
>                 unsigned nsid)
>  {
> @@ -2439,6 +2614,7 @@ static struct nvme_ns_head *nvme_alloc_ns_head(struct nvme_ctrl *ctrl,
>                 unsigned nsid, struct nvme_id_ns *id)
>  {
>         struct nvme_ns_head *head;
> +       struct request_queue *q;
>         int ret = -ENOMEM;
>
>         head = kzalloc(sizeof(*head), GFP_KERNEL);
> @@ -2447,6 +2623,9 @@ static struct nvme_ns_head *nvme_alloc_ns_head(struct nvme_ctrl *ctrl,
>
>         INIT_LIST_HEAD(&head->list);
>         head->ns_id = nsid;
> +       bio_list_init(&head->requeue_list);
> +       spin_lock_init(&head->requeue_lock);
> +       INIT_WORK(&head->requeue_work, nvme_requeue_work);
>         init_srcu_struct(&head->srcu);
>         kref_init(&head->ref);
>
> @@ -2459,8 +2638,37 @@ static struct nvme_ns_head *nvme_alloc_ns_head(struct nvme_ctrl *ctrl,
>                 goto out_free_head;
>         }
>
> +       ret = -ENOMEM;
> +       q = blk_alloc_queue_node(GFP_KERNEL, NUMA_NO_NODE);
> +       if (!q)
> +               goto out_free_head;
> +       q->queuedata = head;
> +       blk_queue_make_request(q, nvme_make_request);
> +       queue_flag_set_unlocked(QUEUE_FLAG_NONROT, q);
> +       /* set to a default value for 512 until disk is validated */
> +       blk_queue_logical_block_size(q, 512);
> +       nvme_set_queue_limits(ctrl, q);
> +
> +       head->instance = ida_simple_get(&nvme_disk_ida, 1, 0, GFP_KERNEL);
> +       if (head->instance < 0)
> +               goto out_cleanup_queue;
> +
> +       head->disk = alloc_disk(0);
> +       if (!head->disk)
> +               goto out_ida_remove;
> +       head->disk->fops = &nvme_subsys_ops;
> +       head->disk->private_data = head;
> +       head->disk->queue = q;
> +       head->disk->flags = GENHD_FL_EXT_DEVT;
> +       sprintf(head->disk->disk_name, "nvme/ns%d", head->instance);
> +
>         list_add_tail(&head->entry, &ctrl->subsys->nsheads);
>         return head;
> +
> +out_ida_remove:
> +       ida_simple_remove(&nvme_disk_ida, head->instance);
> +out_cleanup_queue:
> +       blk_cleanup_queue(q);
>  out_free_head:
>         cleanup_srcu_struct(&head->srcu);
>         kfree(head);
> @@ -2469,7 +2677,7 @@ static struct nvme_ns_head *nvme_alloc_ns_head(struct nvme_ctrl *ctrl,
>  }
>
>  static int nvme_init_ns_head(struct nvme_ns *ns, unsigned nsid,
> -               struct nvme_id_ns *id)
> +               struct nvme_id_ns *id, bool *new)
>  {
>         struct nvme_ctrl *ctrl = ns->ctrl;
>         bool is_shared = id->nmic & (1 << 0);
> @@ -2485,6 +2693,8 @@ static int nvme_init_ns_head(struct nvme_ns *ns, unsigned nsid,
>                         ret = PTR_ERR(head);
>                         goto out_unlock;
>                 }
> +
> +               *new = true;
>         } else {
>                 struct nvme_ns_ids ids;
>
> @@ -2496,6 +2706,8 @@ static int nvme_init_ns_head(struct nvme_ns *ns, unsigned nsid,
>                         ret = -EINVAL;
>                         goto out_unlock;
>                 }
> +
> +               *new = false;
>         }
>
>         list_add_tail(&ns->siblings, &head->list);
> @@ -2565,6 +2777,7 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
>         struct nvme_id_ns *id;
>         char disk_name[DISK_NAME_LEN];
>         int node = dev_to_node(ctrl->dev);
> +       bool new = true;
>
>         ns = kzalloc_node(sizeof(*ns), GFP_KERNEL, node);
>         if (!ns)
> @@ -2597,7 +2810,7 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
>         if (id->ncap == 0)
>                 goto out_free_id;
>
> -       if (nvme_init_ns_head(ns, nsid, id))
> +       if (nvme_init_ns_head(ns, nsid, id, &new))
>                 goto out_free_id;
>
>         if ((ctrl->quirks & NVME_QUIRK_LIGHTNVM) && id->vs[0] == 0x1) {
> @@ -2636,6 +2849,19 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
>         if (ns->ndev && nvme_nvm_register_sysfs(ns))
>                 pr_warn("%s: failed to register lightnvm sysfs group for identification\n",
>                         ns->disk->disk_name);
> +
> +       if (new)
> +               add_disk(ns->head->disk);
> +
> +       if (sysfs_create_link(&disk_to_dev(ns->disk)->kobj,
> +                       &disk_to_dev(ns->head->disk)->kobj, "mpath"))
> +               pr_warn("%s: failed to create sysfs link to mpath device\n",
> +                       ns->disk->disk_name);
> +       if (sysfs_create_link(&disk_to_dev(ns->head->disk)->kobj,
> +                       &disk_to_dev(ns->disk)->kobj, ns->disk->disk_name))
> +               pr_warn("%s: failed to create sysfs link from mpath device\n",
> +                       ns->disk->disk_name);
> +
>         return;
>   out_unlink_ns:
>         mutex_lock(&ctrl->subsys->lock);
> @@ -2663,6 +2889,9 @@ static void nvme_ns_remove(struct nvme_ns *ns)
>                         blk_integrity_unregister(ns->disk);
>                 sysfs_remove_group(&disk_to_dev(ns->disk)->kobj,
>                                         &nvme_ns_attr_group);
> +               sysfs_remove_link(&disk_to_dev(ns->disk)->kobj, "mpath");
> +               sysfs_remove_link(&disk_to_dev(ns->head->disk)->kobj,
> +                               ns->disk->disk_name);
>                 if (ns->ndev)
>                         nvme_nvm_unregister_sysfs(ns);
>                 del_gendisk(ns->disk);
> @@ -2670,8 +2899,10 @@ static void nvme_ns_remove(struct nvme_ns *ns)
>         }
>
>         mutex_lock(&ns->ctrl->subsys->lock);
> -       if (head)
> +       if (head) {
> +               rcu_assign_pointer(head->current_path, NULL);
>                 list_del_rcu(&ns->siblings);
> +       }
>         mutex_unlock(&ns->ctrl->subsys->lock);
>
>         mutex_lock(&ns->ctrl->namespaces_mutex);
> @@ -3222,6 +3453,7 @@ int __init nvme_core_init(void)
>
>  void nvme_core_exit(void)
>  {
> +       ida_destroy(&nvme_disk_ida);
>         class_destroy(nvme_class);
>         __unregister_chrdev(nvme_char_major, 0, NVME_MINORS, "nvme");
>         destroy_workqueue(nvme_wq);
> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
> index a724d2597c4c..2062e62c9769 100644
> --- a/drivers/nvme/host/nvme.h
> +++ b/drivers/nvme/host/nvme.h
> @@ -94,6 +94,11 @@ struct nvme_request {
>         u16                     status;
>  };
>
> +/*
> + * Mark a bio as coming in through the mpath node.
> + */
> +#define REQ_NVME_MPATH         REQ_DRV
> +
>  enum {
>         NVME_REQ_CANCELLED              = (1 << 0),
>  };
> @@ -225,12 +230,18 @@ struct nvme_ns_ids {
>   * only ever has a single entry for private namespaces.
>   */
>  struct nvme_ns_head {
> +       struct nvme_ns __rcu    *current_path;
> +       struct gendisk          *disk;
>         struct list_head        list;
>         struct srcu_struct      srcu;
> +       struct bio_list         requeue_list;
> +       spinlock_t              requeue_lock;
> +       struct work_struct      requeue_work;
>         unsigned                ns_id;
>         struct nvme_ns_ids      ids;
>         struct list_head        entry;
>         struct kref             ref;
> +       int                     instance;
>  };
>
>  struct nvme_ns {
> --
> 2.14.1
>
>
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH 9/9] nvme: implement multipath access to nvme subsystems
@ 2017-09-20  0:18     ` Tony Yang
  0 siblings, 0 replies; 66+ messages in thread
From: Tony Yang @ 2017-09-20  0:18 UTC (permalink / raw)


Hi , Christoph

     I use the above code to recompile the kernel. The following error
occurred. I can't find the blk_steal_bios function. What's the reason
for that? Hope to get your help, thank you


[root at scst1 nvme-nvme-4.13]# make
  CHK     include/config/kernel.release
  CHK     include/generated/uapi/linux/version.h
  CHK     include/generated/utsrelease.h
  CHK     include/generated/bounds.h
  CHK     include/generated/timeconst.h
  CHK     include/generated/asm-offsets.h
  CALL    scripts/checksyscalls.sh
  CHK     scripts/mod/devicetable-offsets.h
  CHK     include/generated/compile.h
  CC [M]  drivers/nvme/host/core.o
drivers/nvme/host/core.c: In function 'nvme_failover_req':
drivers/nvme/host/core.c:115:2: error: implicit declaration of
function 'blk_steal_bios' [-Werror=implicit-function-declaration]
  blk_steal_bios(&ns->head->requeue_list, req);
  ^
In file included from drivers/nvme/host/core.c:32:0:
drivers/nvme/host/core.c: In function 'nvme_req_needs_failover':
drivers/nvme/host/nvme.h:100:25: error: 'REQ_DRV' undeclared (first
use in this function)
 #define REQ_NVME_MPATH  REQ_DRV
                         ^
drivers/nvme/host/core.c:151:25: note: in expansion of macro 'REQ_NVME_MPATH'
  if (!(req->cmd_flags & REQ_NVME_MPATH))
                         ^
drivers/nvme/host/nvme.h:100:25: note: each undeclared identifier is
reported only once for each function it appears in
 #define REQ_NVME_MPATH  REQ_DRV
                         ^
drivers/nvme/host/core.c:151:25: note: in expansion of macro 'REQ_NVME_MPATH'
  if (!(req->cmd_flags & REQ_NVME_MPATH))
                         ^
In file included from ./include/linux/byteorder/little_endian.h:4:0,
                 from ./arch/x86/include/uapi/asm/byteorder.h:4,
                 from ./include/asm-generic/bitops/le.h:5,
                 from ./arch/x86/include/asm/bitops.h:517,
                 from ./include/linux/bitops.h:36,
                 from ./include/linux/kernel.h:10,
                 from ./arch/x86/include/asm/percpu.h:44,
                 from ./arch/x86/include/asm/current.h:5,
                 from ./include/linux/sched.h:11,
                 from ./include/linux/blkdev.h:4,
                 from drivers/nvme/host/core.c:15:
drivers/nvme/host/core.c: In function 'nvme_toggle_streams':
drivers/nvme/host/core.c:433:33: error: 'NVME_NSID_ALL' undeclared
(first use in this function)
  c.directive.nsid = cpu_to_le32(NVME_NSID_ALL);
                                 ^
./include/uapi/linux/byteorder/little_endian.h:32:51: note: in
definition of macro '__cpu_to_le32'
 #define __cpu_to_le32(x) ((__force __le32)(__u32)(x))
                                                   ^
drivers/nvme/host/core.c:433:21: note: in expansion of macro 'cpu_to_le32'
  c.directive.nsid = cpu_to_le32(NVME_NSID_ALL);
                     ^
drivers/nvme/host/core.c: In function 'nvme_configure_directives':
drivers/nvme/host/core.c:483:41: error: 'NVME_NSID_ALL' undeclared
(first use in this function)
  ret = nvme_get_stream_params(ctrl, &s, NVME_NSID_ALL);
                                         ^
drivers/nvme/host/core.c: In function 'nvme_submit_user_cmd':
drivers/nvme/host/core.c:770:6: error: 'struct bio' has no member
named 'bi_disk'
   bio->bi_disk = disk;
      ^
drivers/nvme/host/core.c: In function 'nvme_enable_ctrl':
drivers/nvme/host/core.c:1598:23: error: 'NVME_CC_AMS_RR' undeclared
(first use in this function)
  ctrl->ctrl_config |= NVME_CC_AMS_RR | NVME_CC_SHN_NONE;
                       ^
drivers/nvme/host/core.c: In function 'nvme_configure_timestamp':
drivers/nvme/host/core.c:1665:21: error: 'NVME_CTRL_ONCS_TIMESTAMP'
undeclared (first use in this function)
  if (!(ctrl->oncs & NVME_CTRL_ONCS_TIMESTAMP))
                     ^
drivers/nvme/host/core.c:1669:32: error: 'NVME_FEAT_TIMESTAMP'
undeclared (first use in this function)
  ret = nvme_set_features(ctrl, NVME_FEAT_TIMESTAMP, 0, &ts, sizeof(ts),
                                ^
In file included from ./include/linux/byteorder/little_endian.h:4:0,
                 from ./arch/x86/include/uapi/asm/byteorder.h:4,
                 from ./include/asm-generic/bitops/le.h:5,
                 from ./arch/x86/include/asm/bitops.h:517,
                 from ./include/linux/bitops.h:36,
                 from ./include/linux/kernel.h:10,
                 from ./arch/x86/include/asm/percpu.h:44,
                 from ./arch/x86/include/asm/current.h:5,
                 from ./include/linux/sched.h:11,
                 from ./include/linux/blkdev.h:4,
                 from drivers/nvme/host/core.c:15:
drivers/nvme/host/core.c: In function 'nvme_init_identify':
drivers/nvme/host/core.c:2116:33: error: 'struct nvme_id_ctrl' has no
member named 'hmminds'
   ctrl->hmminds = le32_to_cpu(id->hmminds);
                                 ^
./include/uapi/linux/byteorder/little_endian.h:33:51: note: in
definition of macro '__le32_to_cpu'
 #define __le32_to_cpu(x) ((__force __u32)(__le32)(x))
                                                   ^
drivers/nvme/host/core.c:2116:19: note: in expansion of macro 'le32_to_cpu'
   ctrl->hmminds = le32_to_cpu(id->hmminds);
                   ^
drivers/nvme/host/core.c:2117:32: error: 'struct nvme_id_ctrl' has no
member named 'hmmaxd'
   ctrl->hmmaxd = le16_to_cpu(id->hmmaxd);
                                ^
./include/uapi/linux/byteorder/little_endian.h:35:51: note: in
definition of macro '__le16_to_cpu'
 #define __le16_to_cpu(x) ((__force __u16)(__le16)(x))
                                                   ^
drivers/nvme/host/core.c:2117:18: note: in expansion of macro 'le16_to_cpu'
   ctrl->hmmaxd = le16_to_cpu(id->hmmaxd);
                  ^
drivers/nvme/host/core.c: In function 'nvme_make_request':
drivers/nvme/host/core.c:2535:6: error: 'struct bio' has no member
named 'bi_disk'
   bio->bi_disk = ns->disk;
      ^
In file included from drivers/nvme/host/core.c:32:0:
drivers/nvme/host/nvme.h:100:25: error: 'REQ_DRV' undeclared (first
use in this function)
 #define REQ_NVME_MPATH  REQ_DRV
                         ^
drivers/nvme/host/core.c:2536:18: note: in expansion of macro 'REQ_NVME_MPATH'
   bio->bi_opf |= REQ_NVME_MPATH;
                  ^
drivers/nvme/host/core.c:2537:3: error: implicit declaration of
function 'direct_make_request' [-Werror=implicit-function-declaration]
   ret = direct_make_request(bio);
   ^
drivers/nvme/host/core.c: In function 'nvme_requeue_work':
drivers/nvme/host/core.c:2577:6: error: 'struct bio' has no member
named 'bi_disk'
   bio->bi_disk = head->disk;
      ^
drivers/nvme/host/core.c: In function 'nvme_ctrl_pp_status':
drivers/nvme/host/core.c:3078:58: error: 'NVME_CSTS_PP' undeclared
(first use in this function)
  return ((ctrl->ctrl_config & NVME_CC_ENABLE) && (csts & NVME_CSTS_PP));
                                                          ^
drivers/nvme/host/core.c: In function 'nvme_get_fw_slot_info':
drivers/nvme/host/core.c:3086:23: error: dereferencing pointer to
incomplete type
  log = kmalloc(sizeof(*log), GFP_KERNEL);
                       ^
In file included from ./include/linux/byteorder/little_endian.h:4:0,
                 from ./arch/x86/include/uapi/asm/byteorder.h:4,
                 from ./include/asm-generic/bitops/le.h:5,
                 from ./arch/x86/include/asm/bitops.h:517,
                 from ./include/linux/bitops.h:36,
                 from ./include/linux/kernel.h:10,
                 from ./arch/x86/include/asm/percpu.h:44,
                 from ./arch/x86/include/asm/current.h:5,
                 from ./include/linux/sched.h:11,
                 from ./include/linux/blkdev.h:4,
                 from drivers/nvme/host/core.c:15:
drivers/nvme/host/core.c:3091:30: error: 'NVME_NSID_ALL' undeclared
(first use in this function)
  c.common.nsid = cpu_to_le32(NVME_NSID_ALL);
                              ^
./include/uapi/linux/byteorder/little_endian.h:32:51: note: in
definition of macro '__cpu_to_le32'
 #define __cpu_to_le32(x) ((__force __le32)(__u32)(x))
                                                   ^
drivers/nvme/host/core.c:3091:18: note: in expansion of macro 'cpu_to_le32'
  c.common.nsid = cpu_to_le32(NVME_NSID_ALL);
                  ^
drivers/nvme/host/core.c:3092:65: error: dereferencing pointer to
incomplete type
  c.common.cdw10[0] = nvme_get_log_dw10(NVME_LOG_FW_SLOT, sizeof(*log));
                                                                 ^
drivers/nvme/host/core.c:3094:59: error: dereferencing pointer to
incomplete type
  if (!nvme_submit_sync_cmd(ctrl->admin_q, &c, log, sizeof(*log)))
                                                           ^
drivers/nvme/host/core.c: In function 'nvme_complete_async_event':
drivers/nvme/host/core.c:3159:7: error:
'NVME_AER_NOTICE_FW_ACT_STARTING' undeclared (first use in this
function)
  case NVME_AER_NOTICE_FW_ACT_STARTING:
       ^
drivers/nvme/host/core.c: In function 'nvme_ctrl_pp_status':
drivers/nvme/host/core.c:3079:1: warning: control reaches end of
non-void function [-Wreturn-type]
 }
 ^
cc1: some warnings being treated as errors
make[3]: *** [drivers/nvme/host/core.o] Error 1
make[2]: *** [drivers/nvme/host] Error 2
make[1]: *** [drivers/nvme] Error 2
make: *** [drivers] Error 2
[root at scst1 nvme-nvme-4.13]#

2017-09-19 7:14 GMT+08:00 Christoph Hellwig <hch at lst.de>:
> This patch adds initial multipath support to the nvme driver.  For each
> namespace we create a new block device node, which can be used to access
> that namespace through any of the controllers that refer to it.
>
> Currently we will always send I/O to the first available path, this will
> be changed once the NVMe Asynchronous Namespace Access (ANA) TP is
> ratified and implemented, at which point we will look at the ANA state
> for each namespace.  Another possibility that was prototyped is to
> use the path that is closes to the submitting NUMA code, which will be
> mostly interesting for PCI, but might also be useful for RDMA or FC
> transports in the future.  There is not plan to implement round robin
> or I/O service time path selectors, as those are not scalable with
> the performance rates provided by NVMe.
>
> The multipath device will go away once all paths to it disappear,
> any delay to keep it alive needs to be implemented at the controller
> level.
>
> Signed-off-by: Christoph Hellwig <hch at lst.de>
> ---
>  drivers/nvme/host/core.c | 264 ++++++++++++++++++++++++++++++++++++++++++++---
>  drivers/nvme/host/nvme.h |  11 ++
>  2 files changed, 259 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 3e8405fd57a9..5449c83a9dc3 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -77,6 +77,8 @@ static DEFINE_MUTEX(nvme_subsystems_lock);
>  static LIST_HEAD(nvme_ctrl_list);
>  static DEFINE_SPINLOCK(dev_list_lock);
>
> +static DEFINE_IDA(nvme_disk_ida);
> +
>  static struct class *nvme_class;
>
>  static __le32 nvme_get_log_dw10(u8 lid, size_t size)
> @@ -104,6 +106,19 @@ static int nvme_reset_ctrl_sync(struct nvme_ctrl *ctrl)
>         return ret;
>  }
>
> +static void nvme_failover_req(struct request *req)
> +{
> +       struct nvme_ns *ns = req->q->queuedata;
> +       unsigned long flags;
> +
> +       spin_lock_irqsave(&ns->head->requeue_lock, flags);
> +       blk_steal_bios(&ns->head->requeue_list, req);
> +       spin_unlock_irqrestore(&ns->head->requeue_lock, flags);
> +
> +       nvme_reset_ctrl(ns->ctrl);
> +       kblockd_schedule_work(&ns->head->requeue_work);
> +}
> +
>  static blk_status_t nvme_error_status(struct request *req)
>  {
>         switch (nvme_req(req)->status & 0x7ff) {
> @@ -131,6 +146,53 @@ static blk_status_t nvme_error_status(struct request *req)
>         }
>  }
>
> +static bool nvme_req_needs_failover(struct request *req)
> +{
> +       if (!(req->cmd_flags & REQ_NVME_MPATH))
> +               return false;
> +
> +       switch (nvme_req(req)->status & 0x7ff) {
> +       /*
> +        * Generic command status:
> +        */
> +       case NVME_SC_INVALID_OPCODE:
> +       case NVME_SC_INVALID_FIELD:
> +       case NVME_SC_INVALID_NS:
> +       case NVME_SC_LBA_RANGE:
> +       case NVME_SC_CAP_EXCEEDED:
> +       case NVME_SC_RESERVATION_CONFLICT:
> +               return false;
> +
> +       /*
> +        * I/O command set specific error.  Unfortunately these values are
> +        * reused for fabrics commands, but those should never get here.
> +        */
> +       case NVME_SC_BAD_ATTRIBUTES:
> +       case NVME_SC_INVALID_PI:
> +       case NVME_SC_READ_ONLY:
> +       case NVME_SC_ONCS_NOT_SUPPORTED:
> +               WARN_ON_ONCE(nvme_req(req)->cmd->common.opcode ==
> +                       nvme_fabrics_command);
> +               return false;
> +
> +       /*
> +        * Media and Data Integrity Errors:
> +        */
> +       case NVME_SC_WRITE_FAULT:
> +       case NVME_SC_READ_ERROR:
> +       case NVME_SC_GUARD_CHECK:
> +       case NVME_SC_APPTAG_CHECK:
> +       case NVME_SC_REFTAG_CHECK:
> +       case NVME_SC_COMPARE_FAILED:
> +       case NVME_SC_ACCESS_DENIED:
> +       case NVME_SC_UNWRITTEN_BLOCK:
> +               return false;
> +       }
> +
> +       /* Everything else could be a path failure, so should be retried */
> +       return true;
> +}
> +
>  static inline bool nvme_req_needs_retry(struct request *req)
>  {
>         if (blk_noretry_request(req))
> @@ -145,6 +207,11 @@ static inline bool nvme_req_needs_retry(struct request *req)
>  void nvme_complete_rq(struct request *req)
>  {
>         if (unlikely(nvme_req(req)->status && nvme_req_needs_retry(req))) {
> +               if (nvme_req_needs_failover(req)) {
> +                       nvme_failover_req(req);
> +                       return;
> +               }
> +
>                 nvme_req(req)->retries++;
>                 blk_mq_requeue_request(req, true);
>                 return;
> @@ -173,6 +240,18 @@ void nvme_cancel_request(struct request *req, void *data, bool reserved)
>  }
>  EXPORT_SYMBOL_GPL(nvme_cancel_request);
>
> +static void nvme_kick_requeue_lists(struct nvme_ctrl *ctrl)
> +{
> +       struct nvme_ns *ns;
> +
> +       mutex_lock(&ctrl->namespaces_mutex);
> +       list_for_each_entry(ns, &ctrl->namespaces, list) {
> +               if (ns->head)
> +                       kblockd_schedule_work(&ns->head->requeue_work);
> +       }
> +       mutex_unlock(&ctrl->namespaces_mutex);
> +}
> +
>  bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
>                 enum nvme_ctrl_state new_state)
>  {
> @@ -240,9 +319,10 @@ bool nvme_change_ctrl_state(struct nvme_ctrl *ctrl,
>
>         if (changed)
>                 ctrl->state = new_state;
> -
>         spin_unlock_irqrestore(&ctrl->lock, flags);
>
> +       if (changed && ctrl->state == NVME_CTRL_LIVE)
> +               nvme_kick_requeue_lists(ctrl);
>         return changed;
>  }
>  EXPORT_SYMBOL_GPL(nvme_change_ctrl_state);
> @@ -252,6 +332,15 @@ static void nvme_destroy_ns_head(struct kref *ref)
>         struct nvme_ns_head *head =
>                 container_of(ref, struct nvme_ns_head, ref);
>
> +       del_gendisk(head->disk);
> +       blk_set_queue_dying(head->disk->queue);
> +       /* make sure all pending bios are cleaned up */
> +       kblockd_schedule_work(&head->requeue_work);
> +       flush_work(&head->requeue_work);
> +       blk_cleanup_queue(head->disk->queue);
> +       put_disk(head->disk);
> +       ida_simple_remove(&nvme_disk_ida, head->instance);
> +
>         list_del_init(&head->entry);
>         cleanup_srcu_struct(&head->srcu);
>         kfree(head);
> @@ -1123,8 +1212,10 @@ static void nvme_prep_integrity(struct gendisk *disk, struct nvme_id_ns *id,
>         if (blk_get_integrity(disk) &&
>             (ns->pi_type != pi_type || ns->ms != old_ms ||
>              bs != queue_logical_block_size(disk->queue) ||
> -            (ns->ms && ns->ext)))
> +            (ns->ms && ns->ext))) {
>                 blk_integrity_unregister(disk);
> +               blk_integrity_unregister(ns->head->disk);
> +       }
>
>         ns->pi_type = pi_type;
>  }
> @@ -1152,7 +1243,9 @@ static void nvme_init_integrity(struct nvme_ns *ns)
>         }
>         integrity.tuple_size = ns->ms;
>         blk_integrity_register(ns->disk, &integrity);
> +       blk_integrity_register(ns->head->disk, &integrity);
>         blk_queue_max_integrity_segments(ns->queue, 1);
> +       blk_queue_max_integrity_segments(ns->head->disk->queue, 1);
>  }
>  #else
>  static void nvme_prep_integrity(struct gendisk *disk, struct nvme_id_ns *id,
> @@ -1170,7 +1263,7 @@ static void nvme_set_chunk_size(struct nvme_ns *ns)
>         blk_queue_chunk_sectors(ns->queue, rounddown_pow_of_two(chunk_size));
>  }
>
> -static void nvme_config_discard(struct nvme_ns *ns)
> +static void nvme_config_discard(struct nvme_ns *ns, struct request_queue *queue)
>  {
>         struct nvme_ctrl *ctrl = ns->ctrl;
>         u32 logical_block_size = queue_logical_block_size(ns->queue);
> @@ -1181,18 +1274,18 @@ static void nvme_config_discard(struct nvme_ns *ns)
>         if (ctrl->nr_streams && ns->sws && ns->sgs) {
>                 unsigned int sz = logical_block_size * ns->sws * ns->sgs;
>
> -               ns->queue->limits.discard_alignment = sz;
> -               ns->queue->limits.discard_granularity = sz;
> +               queue->limits.discard_alignment = sz;
> +               queue->limits.discard_granularity = sz;
>         } else {
>                 ns->queue->limits.discard_alignment = logical_block_size;
>                 ns->queue->limits.discard_granularity = logical_block_size;
>         }
> -       blk_queue_max_discard_sectors(ns->queue, UINT_MAX);
> -       blk_queue_max_discard_segments(ns->queue, NVME_DSM_MAX_RANGES);
> -       queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, ns->queue);
> +       blk_queue_max_discard_sectors(queue, UINT_MAX);
> +       blk_queue_max_discard_segments(queue, NVME_DSM_MAX_RANGES);
> +       queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, queue);
>
>         if (ctrl->quirks & NVME_QUIRK_DEALLOCATE_ZEROES)
> -               blk_queue_max_write_zeroes_sectors(ns->queue, UINT_MAX);
> +               blk_queue_max_write_zeroes_sectors(queue, UINT_MAX);
>  }
>
>  static void nvme_report_ns_ids(struct nvme_ctrl *ctrl, unsigned int nsid,
> @@ -1249,17 +1342,25 @@ static void __nvme_revalidate_disk(struct gendisk *disk, struct nvme_id_ns *id)
>         if (ctrl->ops->flags & NVME_F_METADATA_SUPPORTED)
>                 nvme_prep_integrity(disk, id, bs);
>         blk_queue_logical_block_size(ns->queue, bs);
> +       blk_queue_logical_block_size(ns->head->disk->queue, bs);
>         if (ns->noiob)
>                 nvme_set_chunk_size(ns);
>         if (ns->ms && !blk_get_integrity(disk) && !ns->ext)
>                 nvme_init_integrity(ns);
> -       if (ns->ms && !(ns->ms == 8 && ns->pi_type) && !blk_get_integrity(disk))
> +       if (ns->ms && !(ns->ms == 8 && ns->pi_type) && !blk_get_integrity(disk)) {
>                 set_capacity(disk, 0);
> -       else
> +               if (ns->head)
> +                       set_capacity(ns->head->disk, 0);
> +       } else {
>                 set_capacity(disk, le64_to_cpup(&id->nsze) << (ns->lba_shift - 9));
> +               if (ns->head)
> +                       set_capacity(ns->head->disk, le64_to_cpup(&id->nsze) << (ns->lba_shift - 9));
> +       }
>
> -       if (ctrl->oncs & NVME_CTRL_ONCS_DSM)
> -               nvme_config_discard(ns);
> +       if (ctrl->oncs & NVME_CTRL_ONCS_DSM) {
> +               nvme_config_discard(ns, ns->queue);
> +               nvme_config_discard(ns, ns->head->disk->queue);
> +       }
>         blk_mq_unfreeze_queue(disk->queue);
>  }
>
> @@ -2404,6 +2505,80 @@ static const struct attribute_group *nvme_dev_attr_groups[] = {
>         NULL,
>  };
>
> +static struct nvme_ns *nvme_find_path(struct nvme_ns_head *head)
> +{
> +       struct nvme_ns *ns;
> +
> +       list_for_each_entry_rcu(ns, &head->list, siblings) {
> +               if (ns->ctrl->state == NVME_CTRL_LIVE) {
> +                       rcu_assign_pointer(head->current_path, ns);
> +                       return ns;
> +               }
> +       }
> +
> +       return NULL;
> +}
> +
> +static blk_qc_t nvme_make_request(struct request_queue *q, struct bio *bio)
> +{
> +       struct nvme_ns_head *head = q->queuedata;
> +       struct device *dev = disk_to_dev(head->disk);
> +       struct nvme_ns *ns;
> +       blk_qc_t ret = BLK_QC_T_NONE;
> +       int srcu_idx;
> +
> +       srcu_idx = srcu_read_lock(&head->srcu);
> +       ns = srcu_dereference(head->current_path, &head->srcu);
> +       if (unlikely(!ns || ns->ctrl->state != NVME_CTRL_LIVE))
> +               ns = nvme_find_path(head);
> +       if (likely(ns)) {
> +               bio->bi_disk = ns->disk;
> +               bio->bi_opf |= REQ_NVME_MPATH;
> +               ret = direct_make_request(bio);
> +       } else if (!list_empty_careful(&head->list)) {
> +               dev_warn_ratelimited(dev, "no path available - requeing I/O\n");
> +
> +               spin_lock_irq(&head->requeue_lock);
> +               bio_list_add(&head->requeue_list, bio);
> +               spin_unlock_irq(&head->requeue_lock);
> +       } else {
> +               dev_warn_ratelimited(dev, "no path - failing I/O\n");
> +
> +               bio->bi_status = BLK_STS_IOERR;
> +               bio_endio(bio);
> +       }
> +
> +       srcu_read_unlock(&head->srcu, srcu_idx);
> +       return ret;
> +}
> +
> +static const struct block_device_operations nvme_subsys_ops = {
> +       .owner          = THIS_MODULE,
> +};
> +
> +static void nvme_requeue_work(struct work_struct *work)
> +{
> +       struct nvme_ns_head *head =
> +               container_of(work, struct nvme_ns_head, requeue_work);
> +       struct bio *bio, *next;
> +
> +       spin_lock_irq(&head->requeue_lock);
> +       next = bio_list_get(&head->requeue_list);
> +       spin_unlock_irq(&head->requeue_lock);
> +
> +       while ((bio = next) != NULL) {
> +               next = bio->bi_next;
> +               bio->bi_next = NULL;
> +
> +               /*
> +                * Reset disk to the mpath node and resubmit to select a new
> +                * path.
> +                */
> +               bio->bi_disk = head->disk;
> +               direct_make_request(bio);
> +       }
> +}
> +
>  static struct nvme_ns_head *__nvme_find_ns_head(struct nvme_subsystem *subsys,
>                 unsigned nsid)
>  {
> @@ -2439,6 +2614,7 @@ static struct nvme_ns_head *nvme_alloc_ns_head(struct nvme_ctrl *ctrl,
>                 unsigned nsid, struct nvme_id_ns *id)
>  {
>         struct nvme_ns_head *head;
> +       struct request_queue *q;
>         int ret = -ENOMEM;
>
>         head = kzalloc(sizeof(*head), GFP_KERNEL);
> @@ -2447,6 +2623,9 @@ static struct nvme_ns_head *nvme_alloc_ns_head(struct nvme_ctrl *ctrl,
>
>         INIT_LIST_HEAD(&head->list);
>         head->ns_id = nsid;
> +       bio_list_init(&head->requeue_list);
> +       spin_lock_init(&head->requeue_lock);
> +       INIT_WORK(&head->requeue_work, nvme_requeue_work);
>         init_srcu_struct(&head->srcu);
>         kref_init(&head->ref);
>
> @@ -2459,8 +2638,37 @@ static struct nvme_ns_head *nvme_alloc_ns_head(struct nvme_ctrl *ctrl,
>                 goto out_free_head;
>         }
>
> +       ret = -ENOMEM;
> +       q = blk_alloc_queue_node(GFP_KERNEL, NUMA_NO_NODE);
> +       if (!q)
> +               goto out_free_head;
> +       q->queuedata = head;
> +       blk_queue_make_request(q, nvme_make_request);
> +       queue_flag_set_unlocked(QUEUE_FLAG_NONROT, q);
> +       /* set to a default value for 512 until disk is validated */
> +       blk_queue_logical_block_size(q, 512);
> +       nvme_set_queue_limits(ctrl, q);
> +
> +       head->instance = ida_simple_get(&nvme_disk_ida, 1, 0, GFP_KERNEL);
> +       if (head->instance < 0)
> +               goto out_cleanup_queue;
> +
> +       head->disk = alloc_disk(0);
> +       if (!head->disk)
> +               goto out_ida_remove;
> +       head->disk->fops = &nvme_subsys_ops;
> +       head->disk->private_data = head;
> +       head->disk->queue = q;
> +       head->disk->flags = GENHD_FL_EXT_DEVT;
> +       sprintf(head->disk->disk_name, "nvme/ns%d", head->instance);
> +
>         list_add_tail(&head->entry, &ctrl->subsys->nsheads);
>         return head;
> +
> +out_ida_remove:
> +       ida_simple_remove(&nvme_disk_ida, head->instance);
> +out_cleanup_queue:
> +       blk_cleanup_queue(q);
>  out_free_head:
>         cleanup_srcu_struct(&head->srcu);
>         kfree(head);
> @@ -2469,7 +2677,7 @@ static struct nvme_ns_head *nvme_alloc_ns_head(struct nvme_ctrl *ctrl,
>  }
>
>  static int nvme_init_ns_head(struct nvme_ns *ns, unsigned nsid,
> -               struct nvme_id_ns *id)
> +               struct nvme_id_ns *id, bool *new)
>  {
>         struct nvme_ctrl *ctrl = ns->ctrl;
>         bool is_shared = id->nmic & (1 << 0);
> @@ -2485,6 +2693,8 @@ static int nvme_init_ns_head(struct nvme_ns *ns, unsigned nsid,
>                         ret = PTR_ERR(head);
>                         goto out_unlock;
>                 }
> +
> +               *new = true;
>         } else {
>                 struct nvme_ns_ids ids;
>
> @@ -2496,6 +2706,8 @@ static int nvme_init_ns_head(struct nvme_ns *ns, unsigned nsid,
>                         ret = -EINVAL;
>                         goto out_unlock;
>                 }
> +
> +               *new = false;
>         }
>
>         list_add_tail(&ns->siblings, &head->list);
> @@ -2565,6 +2777,7 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
>         struct nvme_id_ns *id;
>         char disk_name[DISK_NAME_LEN];
>         int node = dev_to_node(ctrl->dev);
> +       bool new = true;
>
>         ns = kzalloc_node(sizeof(*ns), GFP_KERNEL, node);
>         if (!ns)
> @@ -2597,7 +2810,7 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
>         if (id->ncap == 0)
>                 goto out_free_id;
>
> -       if (nvme_init_ns_head(ns, nsid, id))
> +       if (nvme_init_ns_head(ns, nsid, id, &new))
>                 goto out_free_id;
>
>         if ((ctrl->quirks & NVME_QUIRK_LIGHTNVM) && id->vs[0] == 0x1) {
> @@ -2636,6 +2849,19 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid)
>         if (ns->ndev && nvme_nvm_register_sysfs(ns))
>                 pr_warn("%s: failed to register lightnvm sysfs group for identification\n",
>                         ns->disk->disk_name);
> +
> +       if (new)
> +               add_disk(ns->head->disk);
> +
> +       if (sysfs_create_link(&disk_to_dev(ns->disk)->kobj,
> +                       &disk_to_dev(ns->head->disk)->kobj, "mpath"))
> +               pr_warn("%s: failed to create sysfs link to mpath device\n",
> +                       ns->disk->disk_name);
> +       if (sysfs_create_link(&disk_to_dev(ns->head->disk)->kobj,
> +                       &disk_to_dev(ns->disk)->kobj, ns->disk->disk_name))
> +               pr_warn("%s: failed to create sysfs link from mpath device\n",
> +                       ns->disk->disk_name);
> +
>         return;
>   out_unlink_ns:
>         mutex_lock(&ctrl->subsys->lock);
> @@ -2663,6 +2889,9 @@ static void nvme_ns_remove(struct nvme_ns *ns)
>                         blk_integrity_unregister(ns->disk);
>                 sysfs_remove_group(&disk_to_dev(ns->disk)->kobj,
>                                         &nvme_ns_attr_group);
> +               sysfs_remove_link(&disk_to_dev(ns->disk)->kobj, "mpath");
> +               sysfs_remove_link(&disk_to_dev(ns->head->disk)->kobj,
> +                               ns->disk->disk_name);
>                 if (ns->ndev)
>                         nvme_nvm_unregister_sysfs(ns);
>                 del_gendisk(ns->disk);
> @@ -2670,8 +2899,10 @@ static void nvme_ns_remove(struct nvme_ns *ns)
>         }
>
>         mutex_lock(&ns->ctrl->subsys->lock);
> -       if (head)
> +       if (head) {
> +               rcu_assign_pointer(head->current_path, NULL);
>                 list_del_rcu(&ns->siblings);
> +       }
>         mutex_unlock(&ns->ctrl->subsys->lock);
>
>         mutex_lock(&ns->ctrl->namespaces_mutex);
> @@ -3222,6 +3453,7 @@ int __init nvme_core_init(void)
>
>  void nvme_core_exit(void)
>  {
> +       ida_destroy(&nvme_disk_ida);
>         class_destroy(nvme_class);
>         __unregister_chrdev(nvme_char_major, 0, NVME_MINORS, "nvme");
>         destroy_workqueue(nvme_wq);
> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
> index a724d2597c4c..2062e62c9769 100644
> --- a/drivers/nvme/host/nvme.h
> +++ b/drivers/nvme/host/nvme.h
> @@ -94,6 +94,11 @@ struct nvme_request {
>         u16                     status;
>  };
>
> +/*
> + * Mark a bio as coming in through the mpath node.
> + */
> +#define REQ_NVME_MPATH         REQ_DRV
> +
>  enum {
>         NVME_REQ_CANCELLED              = (1 << 0),
>  };
> @@ -225,12 +230,18 @@ struct nvme_ns_ids {
>   * only ever has a single entry for private namespaces.
>   */
>  struct nvme_ns_head {
> +       struct nvme_ns __rcu    *current_path;
> +       struct gendisk          *disk;
>         struct list_head        list;
>         struct srcu_struct      srcu;
> +       struct bio_list         requeue_list;
> +       spinlock_t              requeue_lock;
> +       struct work_struct      requeue_work;
>         unsigned                ns_id;
>         struct nvme_ns_ids      ids;
>         struct list_head        entry;
>         struct kref             ref;
> +       int                     instance;
>  };
>
>  struct nvme_ns {
> --
> 2.14.1
>
>
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 9/9] nvme: implement multipath access to nvme subsystems
  2017-09-20  0:18     ` Tony Yang
@ 2017-09-20  8:15       ` Johannes Thumshirn
  -1 siblings, 0 replies; 66+ messages in thread
From: Johannes Thumshirn @ 2017-09-20  8:15 UTC (permalink / raw)
  To: Tony Yang
  Cc: Christoph Hellwig, Jens Axboe, Keith Busch, linux-block,
	Sagi Grimberg, linux-nvme

On Wed, Sep 20, 2017 at 08:18:41AM +0800, Tony Yang wrote:
> Hi , Christoph
> 
>      I use the above code to recompile the kernel. The following error
> occurred. I can't find the blk_steal_bios function. What's the reason
> for that? Hope to get your help, thank you

Hi Ton,

have you puled it from Christoph's git? I just did a git pull and it builds
fine.

Byte,
	Johannes
-- 
Johannes Thumshirn                                          Storage
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N�rnberg
GF: Felix Imend�rffer, Jane Smithard, Graham Norton
HRB 21284 (AG N�rnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH 9/9] nvme: implement multipath access to nvme subsystems
@ 2017-09-20  8:15       ` Johannes Thumshirn
  0 siblings, 0 replies; 66+ messages in thread
From: Johannes Thumshirn @ 2017-09-20  8:15 UTC (permalink / raw)


On Wed, Sep 20, 2017@08:18:41AM +0800, Tony Yang wrote:
> Hi , Christoph
> 
>      I use the above code to recompile the kernel. The following error
> occurred. I can't find the blk_steal_bios function. What's the reason
> for that? Hope to get your help, thank you

Hi Ton,

have you puled it from Christoph's git? I just did a git pull and it builds
fine.

Byte,
	Johannes
-- 
Johannes Thumshirn                                          Storage
jthumshirn at suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: Felix Imend?rffer, Jane Smithard, Graham Norton
HRB 21284 (AG N?rnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 7/9] nvme: introduce a nvme_ns_ids structure
  2017-09-18 23:14   ` Christoph Hellwig
@ 2017-09-20  8:27     ` Johannes Thumshirn
  -1 siblings, 0 replies; 66+ messages in thread
From: Johannes Thumshirn @ 2017-09-20  8:27 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Keith Busch, Sagi Grimberg, linux-nvme, linux-block

Looks good,
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
-- 
Johannes Thumshirn                                          Storage
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N�rnberg
GF: Felix Imend�rffer, Jane Smithard, Graham Norton
HRB 21284 (AG N�rnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH 7/9] nvme: introduce a nvme_ns_ids structure
@ 2017-09-20  8:27     ` Johannes Thumshirn
  0 siblings, 0 replies; 66+ messages in thread
From: Johannes Thumshirn @ 2017-09-20  8:27 UTC (permalink / raw)


Looks good,
Reviewed-by: Johannes Thumshirn <jthumshirn at suse.de>
-- 
Johannes Thumshirn                                          Storage
jthumshirn at suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: Felix Imend?rffer, Jane Smithard, Graham Norton
HRB 21284 (AG N?rnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 8/9] nvme: track shared namespaces
  2017-09-18 23:14   ` Christoph Hellwig
@ 2017-09-20  8:36     ` Johannes Thumshirn
  -1 siblings, 0 replies; 66+ messages in thread
From: Johannes Thumshirn @ 2017-09-20  8:36 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Keith Busch, Sagi Grimberg, linux-nvme, linux-block

On Mon, Sep 18, 2017 at 04:14:52PM -0700, Christoph Hellwig wrote:
> Introduce a new struct nvme_ns_head [1] that holds information about
> an actual namespace, unlike struct nvme_ns, which only holds the
> per-controller namespace information.  For private namespaces there
> is a 1:1 relation of the two, but for shared namespaces this lets us
> discover all the paths to it.  For now only the identifiers are moved
> to the new structure, but most of the information in struct nvme_ns
> should eventually move over.
> 
> To allow lockless path lookup the list of nvme_ns structures per
> nvme_ns_head is protected by SRCU, which requires freeing the nvme_ns
> structure through call_srcu.
> 
> [1] comments welcome if you have a better name for it, the current one is
>     horrible.  One idea would be to rename the current struct nvme_ns
>     to struct nvme_ns_link or similar and use the nvme_ns name for the
>     new structure.  But that would involve a lot of churn.

Being one of the persons who has to backport a lot of NVMe code to older
kernels I'm not a huge fan of renaming nmve_ns.

That said, I don't have a better name for nvme_ns_head (yet) but I'll try to
think of one as well. OTOH looking at nvme_ns_head it actaully is the list
head of the nvme_ns list.

Byte,
	Johannes
-- 
Johannes Thumshirn                                          Storage
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N�rnberg
GF: Felix Imend�rffer, Jane Smithard, Graham Norton
HRB 21284 (AG N�rnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH 8/9] nvme: track shared namespaces
@ 2017-09-20  8:36     ` Johannes Thumshirn
  0 siblings, 0 replies; 66+ messages in thread
From: Johannes Thumshirn @ 2017-09-20  8:36 UTC (permalink / raw)


On Mon, Sep 18, 2017@04:14:52PM -0700, Christoph Hellwig wrote:
> Introduce a new struct nvme_ns_head [1] that holds information about
> an actual namespace, unlike struct nvme_ns, which only holds the
> per-controller namespace information.  For private namespaces there
> is a 1:1 relation of the two, but for shared namespaces this lets us
> discover all the paths to it.  For now only the identifiers are moved
> to the new structure, but most of the information in struct nvme_ns
> should eventually move over.
> 
> To allow lockless path lookup the list of nvme_ns structures per
> nvme_ns_head is protected by SRCU, which requires freeing the nvme_ns
> structure through call_srcu.
> 
> [1] comments welcome if you have a better name for it, the current one is
>     horrible.  One idea would be to rename the current struct nvme_ns
>     to struct nvme_ns_link or similar and use the nvme_ns name for the
>     new structure.  But that would involve a lot of churn.

Being one of the persons who has to backport a lot of NVMe code to older
kernels I'm not a huge fan of renaming nmve_ns.

That said, I don't have a better name for nvme_ns_head (yet) but I'll try to
think of one as well. OTOH looking at nvme_ns_head it actaully is the list
head of the nvme_ns list.

Byte,
	Johannes
-- 
Johannes Thumshirn                                          Storage
jthumshirn at suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: Felix Imend?rffer, Jane Smithard, Graham Norton
HRB 21284 (AG N?rnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 9/9] nvme: implement multipath access to nvme subsystems
  2017-09-18 23:14   ` Christoph Hellwig
@ 2017-09-20  9:42     ` Johannes Thumshirn
  -1 siblings, 0 replies; 66+ messages in thread
From: Johannes Thumshirn @ 2017-09-20  9:42 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Keith Busch, Sagi Grimberg, linux-nvme, linux-block

On Mon, Sep 18, 2017 at 04:14:53PM -0700, Christoph Hellwig wrote:
> This patch adds initial multipath support to the nvme driver.  For each
> namespace we create a new block device node, which can be used to access
> that namespace through any of the controllers that refer to it.
> 
> Currently we will always send I/O to the first available path, this will
> be changed once the NVMe Asynchronous Namespace Access (ANA) TP is
> ratified and implemented, at which point we will look at the ANA state
> for each namespace.  Another possibility that was prototyped is to
> use the path that is closes to the submitting NUMA code, which will be
> mostly interesting for PCI, but might also be useful for RDMA or FC
> transports in the future.  There is not plan to implement round robin
                                    no ^
> or I/O service time path selectors, as those are not scalable with
> the performance rates provided by NVMe.
> 
> The multipath device will go away once all paths to it disappear,
> any delay to keep it alive needs to be implemented at the controller
> level.

Sorry for the typo only mail, I'll be going around a real review soon but I
wanted to test the patchset in my nvme-rdma setup and just hit a panic in
ib_core with your branched merged into -rc1.

Byte,
	Johannes

-- 
Johannes Thumshirn                                          Storage
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N�rnberg
GF: Felix Imend�rffer, Jane Smithard, Graham Norton
HRB 21284 (AG N�rnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH 9/9] nvme: implement multipath access to nvme subsystems
@ 2017-09-20  9:42     ` Johannes Thumshirn
  0 siblings, 0 replies; 66+ messages in thread
From: Johannes Thumshirn @ 2017-09-20  9:42 UTC (permalink / raw)


On Mon, Sep 18, 2017@04:14:53PM -0700, Christoph Hellwig wrote:
> This patch adds initial multipath support to the nvme driver.  For each
> namespace we create a new block device node, which can be used to access
> that namespace through any of the controllers that refer to it.
> 
> Currently we will always send I/O to the first available path, this will
> be changed once the NVMe Asynchronous Namespace Access (ANA) TP is
> ratified and implemented, at which point we will look at the ANA state
> for each namespace.  Another possibility that was prototyped is to
> use the path that is closes to the submitting NUMA code, which will be
> mostly interesting for PCI, but might also be useful for RDMA or FC
> transports in the future.  There is not plan to implement round robin
                                    no ^
> or I/O service time path selectors, as those are not scalable with
> the performance rates provided by NVMe.
> 
> The multipath device will go away once all paths to it disappear,
> any delay to keep it alive needs to be implemented at the controller
> level.

Sorry for the typo only mail, I'll be going around a real review soon but I
wanted to test the patchset in my nvme-rdma setup and just hit a panic in
ib_core with your branched merged into -rc1.

Byte,
	Johannes

-- 
Johannes Thumshirn                                          Storage
jthumshirn at suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: Felix Imend?rffer, Jane Smithard, Graham Norton
HRB 21284 (AG N?rnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: nvme multipath support V2
  2017-09-18 23:14 ` Christoph Hellwig
@ 2017-09-20 11:09   ` Johannes Thumshirn
  -1 siblings, 0 replies; 66+ messages in thread
From: Johannes Thumshirn @ 2017-09-20 11:09 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Keith Busch, Sagi Grimberg, linux-nvme, linux-block

Hi Christoph,

I wanted to test your patches, but I fail to see how I have to set it up.

I do have a host with two RDMA HCAs connected to the target (Linux), for "normal
dm-mpath" test I do nvme connect with the host traddr argument for both of the
HCAs and get two nvme block devices which I can add to multipath.

Using your patchset and doing the sanme double connect trick I get the same
two block devices of cause.

How do I connect using both paths?

Thanks,
	Johannes
-- 
Johannes Thumshirn                                          Storage
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N�rnberg
GF: Felix Imend�rffer, Jane Smithard, Graham Norton
HRB 21284 (AG N�rnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 66+ messages in thread

* nvme multipath support V2
@ 2017-09-20 11:09   ` Johannes Thumshirn
  0 siblings, 0 replies; 66+ messages in thread
From: Johannes Thumshirn @ 2017-09-20 11:09 UTC (permalink / raw)


Hi Christoph,

I wanted to test your patches, but I fail to see how I have to set it up.

I do have a host with two RDMA HCAs connected to the target (Linux), for "normal
dm-mpath" test I do nvme connect with the host traddr argument for both of the
HCAs and get two nvme block devices which I can add to multipath.

Using your patchset and doing the sanme double connect trick I get the same
two block devices of cause.

How do I connect using both paths?

Thanks,
	Johannes
-- 
Johannes Thumshirn                                          Storage
jthumshirn at suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: Felix Imend?rffer, Jane Smithard, Graham Norton
HRB 21284 (AG N?rnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 8/9] nvme: track shared namespaces
  2017-09-20  8:36     ` Johannes Thumshirn
@ 2017-09-20 14:54       ` Christoph Hellwig
  -1 siblings, 0 replies; 66+ messages in thread
From: Christoph Hellwig @ 2017-09-20 14:54 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: Christoph Hellwig, Jens Axboe, Keith Busch, Sagi Grimberg,
	linux-nvme, linux-block

On Wed, Sep 20, 2017 at 10:36:43AM +0200, Johannes Thumshirn wrote:
> Being one of the persons who has to backport a lot of NVMe code to older
> kernels I'm not a huge fan of renaming nmve_ns.

The churn is main main worry.  Well and that I don't have a reall good
name for what currently is nvme_ns either :)

> That said, I don't have a better name for nvme_ns_head (yet) but I'll try to
> think of one as well. OTOH looking at nvme_ns_head it actaully is the list
> head of the nvme_ns list.

But head also has connotations in the SAN world.  Maybe nvme_ns_chain?

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH 8/9] nvme: track shared namespaces
@ 2017-09-20 14:54       ` Christoph Hellwig
  0 siblings, 0 replies; 66+ messages in thread
From: Christoph Hellwig @ 2017-09-20 14:54 UTC (permalink / raw)


On Wed, Sep 20, 2017@10:36:43AM +0200, Johannes Thumshirn wrote:
> Being one of the persons who has to backport a lot of NVMe code to older
> kernels I'm not a huge fan of renaming nmve_ns.

The churn is main main worry.  Well and that I don't have a reall good
name for what currently is nvme_ns either :)

> That said, I don't have a better name for nvme_ns_head (yet) but I'll try to
> think of one as well. OTOH looking at nvme_ns_head it actaully is the list
> head of the nvme_ns list.

But head also has connotations in the SAN world.  Maybe nvme_ns_chain?

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: nvme multipath support V2
  2017-09-20 11:09   ` Johannes Thumshirn
@ 2017-09-20 14:55     ` Christoph Hellwig
  -1 siblings, 0 replies; 66+ messages in thread
From: Christoph Hellwig @ 2017-09-20 14:55 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: Christoph Hellwig, Jens Axboe, Keith Busch, Sagi Grimberg,
	linux-nvme, linux-block

On Wed, Sep 20, 2017 at 01:09:59PM +0200, Johannes Thumshirn wrote:
> Using your patchset and doing the sanme double connect trick I get the same
> two block devices of cause.
> 
> How do I connect using both paths?

Same as above.  But now in addition to the /dev/nvmeXnY devices you
should also see a /dev/nvme/nsZ device.  The sysfs entry for it shows
which paths it belongs to.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* nvme multipath support V2
@ 2017-09-20 14:55     ` Christoph Hellwig
  0 siblings, 0 replies; 66+ messages in thread
From: Christoph Hellwig @ 2017-09-20 14:55 UTC (permalink / raw)


On Wed, Sep 20, 2017@01:09:59PM +0200, Johannes Thumshirn wrote:
> Using your patchset and doing the sanme double connect trick I get the same
> two block devices of cause.
> 
> How do I connect using both paths?

Same as above.  But now in addition to the /dev/nvmeXnY devices you
should also see a /dev/nvme/nsZ device.  The sysfs entry for it shows
which paths it belongs to.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 9/9] nvme: implement multipath access to nvme subsystems
  2017-09-18 23:14   ` Christoph Hellwig
@ 2017-09-20 22:58     ` Keith Busch
  -1 siblings, 0 replies; 66+ messages in thread
From: Keith Busch @ 2017-09-20 22:58 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jens Axboe, Sagi Grimberg, linux-nvme, linux-block

On Mon, Sep 18, 2017 at 04:14:53PM -0700, Christoph Hellwig wrote:

This is awesome! Looks great, just a minor comment:

> +	sprintf(head->disk->disk_name, "nvme/ns%d", head->instance);

Naming it 'nvme/ns<#>', kobject_set_name_vargs is going to change that
'/' into a '!', so the sysfs entry is named 'nvme!ns<#>'. Not a big
deal I suppose, but I just thought it looked odd since '!' has special
meaning in shells.

Otherwise, this is looking really solid, and test well on my single
ported NVMe. I had some trouble getting dual ported ones, but I've some
now and will run tests on those tomorrow with some failure injection.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH 9/9] nvme: implement multipath access to nvme subsystems
@ 2017-09-20 22:58     ` Keith Busch
  0 siblings, 0 replies; 66+ messages in thread
From: Keith Busch @ 2017-09-20 22:58 UTC (permalink / raw)


On Mon, Sep 18, 2017@04:14:53PM -0700, Christoph Hellwig wrote:

This is awesome! Looks great, just a minor comment:

> +	sprintf(head->disk->disk_name, "nvme/ns%d", head->instance);

Naming it 'nvme/ns<#>', kobject_set_name_vargs is going to change that
'/' into a '!', so the sysfs entry is named 'nvme!ns<#>'. Not a big
deal I suppose, but I just thought it looked odd since '!' has special
meaning in shells.

Otherwise, this is looking really solid, and test well on my single
ported NVMe. I had some trouble getting dual ported ones, but I've some
now and will run tests on those tomorrow with some failure injection.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 9/9] nvme: implement multipath access to nvme subsystems
  2017-09-20 22:58     ` Keith Busch
@ 2017-09-20 23:52       ` Christoph Hellwig
  -1 siblings, 0 replies; 66+ messages in thread
From: Christoph Hellwig @ 2017-09-20 23:52 UTC (permalink / raw)
  To: Keith Busch
  Cc: Christoph Hellwig, Jens Axboe, Sagi Grimberg, linux-nvme, linux-block

On Wed, Sep 20, 2017 at 06:58:22PM -0400, Keith Busch wrote:
> > +	sprintf(head->disk->disk_name, "nvme/ns%d", head->instance);
> 
> Naming it 'nvme/ns<#>', kobject_set_name_vargs is going to change that
> '/' into a '!', so the sysfs entry is named 'nvme!ns<#>'. Not a big
> deal I suppose, but I just thought it looked odd since '!' has special
> meaning in shells.

I noticed the odd renaming in sysfs and though about gettind rid
of the /dev/nvme/ directory.  I just need to come up with a good
name for the device nodes - the name can't contain /dev/nvme* as
nvme-cli would break if it sees files that start with nvme in /dev/.

/dev/nvmN ?  

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH 9/9] nvme: implement multipath access to nvme subsystems
@ 2017-09-20 23:52       ` Christoph Hellwig
  0 siblings, 0 replies; 66+ messages in thread
From: Christoph Hellwig @ 2017-09-20 23:52 UTC (permalink / raw)


On Wed, Sep 20, 2017@06:58:22PM -0400, Keith Busch wrote:
> > +	sprintf(head->disk->disk_name, "nvme/ns%d", head->instance);
> 
> Naming it 'nvme/ns<#>', kobject_set_name_vargs is going to change that
> '/' into a '!', so the sysfs entry is named 'nvme!ns<#>'. Not a big
> deal I suppose, but I just thought it looked odd since '!' has special
> meaning in shells.

I noticed the odd renaming in sysfs and though about gettind rid
of the /dev/nvme/ directory.  I just need to come up with a good
name for the device nodes - the name can't contain /dev/nvme* as
nvme-cli would break if it sees files that start with nvme in /dev/.

/dev/nvmN ?  

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 8/9] nvme: track shared namespaces
  2017-09-20 14:54       ` Christoph Hellwig
@ 2017-09-21  5:22         ` Johannes Thumshirn
  -1 siblings, 0 replies; 66+ messages in thread
From: Johannes Thumshirn @ 2017-09-21  5:22 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Keith Busch, Sagi Grimberg, linux-nvme, linux-block

On Wed, Sep 20, 2017 at 04:54:36PM +0200, Christoph Hellwig wrote:
> On Wed, Sep 20, 2017 at 10:36:43AM +0200, Johannes Thumshirn wrote:
> > Being one of the persons who has to backport a lot of NVMe code to older
> > kernels I'm not a huge fan of renaming nmve_ns.
> 
> The churn is main main worry.  Well and that I don't have a reall good
> name for what currently is nvme_ns either :)
> 
> > That said, I don't have a better name for nvme_ns_head (yet) but I'll try to
> > think of one as well. OTOH looking at nvme_ns_head it actaully is the list
> > head of the nvme_ns list.
> 
> But head also has connotations in the SAN world.  Maybe nvme_ns_chain?

I know that's why I didn't really like it all too much in the first place as
well. For nvme_ns_chain, it's not a chain really (the list itself is a chain,
the structure really is the list head...), but I suck at naming things so.

-- 
Johannes Thumshirn                                          Storage
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N�rnberg
GF: Felix Imend�rffer, Jane Smithard, Graham Norton
HRB 21284 (AG N�rnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH 8/9] nvme: track shared namespaces
@ 2017-09-21  5:22         ` Johannes Thumshirn
  0 siblings, 0 replies; 66+ messages in thread
From: Johannes Thumshirn @ 2017-09-21  5:22 UTC (permalink / raw)


On Wed, Sep 20, 2017@04:54:36PM +0200, Christoph Hellwig wrote:
> On Wed, Sep 20, 2017@10:36:43AM +0200, Johannes Thumshirn wrote:
> > Being one of the persons who has to backport a lot of NVMe code to older
> > kernels I'm not a huge fan of renaming nmve_ns.
> 
> The churn is main main worry.  Well and that I don't have a reall good
> name for what currently is nvme_ns either :)
> 
> > That said, I don't have a better name for nvme_ns_head (yet) but I'll try to
> > think of one as well. OTOH looking at nvme_ns_head it actaully is the list
> > head of the nvme_ns list.
> 
> But head also has connotations in the SAN world.  Maybe nvme_ns_chain?

I know that's why I didn't really like it all too much in the first place as
well. For nvme_ns_chain, it's not a chain really (the list itself is a chain,
the structure really is the list head...), but I suck at naming things so.

-- 
Johannes Thumshirn                                          Storage
jthumshirn at suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: Felix Imend?rffer, Jane Smithard, Graham Norton
HRB 21284 (AG N?rnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: nvme multipath support V2
  2017-09-20 14:55     ` Christoph Hellwig
@ 2017-09-21  5:23       ` Johannes Thumshirn
  -1 siblings, 0 replies; 66+ messages in thread
From: Johannes Thumshirn @ 2017-09-21  5:23 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Keith Busch, Sagi Grimberg, linux-nvme, linux-block

On Wed, Sep 20, 2017 at 04:55:46PM +0200, Christoph Hellwig wrote:
> On Wed, Sep 20, 2017 at 01:09:59PM +0200, Johannes Thumshirn wrote:
> > Using your patchset and doing the sanme double connect trick I get the same
> > two block devices of cause.
> > 
> > How do I connect using both paths?
> 
> Same as above.  But now in addition to the /dev/nvmeXnY devices you
> should also see a /dev/nvme/nsZ device.  The sysfs entry for it shows
> which paths it belongs to.

Ah OK, we maybe should update nvme-cli to recognize it as well then. I just
looked at the output of nvme list and obviously didn't find it.

-- 
Johannes Thumshirn                                          Storage
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N�rnberg
GF: Felix Imend�rffer, Jane Smithard, Graham Norton
HRB 21284 (AG N�rnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 66+ messages in thread

* nvme multipath support V2
@ 2017-09-21  5:23       ` Johannes Thumshirn
  0 siblings, 0 replies; 66+ messages in thread
From: Johannes Thumshirn @ 2017-09-21  5:23 UTC (permalink / raw)


On Wed, Sep 20, 2017@04:55:46PM +0200, Christoph Hellwig wrote:
> On Wed, Sep 20, 2017@01:09:59PM +0200, Johannes Thumshirn wrote:
> > Using your patchset and doing the sanme double connect trick I get the same
> > two block devices of cause.
> > 
> > How do I connect using both paths?
> 
> Same as above.  But now in addition to the /dev/nvmeXnY devices you
> should also see a /dev/nvme/nsZ device.  The sysfs entry for it shows
> which paths it belongs to.

Ah OK, we maybe should update nvme-cli to recognize it as well then. I just
looked at the output of nvme list and obviously didn't find it.

-- 
Johannes Thumshirn                                          Storage
jthumshirn at suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: Felix Imend?rffer, Jane Smithard, Graham Norton
HRB 21284 (AG N?rnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 8/9] nvme: track shared namespaces
  2017-09-21  5:22         ` Johannes Thumshirn
@ 2017-09-21 14:37           ` Christoph Hellwig
  -1 siblings, 0 replies; 66+ messages in thread
From: Christoph Hellwig @ 2017-09-21 14:37 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: Christoph Hellwig, Jens Axboe, Keith Busch, Sagi Grimberg,
	linux-nvme, linux-block

On Thu, Sep 21, 2017 at 07:22:17AM +0200, Johannes Thumshirn wrote:
> > But head also has connotations in the SAN world.  Maybe nvme_ns_chain?
> 
> I know that's why I didn't really like it all too much in the first place as
> well. For nvme_ns_chain, it's not a chain really (the list itself is a chain,
> the structure really is the list head...), but I suck at naming things so.

Well, it _is_ the structure for the namespace, and that's the fundamental
problem here given that we use that name for something else at the
moment.

We could hav nvme_namespace and nvme_ns, but I'm not sure that this
helps clarity..

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH 8/9] nvme: track shared namespaces
@ 2017-09-21 14:37           ` Christoph Hellwig
  0 siblings, 0 replies; 66+ messages in thread
From: Christoph Hellwig @ 2017-09-21 14:37 UTC (permalink / raw)


On Thu, Sep 21, 2017@07:22:17AM +0200, Johannes Thumshirn wrote:
> > But head also has connotations in the SAN world.  Maybe nvme_ns_chain?
> 
> I know that's why I didn't really like it all too much in the first place as
> well. For nvme_ns_chain, it's not a chain really (the list itself is a chain,
> the structure really is the list head...), but I suck at naming things so.

Well, it _is_ the structure for the namespace, and that's the fundamental
problem here given that we use that name for something else at the
moment.

We could hav nvme_namespace and nvme_ns, but I'm not sure that this
helps clarity..

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: nvme multipath support V2
  2017-09-21  5:23       ` Johannes Thumshirn
@ 2017-09-21 14:50         ` Christoph Hellwig
  -1 siblings, 0 replies; 66+ messages in thread
From: Christoph Hellwig @ 2017-09-21 14:50 UTC (permalink / raw)
  To: Johannes Thumshirn
  Cc: Christoph Hellwig, Jens Axboe, Keith Busch, Sagi Grimberg,
	linux-nvme, linux-block

On Thu, Sep 21, 2017 at 07:23:45AM +0200, Johannes Thumshirn wrote:
> Ah OK, we maybe should update nvme-cli to recognize it as well then. I just
> looked at the output of nvme list and obviously didn't find it.

Overloading the new per-subsystem nodes into nvme list would be
very confusing I think.  We could add a new command to list subsystems
instead of controllers, if just doing a ls in /dev or sysfs is too hard.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* nvme multipath support V2
@ 2017-09-21 14:50         ` Christoph Hellwig
  0 siblings, 0 replies; 66+ messages in thread
From: Christoph Hellwig @ 2017-09-21 14:50 UTC (permalink / raw)


On Thu, Sep 21, 2017@07:23:45AM +0200, Johannes Thumshirn wrote:
> Ah OK, we maybe should update nvme-cli to recognize it as well then. I just
> looked at the output of nvme list and obviously didn't find it.

Overloading the new per-subsystem nodes into nvme list would be
very confusing I think.  We could add a new command to list subsystems
instead of controllers, if just doing a ls in /dev or sysfs is too hard.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 8/9] nvme: track shared namespaces
  2017-09-21 14:37           ` Christoph Hellwig
@ 2017-09-21 15:16             ` Keith Busch
  -1 siblings, 0 replies; 66+ messages in thread
From: Keith Busch @ 2017-09-21 15:16 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Johannes Thumshirn, Jens Axboe, Sagi Grimberg, linux-nvme, linux-block

On Thu, Sep 21, 2017 at 04:37:48PM +0200, Christoph Hellwig wrote:
> On Thu, Sep 21, 2017 at 07:22:17AM +0200, Johannes Thumshirn wrote:
> > > But head also has connotations in the SAN world.  Maybe nvme_ns_chain?
> > 
> > I know that's why I didn't really like it all too much in the first place as
> > well. For nvme_ns_chain, it's not a chain really (the list itself is a chain,
> > the structure really is the list head...), but I suck at naming things so.
> 
> Well, it _is_ the structure for the namespace, and that's the fundamental
> problem here given that we use that name for something else at the
> moment.
> 
> We could hav nvme_namespace and nvme_ns, but I'm not sure that this
> helps clarity..

If there weren't resistence to renaming structs, it would be more
aligned to how the specification calls these if we rename nvme_ns to
nvme_ns_path, and what you're calling nvme_ns_head is should just be
the nvme_ns.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH 8/9] nvme: track shared namespaces
@ 2017-09-21 15:16             ` Keith Busch
  0 siblings, 0 replies; 66+ messages in thread
From: Keith Busch @ 2017-09-21 15:16 UTC (permalink / raw)


On Thu, Sep 21, 2017@04:37:48PM +0200, Christoph Hellwig wrote:
> On Thu, Sep 21, 2017@07:22:17AM +0200, Johannes Thumshirn wrote:
> > > But head also has connotations in the SAN world.  Maybe nvme_ns_chain?
> > 
> > I know that's why I didn't really like it all too much in the first place as
> > well. For nvme_ns_chain, it's not a chain really (the list itself is a chain,
> > the structure really is the list head...), but I suck at naming things so.
> 
> Well, it _is_ the structure for the namespace, and that's the fundamental
> problem here given that we use that name for something else at the
> moment.
> 
> We could hav nvme_namespace and nvme_ns, but I'm not sure that this
> helps clarity..

If there weren't resistence to renaming structs, it would be more
aligned to how the specification calls these if we rename nvme_ns to
nvme_ns_path, and what you're calling nvme_ns_head is should just be
the nvme_ns.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 9/9] nvme: implement multipath access to nvme subsystems
  2017-09-20 23:52       ` Christoph Hellwig
@ 2017-09-21 21:12         ` Keith Busch
  -1 siblings, 0 replies; 66+ messages in thread
From: Keith Busch @ 2017-09-21 21:12 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jens Axboe, Sagi Grimberg, linux-nvme, linux-block

On Thu, Sep 21, 2017 at 01:52:50AM +0200, Christoph Hellwig wrote:
> I noticed the odd renaming in sysfs and though about gettind rid
> of the /dev/nvme/ directory.  I just need to come up with a good
> name for the device nodes - the name can't contain /dev/nvme* as
> nvme-cli would break if it sees files that start with nvme in /dev/.

Ah, didn't know that. I'll fix nvme-cli to handle this.

We can still use a different naming convention for the multipath handles.

> /dev/nvmN ?

Looks safe.

BTW, considered persistent nameing rules to symlink these from
/dev/disk/by-id/? May need to add an attribute to the multipath object
to assist that.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH 9/9] nvme: implement multipath access to nvme subsystems
@ 2017-09-21 21:12         ` Keith Busch
  0 siblings, 0 replies; 66+ messages in thread
From: Keith Busch @ 2017-09-21 21:12 UTC (permalink / raw)


On Thu, Sep 21, 2017@01:52:50AM +0200, Christoph Hellwig wrote:
> I noticed the odd renaming in sysfs and though about gettind rid
> of the /dev/nvme/ directory.  I just need to come up with a good
> name for the device nodes - the name can't contain /dev/nvme* as
> nvme-cli would break if it sees files that start with nvme in /dev/.

Ah, didn't know that. I'll fix nvme-cli to handle this.

We can still use a different naming convention for the multipath handles.

> /dev/nvmN ?

Looks safe.

BTW, considered persistent nameing rules to symlink these from
/dev/disk/by-id/? May need to add an attribute to the multipath object
to assist that.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 6/9] nvme: track subsystems
  2017-09-18 23:14   ` Christoph Hellwig
@ 2017-09-21 22:52     ` Keith Busch
  -1 siblings, 0 replies; 66+ messages in thread
From: Keith Busch @ 2017-09-21 22:52 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jens Axboe, Sagi Grimberg, linux-nvme, linux-block

On Mon, Sep 18, 2017 at 04:14:50PM -0700, Christoph Hellwig wrote:
> +static int nvme_init_subsystem(struct nvme_ctrl *ctrl, struct nvme_id_ctrl *id)
> +{
> +	struct nvme_subsystem *subsys, *found;
> +
> +	subsys = kzalloc(sizeof(*subsys), GFP_KERNEL);
> +	if (!subsys)
> +		return -ENOMEM;
> +	INIT_LIST_HEAD(&subsys->ctrls);
> +	kref_init(&subsys->ref);
> +	nvme_init_subnqn(subsys, ctrl, id);
> +	mutex_init(&subsys->lock);
> +
> +	mutex_lock(&nvme_subsystems_lock);
> +	found = __nvme_find_get_subsystem(subsys->subnqn);
> +	if (found) {
> +		/*
> +		 * Verify that the subsystem actually supports multiple
> +		 * controllers, else bail out.
> +		 */
> +		kfree(subsys);
> +		if (!(id->cmic & (1 << 1))) {
> +			dev_err(ctrl->device,
> +				"ignoring ctrl due to duplicate subnqn (%s).\n",
> +				found->subnqn);
> +			mutex_unlock(&nvme_subsystems_lock);
> +			return -EINVAL;
> +		}
> +
> +		subsys = found;
> +	} else {
> +		list_add_tail(&subsys->entry, &nvme_subsystems);
> +	}
> +
> +	ctrl->subsys = subsys;
> +	mutex_unlock(&nvme_subsystems_lock);
> +
> +	mutex_lock(&subsys->lock);
> +	list_add_tail(&ctrl->subsys_entry, &subsys->ctrls);
> +	mutex_unlock(&subsys->lock);

This function is called every time nvme_init_identify is called, which
happens on every controller reset. The controller reset does not remove
itself from the subsystem list of controllers, so its entry is getting
doubly added after a controller reset.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH 6/9] nvme: track subsystems
@ 2017-09-21 22:52     ` Keith Busch
  0 siblings, 0 replies; 66+ messages in thread
From: Keith Busch @ 2017-09-21 22:52 UTC (permalink / raw)


On Mon, Sep 18, 2017@04:14:50PM -0700, Christoph Hellwig wrote:
> +static int nvme_init_subsystem(struct nvme_ctrl *ctrl, struct nvme_id_ctrl *id)
> +{
> +	struct nvme_subsystem *subsys, *found;
> +
> +	subsys = kzalloc(sizeof(*subsys), GFP_KERNEL);
> +	if (!subsys)
> +		return -ENOMEM;
> +	INIT_LIST_HEAD(&subsys->ctrls);
> +	kref_init(&subsys->ref);
> +	nvme_init_subnqn(subsys, ctrl, id);
> +	mutex_init(&subsys->lock);
> +
> +	mutex_lock(&nvme_subsystems_lock);
> +	found = __nvme_find_get_subsystem(subsys->subnqn);
> +	if (found) {
> +		/*
> +		 * Verify that the subsystem actually supports multiple
> +		 * controllers, else bail out.
> +		 */
> +		kfree(subsys);
> +		if (!(id->cmic & (1 << 1))) {
> +			dev_err(ctrl->device,
> +				"ignoring ctrl due to duplicate subnqn (%s).\n",
> +				found->subnqn);
> +			mutex_unlock(&nvme_subsystems_lock);
> +			return -EINVAL;
> +		}
> +
> +		subsys = found;
> +	} else {
> +		list_add_tail(&subsys->entry, &nvme_subsystems);
> +	}
> +
> +	ctrl->subsys = subsys;
> +	mutex_unlock(&nvme_subsystems_lock);
> +
> +	mutex_lock(&subsys->lock);
> +	list_add_tail(&ctrl->subsys_entry, &subsys->ctrls);
> +	mutex_unlock(&subsys->lock);

This function is called every time nvme_init_identify is called, which
happens on every controller reset. The controller reset does not remove
itself from the subsystem list of controllers, so its entry is getting
doubly added after a controller reset.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: nvme multipath support V2
  2017-09-21 14:50         ` Christoph Hellwig
@ 2017-09-22  0:21           ` Tony Yang
  -1 siblings, 0 replies; 66+ messages in thread
From: Tony Yang @ 2017-09-22  0:21 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Johannes Thumshirn, Jens Axboe, linux-block, Sagi Grimberg,
	linux-nvme, Keith Busch

Excuse me,
   ask an junior question, how can you complete the nvme mpath package
clone, I use git clone, after completion, found no nvme directory.


[root@scst1 soft]#  git clone
git://git.infradead.org/users/hch/block.git nvme-mpath
Cloning into 'nvme-mpath'...
remote: Counting objects: 5623436, done.
remote: Compressing objects: 100% (922468/922468), done.
remote: Total 5623436 (delta 4757577), reused 5524606 (delta 4658819)
Receiving objects: 100% (5623436/5623436), 1.12 GiB | 3.17 MiB/s, done.
Resolving deltas: 100% (4757577/4757577), done.
Checking out files: 100% (50795/50795), done.


[root@scst1 drivers]# pwd
/u01/soft/nvme-mpath/drivers

[root@scst1 drivers]# ls nvme
ls: cannot access nvme: No such file or directory

Thanks

2017-09-21 22:50 GMT+08:00 Christoph Hellwig <hch@lst.de>:
> On Thu, Sep 21, 2017 at 07:23:45AM +0200, Johannes Thumshirn wrote:
>> Ah OK, we maybe should update nvme-cli to recognize it as well then. I just
>> looked at the output of nvme list and obviously didn't find it.
>
> Overloading the new per-subsystem nodes into nvme list would be
> very confusing I think.  We could add a new command to list subsystems
> instead of controllers, if just doing a ls in /dev or sysfs is too hard.
>
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 66+ messages in thread

* nvme multipath support V2
@ 2017-09-22  0:21           ` Tony Yang
  0 siblings, 0 replies; 66+ messages in thread
From: Tony Yang @ 2017-09-22  0:21 UTC (permalink / raw)


Excuse me,
   ask an junior question, how can you complete the nvme mpath package
clone, I use git clone, after completion, found no nvme directory.


[root at scst1 soft]#  git clone
git://git.infradead.org/users/hch/block.git nvme-mpath
Cloning into 'nvme-mpath'...
remote: Counting objects: 5623436, done.
remote: Compressing objects: 100% (922468/922468), done.
remote: Total 5623436 (delta 4757577), reused 5524606 (delta 4658819)
Receiving objects: 100% (5623436/5623436), 1.12 GiB | 3.17 MiB/s, done.
Resolving deltas: 100% (4757577/4757577), done.
Checking out files: 100% (50795/50795), done.


[root at scst1 drivers]# pwd
/u01/soft/nvme-mpath/drivers

[root at scst1 drivers]# ls nvme
ls: cannot access nvme: No such file or directory

Thanks

2017-09-21 22:50 GMT+08:00 Christoph Hellwig <hch at lst.de>:
> On Thu, Sep 21, 2017@07:23:45AM +0200, Johannes Thumshirn wrote:
>> Ah OK, we maybe should update nvme-cli to recognize it as well then. I just
>> looked at the output of nvme list and obviously didn't find it.
>
> Overloading the new per-subsystem nodes into nvme list would be
> very confusing I think.  We could add a new command to list subsystems
> instead of controllers, if just doing a ls in /dev or sysfs is too hard.
>
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 9/9] nvme: implement multipath access to nvme subsystems
  2017-09-18 23:14   ` Christoph Hellwig
@ 2017-09-22 15:09     ` Keith Busch
  -1 siblings, 0 replies; 66+ messages in thread
From: Keith Busch @ 2017-09-22 15:09 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Jens Axboe, Sagi Grimberg, linux-nvme, linux-block

On Mon, Sep 18, 2017 at 04:14:53PM -0700, Christoph Hellwig wrote:
> +static void nvme_failover_req(struct request *req)
> +{
> +	struct nvme_ns *ns = req->q->queuedata;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&ns->head->requeue_lock, flags);
> +	blk_steal_bios(&ns->head->requeue_list, req);
> +	spin_unlock_irqrestore(&ns->head->requeue_lock, flags);
> +
> +	nvme_reset_ctrl(ns->ctrl);
> +	kblockd_schedule_work(&ns->head->requeue_work);
> +}

Need to call blk_mq_free_req after stealing all its bios to prevent
leaking that entered request.

All together, I needed just two small changes to the entire series to
get everything testing successfully. Here's the diff on top of your set:

---
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 5449c83..55620ba 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -115,6 +115,8 @@ static void nvme_failover_req(struct request *req)
 	blk_steal_bios(&ns->head->requeue_list, req);
 	spin_unlock_irqrestore(&ns->head->requeue_lock, flags);
 
+	blk_mq_free_request(req);
+
 	nvme_reset_ctrl(ns->ctrl);
 	kblockd_schedule_work(&ns->head->requeue_work);
 }
@@ -1935,6 +1937,9 @@ static int nvme_init_subsystem(struct nvme_ctrl *ctrl, struct nvme_id_ctrl *id)
 {
 	struct nvme_subsystem *subsys, *found;
 
+	if (ctrl->identified)
+		return 0;
+
 	subsys = kzalloc(sizeof(*subsys), GFP_KERNEL);
 	if (!subsys)
 		return -ENOMEM;
--

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 9/9] nvme: implement multipath access to nvme subsystems
@ 2017-09-22 15:09     ` Keith Busch
  0 siblings, 0 replies; 66+ messages in thread
From: Keith Busch @ 2017-09-22 15:09 UTC (permalink / raw)


On Mon, Sep 18, 2017@04:14:53PM -0700, Christoph Hellwig wrote:
> +static void nvme_failover_req(struct request *req)
> +{
> +	struct nvme_ns *ns = req->q->queuedata;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&ns->head->requeue_lock, flags);
> +	blk_steal_bios(&ns->head->requeue_list, req);
> +	spin_unlock_irqrestore(&ns->head->requeue_lock, flags);
> +
> +	nvme_reset_ctrl(ns->ctrl);
> +	kblockd_schedule_work(&ns->head->requeue_work);
> +}

Need to call blk_mq_free_req after stealing all its bios to prevent
leaking that entered request.

All together, I needed just two small changes to the entire series to
get everything testing successfully. Here's the diff on top of your set:

---
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 5449c83..55620ba 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -115,6 +115,8 @@ static void nvme_failover_req(struct request *req)
 	blk_steal_bios(&ns->head->requeue_list, req);
 	spin_unlock_irqrestore(&ns->head->requeue_lock, flags);
 
+	blk_mq_free_request(req);
+
 	nvme_reset_ctrl(ns->ctrl);
 	kblockd_schedule_work(&ns->head->requeue_work);
 }
@@ -1935,6 +1937,9 @@ static int nvme_init_subsystem(struct nvme_ctrl *ctrl, struct nvme_id_ctrl *id)
 {
 	struct nvme_subsystem *subsys, *found;
 
+	if (ctrl->identified)
+		return 0;
+
 	subsys = kzalloc(sizeof(*subsys), GFP_KERNEL);
 	if (!subsys)
 		return -ENOMEM;
--

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: nvme multipath support V2
  2017-09-22  0:21           ` Tony Yang
@ 2017-09-24 14:41             ` Christoph Hellwig
  -1 siblings, 0 replies; 66+ messages in thread
From: Christoph Hellwig @ 2017-09-24 14:41 UTC (permalink / raw)
  To: Tony Yang
  Cc: Christoph Hellwig, Johannes Thumshirn, Jens Axboe, linux-block,
	Sagi Grimberg, linux-nvme, Keith Busch

On Fri, Sep 22, 2017 at 08:21:09AM +0800, Tony Yang wrote:
> Excuse me,
>    ask an junior question, how can you complete the nvme mpath package
> clone, I use git clone, after completion, found no nvme directory.
> 
> 
> [root@scst1 soft]#  git clone
> git://git.infradead.org/users/hch/block.git nvme-mpath
> Cloning into 'nvme-mpath'...
> remote: Counting objects: 5623436, done.
> remote: Compressing objects: 100% (922468/922468), done.
> remote: Total 5623436 (delta 4757577), reused 5524606 (delta 4658819)
> Receiving objects: 100% (5623436/5623436), 1.12 GiB | 3.17 MiB/s, done.
> Resolving deltas: 100% (4757577/4757577), done.
> Checking out files: 100% (50795/50795), done.

git checkout nvme-mpath
make oldconfig
<answer question>
<make sure all options with NVME in the name are enabled in .config>
make

^ permalink raw reply	[flat|nested] 66+ messages in thread

* nvme multipath support V2
@ 2017-09-24 14:41             ` Christoph Hellwig
  0 siblings, 0 replies; 66+ messages in thread
From: Christoph Hellwig @ 2017-09-24 14:41 UTC (permalink / raw)


On Fri, Sep 22, 2017@08:21:09AM +0800, Tony Yang wrote:
> Excuse me,
>    ask an junior question, how can you complete the nvme mpath package
> clone, I use git clone, after completion, found no nvme directory.
> 
> 
> [root at scst1 soft]#  git clone
> git://git.infradead.org/users/hch/block.git nvme-mpath
> Cloning into 'nvme-mpath'...
> remote: Counting objects: 5623436, done.
> remote: Compressing objects: 100% (922468/922468), done.
> remote: Total 5623436 (delta 4757577), reused 5524606 (delta 4658819)
> Receiving objects: 100% (5623436/5623436), 1.12 GiB | 3.17 MiB/s, done.
> Resolving deltas: 100% (4757577/4757577), done.
> Checking out files: 100% (50795/50795), done.

git checkout nvme-mpath
make oldconfig
<answer question>
<make sure all options with NVME in the name are enabled in .config>
make

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 9/9] nvme: implement multipath access to nvme subsystems
  2017-09-21 21:12         ` Keith Busch
@ 2017-09-24 14:42           ` Christoph Hellwig
  -1 siblings, 0 replies; 66+ messages in thread
From: Christoph Hellwig @ 2017-09-24 14:42 UTC (permalink / raw)
  To: Keith Busch
  Cc: Christoph Hellwig, Jens Axboe, Sagi Grimberg, linux-nvme, linux-block

On Thu, Sep 21, 2017 at 05:12:46PM -0400, Keith Busch wrote:
> BTW, considered persistent nameing rules to symlink these from
> /dev/disk/by-id/? May need to add an attribute to the multipath object
> to assist that.

Yes, we do.  I've just been trying to push out udev/systemd work
as long as possible in the hopes that someone else will do it,
as it always seems painful.

But I think we should also offer versions of our id attributes
on this device to make life easier.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH 9/9] nvme: implement multipath access to nvme subsystems
@ 2017-09-24 14:42           ` Christoph Hellwig
  0 siblings, 0 replies; 66+ messages in thread
From: Christoph Hellwig @ 2017-09-24 14:42 UTC (permalink / raw)


On Thu, Sep 21, 2017@05:12:46PM -0400, Keith Busch wrote:
> BTW, considered persistent nameing rules to symlink these from
> /dev/disk/by-id/? May need to add an attribute to the multipath object
> to assist that.

Yes, we do.  I've just been trying to push out udev/systemd work
as long as possible in the hopes that someone else will do it,
as it always seems painful.

But I think we should also offer versions of our id attributes
on this device to make life easier.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 9/9] nvme: implement multipath access to nvme subsystems
  2017-09-22 15:09     ` Keith Busch
@ 2017-09-24 14:48       ` Christoph Hellwig
  -1 siblings, 0 replies; 66+ messages in thread
From: Christoph Hellwig @ 2017-09-24 14:48 UTC (permalink / raw)
  To: Keith Busch
  Cc: Christoph Hellwig, Jens Axboe, Sagi Grimberg, linux-nvme, linux-block

On Fri, Sep 22, 2017 at 11:09:16AM -0400, Keith Busch wrote:
> On Mon, Sep 18, 2017 at 04:14:53PM -0700, Christoph Hellwig wrote:
> > +static void nvme_failover_req(struct request *req)
> > +{
> > +	struct nvme_ns *ns = req->q->queuedata;
> > +	unsigned long flags;
> > +
> > +	spin_lock_irqsave(&ns->head->requeue_lock, flags);
> > +	blk_steal_bios(&ns->head->requeue_list, req);
> > +	spin_unlock_irqrestore(&ns->head->requeue_lock, flags);
> > +
> > +	nvme_reset_ctrl(ns->ctrl);
> > +	kblockd_schedule_work(&ns->head->requeue_work);
> > +}
> 
> Need to call blk_mq_free_req after stealing all its bios to prevent
> leaking that entered request.

I think this should be a blk_mq_end_request actually.  The difference
is that blk_mq_end_request will get the I/O accounting right, and
treats the case of having and ->end_io handler correctly as well.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH 9/9] nvme: implement multipath access to nvme subsystems
@ 2017-09-24 14:48       ` Christoph Hellwig
  0 siblings, 0 replies; 66+ messages in thread
From: Christoph Hellwig @ 2017-09-24 14:48 UTC (permalink / raw)


On Fri, Sep 22, 2017@11:09:16AM -0400, Keith Busch wrote:
> On Mon, Sep 18, 2017@04:14:53PM -0700, Christoph Hellwig wrote:
> > +static void nvme_failover_req(struct request *req)
> > +{
> > +	struct nvme_ns *ns = req->q->queuedata;
> > +	unsigned long flags;
> > +
> > +	spin_lock_irqsave(&ns->head->requeue_lock, flags);
> > +	blk_steal_bios(&ns->head->requeue_list, req);
> > +	spin_unlock_irqrestore(&ns->head->requeue_lock, flags);
> > +
> > +	nvme_reset_ctrl(ns->ctrl);
> > +	kblockd_schedule_work(&ns->head->requeue_work);
> > +}
> 
> Need to call blk_mq_free_req after stealing all its bios to prevent
> leaking that entered request.

I think this should be a blk_mq_end_request actually.  The difference
is that blk_mq_end_request will get the I/O accounting right, and
treats the case of having and ->end_io handler correctly as well.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 8/9] nvme: track shared namespaces
  2017-09-21 15:16             ` Keith Busch
@ 2017-09-24 15:25               ` Christoph Hellwig
  -1 siblings, 0 replies; 66+ messages in thread
From: Christoph Hellwig @ 2017-09-24 15:25 UTC (permalink / raw)
  To: Keith Busch
  Cc: Christoph Hellwig, Johannes Thumshirn, Jens Axboe, Sagi Grimberg,
	linux-nvme, linux-block

On Thu, Sep 21, 2017 at 11:16:31AM -0400, Keith Busch wrote:
> If there weren't resistence to renaming structs, it would be more
> aligned to how the specification calls these if we rename nvme_ns to
> nvme_ns_path, and what you're calling nvme_ns_head is should just be
> the nvme_ns.

Then we'd still need a good name for what's now nvme_ns :)

But seriously speaking - I'm a little worried because Linus is really
pissed at me for doing major refactoring in block land (by proxy of
Jens who gets all the heat).  I just don't feel comfortable with a
rename at the moment, even if it's the right thing.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH 8/9] nvme: track shared namespaces
@ 2017-09-24 15:25               ` Christoph Hellwig
  0 siblings, 0 replies; 66+ messages in thread
From: Christoph Hellwig @ 2017-09-24 15:25 UTC (permalink / raw)


On Thu, Sep 21, 2017@11:16:31AM -0400, Keith Busch wrote:
> If there weren't resistence to renaming structs, it would be more
> aligned to how the specification calls these if we rename nvme_ns to
> nvme_ns_path, and what you're calling nvme_ns_head is should just be
> the nvme_ns.

Then we'd still need a good name for what's now nvme_ns :)

But seriously speaking - I'm a little worried because Linus is really
pissed at me for doing major refactoring in block land (by proxy of
Jens who gets all the heat).  I just don't feel comfortable with a
rename at the moment, even if it's the right thing.

^ permalink raw reply	[flat|nested] 66+ messages in thread

end of thread, other threads:[~2017-09-24 15:25 UTC | newest]

Thread overview: 66+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-18 23:14 nvme multipath support V2 Christoph Hellwig
2017-09-18 23:14 ` Christoph Hellwig
2017-09-18 23:14 ` [PATCH 1/9] nvme: allow timed-out ios to retry Christoph Hellwig
2017-09-18 23:14   ` Christoph Hellwig
2017-09-18 23:14 ` [PATCH 2/9] block: move REQ_NOWAIT Christoph Hellwig
2017-09-18 23:14   ` Christoph Hellwig
2017-09-18 23:14 ` [PATCH 3/9] block: add REQ_DRV bit Christoph Hellwig
2017-09-18 23:14   ` Christoph Hellwig
2017-09-18 23:14 ` [PATCH 4/9] block: provide a direct_make_request helper Christoph Hellwig
2017-09-18 23:14   ` Christoph Hellwig
2017-09-18 23:14 ` [PATCH 5/9] block: add a blk_steal_bios helper Christoph Hellwig
2017-09-18 23:14   ` Christoph Hellwig
2017-09-18 23:14 ` [PATCH 6/9] nvme: track subsystems Christoph Hellwig
2017-09-18 23:14   ` Christoph Hellwig
2017-09-21 22:52   ` Keith Busch
2017-09-21 22:52     ` Keith Busch
2017-09-18 23:14 ` [PATCH 7/9] nvme: introduce a nvme_ns_ids structure Christoph Hellwig
2017-09-18 23:14   ` Christoph Hellwig
2017-09-20  8:27   ` Johannes Thumshirn
2017-09-20  8:27     ` Johannes Thumshirn
2017-09-18 23:14 ` [PATCH 8/9] nvme: track shared namespaces Christoph Hellwig
2017-09-18 23:14   ` Christoph Hellwig
2017-09-20  8:36   ` Johannes Thumshirn
2017-09-20  8:36     ` Johannes Thumshirn
2017-09-20 14:54     ` Christoph Hellwig
2017-09-20 14:54       ` Christoph Hellwig
2017-09-21  5:22       ` Johannes Thumshirn
2017-09-21  5:22         ` Johannes Thumshirn
2017-09-21 14:37         ` Christoph Hellwig
2017-09-21 14:37           ` Christoph Hellwig
2017-09-21 15:16           ` Keith Busch
2017-09-21 15:16             ` Keith Busch
2017-09-24 15:25             ` Christoph Hellwig
2017-09-24 15:25               ` Christoph Hellwig
2017-09-18 23:14 ` [PATCH 9/9] nvme: implement multipath access to nvme subsystems Christoph Hellwig
2017-09-18 23:14   ` Christoph Hellwig
2017-09-20  0:18   ` Tony Yang
2017-09-20  0:18     ` Tony Yang
2017-09-20  8:15     ` Johannes Thumshirn
2017-09-20  8:15       ` Johannes Thumshirn
2017-09-20  9:42   ` Johannes Thumshirn
2017-09-20  9:42     ` Johannes Thumshirn
2017-09-20 22:58   ` Keith Busch
2017-09-20 22:58     ` Keith Busch
2017-09-20 23:52     ` Christoph Hellwig
2017-09-20 23:52       ` Christoph Hellwig
2017-09-21 21:12       ` Keith Busch
2017-09-21 21:12         ` Keith Busch
2017-09-24 14:42         ` Christoph Hellwig
2017-09-24 14:42           ` Christoph Hellwig
2017-09-22 15:09   ` Keith Busch
2017-09-22 15:09     ` Keith Busch
2017-09-24 14:48     ` Christoph Hellwig
2017-09-24 14:48       ` Christoph Hellwig
2017-09-20 11:09 ` nvme multipath support V2 Johannes Thumshirn
2017-09-20 11:09   ` Johannes Thumshirn
2017-09-20 14:55   ` Christoph Hellwig
2017-09-20 14:55     ` Christoph Hellwig
2017-09-21  5:23     ` Johannes Thumshirn
2017-09-21  5:23       ` Johannes Thumshirn
2017-09-21 14:50       ` Christoph Hellwig
2017-09-21 14:50         ` Christoph Hellwig
2017-09-22  0:21         ` Tony Yang
2017-09-22  0:21           ` Tony Yang
2017-09-24 14:41           ` Christoph Hellwig
2017-09-24 14:41             ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.