linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V15 0/5] nvmet: add ZBD backened support
@ 2021-06-10  1:32 Chaitanya Kulkarni
  2021-06-10  1:32 ` [PATCH V15 1/5] block: export blk_next_bio() Chaitanya Kulkarni
                   ` (5 more replies)
  0 siblings, 6 replies; 9+ messages in thread
From: Chaitanya Kulkarni @ 2021-06-10  1:32 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: axboe, sagi, hch, Chaitanya Kulkarni

Hi,

NVMeOF Host is capable of handling the NVMe Protocol based Zoned Block
Devices (ZBD) in the Zoned Namespaces (ZNS) mode with the passthru
backend. There is no support for a generic block device backend to
handle the ZBDs which are not NVMe protocol compliant.

This adds support to export the ZBDs (which are not NVMe drives) to host
from the target via NVMeOF using the host side ZNS interface.

Note: This patch-series is based on nvme-5.14.

-ck

Chaitanya Kulkarni (5):
  block: export blk_next_bio()
  nvmet: add req cns error complete helper
  nvmet: add nvmet_req_bio put helper for backends
  nvmet: add Command Set Identifier support
  nvmet: add ZBD over ZNS backend support

 block/blk-lib.c                   |   1 +
 drivers/nvme/target/Makefile      |   1 +
 drivers/nvme/target/admin-cmd.c   | 121 +++++-
 drivers/nvme/target/core.c        |  43 ++-
 drivers/nvme/target/io-cmd-bdev.c |  29 +-
 drivers/nvme/target/nvmet.h       |  35 ++
 drivers/nvme/target/passthru.c    |   3 +-
 drivers/nvme/target/zns.c         | 622 ++++++++++++++++++++++++++++++
 include/linux/bio.h               |   2 +
 include/linux/nvme.h              |   8 +
 10 files changed, 828 insertions(+), 37 deletions(-)
 create mode 100644 drivers/nvme/target/zns.c

Changes since V14:-

1. Remove the unsingned long cast in the call to set_bit().
2. Update reviewed-by tag.

Changes since V13:-

1. Move the code cleanup patches upfront and export blk_next_bio().
2. Remove NVM in the subject line for the CSS patch.
3. Remove the switch from nvmet_execute_identify_desclist_csi()
   and use ns->csi directly.
4. Add csi switch in the nvmet_parse_io_cmd().
5. Rename nvmet_cc_css_check() -> nvmet_css_supported().
6. Remove switch from nvmet_set_csi_zns_effects().
7. Rename nvmet_set_csi_nvm_effects() ->
   nvmet_get_cmd_effects_nvm().
7. Add separate parse case for the different csi.
8. Remove the bdev_is_zoned().
9. Remove the csi check from the admin cmd handlers and move
   to the caller.
10. Add Zone State change emulation and add state machine check
    to handle zone mgmt send command.
12. Abort the zone append command with error when transfer len = 0.
13. Don't allocate the buffer for zone mgmt recv report zones instead
    use the command data buffer.
14. Overall small cleanups to reflect the new code changes.
15. Offload zone-mgmt commands to workqueue.

Changes since V12:-

1. Remove the comment :-
  /* bdev_is_zoned() is stubbed out of CONFIG_BLK_DEV_ZONED */
2. Update copyright header.
3. Use folliwng expression in nvmet_zasl(). 
   return ilog2(zone_append_sects >> (NVMET_MPSMIN_SHIFT - 9));
4. Add comment in nvmet_bdev_zns_enable().
5. Add zone filtering and partial bit handling.
6. Use if else pattern in nvmet_bdev_execute_zone_mgmt_send().
7. Make zone append async. 
8. Add a comment for nvmet_check_transfer_len().
9. Add a helper nvmet_req_cns_error_compplete() to remove the duplicate
   code.

Changes from V11:-

1. Use bdev_logical_block_size() in nvmet_bdev_zns_enable().
2. Return error if nr_zones calculation ends up having value 0.
4. Drop the comment patch from this series :-
   nvme: add comments to nvme_zns_alloc_report_buffer

Changes from V10:-

1.  Move CONFIG_BLK_DEV_ZONED check into the caller of
    nvmet_set_csi_zns_effects().
2.  Move ZNS related csi code from csi patch to its own ZNS backend
    patch.
3.  For ZNS command effects logs set default command effects with 
    nvmet_set_csi_nvm_effects() along with nvmet_set_csi_zns_effects().
4.  Use goto for failure case in the nvmet_set_csi_zns_effects().
5.  Return directly from swicth in nvmet_execute_identify_desclist_csi().
6.  Merge Patch 2nd/3rd into one patch and move ZNS related code
    into its own :-
     [PATCH V10 2/8] nvmet: add NVM Command Set Identifier support
     [PATCH V10 3/8] nvmet: add command set supported ctrl cap
    Merged into new patch minux ZNS calls :-
     [PATCH V11 1/4] nvmet: add NVM Command Set Identifier support
7.  Move req->cmd->identify.csi == NVME_CSI_ZNS checks in to respective
    caller in nvmet_execute_identify().
8.  Update error log page in nvmet_bdev_zns_checks().
9.  Remove the terney expression nvmet_zns_update_zasl().
10. Drop following patches:-
     [PATCH V10 1/8] nvmet: trim args for nvmet_copy_ns_identifier()
     [PATCH V10 6/8] nvme-core: check ctrl css before setting up zns
     [PATCH V10 7/8] nvme-core: add a helper to print css related error

Changes from V9:-

1.  Use bio_add_zone_append_page() for REQ_OP_ZONE_APPEND.
2.  Add a separate prep patch to reduce the arguments for
    nvmet_copy_ns_identifier().
3.  Add a separate patch for nvmet CSI support.
4.  Add a separate patch for nvmet CSS support.
5.  Move nvmet_cc_css_check() to multi-css supported patch.
6.  Return error in csi cmd effects helper when !CONFIG_BLK_DEV_ZONED.
7.  Return error in id desc list helper when !CONFIG_BLK_DEV_ZONED.
8.  Remove goto and return from nvmet_bdev_zns_checks().
9.  Move nr_zones calculations near call to blkdev_report_zones() in
    nvmet_bdev_execute_zone_mgmt_recv().
10. Split command effects logs into respective CSI helpers.
11. Don't use the local variables to pass NVME_CSI_XXX values, instead
    use req->ns->csi, also move this from ZBD support patch to nvmet
    CSI support.
12. Fix the bug that is chekcing cns value instead of csi in identify.
13. bdev_is_zoned() is stubbed out the CONFIG_BLK_DEV_ZONED, so remove
    the check for CONFIG_BLK_DEV_ZONED before calling bdev_is_zoned().
14  Drop following patches :-
    [PATCH V9 1/9] block: export bio_add_hw_pages().
    [PATCH V9 5/9] nvmet: add bio get helper.
    [PATCH V9 6/9] nvmet: add bio init helper.
    [PATCH V9 8/9] nvmet: add common I/O length check helper.
    [PATCH V9 9/9] nvmet: call nvmet_bio_done() for zone append.
15. Add a patch to check for ctrl bits to make sure ctrl supports the
    multi css on the host side when setting up Zoned Namespace.
17. Add a documentation patch for the host side when calculating the
    buffer size allocation size for the report zones.
16. Rebase and retest 5.12-rc1.

Changes from V8:-

1. Rebase and retest on latest nvme-5.11.
2. Export ctrl->cap csi support only if CONFIG_BLK_DEV_ZONE is set.
3. Add a fix to admin ns-desc list handler for handling default csi.

Changes from V7:-

1. Just like what block layer provides an API for bio_init(), provide
   nvmet_bio_init() such that we move bio initialization code for
   nvme-read-write commands from bdev and zns backend into the centralize
   helper. 
2. With bdev/zns/file now we have three backends that are checking for
   req->sg_cnt and calling nvmet_check_transfer_len() before we process
   nvme-read-write commands. Move this duplicate code from three
   backeneds into the helper.
3. Export and use nvmet_bio_done() callback in
   nvmet_execute_zone_append() instead of the open coding the function.
   This also avoids code duplication for bio & request completion with
   error log page update.
4. Add zonefs tests log for dm linear device created on the top of SMR HDD
   exported with NVMeOF ZNS backend with the help of nvme-loop.

Changes from V6:-

1. Instead of calling report zones to find conventional zones in the 
   loop use the loop inside LLD blkdev_report_zones()->LLD_report_zones,
   that also simplifies the report zone callback.
2. Fix the bug in nvmet_bdev_has_conv_zones().
3. Remove conditional operators in the nvmet_bdev_execute_zone_append().

Changes from V5:-

1.  Use bio->bi_iter.bi_sector for result of the REQ_OP_ZONE_APPEND
    command.
2.  Add endianness to the helper nvmet_sect_to_lba().
3.  Make bufsize u32 in zone mgmt recv command handler.
4.  Add __GFP_ZERO for report zone data buffer to return clean buffer.

Changes from V4:-

1.  Don't use bio_iov_iter_get_pages() instead add a patch to export
    bio_add_hw_page() and call it directly for zone append.
2.  Add inline vector optimization for append bio.
3.  Update the commit logs for the patches.
4.  Remove ZNS related identify data structures, use individual members.
5.  Add a comment for macro NVMET_MPSMIN_SHIFT.
6.  Remove nvmet_bdev() helper.
7.  Move the command set identifier code into common code.
8.  Use IS_ENABLED() and move helpers fomr zns.c into common code.
9.  Add a patch to support Command Set identifiers.
10. Open code nvmet_bdev_validate_zns_zones().
11. Remove the per namespace min zasl calculation and don't allow
    namespaces with zasl value > the first ns zasl value.
12. Move the stubs into the header file.
13. Add lba to/from sector conversion helpers and update the
    io-cmd-bdev.c to avoid the code duplication.
14. Add everything into one patch for zns command handlers and 
    respective calls from the target code.
15. Remove the trim ns-desclist admin callback patch from this series.
16. Add bio get and put helpers patches to reduce the duplicate code in
    generic bdev, passthru, and generic zns backend.

Changes from V3:-

1.  Get rid of the bio_max_zasl check.
2.  Remove extra lines.
3.  Remove the block layer api export patch.
4.  Remove the bvec check in the bio_iov_iter_get_pages() for
    REQ_OP_ZONE_APPEND so that we can reuse the code.

Changes from V2:-

1.  Move conventional zone bitmap check into 
    nvmet_bdev_validate_zns_zones(). 
2.  Don't use report zones call to check the runt zone.
3.  Trim nvmet_zasl() helper.
4.  Fix typo in the nvmet_zns_update_zasl().
5.  Remove the comment and fix the mdts calculation in
    nvmet_execute_identify_cns_cs_ctrl().
6.  Use u64 for bufsize in nvmet_bdev_execute_zone_mgmt_recv().
7.  Remove nvmet_zones_to_desc_size() and fix the nr_zones
    calculation.
8.  Remove the op variable in nvmet_bdev_execute_zone_append().
9.  Fix the nr_zones calculation nvmet_bdev_execute_zone_mgmt_recv().
10. Update cover letter subject.

Changes from V1:-

1.  Remove the nvmet-$(CONFIG_BLK_DEV_ZONED) += zns.o.
2.  Mark helpers inline.
3.  Fix typos in the comments and update the comments.
4.  Get rid of the curly brackets.
5.  Don't allow drives with last smaller zones.
6.  Calculate the zasl as a function of ax_zone_append_sectors,
    bio_max_pages so we don't have to split the bio.
7.  Add global subsys->zasl and update the zasl when new namespace
    is enabled.
8.  Rmove the loop in the nvmet_bdev_execute_zone_mgmt_recv() and
    move functionality in to the report zone callback.
9.  Add goto for default case in nvmet_bdev_execute_zone_mgmt_send().
10. Allocate the zones buffer with zones size instead of bdev nr_zones.

-- 
2.22.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH V15 1/5] block: export blk_next_bio()
  2021-06-10  1:32 [PATCH V15 0/5] nvmet: add ZBD backened support Chaitanya Kulkarni
@ 2021-06-10  1:32 ` Chaitanya Kulkarni
  2021-06-10  1:32 ` [PATCH V15 2/5] nvmet: add req cns error complete helper Chaitanya Kulkarni
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Chaitanya Kulkarni @ 2021-06-10  1:32 UTC (permalink / raw)
  To: linux-block, linux-nvme
  Cc: axboe, sagi, hch, Chaitanya Kulkarni, Damien Le Moal

The block layer provides emulation of zone management operations
targeting all zones of a zoned block device only for the zone reset
operation (REQ_OP_ZONE_RESET). In order to correctly implement
exporting of zoned block devices with NVMeOF, emulating zone management
operations targeting all zones of a device is also necessary for the
open, close and finish zone operations (REQ_OP_ZONE_OPEN,
REQ_OP_ZONE_CLOSE and REQ_OP_ZONE_FINISH).

Instead of duplicating the code, export the existing helper from block
layer so we can use a bio chaining pattern that is present in the block
layer for REQ_OP_ZONE RESET all emulation in the NVMeOF zoned block
device backend.

Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
 block/blk-lib.c     | 1 +
 include/linux/bio.h | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/block/blk-lib.c b/block/blk-lib.c
index 7b256131b20b..9f09beadcbe3 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -21,6 +21,7 @@ struct bio *blk_next_bio(struct bio *bio, unsigned int nr_pages, gfp_t gfp)
 
 	return new;
 }
+EXPORT_SYMBOL_GPL(blk_next_bio);
 
 int __blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 		sector_t nr_sects, gfp_t gfp_mask, int flags,
diff --git a/include/linux/bio.h b/include/linux/bio.h
index a0b4cfdf62a4..b2491ead22a0 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -822,4 +822,6 @@ static inline void bio_set_polled(struct bio *bio, struct kiocb *kiocb)
 		bio->bi_opf |= REQ_NOWAIT;
 }
 
+struct bio *blk_next_bio(struct bio *bio, unsigned int nr_pages, gfp_t gfp);
+
 #endif /* __LINUX_BIO_H */
-- 
2.22.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH V15 2/5] nvmet: add req cns error complete helper
  2021-06-10  1:32 [PATCH V15 0/5] nvmet: add ZBD backened support Chaitanya Kulkarni
  2021-06-10  1:32 ` [PATCH V15 1/5] block: export blk_next_bio() Chaitanya Kulkarni
@ 2021-06-10  1:32 ` Chaitanya Kulkarni
  2021-06-10  1:32 ` [PATCH V15 3/5] nvmet: add nvmet_req_bio put helper for backends Chaitanya Kulkarni
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Chaitanya Kulkarni @ 2021-06-10  1:32 UTC (permalink / raw)
  To: linux-block, linux-nvme
  Cc: axboe, sagi, hch, Chaitanya Kulkarni, Damien Le Moal

We report error and complete the request when identify cns value is not
handled in nvmet_execute_identify(). This error reporting is also needed
for Zone Block Device backend for NVMeOF target.

Add a helper nvmet_req_cns_error_compplete() to report an error and
complete the request when idenitfy command cns not handled value.

Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
 drivers/nvme/target/admin-cmd.c | 5 +----
 drivers/nvme/target/nvmet.h     | 8 ++++++++
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index cd60a8184d04..3de6a6c99b01 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -637,10 +637,7 @@ static void nvmet_execute_identify(struct nvmet_req *req)
 		return nvmet_execute_identify_desclist(req);
 	}
 
-	pr_debug("unhandled identify cns %d on qid %d\n",
-	       req->cmd->identify.cns, req->sq->qid);
-	req->error_loc = offsetof(struct nvme_identify, cns);
-	nvmet_req_complete(req, NVME_SC_INVALID_FIELD | NVME_SC_DNR);
+	nvmet_req_cns_error_complete(req);
 }
 
 /*
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index bd0a0b91d843..f53ce96df498 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -624,4 +624,12 @@ static inline bool nvmet_use_inline_bvec(struct nvmet_req *req)
 	       req->sg_cnt <= NVMET_MAX_INLINE_BIOVEC;
 }
 
+static inline void nvmet_req_cns_error_complete(struct nvmet_req *req)
+{
+	pr_debug("unhandled identify cns %d on qid %d\n",
+	       req->cmd->identify.cns, req->sq->qid);
+	req->error_loc = offsetof(struct nvme_identify, cns);
+	nvmet_req_complete(req, NVME_SC_INVALID_FIELD | NVME_SC_DNR);
+}
+
 #endif /* _NVMET_H */
-- 
2.22.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH V15 3/5] nvmet: add nvmet_req_bio put helper for backends
  2021-06-10  1:32 [PATCH V15 0/5] nvmet: add ZBD backened support Chaitanya Kulkarni
  2021-06-10  1:32 ` [PATCH V15 1/5] block: export blk_next_bio() Chaitanya Kulkarni
  2021-06-10  1:32 ` [PATCH V15 2/5] nvmet: add req cns error complete helper Chaitanya Kulkarni
@ 2021-06-10  1:32 ` Chaitanya Kulkarni
  2021-06-10  1:32 ` [PATCH V15 4/5] nvmet: add Command Set Identifier support Chaitanya Kulkarni
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 9+ messages in thread
From: Chaitanya Kulkarni @ 2021-06-10  1:32 UTC (permalink / raw)
  To: linux-block, linux-nvme
  Cc: axboe, sagi, hch, Chaitanya Kulkarni, Damien Le Moal

In current code there exists two backends which are using inline bio
optimization, that adds a duplicate code for freeing the bio.

For Zoned Block Device backend we also use the same optimzation and it
will lead to having duplicate code in the three backends: generic
bdev, passsthru, and generic zns.

Add a helper function to avoid duplicate code and update the respective
backends.

Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
 drivers/nvme/target/io-cmd-bdev.c | 3 +--
 drivers/nvme/target/nvmet.h       | 6 ++++++
 drivers/nvme/target/passthru.c    | 3 +--
 3 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c
index f673679d258a..40534d9a86d1 100644
--- a/drivers/nvme/target/io-cmd-bdev.c
+++ b/drivers/nvme/target/io-cmd-bdev.c
@@ -164,8 +164,7 @@ static void nvmet_bio_done(struct bio *bio)
 	struct nvmet_req *req = bio->bi_private;
 
 	nvmet_req_complete(req, blk_to_nvme_status(req, bio->bi_status));
-	if (bio != &req->b.inline_bio)
-		bio_put(bio);
+	nvmet_req_bio_put(req, bio);
 }
 
 #ifdef CONFIG_BLK_DEV_INTEGRITY
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index f53ce96df498..d4bcfeb570e5 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -632,4 +632,10 @@ static inline void nvmet_req_cns_error_complete(struct nvmet_req *req)
 	nvmet_req_complete(req, NVME_SC_INVALID_FIELD | NVME_SC_DNR);
 }
 
+static inline void nvmet_req_bio_put(struct nvmet_req *req, struct bio *bio)
+{
+	if (bio != &req->b.inline_bio)
+		bio_put(bio);
+}
+
 #endif /* _NVMET_H */
diff --git a/drivers/nvme/target/passthru.c b/drivers/nvme/target/passthru.c
index 39b1473f7204..fced52de33ce 100644
--- a/drivers/nvme/target/passthru.c
+++ b/drivers/nvme/target/passthru.c
@@ -206,8 +206,7 @@ static int nvmet_passthru_map_sg(struct nvmet_req *req, struct request *rq)
 	for_each_sg(req->sg, sg, req->sg_cnt, i) {
 		if (bio_add_pc_page(rq->q, bio, sg_page(sg), sg->length,
 				    sg->offset) < sg->length) {
-			if (bio != &req->p.inline_bio)
-				bio_put(bio);
+			nvmet_req_bio_put(req, bio);
 			return -EINVAL;
 		}
 	}
-- 
2.22.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH V15 4/5] nvmet: add Command Set Identifier support
  2021-06-10  1:32 [PATCH V15 0/5] nvmet: add ZBD backened support Chaitanya Kulkarni
                   ` (2 preceding siblings ...)
  2021-06-10  1:32 ` [PATCH V15 3/5] nvmet: add nvmet_req_bio put helper for backends Chaitanya Kulkarni
@ 2021-06-10  1:32 ` Chaitanya Kulkarni
  2021-06-11  2:14   ` Wu Bo
  2021-06-10  1:32 ` [PATCH V15 5/5] nvmet: add ZBD over ZNS backend support Chaitanya Kulkarni
  2021-06-15 16:25 ` [PATCH V15 0/5] nvmet: add ZBD backened support Christoph Hellwig
  5 siblings, 1 reply; 9+ messages in thread
From: Chaitanya Kulkarni @ 2021-06-10  1:32 UTC (permalink / raw)
  To: linux-block, linux-nvme; +Cc: axboe, sagi, hch, Chaitanya Kulkarni

NVMe TP 4056 allows controllers to support different command sets.
NVMeoF target currently only supports namespaces that contain
traditional logical blocks that may be randomly read and written. In
some applications there is a value in exposing namespaces that contain
logical blocks that have special access rules (e.g. sequentially write
required namespace such as Zoned Namespace (ZNS)).

In order to support the Zoned Block Devices (ZBD) backend, controllers
need to have support for ZNS Command Set Identifier (CSI).

In this preparation patch, we adjust the code such that it can now
support the default command set identifier. We update the namespace data
structure to store the CSI value which defaults to NVME_CSI_NVM
that represents traditional logical blocks namespace type.

The CSI support is required to implement the ZBD backend for NVMeOF
with host side NVMe ZNS interface, since ZNS commands belong to
the different command set than the default one.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
 drivers/nvme/target/admin-cmd.c | 75 +++++++++++++++++++++++++++------
 drivers/nvme/target/core.c      | 28 +++++++++---
 drivers/nvme/target/nvmet.h     |  1 +
 include/linux/nvme.h            |  1 +
 4 files changed, 87 insertions(+), 18 deletions(-)

diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index 3de6a6c99b01..93aaa7479e71 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -162,15 +162,8 @@ static void nvmet_execute_get_log_page_smart(struct nvmet_req *req)
 	nvmet_req_complete(req, status);
 }
 
-static void nvmet_execute_get_log_cmd_effects_ns(struct nvmet_req *req)
+static void nvmet_get_cmd_effects_nvm(struct nvme_effects_log *log)
 {
-	u16 status = NVME_SC_INTERNAL;
-	struct nvme_effects_log *log;
-
-	log = kzalloc(sizeof(*log), GFP_KERNEL);
-	if (!log)
-		goto out;
-
 	log->acs[nvme_admin_get_log_page]	= cpu_to_le32(1 << 0);
 	log->acs[nvme_admin_identify]		= cpu_to_le32(1 << 0);
 	log->acs[nvme_admin_abort_cmd]		= cpu_to_le32(1 << 0);
@@ -184,9 +177,30 @@ static void nvmet_execute_get_log_cmd_effects_ns(struct nvmet_req *req)
 	log->iocs[nvme_cmd_flush]		= cpu_to_le32(1 << 0);
 	log->iocs[nvme_cmd_dsm]			= cpu_to_le32(1 << 0);
 	log->iocs[nvme_cmd_write_zeroes]	= cpu_to_le32(1 << 0);
+}
 
-	status = nvmet_copy_to_sgl(req, 0, log, sizeof(*log));
+static void nvmet_execute_get_log_cmd_effects_ns(struct nvmet_req *req)
+{
+	struct nvme_effects_log *log;
+	u16 status = NVME_SC_SUCCESS;
+
+	log = kzalloc(sizeof(*log), GFP_KERNEL);
+	if (!log) {
+		status = NVME_SC_INTERNAL;
+		goto out;
+	}
+
+	switch (req->cmd->get_log_page.csi) {
+	case NVME_CSI_NVM:
+		nvmet_get_cmd_effects_nvm(log);
+		break;
+	default:
+		status = NVME_SC_INVALID_LOG_PAGE;
+		goto free;
+	}
 
+	status = nvmet_copy_to_sgl(req, 0, log, sizeof(*log));
+free:
 	kfree(log);
 out:
 	nvmet_req_complete(req, status);
@@ -613,6 +627,12 @@ static void nvmet_execute_identify_desclist(struct nvmet_req *req)
 			goto out;
 	}
 
+	status = nvmet_copy_ns_identifier(req, NVME_NIDT_CSI,
+					  NVME_NIDT_CSI_LEN,
+					  &req->ns->csi, &off);
+	if (status)
+		goto out;
+
 	if (sg_zero_buffer(req->sg, req->sg_cnt, NVME_IDENTIFY_DATA_SIZE - off,
 			off) != NVME_IDENTIFY_DATA_SIZE - off)
 		status = NVME_SC_INTERNAL | NVME_SC_DNR;
@@ -621,6 +641,17 @@ static void nvmet_execute_identify_desclist(struct nvmet_req *req)
 	nvmet_req_complete(req, status);
 }
 
+static bool nvmet_handle_identify_desclist(struct nvmet_req *req)
+{
+	switch (req->cmd->identify.csi) {
+	case NVME_CSI_NVM:
+		nvmet_execute_identify_desclist(req);
+		return true;
+	default:
+		return false;
+	}
+}
+
 static void nvmet_execute_identify(struct nvmet_req *req)
 {
 	if (!nvmet_check_transfer_len(req, NVME_IDENTIFY_DATA_SIZE))
@@ -628,13 +659,31 @@ static void nvmet_execute_identify(struct nvmet_req *req)
 
 	switch (req->cmd->identify.cns) {
 	case NVME_ID_CNS_NS:
-		return nvmet_execute_identify_ns(req);
+		switch (req->cmd->identify.csi) {
+		case NVME_CSI_NVM:
+			return nvmet_execute_identify_ns(req);
+		default:
+			break;
+		}
+		break;
 	case NVME_ID_CNS_CTRL:
-		return nvmet_execute_identify_ctrl(req);
+		switch (req->cmd->identify.csi) {
+		case NVME_CSI_NVM:
+			return nvmet_execute_identify_ctrl(req);
+		}
+		break;
 	case NVME_ID_CNS_NS_ACTIVE_LIST:
-		return nvmet_execute_identify_nslist(req);
+		switch (req->cmd->identify.csi) {
+		case NVME_CSI_NVM:
+			return nvmet_execute_identify_nslist(req);
+		default:
+			break;
+		}
+		break;
 	case NVME_ID_CNS_NS_DESC_LIST:
-		return nvmet_execute_identify_desclist(req);
+		if (nvmet_handle_identify_desclist(req) == true)
+			return;
+		break;
 	}
 
 	nvmet_req_cns_error_complete(req);
diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
index 146909486b8f..ac53a1307ce4 100644
--- a/drivers/nvme/target/core.c
+++ b/drivers/nvme/target/core.c
@@ -692,6 +692,7 @@ struct nvmet_ns *nvmet_ns_alloc(struct nvmet_subsys *subsys, u32 nsid)
 
 	uuid_gen(&ns->uuid);
 	ns->buffered_io = false;
+	ns->csi = NVME_CSI_NVM;
 
 	return ns;
 }
@@ -887,10 +888,14 @@ static u16 nvmet_parse_io_cmd(struct nvmet_req *req)
 		return ret;
 	}
 
-	if (req->ns->file)
-		return nvmet_file_parse_io_cmd(req);
-
-	return nvmet_bdev_parse_io_cmd(req);
+	switch (req->ns->csi) {
+	case NVME_CSI_NVM:
+		if (req->ns->file)
+			return nvmet_file_parse_io_cmd(req);
+		return nvmet_bdev_parse_io_cmd(req);
+	default:
+		return NVME_SC_INVALID_IO_CMD_SET;
+	}
 }
 
 bool nvmet_req_init(struct nvmet_req *req, struct nvmet_cq *cq,
@@ -1112,6 +1117,17 @@ static inline u8 nvmet_cc_iocqes(u32 cc)
 	return (cc >> NVME_CC_IOCQES_SHIFT) & 0xf;
 }
 
+static inline bool nvmet_css_supported(u8 cc_css)
+{
+	switch (cc_css <<= NVME_CC_CSS_SHIFT) {
+	case NVME_CC_CSS_NVM:
+	case NVME_CC_CSS_CSI:
+		return true;
+	default:
+		return false;
+	}
+}
+
 static void nvmet_start_ctrl(struct nvmet_ctrl *ctrl)
 {
 	lockdep_assert_held(&ctrl->lock);
@@ -1131,7 +1147,7 @@ static void nvmet_start_ctrl(struct nvmet_ctrl *ctrl)
 
 	if (nvmet_cc_mps(ctrl->cc) != 0 ||
 	    nvmet_cc_ams(ctrl->cc) != 0 ||
-	    nvmet_cc_css(ctrl->cc) != 0) {
+	    !nvmet_css_supported(nvmet_cc_css(ctrl->cc))) {
 		ctrl->csts = NVME_CSTS_CFS;
 		return;
 	}
@@ -1182,6 +1198,8 @@ static void nvmet_init_cap(struct nvmet_ctrl *ctrl)
 {
 	/* command sets supported: NVMe command set: */
 	ctrl->cap = (1ULL << 37);
+	/* Controller supports one or more I/O Command Sets */
+	ctrl->cap |= (1ULL << 43);
 	/* CC.EN timeout in 500msec units: */
 	ctrl->cap |= (15ULL << 24);
 	/* maximum queue entries supported: */
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index d4bcfeb570e5..4ca558b94955 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -83,6 +83,7 @@ struct nvmet_ns {
 	struct pci_dev		*p2p_dev;
 	int			pi_type;
 	int			metadata_size;
+	u8			csi;
 };
 
 static inline struct nvmet_ns *to_nvmet_ns(struct config_item *item)
diff --git a/include/linux/nvme.h b/include/linux/nvme.h
index edcbd60b88b9..c7ba83144d52 100644
--- a/include/linux/nvme.h
+++ b/include/linux/nvme.h
@@ -1504,6 +1504,7 @@ enum {
 	NVME_SC_NS_WRITE_PROTECTED	= 0x20,
 	NVME_SC_CMD_INTERRUPTED		= 0x21,
 	NVME_SC_TRANSIENT_TR_ERR	= 0x22,
+	NVME_SC_INVALID_IO_CMD_SET	= 0x2C,
 
 	NVME_SC_LBA_RANGE		= 0x80,
 	NVME_SC_CAP_EXCEEDED		= 0x81,
-- 
2.22.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH V15 5/5] nvmet: add ZBD over ZNS backend support
  2021-06-10  1:32 [PATCH V15 0/5] nvmet: add ZBD backened support Chaitanya Kulkarni
                   ` (3 preceding siblings ...)
  2021-06-10  1:32 ` [PATCH V15 4/5] nvmet: add Command Set Identifier support Chaitanya Kulkarni
@ 2021-06-10  1:32 ` Chaitanya Kulkarni
  2021-06-15 16:25 ` [PATCH V15 0/5] nvmet: add ZBD backened support Christoph Hellwig
  5 siblings, 0 replies; 9+ messages in thread
From: Chaitanya Kulkarni @ 2021-06-10  1:32 UTC (permalink / raw)
  To: linux-block, linux-nvme
  Cc: axboe, sagi, hch, Chaitanya Kulkarni, Damien Le Moal

NVMe TP 4053 – Zoned Namespaces (ZNS) allows host software to
communicate with a non-volatile memory subsystem using zones for NVMe
protocol-based controllers. NVMeOF already support the ZNS NVMe
Protocol compliant devices on the target in the passthru mode. There
are generic zoned block devices like  Shingled Magnetic Recording (SMR)
HDDs that are not based on the NVMe protocol.

This patch adds ZNS backend support for non-ZNS zoned block devices as
NVMeOF targets.

This support includes implementing the new command set NVME_CSI_ZNS,
adding different command handlers for ZNS command set such as NVMe
Identify Controller, NVMe Identify Namespace, NVMe Zone Append,
NVMe Zone Management Send and NVMe Zone Management Receive.

With the new command set identifier, we also update the target command
effects logs to reflect the ZNS compliant commands.

Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
 drivers/nvme/target/Makefile      |   1 +
 drivers/nvme/target/admin-cmd.c   |  41 ++
 drivers/nvme/target/core.c        |  15 +-
 drivers/nvme/target/io-cmd-bdev.c |  26 +-
 drivers/nvme/target/nvmet.h       |  20 +
 drivers/nvme/target/zns.c         | 622 ++++++++++++++++++++++++++++++
 include/linux/nvme.h              |   7 +
 7 files changed, 721 insertions(+), 11 deletions(-)
 create mode 100644 drivers/nvme/target/zns.c

diff --git a/drivers/nvme/target/Makefile b/drivers/nvme/target/Makefile
index ebf91fc4c72e..9837e580fa7e 100644
--- a/drivers/nvme/target/Makefile
+++ b/drivers/nvme/target/Makefile
@@ -12,6 +12,7 @@ obj-$(CONFIG_NVME_TARGET_TCP)		+= nvmet-tcp.o
 nvmet-y		+= core.o configfs.o admin-cmd.o fabrics-cmd.o \
 			discovery.o io-cmd-file.o io-cmd-bdev.o
 nvmet-$(CONFIG_NVME_TARGET_PASSTHRU)	+= passthru.o
+nvmet-$(CONFIG_BLK_DEV_ZONED)		+= zns.o
 nvme-loop-y	+= loop.o
 nvmet-rdma-y	+= rdma.o
 nvmet-fc-y	+= fc.o
diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index 93aaa7479e71..363e357d2f20 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -179,6 +179,13 @@ static void nvmet_get_cmd_effects_nvm(struct nvme_effects_log *log)
 	log->iocs[nvme_cmd_write_zeroes]	= cpu_to_le32(1 << 0);
 }
 
+static void nvmet_get_cmd_effects_zns(struct nvme_effects_log *log)
+{
+	log->iocs[nvme_cmd_zone_append]		= cpu_to_le32(1 << 0);
+	log->iocs[nvme_cmd_zone_mgmt_send]	= cpu_to_le32(1 << 0);
+	log->iocs[nvme_cmd_zone_mgmt_recv]	= cpu_to_le32(1 << 0);
+}
+
 static void nvmet_execute_get_log_cmd_effects_ns(struct nvmet_req *req)
 {
 	struct nvme_effects_log *log;
@@ -194,6 +201,14 @@ static void nvmet_execute_get_log_cmd_effects_ns(struct nvmet_req *req)
 	case NVME_CSI_NVM:
 		nvmet_get_cmd_effects_nvm(log);
 		break;
+	case NVME_CSI_ZNS:
+		if (!IS_ENABLED(CONFIG_BLK_DEV_ZONED)) {
+			status = NVME_SC_INVALID_IO_CMD_SET;
+			goto free;
+		}
+		nvmet_get_cmd_effects_nvm(log);
+		nvmet_get_cmd_effects_zns(log);
+		break;
 	default:
 		status = NVME_SC_INVALID_LOG_PAGE;
 		goto free;
@@ -647,6 +662,12 @@ static bool nvmet_handle_identify_desclist(struct nvmet_req *req)
 	case NVME_CSI_NVM:
 		nvmet_execute_identify_desclist(req);
 		return true;
+	case NVME_CSI_ZNS:
+		if (IS_ENABLED(CONFIG_BLK_DEV_ZONED)) {
+			nvmet_execute_identify_desclist(req);
+			return true;
+		}
+		return false;
 	default:
 		return false;
 	}
@@ -666,12 +687,32 @@ static void nvmet_execute_identify(struct nvmet_req *req)
 			break;
 		}
 		break;
+	case NVME_ID_CNS_CS_NS:
+		if (IS_ENABLED(CONFIG_BLK_DEV_ZONED)) {
+			switch (req->cmd->identify.csi) {
+			case NVME_CSI_ZNS:
+				return nvmet_execute_identify_cns_cs_ns(req);
+			default:
+				break;
+			}
+		}
+		break;
 	case NVME_ID_CNS_CTRL:
 		switch (req->cmd->identify.csi) {
 		case NVME_CSI_NVM:
 			return nvmet_execute_identify_ctrl(req);
 		}
 		break;
+	case NVME_ID_CNS_CS_CTRL:
+		if (IS_ENABLED(CONFIG_BLK_DEV_ZONED)) {
+			switch (req->cmd->identify.csi) {
+			case NVME_CSI_ZNS:
+				return nvmet_execute_identify_cns_cs_ctrl(req);
+			default:
+				break;
+			}
+		}
+		break;
 	case NVME_ID_CNS_NS_ACTIVE_LIST:
 		switch (req->cmd->identify.csi) {
 		case NVME_CSI_NVM:
diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
index ac53a1307ce4..d061e9946568 100644
--- a/drivers/nvme/target/core.c
+++ b/drivers/nvme/target/core.c
@@ -16,6 +16,7 @@
 #include "nvmet.h"
 
 struct workqueue_struct *buffered_io_wq;
+struct workqueue_struct *zbd_wq;
 static const struct nvmet_fabrics_ops *nvmet_transports[NVMF_TRTYPE_MAX];
 static DEFINE_IDA(cntlid_ida);
 
@@ -893,6 +894,10 @@ static u16 nvmet_parse_io_cmd(struct nvmet_req *req)
 		if (req->ns->file)
 			return nvmet_file_parse_io_cmd(req);
 		return nvmet_bdev_parse_io_cmd(req);
+	case NVME_CSI_ZNS:
+		if (IS_ENABLED(CONFIG_BLK_DEV_ZONED))
+			return nvmet_bdev_zns_parse_io_cmd(req);
+		return NVME_SC_INVALID_IO_CMD_SET;
 	default:
 		return NVME_SC_INVALID_IO_CMD_SET;
 	}
@@ -1602,11 +1607,15 @@ static int __init nvmet_init(void)
 
 	nvmet_ana_group_enabled[NVMET_DEFAULT_ANA_GRPID] = 1;
 
+	zbd_wq = alloc_workqueue("nvmet-zbd-wq", WQ_MEM_RECLAIM, 0);
+	if (!zbd_wq)
+		return -ENOMEM;
+
 	buffered_io_wq = alloc_workqueue("nvmet-buffered-io-wq",
 			WQ_MEM_RECLAIM, 0);
 	if (!buffered_io_wq) {
 		error = -ENOMEM;
-		goto out;
+		goto out_free_zbd_work_queue;
 	}
 
 	error = nvmet_init_discovery();
@@ -1622,7 +1631,8 @@ static int __init nvmet_init(void)
 	nvmet_exit_discovery();
 out_free_work_queue:
 	destroy_workqueue(buffered_io_wq);
-out:
+out_free_zbd_work_queue:
+	destroy_workqueue(zbd_wq);
 	return error;
 }
 
@@ -1632,6 +1642,7 @@ static void __exit nvmet_exit(void)
 	nvmet_exit_discovery();
 	ida_destroy(&cntlid_ida);
 	destroy_workqueue(buffered_io_wq);
+	destroy_workqueue(zbd_wq);
 
 	BUILD_BUG_ON(sizeof(struct nvmf_disc_rsp_page_entry) != 1024);
 	BUILD_BUG_ON(sizeof(struct nvmf_disc_rsp_page_hdr) != 1024);
diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c
index 40534d9a86d1..57f619aa7cba 100644
--- a/drivers/nvme/target/io-cmd-bdev.c
+++ b/drivers/nvme/target/io-cmd-bdev.c
@@ -47,6 +47,14 @@ void nvmet_bdev_set_limits(struct block_device *bdev, struct nvme_id_ns *id)
 	id->nows = to0based(ql->io_opt / ql->logical_block_size);
 }
 
+void nvmet_bdev_ns_disable(struct nvmet_ns *ns)
+{
+	if (ns->bdev) {
+		blkdev_put(ns->bdev, FMODE_WRITE | FMODE_READ);
+		ns->bdev = NULL;
+	}
+}
+
 static void nvmet_bdev_ns_enable_integrity(struct nvmet_ns *ns)
 {
 	struct blk_integrity *bi = bdev_get_integrity(ns->bdev);
@@ -86,15 +94,15 @@ int nvmet_bdev_ns_enable(struct nvmet_ns *ns)
 	if (IS_ENABLED(CONFIG_BLK_DEV_INTEGRITY_T10))
 		nvmet_bdev_ns_enable_integrity(ns);
 
-	return 0;
-}
-
-void nvmet_bdev_ns_disable(struct nvmet_ns *ns)
-{
-	if (ns->bdev) {
-		blkdev_put(ns->bdev, FMODE_WRITE | FMODE_READ);
-		ns->bdev = NULL;
+	if (bdev_is_zoned(ns->bdev)) {
+		if (!nvmet_bdev_zns_enable(ns)) {
+			nvmet_bdev_ns_disable(ns);
+			return -EINVAL;
+		}
+		ns->csi = NVME_CSI_ZNS;
 	}
+
+	return 0;
 }
 
 void nvmet_bdev_ns_revalidate(struct nvmet_ns *ns)
@@ -102,7 +110,7 @@ void nvmet_bdev_ns_revalidate(struct nvmet_ns *ns)
 	ns->size = i_size_read(ns->bdev->bd_inode);
 }
 
-static u16 blk_to_nvme_status(struct nvmet_req *req, blk_status_t blk_sts)
+u16 blk_to_nvme_status(struct nvmet_req *req, blk_status_t blk_sts)
 {
 	u16 status = NVME_SC_SUCCESS;
 
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index 4ca558b94955..f7fbf9690ce8 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -250,6 +250,10 @@ struct nvmet_subsys {
 	unsigned int		admin_timeout;
 	unsigned int		io_timeout;
 #endif /* CONFIG_NVME_TARGET_PASSTHRU */
+
+#ifdef CONFIG_BLK_DEV_ZONED
+	u8			zasl;
+#endif /* CONFIG_BLK_DEV_ZONED */
 };
 
 static inline struct nvmet_subsys *to_subsys(struct config_item *item)
@@ -335,6 +339,12 @@ struct nvmet_req {
 			struct work_struct      work;
 			bool			use_workqueue;
 		} p;
+#ifdef CONFIG_BLK_DEV_ZONED
+		struct {
+			struct bio		inline_bio;
+			struct work_struct	zmgmt_work;
+		} z;
+#endif /* CONFIG_BLK_DEV_ZONED */
 	};
 	int			sg_cnt;
 	int			metadata_sg_cnt;
@@ -354,6 +364,7 @@ struct nvmet_req {
 };
 
 extern struct workqueue_struct *buffered_io_wq;
+extern struct workqueue_struct *zbd_wq;
 
 static inline void nvmet_set_result(struct nvmet_req *req, u32 result)
 {
@@ -403,6 +414,7 @@ u16 nvmet_parse_connect_cmd(struct nvmet_req *req);
 void nvmet_bdev_set_limits(struct block_device *bdev, struct nvme_id_ns *id);
 u16 nvmet_bdev_parse_io_cmd(struct nvmet_req *req);
 u16 nvmet_file_parse_io_cmd(struct nvmet_req *req);
+u16 nvmet_bdev_zns_parse_io_cmd(struct nvmet_req *req);
 u16 nvmet_parse_admin_cmd(struct nvmet_req *req);
 u16 nvmet_parse_discovery_cmd(struct nvmet_req *req);
 u16 nvmet_parse_fabrics_cmd(struct nvmet_req *req);
@@ -530,6 +542,14 @@ void nvmet_ns_changed(struct nvmet_subsys *subsys, u32 nsid);
 void nvmet_bdev_ns_revalidate(struct nvmet_ns *ns);
 int nvmet_file_ns_revalidate(struct nvmet_ns *ns);
 void nvmet_ns_revalidate(struct nvmet_ns *ns);
+u16 blk_to_nvme_status(struct nvmet_req *req, blk_status_t blk_sts);
+
+bool nvmet_bdev_zns_enable(struct nvmet_ns *ns);
+void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req);
+void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req);
+void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req);
+void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req);
+void nvmet_bdev_execute_zone_append(struct nvmet_req *req);
 
 static inline u32 nvmet_rw_data_len(struct nvmet_req *req)
 {
diff --git a/drivers/nvme/target/zns.c b/drivers/nvme/target/zns.c
new file mode 100644
index 000000000000..b8ff75293e53
--- /dev/null
+++ b/drivers/nvme/target/zns.c
@@ -0,0 +1,622 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * NVMe ZNS-ZBD command implementation.
+ * Copyright (C) 2021 Western Digital Corporation or its affiliates.
+ */
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+#include <linux/nvme.h>
+#include <linux/blkdev.h>
+#include "nvmet.h"
+
+/*
+ * We set the Memory Page Size Minimum (MPSMIN) for target controller to 0
+ * which gets added by 12 in the nvme_enable_ctrl() which results in 2^12 = 4k
+ * as page_shift value. When calculating the ZASL use shift by 12.
+ */
+#define NVMET_MPSMIN_SHIFT	12
+
+static inline u8 nvmet_zasl(unsigned int zone_append_sects)
+{
+	/*
+	 * Zone Append Size Limit (zasl) is expressed as a power of 2 value
+	 * with the minimum memory page size (i.e. 12) as unit.
+	 */
+	return ilog2(zone_append_sects >> (NVMET_MPSMIN_SHIFT - 9));
+}
+
+static int validate_conv_zones_cb(struct blk_zone *z,
+				  unsigned int i, void *data)
+{
+	if (z->type == BLK_ZONE_TYPE_CONVENTIONAL)
+		return -EOPNOTSUPP;
+	return 0;
+}
+
+bool nvmet_bdev_zns_enable(struct nvmet_ns *ns)
+{
+	struct request_queue *q = ns->bdev->bd_disk->queue;
+	u8 zasl = nvmet_zasl(queue_max_zone_append_sectors(q));
+	struct gendisk *bd_disk = ns->bdev->bd_disk;
+	int ret;
+
+	if (ns->subsys->zasl) {
+		if (ns->subsys->zasl > zasl)
+			return false;
+	}
+	ns->subsys->zasl = zasl;
+
+	/*
+	 * Generic zoned block devices may have a smaller last zone which is
+	 * not supported by ZNS. Exclude zoned drives that have such smaller
+	 * last zone.
+	 */
+	if (get_capacity(bd_disk) & (bdev_zone_sectors(ns->bdev) - 1))
+		return false;
+	/*
+	 * ZNS does not define a conventional zone type. If the underlying
+	 * device has a bitmap set indicating the existence of conventional
+	 * zones, reject the device. Otherwise, use report zones to detect if
+	 * the device has conventional zones.
+	 */
+	if (ns->bdev->bd_disk->queue->conv_zones_bitmap)
+		return false;
+
+	ret = blkdev_report_zones(ns->bdev, 0, blkdev_nr_zones(bd_disk),
+				  validate_conv_zones_cb, NULL);
+	if (ret < 0)
+		return false;
+
+	ns->blksize_shift = blksize_bits(bdev_logical_block_size(ns->bdev));
+
+	return true;
+}
+
+void nvmet_execute_identify_cns_cs_ctrl(struct nvmet_req *req)
+{
+	u8 zasl = req->sq->ctrl->subsys->zasl;
+	struct nvmet_ctrl *ctrl = req->sq->ctrl;
+	struct nvme_id_ctrl_zns *id;
+	u16 status;
+
+	id = kzalloc(sizeof(*id), GFP_KERNEL);
+	if (!id) {
+		status = NVME_SC_INTERNAL;
+		goto out;
+	}
+
+	if (ctrl->ops->get_mdts)
+		id->zasl = min_t(u8, ctrl->ops->get_mdts(ctrl), zasl);
+	else
+		id->zasl = zasl;
+
+	status = nvmet_copy_to_sgl(req, 0, id, sizeof(*id));
+
+	kfree(id);
+out:
+	nvmet_req_complete(req, status);
+}
+
+void nvmet_execute_identify_cns_cs_ns(struct nvmet_req *req)
+{
+	struct nvme_id_ns_zns *id_zns;
+	u64 zsze;
+	u16 status;
+
+	if (le32_to_cpu(req->cmd->identify.nsid) == NVME_NSID_ALL) {
+		req->error_loc = offsetof(struct nvme_identify, nsid);
+		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
+		goto out;
+	}
+
+	id_zns = kzalloc(sizeof(*id_zns), GFP_KERNEL);
+	if (!id_zns) {
+		status = NVME_SC_INTERNAL;
+		goto out;
+	}
+
+	status = nvmet_req_find_ns(req);
+	if (status) {
+		status = NVME_SC_INTERNAL;
+		goto done;
+	}
+
+	if (!bdev_is_zoned(req->ns->bdev)) {
+		req->error_loc = offsetof(struct nvme_identify, nsid);
+		status = NVME_SC_INVALID_NS | NVME_SC_DNR;
+		goto done;
+	}
+
+	nvmet_ns_revalidate(req->ns);
+	zsze = (bdev_zone_sectors(req->ns->bdev) << 9) >>
+					req->ns->blksize_shift;
+	id_zns->lbafe[0].zsze = cpu_to_le64(zsze);
+	id_zns->mor = cpu_to_le32(bdev_max_open_zones(req->ns->bdev));
+	id_zns->mar = cpu_to_le32(bdev_max_active_zones(req->ns->bdev));
+
+done:
+	status = nvmet_copy_to_sgl(req, 0, id_zns, sizeof(*id_zns));
+	kfree(id_zns);
+out:
+	nvmet_req_complete(req, status);
+}
+
+static u16 nvmet_bdev_validate_zone_mgmt_recv(struct nvmet_req *req)
+{
+	sector_t sect = nvmet_lba_to_sect(req->ns, req->cmd->zmr.slba);
+	u32 out_bufsize = (le32_to_cpu(req->cmd->zmr.numd) + 1) << 2;
+
+	if (sect >= get_capacity(req->ns->bdev->bd_disk)) {
+		req->error_loc = offsetof(struct nvme_zone_mgmt_recv_cmd, slba);
+		return NVME_SC_LBA_RANGE | NVME_SC_DNR;
+	}
+
+	if (out_bufsize < sizeof(struct nvme_zone_report)) {
+		req->error_loc = offsetof(struct nvme_zone_mgmt_recv_cmd, numd);
+		return NVME_SC_INVALID_FIELD | NVME_SC_DNR;
+	}
+
+	if (req->cmd->zmr.zra != NVME_ZRA_ZONE_REPORT) {
+		req->error_loc = offsetof(struct nvme_zone_mgmt_recv_cmd, zra);
+		return NVME_SC_INVALID_FIELD | NVME_SC_DNR;
+	}
+
+	switch (req->cmd->zmr.pr) {
+	case 0:
+	case 1:
+		break;
+	default:
+		req->error_loc = offsetof(struct nvme_zone_mgmt_recv_cmd, pr);
+		return NVME_SC_INVALID_FIELD | NVME_SC_DNR;
+	}
+
+	switch (req->cmd->zmr.zrasf) {
+	case NVME_ZRASF_ZONE_REPORT_ALL:
+	case NVME_ZRASF_ZONE_STATE_EMPTY:
+	case NVME_ZRASF_ZONE_STATE_IMP_OPEN:
+	case NVME_ZRASF_ZONE_STATE_EXP_OPEN:
+	case NVME_ZRASF_ZONE_STATE_CLOSED:
+	case NVME_ZRASF_ZONE_STATE_FULL:
+	case NVME_ZRASF_ZONE_STATE_READONLY:
+	case NVME_ZRASF_ZONE_STATE_OFFLINE:
+		break;
+	default:
+		req->error_loc =
+			offsetof(struct nvme_zone_mgmt_recv_cmd, zrasf);
+		return NVME_SC_INVALID_FIELD | NVME_SC_DNR;
+	}
+
+	return NVME_SC_SUCCESS;
+}
+
+struct nvmet_report_zone_data {
+	struct nvmet_req *req;
+	u64 out_buf_offset;
+	u64 out_nr_zones;
+	u64 nr_zones;
+	u8 zrasf;
+};
+
+static int nvmet_bdev_report_zone_cb(struct blk_zone *z, unsigned i, void *d)
+{
+	static const unsigned int nvme_zrasf_to_blk_zcond[] = {
+		[NVME_ZRASF_ZONE_STATE_EMPTY]	 = BLK_ZONE_COND_EMPTY,
+		[NVME_ZRASF_ZONE_STATE_IMP_OPEN] = BLK_ZONE_COND_IMP_OPEN,
+		[NVME_ZRASF_ZONE_STATE_EXP_OPEN] = BLK_ZONE_COND_EXP_OPEN,
+		[NVME_ZRASF_ZONE_STATE_CLOSED]	 = BLK_ZONE_COND_CLOSED,
+		[NVME_ZRASF_ZONE_STATE_READONLY] = BLK_ZONE_COND_READONLY,
+		[NVME_ZRASF_ZONE_STATE_FULL]	 = BLK_ZONE_COND_FULL,
+		[NVME_ZRASF_ZONE_STATE_OFFLINE]	 = BLK_ZONE_COND_OFFLINE,
+	};
+	struct nvmet_report_zone_data *rz = d;
+
+	if (rz->zrasf != NVME_ZRASF_ZONE_REPORT_ALL &&
+	    z->cond != nvme_zrasf_to_blk_zcond[rz->zrasf])
+		return 0;
+
+	if (rz->nr_zones < rz->out_nr_zones) {
+		struct nvme_zone_descriptor zdesc = { };
+		u16 status;
+
+		zdesc.zcap = nvmet_sect_to_lba(rz->req->ns, z->capacity);
+		zdesc.zslba = nvmet_sect_to_lba(rz->req->ns, z->start);
+		zdesc.wp = nvmet_sect_to_lba(rz->req->ns, z->wp);
+		zdesc.za = z->reset ? 1 << 2 : 0;
+		zdesc.zs = z->cond << 4;
+		zdesc.zt = z->type;
+
+		status = nvmet_copy_to_sgl(rz->req, rz->out_buf_offset, &zdesc,
+					   sizeof(zdesc));
+		if (status)
+			return -EINVAL;
+
+		rz->out_buf_offset += sizeof(zdesc);
+	}
+
+	rz->nr_zones++;
+
+	return 0;
+}
+
+static unsigned long nvmet_req_nr_zones_from_slba(struct nvmet_req *req)
+{
+	unsigned int sect = nvmet_lba_to_sect(req->ns, req->cmd->zmr.slba);
+
+	return blkdev_nr_zones(req->ns->bdev->bd_disk) -
+		(sect >> ilog2(bdev_zone_sectors(req->ns->bdev)));
+}
+
+static unsigned long get_nr_zones_from_buf(struct nvmet_req *req, u32 bufsize)
+{
+	if (bufsize <= sizeof(struct nvme_zone_report))
+		return 0;
+
+	return (bufsize - sizeof(struct nvme_zone_report)) /
+		sizeof(struct nvme_zone_descriptor);
+}
+
+void nvmet_bdev_zone_zmgmt_recv_work(struct work_struct *w)
+{
+	struct nvmet_req *req = container_of(w, struct nvmet_req, z.zmgmt_work);
+	sector_t start_sect = nvmet_lba_to_sect(req->ns, req->cmd->zmr.slba);
+	unsigned long req_slba_nr_zones = nvmet_req_nr_zones_from_slba(req);
+	u32 out_bufsize = (le32_to_cpu(req->cmd->zmr.numd) + 1) << 2;
+	__le64 nr_zones;
+	u16 status;
+	int ret;
+	struct nvmet_report_zone_data rz_data = {
+		.out_nr_zones = get_nr_zones_from_buf(req, out_bufsize),
+		/* leave the place for report zone header */
+		.out_buf_offset = sizeof(struct nvme_zone_report),
+		.zrasf = req->cmd->zmr.zrasf,
+		.nr_zones = 0,
+		.req = req,
+	};
+
+	status = nvmet_bdev_validate_zone_mgmt_recv(req);
+	if (status)
+		goto out;
+
+	if (!req_slba_nr_zones) {
+		status = NVME_SC_SUCCESS;
+		goto out;
+	}
+
+	ret = blkdev_report_zones(req->ns->bdev, start_sect, req_slba_nr_zones,
+				 nvmet_bdev_report_zone_cb, &rz_data);
+	if (ret < 0) {
+		status = NVME_SC_INTERNAL;
+		goto out;
+	}
+
+	/*
+	 * When partial bit is set nr_zones must indicate the number of zone
+	 * descriptors actually transferred.
+	 */
+	if (req->cmd->zmr.pr)
+		rz_data.nr_zones = min(rz_data.nr_zones, rz_data.out_nr_zones);
+
+	nr_zones = cpu_to_le64(rz_data.nr_zones);
+	status = nvmet_copy_to_sgl(req, 0, &nr_zones, sizeof(nr_zones));
+
+out:
+	nvmet_req_complete(req, status);
+}
+
+void nvmet_bdev_execute_zone_mgmt_recv(struct nvmet_req *req)
+{
+	INIT_WORK(&req->z.zmgmt_work, nvmet_bdev_zone_zmgmt_recv_work);
+	queue_work(zbd_wq, &req->z.zmgmt_work);
+}
+
+static inline enum req_opf zsa_req_op(u8 zsa)
+{
+	switch (zsa) {
+	case NVME_ZONE_OPEN:
+		return REQ_OP_ZONE_OPEN;
+	case NVME_ZONE_CLOSE:
+		return REQ_OP_ZONE_CLOSE;
+	case NVME_ZONE_FINISH:
+		return REQ_OP_ZONE_FINISH;
+	case NVME_ZONE_RESET:
+		return REQ_OP_ZONE_RESET;
+	default:
+		return REQ_OP_LAST;
+	}
+}
+
+static u16 blkdev_zone_mgmt_errno_to_nvme_status(int ret)
+{
+	switch (ret) {
+	case 0:
+		return NVME_SC_SUCCESS;
+	case -EINVAL:
+	case -EIO:
+		return NVME_SC_ZONE_INVALID_TRANSITION | NVME_SC_DNR;
+	default:
+		return NVME_SC_INTERNAL;
+	}
+}
+
+struct nvmet_zone_mgmt_send_all_data {
+	unsigned long *zbitmap;
+	struct nvmet_req *req;
+};
+
+static int zmgmt_send_scan_cb(struct blk_zone *z, unsigned i, void *d)
+{
+	struct nvmet_zone_mgmt_send_all_data *data = d;
+
+	switch (zsa_req_op(data->req->cmd->zms.zsa)) {
+	case REQ_OP_ZONE_OPEN:
+		switch (z->cond) {
+		case BLK_ZONE_COND_CLOSED:
+			break;
+		default:
+			return 0;
+		}
+		break;
+	case REQ_OP_ZONE_CLOSE:
+		switch (z->cond) {
+		case BLK_ZONE_COND_IMP_OPEN:
+		case BLK_ZONE_COND_EXP_OPEN:
+			break;
+		default:
+			return 0;
+		}
+		break;
+	case REQ_OP_ZONE_FINISH:
+		switch (z->cond) {
+		case BLK_ZONE_COND_IMP_OPEN:
+		case BLK_ZONE_COND_EXP_OPEN:
+		case BLK_ZONE_COND_CLOSED:
+			break;
+		default:
+			return 0;
+		}
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	set_bit(i, data->zbitmap);
+
+	return 0;
+}
+
+static u16 nvmet_bdev_zone_mgmt_emulate_all(struct nvmet_req *req)
+{
+	struct block_device *bdev = req->ns->bdev;
+	unsigned int nr_zones = blkdev_nr_zones(bdev->bd_disk);
+	struct request_queue *q = bdev_get_queue(bdev);
+	struct bio *bio = NULL;
+	sector_t sector = 0;
+	int ret;
+	struct nvmet_zone_mgmt_send_all_data d = {
+		.req = req,
+	};
+
+	d.zbitmap = kcalloc_node(BITS_TO_LONGS(nr_zones), sizeof(*(d.zbitmap)),
+				 GFP_NOIO, q->node);
+	if (!d.zbitmap) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	/* Scan and build bitmap of the eligible zones */
+	ret = blkdev_report_zones(bdev, 0, nr_zones, zmgmt_send_scan_cb, &d);
+	if (ret != nr_zones) {
+		if (ret > 0)
+			ret = -EIO;
+		goto out;
+	} else {
+		/* We scanned all the zones */
+		ret = 0;
+	}
+
+	while (sector < get_capacity(bdev->bd_disk)) {
+		if (test_bit(blk_queue_zone_no(q, sector), d.zbitmap)) {
+			bio = blk_next_bio(bio, 0, GFP_KERNEL);
+			bio->bi_opf = zsa_req_op(req->cmd->zms.zsa) | REQ_SYNC;
+			bio->bi_iter.bi_sector = sector;
+			bio_set_dev(bio, bdev);
+			/* This may take a while, so be nice to others */
+			cond_resched();
+		}
+		sector += blk_queue_zone_sectors(q);
+	}
+
+	if (bio) {
+		ret = submit_bio_wait(bio);
+		bio_put(bio);
+	}
+
+out:
+	kfree(d.zbitmap);
+
+	return blkdev_zone_mgmt_errno_to_nvme_status(ret);
+}
+
+static u16 nvmet_bdev_execute_zmgmt_send_all(struct nvmet_req *req)
+{
+	int ret;
+
+	switch (zsa_req_op(req->cmd->zms.zsa)) {
+	case REQ_OP_ZONE_RESET:
+		ret = blkdev_zone_mgmt(req->ns->bdev, REQ_OP_ZONE_RESET, 0,
+				       get_capacity(req->ns->bdev->bd_disk),
+				       GFP_KERNEL);
+		if (ret < 0)
+			return blkdev_zone_mgmt_errno_to_nvme_status(ret);
+		break;
+	case REQ_OP_ZONE_OPEN:
+	case REQ_OP_ZONE_CLOSE:
+	case REQ_OP_ZONE_FINISH:
+		return nvmet_bdev_zone_mgmt_emulate_all(req);
+	default:
+		/* this is needed to quiet compiler warning */
+		req->error_loc = offsetof(struct nvme_zone_mgmt_send_cmd, zsa);
+		return NVME_SC_INVALID_FIELD | NVME_SC_DNR;
+	}
+
+	return NVME_SC_SUCCESS;
+}
+
+static void nvmet_bdev_zmgmt_send_work(struct work_struct *w)
+{
+	struct nvmet_req *req = container_of(w, struct nvmet_req, z.zmgmt_work);
+	sector_t sect = nvmet_lba_to_sect(req->ns, req->cmd->zms.slba);
+	enum req_opf op = zsa_req_op(req->cmd->zms.zsa);
+	struct block_device *bdev = req->ns->bdev;
+	sector_t zone_sectors = bdev_zone_sectors(bdev);
+	u16 status = NVME_SC_SUCCESS;
+	int ret;
+
+	if (op == REQ_OP_LAST) {
+		req->error_loc = offsetof(struct nvme_zone_mgmt_send_cmd, zsa);
+		status = NVME_SC_ZONE_INVALID_TRANSITION | NVME_SC_DNR;
+		goto out;
+	}
+
+	/* when select all bit is set slba field is ignored */
+	if (req->cmd->zms.select_all) {
+		status = nvmet_bdev_execute_zmgmt_send_all(req);
+		goto out;
+	}
+
+	if (sect >= get_capacity(bdev->bd_disk)) {
+		req->error_loc = offsetof(struct nvme_zone_mgmt_send_cmd, slba);
+		status = NVME_SC_LBA_RANGE | NVME_SC_DNR;
+		goto out;
+	}
+
+	if (sect & (zone_sectors - 1)) {
+		req->error_loc = offsetof(struct nvme_zone_mgmt_send_cmd, slba);
+		status = NVME_SC_INVALID_FIELD | NVME_SC_DNR;
+		goto out;
+	}
+
+	ret = blkdev_zone_mgmt(bdev, op, sect, zone_sectors, GFP_KERNEL);
+	if (ret < 0)
+		status = blkdev_zone_mgmt_errno_to_nvme_status(ret);
+
+out:
+	nvmet_req_complete(req, status);
+}
+
+void nvmet_bdev_execute_zone_mgmt_send(struct nvmet_req *req)
+{
+	INIT_WORK(&req->z.zmgmt_work, nvmet_bdev_zmgmt_send_work);
+	queue_work(zbd_wq, &req->z.zmgmt_work);
+}
+
+static void nvmet_bdev_zone_append_bio_done(struct bio *bio)
+{
+	struct nvmet_req *req = bio->bi_private;
+
+	if (bio->bi_status == BLK_STS_OK) {
+		req->cqe->result.u64 =
+			nvmet_sect_to_lba(req->ns, bio->bi_iter.bi_sector);
+	}
+
+	nvmet_req_complete(req, blk_to_nvme_status(req, bio->bi_status));
+	nvmet_req_bio_put(req, bio);
+}
+
+void nvmet_bdev_execute_zone_append(struct nvmet_req *req)
+{
+	sector_t sect = nvmet_lba_to_sect(req->ns, req->cmd->rw.slba);
+	u16 nr_sect = nvmet_lba_to_sect(req->ns, req->cmd->rw.length);
+	u16 status = NVME_SC_SUCCESS;
+	unsigned int total_len = 0;
+	struct scatterlist *sg;
+	struct bio *bio;
+	int sg_cnt;
+
+	/* Request is completed on len mismatch in nvmet_check_transter_len() */
+	if (!nvmet_check_transfer_len(req, nvmet_rw_data_len(req)))
+		return;
+
+	if (!req->sg_cnt) {
+		nvmet_req_complete(req, 0);
+		return;
+	}
+
+	if (sect >= get_capacity(req->ns->bdev->bd_disk)) {
+		req->error_loc = offsetof(struct nvme_rw_command, slba);
+		status = NVME_SC_LBA_RANGE | NVME_SC_DNR;
+		goto out;
+	}
+
+	if ((sect + nr_sect > get_capacity(req->ns->bdev->bd_disk))) {
+		req->error_loc = offsetof(struct nvme_rw_command, length);
+		status = NVME_SC_LBA_RANGE | NVME_SC_DNR;
+		goto out;
+	}
+
+	if (sect & (bdev_zone_sectors(req->ns->bdev) - 1)) {
+		req->error_loc = offsetof(struct nvme_rw_command, slba);
+		status = NVME_SC_INVALID_FIELD | NVME_SC_DNR;
+		goto out;
+	}
+
+	if (nvmet_use_inline_bvec(req)) {
+		bio = &req->z.inline_bio;
+		bio_init(bio, req->inline_bvec, ARRAY_SIZE(req->inline_bvec));
+	} else {
+		bio = bio_alloc(GFP_KERNEL, req->sg_cnt);
+	}
+
+	bio->bi_opf = REQ_OP_ZONE_APPEND | REQ_SYNC | REQ_IDLE;
+	bio->bi_end_io = nvmet_bdev_zone_append_bio_done;
+	bio_set_dev(bio, req->ns->bdev);
+	bio->bi_iter.bi_sector = sect;
+	bio->bi_private = req;
+	if (req->cmd->rw.control & cpu_to_le16(NVME_RW_FUA))
+		bio->bi_opf |= REQ_FUA;
+
+	for_each_sg(req->sg, sg, req->sg_cnt, sg_cnt) {
+		struct page *p = sg_page(sg);
+		unsigned int l = sg->length;
+		unsigned int o = sg->offset;
+		unsigned int ret;
+
+		ret = bio_add_zone_append_page(bio, p, l, o);
+		if (ret != sg->length) {
+			status = NVME_SC_INTERNAL;
+			goto out_put_bio;
+		}
+		total_len += sg->length;
+	}
+
+	if (total_len != nvmet_rw_data_len(req)) {
+		status = NVME_SC_INTERNAL | NVME_SC_DNR;
+		goto out_put_bio;
+	}
+
+	submit_bio(bio);
+	return;
+
+out_put_bio:
+	nvmet_req_bio_put(req, bio);
+out:
+	nvmet_req_complete(req, status);
+}
+
+u16 nvmet_bdev_zns_parse_io_cmd(struct nvmet_req *req)
+{
+	struct nvme_command *cmd = req->cmd;
+
+	switch (cmd->common.opcode) {
+	case nvme_cmd_zone_append:
+		req->execute = nvmet_bdev_execute_zone_append;
+		return 0;
+	case nvme_cmd_zone_mgmt_recv:
+		req->execute = nvmet_bdev_execute_zone_mgmt_recv;
+		return 0;
+	case nvme_cmd_zone_mgmt_send:
+		req->execute = nvmet_bdev_execute_zone_mgmt_send;
+		return 0;
+	default:
+		return nvmet_bdev_parse_io_cmd(req);
+	}
+}
diff --git a/include/linux/nvme.h b/include/linux/nvme.h
index c7ba83144d52..cb1197f1cfed 100644
--- a/include/linux/nvme.h
+++ b/include/linux/nvme.h
@@ -944,6 +944,13 @@ struct nvme_zone_mgmt_recv_cmd {
 enum {
 	NVME_ZRA_ZONE_REPORT		= 0,
 	NVME_ZRASF_ZONE_REPORT_ALL	= 0,
+	NVME_ZRASF_ZONE_STATE_EMPTY	= 0x01,
+	NVME_ZRASF_ZONE_STATE_IMP_OPEN	= 0x02,
+	NVME_ZRASF_ZONE_STATE_EXP_OPEN	= 0x03,
+	NVME_ZRASF_ZONE_STATE_CLOSED	= 0x04,
+	NVME_ZRASF_ZONE_STATE_READONLY	= 0x05,
+	NVME_ZRASF_ZONE_STATE_FULL	= 0x06,
+	NVME_ZRASF_ZONE_STATE_OFFLINE	= 0x07,
 	NVME_REPORT_ZONE_PARTIAL	= 1,
 };
 
-- 
2.22.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH V15 4/5] nvmet: add Command Set Identifier support
  2021-06-10  1:32 ` [PATCH V15 4/5] nvmet: add Command Set Identifier support Chaitanya Kulkarni
@ 2021-06-11  2:14   ` Wu Bo
  2021-06-11  2:35     ` Chaitanya Kulkarni
  0 siblings, 1 reply; 9+ messages in thread
From: Wu Bo @ 2021-06-11  2:14 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-block, linux-nvme; +Cc: axboe, sagi, hch

On 2021/6/10 9:32, Chaitanya Kulkarni wrote:
> NVMe TP 4056 allows controllers to support different command sets.
> NVMeoF target currently only supports namespaces that contain
> traditional logical blocks that may be randomly read and written. In
> some applications there is a value in exposing namespaces that contain
> logical blocks that have special access rules (e.g. sequentially write
> required namespace such as Zoned Namespace (ZNS)).
> 
> In order to support the Zoned Block Devices (ZBD) backend, controllers
> need to have support for ZNS Command Set Identifier (CSI).
> 
> In this preparation patch, we adjust the code such that it can now
> support the default command set identifier. We update the namespace data
> structure to store the CSI value which defaults to NVME_CSI_NVM
> that represents traditional logical blocks namespace type.
> 
> The CSI support is required to implement the ZBD backend for NVMeOF
> with host side NVMe ZNS interface, since ZNS commands belong to
> the different command set than the default one.
> 
> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
> ---
>   drivers/nvme/target/admin-cmd.c | 75 +++++++++++++++++++++++++++------
>   drivers/nvme/target/core.c      | 28 +++++++++---
>   drivers/nvme/target/nvmet.h     |  1 +
>   include/linux/nvme.h            |  1 +
>   4 files changed, 87 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
> index 3de6a6c99b01..93aaa7479e71 100644
> --- a/drivers/nvme/target/admin-cmd.c
> +++ b/drivers/nvme/target/admin-cmd.c
> @@ -162,15 +162,8 @@ static void nvmet_execute_get_log_page_smart(struct nvmet_req *req)
>   	nvmet_req_complete(req, status);
>   }
>   
> -static void nvmet_execute_get_log_cmd_effects_ns(struct nvmet_req *req)
> +static void nvmet_get_cmd_effects_nvm(struct nvme_effects_log *log)
>   {
> -	u16 status = NVME_SC_INTERNAL;
> -	struct nvme_effects_log *log;
> -
> -	log = kzalloc(sizeof(*log), GFP_KERNEL);
> -	if (!log)
> -		goto out;
> -
>   	log->acs[nvme_admin_get_log_page]	= cpu_to_le32(1 << 0);
>   	log->acs[nvme_admin_identify]		= cpu_to_le32(1 << 0);
>   	log->acs[nvme_admin_abort_cmd]		= cpu_to_le32(1 << 0);
> @@ -184,9 +177,30 @@ static void nvmet_execute_get_log_cmd_effects_ns(struct nvmet_req *req)
>   	log->iocs[nvme_cmd_flush]		= cpu_to_le32(1 << 0);
>   	log->iocs[nvme_cmd_dsm]			= cpu_to_le32(1 << 0);
>   	log->iocs[nvme_cmd_write_zeroes]	= cpu_to_le32(1 << 0);
> +}
>   
> -	status = nvmet_copy_to_sgl(req, 0, log, sizeof(*log));
> +static void nvmet_execute_get_log_cmd_effects_ns(struct nvmet_req *req)
> +{
> +	struct nvme_effects_log *log;
> +	u16 status = NVME_SC_SUCCESS;
> +
> +	log = kzalloc(sizeof(*log), GFP_KERNEL);
> +	if (!log) {
> +		status = NVME_SC_INTERNAL;
> +		goto out;
> +	}
> +
> +	switch (req->cmd->get_log_page.csi) {
> +	case NVME_CSI_NVM:
> +		nvmet_get_cmd_effects_nvm(log);
> +		break;
> +	default:
> +		status = NVME_SC_INVALID_LOG_PAGE;
> +		goto free;
> +	}
>   
> +	status = nvmet_copy_to_sgl(req, 0, log, sizeof(*log));
> +free:
>   	kfree(log);
>   out:
>   	nvmet_req_complete(req, status);
> @@ -613,6 +627,12 @@ static void nvmet_execute_identify_desclist(struct nvmet_req *req)
>   			goto out;
>   	}
>   
> +	status = nvmet_copy_ns_identifier(req, NVME_NIDT_CSI,
> +					  NVME_NIDT_CSI_LEN,
> +					  &req->ns->csi, &off);
> +	if (status)
> +		goto out;
> +
>   	if (sg_zero_buffer(req->sg, req->sg_cnt, NVME_IDENTIFY_DATA_SIZE - off,
>   			off) != NVME_IDENTIFY_DATA_SIZE - off)
>   		status = NVME_SC_INTERNAL | NVME_SC_DNR;
> @@ -621,6 +641,17 @@ static void nvmet_execute_identify_desclist(struct nvmet_req *req)
>   	nvmet_req_complete(req, status);
>   }
>   
> +static bool nvmet_handle_identify_desclist(struct nvmet_req *req)
> +{
> +	switch (req->cmd->identify.csi) {
> +	case NVME_CSI_NVM:
> +		nvmet_execute_identify_desclist(req);
> +		return true;
> +	default:
> +		return false;
> +	}
> +}
> +
>   static void nvmet_execute_identify(struct nvmet_req *req)
>   {
>   	if (!nvmet_check_transfer_len(req, NVME_IDENTIFY_DATA_SIZE))
> @@ -628,13 +659,31 @@ static void nvmet_execute_identify(struct nvmet_req *req)
>   
>   	switch (req->cmd->identify.cns) {
>   	case NVME_ID_CNS_NS:
> -		return nvmet_execute_identify_ns(req);
> +		switch (req->cmd->identify.csi) {
> +		case NVME_CSI_NVM:
> +			return nvmet_execute_identify_ns(req);
> +		default:
> +			break;
> +		}
> +		break;
>   	case NVME_ID_CNS_CTRL:
> -		return nvmet_execute_identify_ctrl(req);
> +		switch (req->cmd->identify.csi) {
> +		case NVME_CSI_NVM:
> +			return nvmet_execute_identify_ctrl(req);
> +		}

Is it missing to add the default branch here ?

> +		break;
>   	case NVME_ID_CNS_NS_ACTIVE_LIST:
> -		return nvmet_execute_identify_nslist(req);
> +		switch (req->cmd->identify.csi) {
> +		case NVME_CSI_NVM:
> +			return nvmet_execute_identify_nslist(req);
> +		default:
> +			break;
> +		}
> +		break;
>   	case NVME_ID_CNS_NS_DESC_LIST:
> -		return nvmet_execute_identify_desclist(req);
> +		if (nvmet_handle_identify_desclist(req) == true)
> +			return;
> +		break;
>   	}
>   
>   	nvmet_req_cns_error_complete(req);
> diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
> index 146909486b8f..ac53a1307ce4 100644
> --- a/drivers/nvme/target/core.c
> +++ b/drivers/nvme/target/core.c
> @@ -692,6 +692,7 @@ struct nvmet_ns *nvmet_ns_alloc(struct nvmet_subsys *subsys, u32 nsid)
>   
>   	uuid_gen(&ns->uuid);
>   	ns->buffered_io = false;
> +	ns->csi = NVME_CSI_NVM;
>   
>   	return ns;
>   }
> @@ -887,10 +888,14 @@ static u16 nvmet_parse_io_cmd(struct nvmet_req *req)
>   		return ret;
>   	}
>   
> -	if (req->ns->file)
> -		return nvmet_file_parse_io_cmd(req);
> -
> -	return nvmet_bdev_parse_io_cmd(req);
> +	switch (req->ns->csi) {
> +	case NVME_CSI_NVM:
> +		if (req->ns->file)
> +			return nvmet_file_parse_io_cmd(req);
> +		return nvmet_bdev_parse_io_cmd(req);
> +	default:
> +		return NVME_SC_INVALID_IO_CMD_SET;
> +	}
>   }
>   
>   bool nvmet_req_init(struct nvmet_req *req, struct nvmet_cq *cq,
> @@ -1112,6 +1117,17 @@ static inline u8 nvmet_cc_iocqes(u32 cc)
>   	return (cc >> NVME_CC_IOCQES_SHIFT) & 0xf;
>   }
>   
> +static inline bool nvmet_css_supported(u8 cc_css)
> +{
> +	switch (cc_css <<= NVME_CC_CSS_SHIFT) {
> +	case NVME_CC_CSS_NVM:
> +	case NVME_CC_CSS_CSI:
> +		return true;
> +	default:
> +		return false;
> +	}
> +}
> +
>   static void nvmet_start_ctrl(struct nvmet_ctrl *ctrl)
>   {
>   	lockdep_assert_held(&ctrl->lock);
> @@ -1131,7 +1147,7 @@ static void nvmet_start_ctrl(struct nvmet_ctrl *ctrl)
>   
>   	if (nvmet_cc_mps(ctrl->cc) != 0 ||
>   	    nvmet_cc_ams(ctrl->cc) != 0 ||
> -	    nvmet_cc_css(ctrl->cc) != 0) {
> +	    !nvmet_css_supported(nvmet_cc_css(ctrl->cc))) {
>   		ctrl->csts = NVME_CSTS_CFS;
>   		return;
>   	}
> @@ -1182,6 +1198,8 @@ static void nvmet_init_cap(struct nvmet_ctrl *ctrl)
>   {
>   	/* command sets supported: NVMe command set: */
>   	ctrl->cap = (1ULL << 37);
> +	/* Controller supports one or more I/O Command Sets */
> +	ctrl->cap |= (1ULL << 43);
>   	/* CC.EN timeout in 500msec units: */
>   	ctrl->cap |= (15ULL << 24);
>   	/* maximum queue entries supported: */
> diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
> index d4bcfeb570e5..4ca558b94955 100644
> --- a/drivers/nvme/target/nvmet.h
> +++ b/drivers/nvme/target/nvmet.h
> @@ -83,6 +83,7 @@ struct nvmet_ns {
>   	struct pci_dev		*p2p_dev;
>   	int			pi_type;
>   	int			metadata_size;
> +	u8			csi;
>   };
>   
>   static inline struct nvmet_ns *to_nvmet_ns(struct config_item *item)
> diff --git a/include/linux/nvme.h b/include/linux/nvme.h
> index edcbd60b88b9..c7ba83144d52 100644
> --- a/include/linux/nvme.h
> +++ b/include/linux/nvme.h
> @@ -1504,6 +1504,7 @@ enum {
>   	NVME_SC_NS_WRITE_PROTECTED	= 0x20,
>   	NVME_SC_CMD_INTERRUPTED		= 0x21,
>   	NVME_SC_TRANSIENT_TR_ERR	= 0x22,
> +	NVME_SC_INVALID_IO_CMD_SET	= 0x2C,
>   
>   	NVME_SC_LBA_RANGE		= 0x80,
>   	NVME_SC_CAP_EXCEEDED		= 0x81,
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH V15 4/5] nvmet: add Command Set Identifier support
  2021-06-11  2:14   ` Wu Bo
@ 2021-06-11  2:35     ` Chaitanya Kulkarni
  0 siblings, 0 replies; 9+ messages in thread
From: Chaitanya Kulkarni @ 2021-06-11  2:35 UTC (permalink / raw)
  To: Wu Bo, linux-block, linux-nvme; +Cc: axboe, sagi, hch

On 6/10/21 19:14, Wu Bo wrote:
>>   	case NVME_ID_CNS_CTRL:
>> -		return nvmet_execute_identify_ctrl(req);
>> +		switch (req->cmd->identify.csi) {
>> +		case NVME_CSI_NVM:
>> +			return nvmet_execute_identify_ctrl(req);
>> +		}
> Is it missing to add the default branch here ?
>

maybe for the consistency



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH V15 0/5] nvmet: add ZBD backened support
  2021-06-10  1:32 [PATCH V15 0/5] nvmet: add ZBD backened support Chaitanya Kulkarni
                   ` (4 preceding siblings ...)
  2021-06-10  1:32 ` [PATCH V15 5/5] nvmet: add ZBD over ZNS backend support Chaitanya Kulkarni
@ 2021-06-15 16:25 ` Christoph Hellwig
  5 siblings, 0 replies; 9+ messages in thread
From: Christoph Hellwig @ 2021-06-15 16:25 UTC (permalink / raw)
  To: Chaitanya Kulkarni; +Cc: linux-block, linux-nvme, axboe, sagi, hch

Thanks,

applied to nvme-5.14.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-06-15 16:25 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-10  1:32 [PATCH V15 0/5] nvmet: add ZBD backened support Chaitanya Kulkarni
2021-06-10  1:32 ` [PATCH V15 1/5] block: export blk_next_bio() Chaitanya Kulkarni
2021-06-10  1:32 ` [PATCH V15 2/5] nvmet: add req cns error complete helper Chaitanya Kulkarni
2021-06-10  1:32 ` [PATCH V15 3/5] nvmet: add nvmet_req_bio put helper for backends Chaitanya Kulkarni
2021-06-10  1:32 ` [PATCH V15 4/5] nvmet: add Command Set Identifier support Chaitanya Kulkarni
2021-06-11  2:14   ` Wu Bo
2021-06-11  2:35     ` Chaitanya Kulkarni
2021-06-10  1:32 ` [PATCH V15 5/5] nvmet: add ZBD over ZNS backend support Chaitanya Kulkarni
2021-06-15 16:25 ` [PATCH V15 0/5] nvmet: add ZBD backened support Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).